We are building cloud-based LLM products that rewrite consumer-facing messages to improve conversion rates for short-term loan applications.
We are looking for a Data Scientist to own the quantitative side of this system: designing experiments, evaluating model outputs, building classifiers, and translating findings into clear recommendations for the team.
What You Will Do
- Design and analyze A/B tests measuring message performance across conversion funnels
- Build evaluation frameworks for LLM-generated content (automated metrics, human eval, regression testing)
- Develop classifiers to detect and categorize message quality issues before they reach customers
- Engineer features from text and behavioral data to predict applicant engagement
- Monitor production model performance and define alerting and rollback criteria
- Communicate findings clearly in writing to both technical and non-technical stakeholders
Required Skills & Experience
- 3+ years in a data science or ML engineering role
- Strong Python fluency (pandas, scikit-learn, statsmodels, Jupyter)
- Hands-on experience with A/B testing and experimentation design (power analysis, multiple comparisons, confidence intervals)
- Familiarity with NLP / text classification — traditional (TF-IDF, logistic regression) or modern (embeddings, transformers, LLM prompting)
- Experience working with or evaluating large language model outputs
- Strong written communication — you will write analysis narratives, not just code
Nice to Have
- Experience in fintech or lending
- Experience with LLM evaluation methods (ROUGE, BERTScore, LLM-as-judge)
- Production ML experience (monitoring, drift detection, CI/CD for models)
Engagement Details
- Type: Contract (part-time, flexible)
- Location: Remote. We are based out of Atlanta, GA (EST)
- Duration: Likely 6+ months. With initial 1-month trial period
How to Apply
Please include in your application:
- A brief description of your most relevant data science project
- Your experience with experimentation and/or NLP
- Briefly describe one project where you measured whether a change to text or messaging improved a business outcome. What did you test, how did you measure it, and what did you find? (3-5 sentences)
- How do you evaluate whether an LLM is producing good outputs at scale, beyond eyeballing examples? Describe your approach in 3-5 sentences.
- Please confirm: (a) hours per week you can commit, (b) your hourly rate in USD, (c) your timezone and overlap with US Eastern, (d) are you applying as an individual or as part of an agency?