We are building cloud-based LLM products that rewrite consumer-facing messages to improve conversion rates for short-term loan applications.

We are looking for a Data Scientist to own the quantitative side of this system: designing experiments, evaluating model outputs, building classifiers, and translating findings into clear recommendations for the team.

What You Will Do

Design and analyze A/B tests measuring message performance across conversion funnels
Build evaluation frameworks for LLM-generated content (automated metrics, human eval, regression testing)
Develop classifiers to detect and categorize message quality issues before they reach customers
Engineer features from text and behavioral data to predict applicant engagement
Monitor production model performance and define alerting and rollback criteria
Communicate findings clearly in writing to both technical and non-technical stakeholders

Required Skills & Experience

3+ years in a data science or ML engineering role
Strong Python fluency (pandas, scikit-learn, statsmodels, Jupyter)
Hands-on experience with A/B testing and experimentation design (power analysis, multiple comparisons, confidence intervals)
Familiarity with NLP / text classification — traditional (TF-IDF, logistic regression) or modern (embeddings, transformers, LLM prompting)
Experience working with or evaluating large language model outputs
Strong written communication — you will write analysis narratives, not just code

Nice to Have

Experience in fintech or lending
Experience with LLM evaluation methods (ROUGE, BERTScore, LLM-as-judge)
Production ML experience (monitoring, drift detection, CI/CD for models)

Engagement Details

Type: Contract (part-time, flexible)
Location: Remote. We are based out of Atlanta, GA (EST)
Duration: Likely 6+ months. With initial 1-month trial period

How to Apply

Please include in your application:

A brief description of your most relevant data science project
Your experience with experimentation and/or NLP
Briefly describe one project where you measured whether a change to text or messaging improved a business outcome. What did you test, how did you measure it, and what did you find? (3-5 sentences)
How do you evaluate whether an LLM is producing good outputs at scale, beyond eyeballing examples? Describe your approach in 3-5 sentences.
Please confirm: (a) hours per week you can commit, (b) your hourly rate in USD, (c) your timezone and overlap with US Eastern, (d) are you applying as an individual or as part of an agency?

Part time Data Scientist: LLM Message Optimization

What You Will Do

Required Skills & Experience

Nice to Have

Engagement Details

How to Apply

Upwork