We're a ten‑year‑old SaaS company that started in a cramped garage in Raymore, Missouri and has since grown into a 200‑person organization serving more than 15,000 small‑business customers across North America. Our product – a real‑time inventory‑visibility platform – lives in the cloud, and the decisions our customers make every day depend on the predictions we generate. That's why we're looking for a senior‑level Remote Data Scientist who can take ownership of the end‑to‑end machine‑learning pipeline, from raw data ingestion to production‑grade model monitoring. The role is remote, but the team still meets once a week on a video call that we all jokingly call "the coffee‑break stand‑up."
In the last twelve months we added two new data sources: a POS‑stream from a major grocery chain and a fleet of IoT sensors on delivery trucks. Those streams increased our daily data volume by 68 % and opened a new line of business we're calling "Predictive Re‑stock." To turn those streams into actionable insights we need a data scientist who can design, validate, and ship models that run on both AWS and GCP. Our current team of six data engineers and two junior scientists has built a solid feature store, but we lack a senior person who can set technical standards, mentor the junior members, and embed robust governance into the model lifecycle. We've also committed to a new Service Level Agreement (SLA) with a marquee client – 95 % model‑drift detection within 24 hours – and we need your expertise to meet that Talexion.
| Time | Activity |
|---|---|
| 20 % | Data exploration & cleansing – write Jupyter notebooks in Python and R to profile the new POS and sensor data, flag anomalies, and document findings in Confluence. |
| 20 % | Feature engineering – design time‑series features using pandas, dask, and Spark, store them in our Snowflake data warehouse, and push them to the feature store managed by Feast. |
| 20 % | Model development – prototype with scikit‑learn, XGBoost, and TensorFlow; run hyper‑parameter sweeps on Vertex AI (GCP) or Sage‑Maker (AWS). |
| 15 % | Productionization – containerize models with Docker, orchestrate pipelines in Airflow, and deploy to Kubernetes clusters that auto‑scale based on traffic. |
| 15 % | Monitoring & governance – set up Prometheus alerts, Grafana dashboards, and drift detection using Evidently AI; write post‑mortems that feed back into the data catalog. |
| 10 % | Mentorship & collaboration – pair‑program with junior scientists, review pull requests on GitHub, and run fortnightly brown‑bag sessions on emerging ML research. |
Note: All work is done remotely, but we rely on a strong culture of async communication. You'll use Slack for quick questions, Notion for project roadmaps, and our internal wiki for knowledge sharing.
We also keep an eye on MLflow for experiment tracking, DVC for data versioning, and Looker for dashboarding, but the twelve tools above are the daily workhorses.