Senior Research Engineer / Senior Machine Learning Engineer
Location: Remote – United States
Duration: 12 Months (W2 Contract, with potential extension)
About the Role
We are seeking a Senior Research Engineer to join a highly collaborative research and engineering team focused on building scalable deep learning systems and distributed training infrastructure. This role involves developing high–quality machine learning libraries, enabling large–scale model training, and translating cutting–edge research into real–world products that operate at massive scale. You will work closely with scientists, engineers, and cross–functional partners to design, implement, and optimize deep learning solutions using modern frameworks and distributed systems.
Must–Have Technical Skills
- 5–10 years of professional Python experience
- 3–5 years of experience with distributed machine learning training (e.g., FSDP, DDP, or similar approaches)
- 3–5 years of hands–on PyTorch experience
- 3–5 years working with datasets, data pipelines, and PyTorch DataLoader
Nice–to–Have Skills
- Active or past contributions to open–source ML/AI repositories
- Strong engineering background with a focus on scalable systems
- Experience collaborating with research teams to productionize ML models
Responsibilities
- Design, develop, and maintain deep learning libraries supporting large–scale distributed training
- Implement and optimize distributed training strategies using techniques such as Data Parallelism and Fully Sharded training
- Build and maintain robust data pipelines and dataset loading systems for large–scale training
- Translate research ideas into production–ready ML systems
- Write clean, efficient, and well–documented code with a strong emphasis on reproducibility
- Contribute to open–source projects and publish high–quality, reusable code when applicable
- Collaborate closely with researchers, engineers, and product partners in a fast–paced environment
- Debug and optimize performance across GPU–based training systems
Qualifications
- Bachelor's, Master's, or PhD in Computer Science, Computer Engineering, or a related technical field
- 5+ years of hands–on experience in deep learning and machine learning engineering
- Strong experience developing ML algorithms or infrastructure using Python and/or C/C++
- Experience with PyTorch and distributed training approaches (e.g., DDP, FSDP or equivalent)
- Experience working with large datasets, data preprocessing, and data loading pipelines
- Solid understanding of algorithms, data structures, and software engineering best practices
- Proven ability to work effectively in a collaborative, team–oriented environment
- Strong problem–solving and communication skills
Preferred Qualifications
- Demonstrated software engineering experience through professional work or widely used open–source contributions
- Prior contributions to open–source AI/ML projects
- Experience training large transformer–based or deep neural network models
- Familiarity with performance optimization, memory efficiency, and scalable training systems
Skills Required for This Job
- C++
- Data pipelines
- Deep learning
- Delivered Duty Paid (DDP)
- Distributed training
- Fully Sharded Data Parallel (FSDP)
- Machine learning
- PyTorch
- Python
- Software engineering
- TensorFlow
Apply to this job