ML Research : Embodied AI

KGEN

KGEN

Software Engineering, Data Science

Bengaluru, Karnataka, India

Posted on Apr 30, 2026

ML Researcher: Embodied AI Humyn Labs

The Role:

We're hiring an ML Researcher to contribute to the scientific direction of our dataset. The central question you'll own: what data will move humanoid robot training forward in the least amount of time, and how do we prove it? This means thinking deeply about pre-training dynamics — what data mixtures, task distributions, and annotation richness actually shift what a model learns at scale.

You'll study where current embodied AI models break, translate those failure modes into concrete data requirements, and run the experiments that validate whether our dataset addresses them. This work directly shapes what we capture, how we label it, and how we evaluate it.

What You'll Do:

  • Characterize the limitations of current embodied AI. Run systematic evaluations of open VLA and egocentric models to identify where they fail — long-horizon manipulation, dexterous contact-rich tasks, generalization to unseen objects and environments, ego-motion robustness, hand-object interaction modeling.
  • Run pre-training experiments. Design and execute data ablations and mixture studies that measure how dataset composition affects pre-training outcomes — task coverage, modality balance, annotation granularity. Understand not just what breaks at inference, but what the model failed to learn during training and why.
  • Translate failure modes into dataset requirements. Convert model weaknesses into specific, testable hypotheses about what data richness — modalities, task diversity, annotation granularity, scene coverage — would address them.
  • Design benchmarks. Build internal evaluation suites that measure whether dataset changes actually improve downstream policy learning. Where public benchmarks are insufficient, design new ones.
  • Lead labeling science. Drive technical decisions on hand pose representations (MANO topology, 6D vs axis-angle rotations), object tracking, action segmentation, and 3D grounding. Validate label quality at scale and catch systemic annotation errors before they compound.
  • Fine-tune and orchestrate ML pipelines. Fine-tune and evaluate vision and multimodal foundation models used in our annotation and QA stack (pose estimation, open-vocabulary detection, tracking, VLMs). Make pragmatic build-vs-adopt decisions.
  • Publish. Continue contributing to the research community through papers, benchmarks, and open artifacts.

What We're Looking For

Must Have

  • MS/PhD in computer vision, machine learning, robotics, or a closely related field — or equivalent research output.
  • Strong hands-on experience fine-tuning vision and multimodal foundation models.
  • Publications in one or more of: egocentric vision, VLA / VLM models, 3D hand and human pose reconstruction, video understanding, or robot learning.
  • Pre-training is a core part of your background — you've run or closely contributed to pre-training or mid-training runs of foundation models for embodied AI (VLAs, world models, or visuomotor policies), and you have a point of view on how data composition shapes what models learn.
  • Deep familiarity with vision dataset construction — labeling pipelines, annotation quality, inter-annotator agreement, and the failure modes that degrade dataset value at scale.
  • At least 4 years of experience working on projects in similar or related fields.
  • A clear point of view on where embodied AI is currently limited and what kinds of data would move it forward.

Nice to Have

  • Prior experience at a research lab (academic or industrial).
  • Experience teaching or advising (professor, assistant professor, postdoc, or research mentor).
  • Familiarity with MCAP, ROS, LeRobot or similar robotics data formats.
  • Experience with dataset scaling studies (Ego4D, EgoExo4D, Open X-Embodiment, EgoScale).
  • Contributions to open-source vision or robotics projects.