Training Models Starts with Data: Collection, Curation and the Refinement Loop

Training strong models starts with building strong datasets. In this talk, we’ll present a practical, end-to-end view of how data strategy drives model quality, development speed and iteration at scale. We’ll cover the core decisions in data collection—what to capture, how much is enough and which operating condition variables (lighting, viewpoints, environments, edge cases) most influence required volume. We’ll then share curation principles for high-leverage training sets: matching real deployment conditions; maintaining balance across classes and scenarios; and pruning data that is irrelevant or misleading. Next, we’ll discuss task-specific labeling, including how we handle ambiguity, reduce inconsistency and implement quality checks. Finally, we’ll focus on evaluation data and iteration: building representative holdout sets; segmenting performance to expose failure modes; and running a data-refinement loop where deployment feedback guides targeted new collection and updates. Attendees will leave with a repeatable framework for making data decisions that scale.

Track

Fundamentals

Session Speakers

Uma Govindarajan
Senior Machine Learning Engineer, Blue River Technology

Uma Govindarajan is a Senior Machine Learning Engineer at Blue River Technology with over five years of experience working on machine learning systems deployed in real-world environments. Her work covers training, adaptation and iteration of machine learning models across the end-to-end life cycle, including evaluation and post-deployment behavior. Her interests include representation learning, large-scale model training and downstream task adaptation, with an emphasis on data-centric AI methods for improving system reliability over time. Uma also mentors undergraduate students interested in machine learning and applied AI and holds a graduate degree from the University of Texas at Austin. In her spare time she enjoys photography, reading, traveling and catching up on the latest developments in technology.

Training Models Starts with Data: Collection, Curation and the Refinement Loop

Track

Session Speakers

Uma Govindarajan

See you May 11-13, 2026 in Silicon Valley, California

Sponsors and Exhibitors

Get in Touch

Share