Mirage is an AI-native video platform that intelligently orchestrates production and editing through natural language. Our models leverage contextual awareness to execute the same creative decisions a professional editor would - dramatically improving productivity for experienced teams, while making video creation accessible to anyone.We're an interdisciplinary team addressing some of the most difficult technical and creative challenges in generative media. As an early member of our team, you'll tackle foundational problems that remain largely unsolved across the industry, driving an outsized impact on the future of creative expression.More about usProduct(Captions by Mirage) Research(Seeing Voices, technical-white-paper)Updates(Mirage on X / twitter)TechCrunch, Forbes AI 50, Fast Company (press)Our InvestorsWe're very fortunate to have some the best investors and entrepreneurs backing us, including Index Ventures, Kleiner Perkins, Sequoia Capital, Andreessen Horowitz, General Catalyst, Uncommon Projects, Kevin Systrom, Mike Krieger, Lenny Rachitsky, Antoine Martin, Julie Zhuo, Ben Rubin, Jaren Glover, SVAngel, 20VC, Ludlow Ventures, Chapter One, and more.Please note that all of our roles will require you to be in-person at our NYC HQ (located in Union Square) About the roleMirage is seeking a Research Scientist to advance the frontier of multimodal video generation. You'll work on novel modeling approaches, training objectives, and scaling strategies for large-scale video models, contributing directly to systems used by millions of creators.You'll focus on pushing generation quality, controllability, and realism-especially in facial expression, audio-to-video synchronization, human motion, expression, and storytelling-while validating ideas through real-world product impact.ResponsibilitiesDevelop novel approaches to video and multimodal generative modelingDesign new training objectives, loss functions, and evaluation methods optimizing for highly compute-efficient, low-latency generationExplore temporal modeling, controllability, and multimodal alignmentConduct empirical studies to understand scaling behavior and model performanceDrive rapid experimentation across architectures and training strategiesAnalyze model behavior and identify opportunities for improvementTranslate research insights into measurable product improvementsWhat makes you a great fitMS/PhD in ML, CS, or related fieldStrong publication record (NeurIPS, ICML, ICLR, etc. or equivalent work)Deep expertise in generative modeling (diffusion, autoregressive architectures, etc.)Deep understanding of transformers and modern multimodal systemsExperience with large-scale training and empirical research, and optimizing models for real-time inference efficiencyStrong experience working with audio representations and audio-visual datasetsBenefits:Comprehensive medical, dental, and vision plans401K with employer matchCommuter BenefitsCatered lunch multiple days per weekDinner stipend every night if you're working late and want a bite!Grubhub subscriptionHealth & Wellness PerksMultiple team offsites per year with team events every monthGenerous PTO policyCaptions provides equal employment opportunities to all employees and applicants for employment and prohibits discrimination and harassment of any type without regard to race, color, religion, age, sex, national origin, disability status, genetics, protected veteran status, sexual orientation, gender identity or expression, or any other characteristic protected by federal, state or local laws.Please note benefits apply to full time employees only.
Job Title
Research Scientist, Video Generation