Job Description

Job Description Insight Global is looking for a Data Engineer to join a dedicated team building and evolving a clinical data platform serving the clinical operations space. You will architect and build the large-scale data pipelines that power clinical insights u2014 processing billions of records across medical claims, clinical trials, publications, and provider data. This is a core infrastructure role. You will be responsible for designing, building, and maintaining ETL frameworks that feed into analytics, machine learning, and product surfaces. You should be deeply comfortable with distributed computing at scale and experienced working alongside ML and data science teams in production environments. Responsibilities Include: u00b7 Design, build, and maintain large-scale ETL pipelines and data frameworks using Apache Spark (PySpark/Scala) on cloud infrastructure u00b7 Architect scalable data models and pipeline patterns to process structured and unstructured healthcare data at volume u00b7 Build and optimize data layers on Azure cloud services, including Databricks, Delta Lake, and supporting compute and storage infrastructure u00b7 Ensure data quality, lineage, and governance across the platform u2014 implementing validation, monitoring, and alerting at scale u00b7 Collaborate with AI Scientists and MLOps teams to build data pipelines that serve model training, inference, and retraining workflows u00b7 Work with data analysts and product teams to ensure curated, reliable data is available for downstream insights and reporting u00b7 Contribute to platform architecture decisions and help define best practices for data engineering within the team We are a company committed to creating diverse and inclusive environments where people can bring their full, authentic selves to work every day. We are an equal opportunity/affirmative action employer that believes everyone matters. Qualified candidates will receive consideration for employment regardless of their race, color, ethnicity, religion, sex (including pregnancy), sexual orientation, gender identity and expression, marital status, national origin, ancestry, genetic factors, age, disability, protected veteran status, military or uniformed service member status, or any other status or characteristic protected by applicable laws, regulations, and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or recruiting process, please send a request to learn more about how we collect, keep, and process your private information, please review Insight Global's Workforce Privacy Policy: Skills and Requirements u00b7 5+ years of experience in data engineering with a focus on large-scale distributed data systems u00b7 Strong proficiency in Python, SQL, and Scala u00b7 Deep hands-on experience with Apache Spark (PySpark, Spark SQL) for building ETL pipelines and data transformations at scale u00b7 Experience with Azure cloud services u2014 including Databricks, Delta Lake, and Azure Data Factory u00b7 Understanding of MLOps practices and experience building data infrastructure that supports machine learning workflows u00b7 Experience with data quality frameworks, data lineage, and governance tooling u00b7 Comfortable working independently in a remote setting with a distributed, cross-time zone team u00b7 Familiarity with Kubernetes and container orchestration for data workloads u00b7 Background in healthcare, life sciences, pharma, or clinical research is a strong plus

Job Title

Company : Insight Global

Location : New York City, NY

Created : 2026-04-04

Job Type : Full Time