Josh Scherbenski
Hands-on AI and data leader with 15+ years building machine learning, analytics, and data platforms from zero to scale across healthcare, marketplaces, and venture-backed technology companies. Most recently founded and built a HIPAA-compliant digital health platform using generative AI, agentic workflows, longitudinal user data, and medical research to deliver personalized health recommendations. Experienced defining AI roadmaps, building production ML systems, leading multidisciplinary data and AI teams, and translating emerging AI techniques into practical, trustworthy products. Strong fit for mission-driven health technology environments where data quality, clinical relevance, privacy, and user impact matter.
Experience
Founded and built a consumer digital health platform owning product, engineering, data, analytics, and AI systems end-to-end
- Built HIPAA-compliant ML and data infrastructure powering personalized health recommendations for 1,000+ users
- Implemented LLM parsing and agentic workflows to turn medical research, user-reported outcomes, and longitudinal health signals into structured recommendations
- Designed production data pipelines, analytics tracking, experimentation, and evaluation workflows for personalization, responsible AI, and model iteration
Head of Data & AI
Scale Venture Partners
2024 — 2025Architected the firm’s first AI-powered sourcing and investment intelligence platform, translating structured company data, founder signals, financial indicators, and unstructured text into predictive analytics workflows
- Built ensemble models using Random Forest, XGBoost, regression, and domain-specific features to evaluate 110,000+ companies, increasing qualified deal flow 4×
- Engineered modeling pipelines using 300+ behavioral, financial, and founder-quality parameters, with cross-validation and out-of-time testing for generalization
- Created Snowflake + dbt + Airflow pipelines supporting LLM-based summarization, predictive scoring, ranking, analytics, and model evaluation
Partnered with Marketing, Sales, Product, and Engineering to build analytics, experimentation, and data infrastructure for a SaaS prop-tech platform
- Designed automated attribution, real-time ETL, and ML targeting pipelines increasing new-user acquisition 50%
- Built causal inference models to measure channel incrementality, improving CAC/LTV 25%
- Developed forecasting, churn, funnel diagnostics, and retention analytics to improve growth and product-market-fit decisions
Head of Data & Pricing
Zeus Living
2019 — 2023Led a global 33-person organization across data science, ML engineering, pricing, analytics, and data platforms supporting a $100M+ marketplace business.
- Built dynamic-pricing ML models generating 2M+ optimized daily price points and increasing gross profit $10M/year
- Developed geospatial regression and demand-surface models supporting underwriting and market risk decisions, reducing launch time 3 months
- Built centralized data lake and MLOps platform enabling 10+ production ML models
- Established model monitoring, retraining, governance, and operational workflows for production ML reliability
Head of Data
Riley Corporation
2017 — 2018Developed NLP automation cutting manual processing costs 50%, enabling profitable unit economics
- Defined the organization’s first KPI framework and statistical monitoring systems
Lead Data Scientist – ML
Glassdoor
2011 — 2017First data scientist at Glassdoor; architected experimentation, ML, and analytics foundations for marketing, product, and international growth.
- Created job-search experimentation framework using bootstrap, nonparametric, and Bayesian methods
- Built semi-supervised models to predict user/customer value and optimize paid marketing and email sequencing
- Led 8-person data science team; supported product and international expansion contributing to 15M+ new users
Data Analyst
Google (Maps)
Program Manager / Systems Engineer
NASA
focused on probabilistic modeling, systems engineering, and innovation programs
Education
WorldQuant University
M.S.
University of Michigan
M.Eng.
Skills
AI strategy, roadmap development, governance, and executive advisingHealthcare AI, HIPAA-compliant systems, and regulated data governanceGenerative AI, LLM applications, agentic workflows, and RAG-style retrievalLongitudinal health data, behavioral modeling, and personalized recommendationsProduction ML systems, MLOps, model evaluation, monitoring, and iterationExperimentation, causal inference, performance evaluation, and measurable product impactMultidisciplinary AI, ML, data science, and engineering leadershipCross-functional partnership with Product, Engineering, clinical/domain experts, and executivesPythonSQLRJavaScriptGenerative AILLM applicationsagentic workflowsrecommendation systemspredictive modelingensemble modelsNLPcausal inferenceexperimentationLongitudinal user datapatient-level personalizationbehavioral modelingtreatment-response analysisbiomarker-adjacent analyticsETL/ELT workflowsfeature pipelinesAirflowdbtSQLMeshSnowflakeBigQueryAWSGCPModel validationmonitoringretraining workflowsevaluation frameworksprivacy-aware MLregulated data practicesLookerTableauModeexecutive dashboardsproduct metrics
CORE CAPABILITIES
AI strategy, roadmap development, governance, and executive advising
Healthcare AI, HIPAA-compliant systems, and regulated data governance
Generative AI, LLM applications, agentic workflows, and RAG-style retrieval
Longitudinal health data, behavioral modeling, and personalized recommendations
Production ML systems, MLOps, model evaluation, monitoring, and iteration
Experimentation, causal inference, performance evaluation, and measurable product impact
Multidisciplinary AI, ML, data science, and engineering leadership
Cross-functional partnership with Product, Engineering, clinical/domain experts, and executives
TECHNICAL SKILLS
Python, SQL, R, JavaScript
Generative AI, LLM applications, agentic workflows, recommendation systems, predictive modeling, ensemble models, NLP, causal inference, experimentation
Health Data & Personalization
Longitudinal user data, patient-level personalization, behavioral modeling, treatment-response analysis, biomarker-adjacent analytics
ETL/ELT workflows, feature pipelines, Airflow, dbt, SQLMesh, Snowflake, BigQuery, AWS, GCP
Model validation, monitoring, retraining workflows, evaluation frameworks, privacy-aware ML, regulated data practices
Looker, Tableau, Mode, executive dashboards, product metrics