CartoChrome computes Healthcare Access Scores for every populated ZIP code in the United States -- approximately 33,000 ZCTAs (ZIP Code Tabulation Areas) as defined by the Census Bureau. Each ZIP code receives 11 scores: one overall and 10 condition-specific. That is 363,000 scores, each derived from an 8-component spatial analysis algorithm that considers provider locations, facility quality, population demand, and social determinants of health. And the entire process runs on autopilot.
The Data Ingestion Layer
Our pipeline begins with 21 free, public data sources published by federal agencies. These sources update on different schedules -- weekly, monthly, quarterly, and annually -- and our ingestion layer handles each cadence automatically.
The core data sources include:
- CMS NPPES (National Plan and Provider Enumeration System) -- Updated monthly with a full file of ~7.5 million provider records, plus weekly delta files. We filter this to ~4 million active, patient-facing providers.
- CMS Hospital Compare -- Monthly quality metrics including star ratings, mortality rates, readmission rates, and patient experience scores for every Medicare-certified hospital.
- Census ACS 5-Year Estimates -- Annual demographic data at the ZCTA level: population, age distribution, income, insurance coverage, vehicle access, disability rates, and more.
- CDC PLACES -- Census tract-level health outcome measures used as our primary calibration target.
Each source has a dedicated Celery task that checks for new data, downloads it, validates the schema, and loads it into PostgreSQL with PostGIS spatial extensions. If a source is unavailable or returns malformed data, the pipeline logs an alert and retries -- it never silently proceeds with stale data.
