Building Maps at Scale: From TIGER Files to Interactive Choropleth
Rendering 33,000 colored polygons and 4 million point markers at 60fps in a web browser requires a very specific technology stack. Here is how CartoChrome builds its maps -- from Census shapefiles to GPU-accelerated vector tiles.
CT
CartoChrome Team
··11 min read
GISMappingData Science
The CartoChrome interactive map is not a simple Google Maps embed with some colored pins. It is a GPU-accelerated, multi-layer visualization system that renders 33,000 ZIP code polygons (each colored by Healthcare Access Score), 4 million provider markers (clustered dynamically as you zoom), and multiple toggleable overlay layers -- all at 60 frames per second in a standard web browser. Building this required solving a series of technical challenges that pushed the boundaries of web-based cartography.
Starting Point: Census TIGER/Line Shapefiles
Every polygon on the CartoChrome map starts as a shapefile from the Census Bureau's TIGER/Line program. TIGER (Topologically Integrated Geographic Encoding and Referencing) provides the official geographic boundaries for every statistical geography in the United States -- states, counties, census tracts, and the ZIP Code Tabulation Areas (ZCTAs) that form our primary visualization layer.
The raw ZCTA shapefile contains approximately 33,000 polygons with complex geometries. Some ZCTAs in Alaska span thousands of square miles; others in Manhattan cover a few city blocks. The raw file weighs in at approximately **900 MB** -- far too large to serve directly to a web browser.
Step 1: Geometry Simplification and Format Conversion
The first processing step converts TIGER shapefiles to GeoJSON using **ogr2ogr** (part of the GDAL/OGR geospatial library) and applies geometry simplification to reduce file size without visible distortion at target zoom levels.
The key decisions at this stage:
Simplification tolerance varies by zoom level. At zoom 4-6, aggressive simplification (removing vertices within 0.01 degrees) reduces polygon complexity by 80% with no visible impact at that zoom. At zoom 10+, minimal simplification preserves the detailed boundary shapes users expect when zoomed in on their neighborhood.
Coordinate precision
is truncated to 6 decimal places (~11cm precision), which is more than sufficient for choropleth visualization and saves significant storage.
Properties are stripped to the minimum needed for rendering: ZCTA code, state FIPS code, and a pre-joined health score field. All other data is fetched on-demand via API when a user clicks a polygon.
Step 2: Tippecanoe -- Building Vector Tiles
The simplified GeoJSON is fed into **Tippecanoe**, a tool originally developed by Mapbox (now maintained as open source) that generates vector tile archives. Tippecanoe is the secret weapon of modern web cartography -- it intelligently selects which features to include at each zoom level, simplifies geometries progressively, and outputs a single **PMTiles** archive file.
Key Tippecanoe parameters for our ZCTA layer:
Minimum zoom 4 -- Below this, we render a pre-computed state-level choropleth instead of individual ZCTAs
Maximum zoom 12 -- Beyond this, polygon boundaries are static and provider markers take over as the primary data layer
Feature dropping is disabled for ZCTAs -- every ZIP code must be visible at its target zoom level, even if that means larger tile sizes
Tile size limit of 500KB ensures fast loading even on mobile connections
Check Your ZIP Code Health Score
See how your area compares across 11 health dimensions
The output is a single PMTiles file of approximately **150 MB** for the full ZCTA layer.
Step 3: PMTiles on S3 -- Serverless Tile Serving
Traditional tile servers (like TileServer GL or Martin) require always-on compute resources to serve tiles on demand. CartoChrome uses **PMTiles**, a cloud-native single-file tile archive format that eliminates the tile server entirely.
PMTiles works through HTTP range requests: the file contains an internal directory that maps tile coordinates (z/x/y) to byte ranges within the archive. The client library reads this directory, then fetches individual tiles as byte-range requests from any static file host. We serve our PMTiles from **S3 behind CloudFront**, which means:
Zero server management -- no tile server to maintain, patch, or scale
Cost efficiency -- S3 storage ($0.023/GB/month) + CloudFront transfer, totaling approximately $5-15/month versus $50+/month for a dedicated tile server
Infinite scalability -- S3 and CloudFront handle any traffic volume without configuration changes
The browser-side rendering is handled by **MapLibre GL JS**, the open-source fork of Mapbox GL JS. MapLibre uses WebGL to render vector tiles directly on the GPU, which is what makes 60fps pan and zoom possible with 33,000 polygons.
For the choropleth coloring, we use a MapLibre **data-driven style expression** that maps health scores to colors on a continuous red-to-teal scale:
Scores 0-24 render as deep red (#d73027)
Scores 25-49 render as orange (#fc8d59)
Scores 50-69 render as warm yellow (#fee08b)
Scores 70-89 render as medium teal (#4dac8a)
Scores 90-100 render as deep teal (#1a9876)
Intermediate values are interpolated using the perceptually uniform color space provided by **d3-scale-chromatic**, ensuring smooth visual transitions and colorblind accessibility (the red-to-teal palette passes WCAG AA for both deuteranopia and protanopia).
Step 5: deck.gl -- 4 Million Provider Markers
Rendering 4 million point markers is beyond even MapLibre's capabilities for native layers. This is where **deck.gl** enters the stack -- Uber's open-source framework for GPU-accelerated geospatial visualization.
At zoom levels 12+, individual provider markers appear as a deck.gl **ScatterplotLayer**. But at zoom levels below 12, displaying 4 million individual points would be visually chaotic and computationally expensive. We use **Supercluster** -- a fast JavaScript library for geospatial point clustering -- to aggregate nearby providers into cluster markers that display the count. As the user zooms in, clusters smoothly split into smaller clusters and eventually into individual markers.
The clustering is performed client-side on a Web Worker thread, keeping the main thread free for rendering. The initial clustering computation for 4 million points takes approximately **400ms** -- well within our 500ms performance target.
The Full Pipeline
Putting it all together, the tile-generation pipeline runs as a shell script in our CI/CD pipeline:
Download latest TIGER/Line shapefiles from Census Bureau
Join health scores to ZCTA geometries via PostGIS
Export to GeoJSON with ogr2ogr
Generate PMTiles with Tippecanoe
Upload to S3 with cache-control headers
Invalidate CloudFront cache for updated tile paths
The entire pipeline completes in approximately 45 minutes and runs automatically whenever health scores are recomputed. No human touches a shapefile, a tile, or a deploy button.