The U.S. Census Bureau and Child Health

Why Does Geographic Data Matter?

In medicine, we often have some kind of data that we want to understand better in a geographic sense:

  • Families (how far do they live from our urgent care?)
  • Patients / Research Participants (do they live in a high-crime area?)
  • Sensors (what are the air quality patterns near my patient’s school?)

We may have clinical uses for geographic data. For example, if a child lives in an area marked by high levels of lead in the soil, we may want providers to suggest care around outside play or additional blood tests.

Or, we may have public health data we want to understand more fully. We may want to ask, “What are the patterns that mark hepatitis outbreaks in Philadelphia?”, or “How does gun violence in Philadelphia affect outside play and childhood obesity?”).

And of course, we have research questions, such as whether children who live downwind of major highways at higher risk for neurological or developmental disorders.

In short, the social determinants of health are often highly affected by geographic location, so, as biomedical researchers and clinicians, we should be aware of the uses of geographic data. This module will teach you how to work with a very important source of geographic data, the U.S. Census Bureau.

At CHOP, we “geocode” (pinpoint the physical location of a street address) the addresses we have for our patients, as well as locating that address within a Census tract and block. This is important because we can get aggregate statistics on those areas, which help us understand the environment the child is living in.

Your patients or participants might not give you information about the poverty, crime rate, or walkability of their neighborhood. But with their geographic locations you can get aggregate information that can help you make some (broad, sweeping, therefore suspect!) generalizations.

Introduction to the U.S. Census Bureau

The US Census Bureau is bound by the Constitution to do a full (not sampled) census of all people within the US every ten years. This determines the number of seats in the US House of Representatives and are used to draw district boundaries. This is the Decennial Census. There are two additional censuses performed by the Census Bureau that we won’t talk about: an Economic Census done every five years and the Census of Governments done every five years.

In addition to the full population census, the Census Bureau is also responsible for conducting the American Community Survey (ACS) which uses sampling and inferential statistics to make estimates of things like:

  • Education levels
  • Poverty
  • Mean and median income
  • Computer usage
  • Crime
  • and much more!

Note that the ACS also has one and five year versions. Five year ACS data includes estimates for the entire country, while one year versions concentrate on population-dense areas and have smaller sample sizes. This means that if you’re doing analysis on, say, NYC, you can get very up-to-date (but less reliable) 1-year estimates, but if you’re interested in studying Iowa, or getting NYC estimates with a smaller margin of error, you’d be better off with a somewhat less current but broader and more reliable 5 year ACS. That’s what we’ll use in this script – five year ACS estimates.

Census data is collected at and aggregated to various levels:

  • The country as a whole
  • States / territories
  • Counties
  • ZIP Code Tabulation Areas (approximations of ZIP Codes)
  • Urban areas
  • Census Tracts (1-8k people)
  • Census Block Groups
  • Census Blocks (600 - 3k people)
  • and probably more I’ve forgotten about!

The website of the Census Bureau is a veritable treasure trove of data about populations. It can be hard to manage the sheer quantity of data.

FIPS

“FIPS” stands for “Federal Information Processing Standards” but often, when you talk to people, they’ll apply the term to whatever their particular federal data is… so, e.g., instead of “Census tract identifier” they’ll say “the FIPS”. It’s a term that therefore ends up having lots of meanings.

There are FIPS codes for states, counties, tracts, and blocks, and when concatenated, they end up being a single geographic id. For example, the state code for Pennsylvania is 42, the county code for Philadelphia is 101, and the census tract within Philadelphia where the main campus of the Children’s Hospital of Philadelphia stands is 036900 (the last two digits can be thought of as ‘after the decimal point’, so this has a “human” name of Census Tract 369). Further, the block group is 2, and the full block number is 2011, so you might be using a “GEOID” of 421010369002011 (if the block is included), or just 42101036900 (if you have tract level data only).

Access to Census Data

APIs

Plan to work with Census Bureau data over and over again? It’s worth the time to use APIs.

The Census Bureau offers free API credentials (and a Slack channel, and more) at their Developers page. Among their list of API endpoints is a geocoding service – which is how we can translate street addresses to a geospatial point (lat/long). You can also just do a one-off at https://geocoding.geo.census.gov/geocoder/.

tidycensus is an R package that helps you work with specific APIs offered by the Census Bureau. I highly recommend it and will demonstrate its use in upcoming articles!

Web GUI

You can also manually choose data and download it using the American Fact Finder (https://factfinder.census.org). A few asides here:

  • you will probably want to transpose rows and columns
  • you will probably want to leave the optional boxes unchecked.

Caveats

Granularity of Data

Census data is very very specific. If, for example, you’re interested in income data for a given tract, you might find columns that include descriptions like:

  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Total households - Less than $10,000
  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Total households - $10,000 to $14,999
  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Total households - $15,000 to $24,999
  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Total households - $25,000 to $34,999
  • … and so on ..

Or:

  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Families - Less than $10,000
  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Families - $10,000 to $14,999
  • … and so on …

Or:

  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - With Supplemental Security Income - Mean Supplemental Security Income (dollars)
  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - With cash public assistance income - Mean cash public assistance income (dollars)
  • … and so on…

Or:

  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Median earnings for workers (dollars)
  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Median earnings for male full-time, year-round workers (dollars)
  • INCOME AND BENEFITS (IN 2017 INFLATION-ADJUSTED DOLLARS) - Median earnings for female full-time, year-round workers (dollars)

You will likely need to do a bit of honing your question: families only, or all households (say, a single person, or a group home)? Do you want to look at statistics across the board or specify race, sex, or hispanicity? What is considered income, and what benefits? Do you want to include SSI? Measure it separately? What about welfare?

Estimates and MOEs

You’ll also find, for any given measure, a few variables related to it:

  • Estimate – used when a scalar count or value is needed, like median income or number of white women
  • Margin of error – used to indicate the precision of the estimate
  • Percent – used when a percent is needed, like percent of families below the poverty line
  • Percent Margin of Error – used to indicate the precision of the percent estimate

Note that all four columns are generally present although only two make sense for any given measure!

Sparsity

Every area of the US belongs to a census tract, even if it’s an area in which people don’t normally live (like a park or lake or airport). That’s why you might see census tracts with little to no data. Don’t panic if you see that a few tracts have very sparse data – they may be one of these special tracts.

Find Out More!

Arcus Education is creating a “Census” module to help researchers master the skills necessary to find and use geographic data from the U.S. Census Bureau more effectively. Interested? Email Arcus support