Skip directly to search Skip directly to A to Z list Skip directly to navigation Skip directly to page options Skip directly to site content

Technical Notes and Instructions for the Environmental Public Health Tracking Network

The Centers for Disease Control and Prevention's (CDC) Environmental Public Health Tracking Network provides users access to nationally consistent health, exposure, and environmental hazard data via the public portal. The Tracking Network allows users to:

  • view data in maps, tables, and charts;
  • search and view metadata; and
  • explore information about their health and their environment.

Data are presented as measures and organized by indicator and content area. For Tracking, an indicator is one or more items, characteristics or other things that will be assessed and that provide information about a population's health status, their environment, and other factors. A content area is a topic within environmental public health. Content areas focus on health, exposure, the environment, or the intersection of health and the environment. More information about indicators, measures, and data can be found in the indicator templates and the metadata.

The purpose of the technical notes page is to offer guidance and insights for efficient use of the Environmental Public Health Tracking Network to maximize interpretation of its indicators and measures.

Disclaimer

By using these data, you signify your agreement to comply with the following requirements:

  1. Use the data for statistical reporting and analysis only.
  2. Do not attempt to learn the identity of any person included in the data and do not combine these data with other data for the purpose of matching records to identify individuals.
  3. Do not disclose or make use of the identity of any person or establishment discovered inadvertently and report the discovery to: trackingsupport@cdc.gov.
  4. Do not imply or state, either in written or oral form, that interpretations based on the data are those of the original data sources and CDC unless the data user and data source are formally collaborating.
  5. Acknowledge, in all reports or presentations based on these data, the original source of the data and CDC.
  6. Suggested citation: Centers for Disease Control and Prevention. National Environmental Public Health Tracking Network. (n.d.) Web. Accessed: 04/18/2018. www.cdc.gov/ephtracking.

Note: The Tracking Network allows you to view different query results at the same time. This does not mean or suggest that the data are related and results should not be interpreted in this way.

 Top of Page

Important Definitions

Age-adjusted Rate: a method of accounting for the effects of age differences on health event rates across geographies. Adjusted rates are statistically altered to remove the effect of a variable, in this case age, to allow for unbiased comparison across places with different age distributions. Age-adjusted rates presented on the Tracking Network are calculated by the direct method using 2000 US standard population. An example from the tracking portal would be the age adjusted rate of hospitalizations for heat stress per 100,000 population.

Confidence Interval: Confidence intervals provide an index of uncertainty for a measure. They are a special form of estimation which provide a range of acceptable values (instead of a single estimate) along with a likelihood that the real value will be in that interval, after a repeated number of trials. For example a 95% confidence interval may or may not contain a true value, but we can be 95% confident that it does. Estimating with greater confidence might mean a wider range of possible values is accepted, so there is a trade-off between confidence and precision. Confidence intervals that contain zero or one, typical values indicating “null” association, suggest that results are likely not significant and could have been observed by chance. Confidence intervals can be found on the portal for various indicators and measures, including the estimated number of deaths avoided from all causes with reducing PM2.5 levels.

Crude Rate: the number of cases, events, or deaths occurring in a population over a period of time, usually expressed as the number of cases per 100,000 or 1 million population. Crude rates do not make any special adjustments to remove the effect of a variable, such as age distributions. An example from the Tracking portal is the crude rate of hospitalizations for heat stress per 100,000 population.

Incidence Rate: the number of new cases of disease over a period of time divided by the population at risk. An example is the number of new bladder cancer cases per 100,000 persons.

Indicator: For Tracking, an indicator is one or more items, characteristics or other things that will be assessed and that provide information about a population's health status, their environment, and other factors. The goal of using indicators is to allow users to monitor trends, compare situations, and better understand the link between environment and health. This relationship is assessed through direct and indirect measures (e.g., levels of a pollutant in the environment as a measure of possible exposure) that describe health or a factor associated with health (i.e., environmental hazard, age) in a specified population. A content area may have more than one indicator.

Measure: on the Tracking Network, a measure is a summary characteristic or statistic, such as a sum, percentage, or rate, ratio, or proportion. There may be several measures of a specific indicator which when considered in conjunction, fully describe the indicator.

Person-time: a way of expressing a rate when different individuals are observed for different lengths of time. The denominator consists of the sum of the units of time that each individual was at risk and was observed, and the numerator is the health outcome events. Person-time is often expressed in terms of person days, months, or years. An example is air quality, person days with PM2.5 over the National Ambient Air Quality Standard.

Prevalence Rate: the number of existing cases of disease at a point in time divided by the total population. An example is the number of existing cases of a birth defect per 10,000 live births.

Proportion: a type of ratio representing the number of events or cases that meet a set of criteria divided by the maximum number of events or cases that could meet those criteria. In this case, the numerator is included in the denominator. Proportions are usually expressed as percentages. An example is the number of low birth weight births among all term singleton births.

Ratio: the number of events or cases that meet a set of criteria divided by the number of events or cases that meet a different set of criteria. Ratios are used to compare the occurrence of a variable in two different groups. An example is the ratio of males to females among term singleton births.

 Top of Page

Data Differences

Data presented on CDC's Tracking Network may differ from data that are presented on state tracking networks, state health department web sites, and other source sites for the same measures. The differences may occur for many reasons, such as:

  • methods used at the state level for generating population estimates;
  • processes for updating data; or
  • definitions used for measures for environmental public health tracking purposes.

Consult Tracking's indicator descriptions and metadata pages for information on specific data differences.

 Top of Page

Methods for calculating legend class breaks

Default Setting: By default, data used for CDC’s Tracking Network maps are classified into 5 class breaks using the quantile method. Users can change the number of class breaks (3 to 7) and can change the classification method (quantile, equal interval, or natural break). Class breaks are calculated using all the data in a user’s query. This means that the same legend will be used for all the years or advanced options queried. When a query is changed (e.g. dropping or adding a year), the class breaks are re-calculated using the default setting of 5 class breaks divided into quantiles. There is no one correct way to assign class breaks to a data set, and different methods will produce different map patterns, especially if data are skewed or include extreme outliers. The way in which data are presented can influence how trends are interpreted.

Quantiles: This method is a variation of percentiles and assigns the same frequency of data points to each class break (or group) based on their ordered ranking. For example, four quantiles divide the data into four equal bins known as quartiles, and classes are usually centered on the median (a robust indicator of central tendency). The range of the quantile class breaks can be unequal as long as the same number of data points is allotted to each group. Quantile classification is well suited for normally distributed (bell shaped) data. There are different ways to handle data that have identical values. For the Tracking portal, all records with identical values are placed into the same class break group, and thresholds of subsequent classes may have to be adjusted to maintain the same number of data points in each break group. Increasing the number of class breaks can minimize the distortion caused by adjusting for tied data values. Quantile classifications are commonly used in public health spatial visualizations.

Equal Interval: This method sets the data value range into equal segments for predictable and equal class ranges. Unlike quantile class breaks, the number of data points in each class can vary (but the interval range from one class break to another remains the same). Equal interval class breaks may not be the best option when the data are skewed (has several data points that are very different from the rest). Equal intervals divides the class breaks equally, and some break groups may not contain any values if the data are not normally distributed. This would reduce visualization effectiveness by eliminating the use of one fill color in the resulting maps.

Natural Breaks: Natural breaks, also known as “Jenks,” are based on optimization methods proposed by George Jenks in the 1970s. This method searches for natural gaps or “valleys” in the data based on the original break groups produced from the quantile analysis (described above). Class breaks are placed in these gaps in a way that minimizes variation within each class group and allows for some variation between different classes. With this approach, enumeration units that share a color are statistically more similar to each other than to units in other color classes. CDC’s Tracking Network uses a standard algorithm to determine natural breaks for each query. The algorithm uses an iterative process, meaning it is repeated based on statistical methods, until it determines the best combination of breaks after numerous trials. The process is repeated until the sum of the within class deviations reaches a minimal value or until all break combinations are examined and the combination with the lowest squared deviation from the class mean is selected. The algorithm used on the Tracking Network does not incorporate a goodness of fit summary statistic.

 Top of Page

No events and missing data

States or counties where data were collected, but there are no health outcome cases or no measured occurrences of an environmental hazard, are labeled as "no events."

States or counties with "missing data" either did not collect data or did not report data to CDC. For example, some counties and states do not have air monitors, and some community water systems do not sample or test for each hazard every year or every reporting period. Missing data pertains to other content areas as well. Additional data may be available through the original data source.

 Top of Page

Privacy and Confidentiality

For the Tracking Program, data privacy means that health data will not be used for anything other than the specific public health reason for which it was shared. Confidentiality means making sure that health information is only seen by people who are authorized to have access to it.

No personal identifiers are included in the data tables for CDC's Tracking Network. For rare health outcomes, the number of cases for a selected place, time, and group of people is often small, particularly in sparsely populated areas. In these situations, data are aggregated across place, time, and people in order to balance, as much as possible, utility and the protection of confidentiality. For example, a measure may be presented as an average over a 5-year period instead of an annual number, or a measure might only be viewed by race or age separately but not at the same time.

When small cell counts exist, they are suppressed, meaning its value is not shown. Non-zero counts that are less than six are suppressed for counties with a total population that is less than 100,000 persons. Some datasets have special suppression rules. For cancer data, all non-zero counts that are less than 16 are suppressed. This is done for confidentiality purposes and to also account for stability.

To prevent users from being able to learn the identity of any person included in the data, some selected non-zero counts that are greater than 6 may also be suppressed. For example, if for a given county with a total population that is less than 100,000 persons, there are 7 asthma hospitalizations among males and 3 among females, then the 3 would be suppressed. The 7 hospitalizations among males would also be suppressed, so that the county total (10) and the male total (7) could not be used to calculate the female total (3).

 Top of Page

Stability

Rates, proportions, and percentages are checked for their stability so that trends over time and between geographic areas or persons can be evaluated with reasonable confidence. Instability can arise from small numerators (number of cases or events) or small denominators (populations or subpopulations). Any rate or measure with a relative standard error (RSE) greater than or equal to 30% is flagged as unstable or, in the case of cancer data, suppressed. Statistical stability and confidentiality go hand-in-hand. The numerator required to generate an RSE of less than 30% is well above the cutoff level of 6. Suppression of any rate or measure due to instability also protects confidentiality.

 Top of Page

Smoothing

One way to stabilize data is to generate smoothed rates, proportions, and percentages. Smoothed measures are geographically based averages that use algorithms to incorporate information from neighboring areas (typically areas from within the same state) to stabilize results from sparsely populated areas. This reduces the variability in the data, allowing patterns to emerge, but it increases the bias in the estimates for each small area. Smoothed rates or measures for any single county should not be interpreted as the result for that county; instead, such rates or measures are used to identify patterns or trends across a state or group of counties.

 Top of Page

Transition from ICD-9-CM to ICD-10-CM

The International Classification of Diseases is a system for classifying diagnoses and reason for health care visits. ICD codes are periodically updated by the National Center for Health Statistics using a standardized clinical revision method. This may change how certain diseases and health care visits are defined and recorded by doctors and other health care professionals. When codes are updated, the data collected before a coding change may not be directly comparable to data collected after a coding change, and interpretation becomes complex.

On October 1, 2015 in the United States, the International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM) replaced the ninth revision (ICD-9-CM) for coding of medical terminology and disease classification. As a direct result of this change, there are nearly five times as many diagnosis codes in ICD-10-CM than in ICD-9-CM. This coding change affects information classifications for hospital discharge, emergency department, and outpatient records for administrative and financial transactions in all healthcare settings. In 2015, data were coded as ICD-9-CM from January to September and as ICD-10-CM from October to December. Differences in counts and rates in years prior to 2015 (ICD-9-CM) compared with 2015 (ICD-9-CM and ICD-10-CM) and subsequent years (ICD-10-CM) could be a result of this coding change and not an actual difference in the number of events.

 Top of Page

References

  • Anderson RN, Miniño AM, Hoyert DL, Rosenberg HM. Comparability of cause of death between ICD–9 and ICD–10: Preliminary estimates. National vital statistics reports; vol 49 no. 2.Hyattsville, Maryland: National Center for Health Statistics. 2001.
  • Brewer, C. A. (2006). "Basic mapping principles for visualizing cancer data using Geographic Information Systems (GIS)." Am J Prev Med 30(2 Suppl): S25-36.
  • Elixhauser A, Heslin KC, Owens PL. Healthcare Cost and Utilization Project (HCUP) Recommendations for Reporting Trends Using ICD-9-CM and ICD-10-CM/PCS Data. Revised July 5, 2017. U.S. Agency for Healthcare Research and Quality. Available: https://www.hcup-us.ahrq.gov/datainnovations/HCUP_RecomForReportingTrends_070517.pdf
  • Gibson T, Casto A, Young J, Karnell L, Coenen N. Impact of ICD-10-CM/PCS on Research Using Administrative Databases. HCUP Methods Series Report # 2016-02 ONLINE. July 25, 2016. U.S. Agency for Healthcare Research and Quality. Available: http://www.hcup-us.ahrq.gov/reports/methods/methods.jsp.
  • Injury Surveillance Workgroup 9. The Transition From ICD-9-CM to ICD-10-CM: Guidance for Analysis and Reporting of Injuries by Mechanism and Intent December 2016. National Center for Injury Prevention and Control, Centers for Disease Control and Prevention. Available: http://c.ymcdn.com/sites/www.safestates.org/resource/resmgr/isw9/ISW9_FINAL_Report.pdf
  • Moncrieff S., & Gulland E. (2015). Dynamic Styling for Thematic Mapping. In Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings (Vol. 15, No. 1, p. 47).
  • National Center for Health Statistics. Hyattsville: U.S. Department of Health and Human Services. ICD-10-CM Official Guidelines for Coding and Reporting FY 2017. Available from: ICD-10-CM Official Guidelines for Coding and Reporting, FY 2017
  • National Center for Health Statistics. Hyattsville: U.S. Department of Health and Human Services. International Classification of Diseases, Tenth Revision, Clinical Modification (ICD-10-CM). https://www.cdc.gov/nchs/icd/icd10cm.htm
  • National Center for Health Statistics, BC Duggar and W.F Lewis: Comparability of diagnostic data coded by the 8th and 9th revisions of the International Classification of Diseases. Vital/ and Health Statistics. Series 2, No. 104. DHHS Pub. No. (PHS) 87-1378. Public Health Service, Washington. U.S. Government Printing Office July 1987.
  • Samarasundera E., Walsh T., Cheng T., Koenig A., Jattansingh K., Dawe A., & Soljak M. (2012). Methods and tools for geographical mapping and analysis in primary health care. Primary health care research & development, 13(1), 10-21.
  • Shyy T., Azeezullah I., Azeezullah I., Stimson R., & Murray A. (2014). Classification for visualizing data: integrating multiple attributes and space for choropleth display. Handbook of research methods and applications in spatially integrated social science, 265-286.
 Top of Page
TOP