What Data Should Not be Open?

As data released as Open Data are meant to be discoverable, accessible, and reusable by others, including the general public, some types of data should not be made open. This includes any data considered sensitive in nature, as well as confidential data:

  • medical information
  • financial records
  • personal information (not de-identified)
  • information containing trade secrets
  • research data under embargo

If you work with human subjects, the Data Preparation Guide from the ICPSR (Inter-university Consortium for Political and Social Research) has an entire chapter devoted to Preparing Data for Sharing, including considerations for protecting respondent confidentiality and treating direct and indirect identifiers when preparing datasets for public use.

The U.S. Department of Health & Human Services also provides guidance for de-identifying protected health information (PHI), in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. This includes definitions of existing methods, such as:

  • safe harbor—removing the 18 specified identifiers; also called the heuristic method
  • expert determination—retaining some of the 18 specified identifiers, but some information is scrambled, and reduces the chance that subjects would be identified from data provided

Please also consult the Emory Office of Compliance site about HIPAA & Research