Secondary Data Analysis

Some research involving existing data sets and archives may not meet the definition of "human subjects" research requiring IRB review; some secondary data analysis may be exempt from the HHS regulations at 45 CFR 46; and some secondary data analysis may require IRB review.  Whether analysis of secondary data requires IRB review turns in great part on whether the data is "identifiable" -- data may contain "direct" identifiers (such as individual's name, Social Security Number) or "indirect" identifiers (that is, a coding system in which codes (letters, numbers, symbols, or a combination of those) replace direct identifers).  The HHS Office for Human Research Protections (OHRP) has issued the following guidance on secondary analysis of data:

Under the definition of human subject at 45 CFR 46.102(f), obtaining identifiable private information or identifiable specimens for research purposes constitutes human subjects research. Obtaining identifiable private information or identifiable specimens includes, but is not limited to:

  • using, studying, or analyzing for research purposes identifiable private information or identifiable specimens that have been provided to investigators from any source; and
  • using, studying, or analyzing for research purposes identifiable private information or identifiable specimens that were already in the possession of the investigator.

Private information must be individually identifiable (i.e., the identity of the subject is or may readily be ascertained by the investigator or associated with the information) in order for obtaining the information to constitute research involving human subjects. 

Data is considered to be coded if:

  • identifying information (such as name or social security number) that would enable the investigator to readily ascertain the identity of the individual to whom the private information or specimens pertain has been replaced with a number, letter, symbol, or combination thereof (i.e., the code); and
  • a key to decipher the code exists, enabling linkage of the identifying information to the private information or specimens.

In general, OHRP considers private information or specimens to be individually identifiable when they can be linked to specific individuals by the investigator(s) either directly, OR indirectly through coding systems.  OHRP considers private information or specimens NOT to be individually identifiable when they cannot be linked to specific individuals by the investigator(s) either directly or indirectly through coding systems. 

For example, OHRP does not consider research involving only coded private information or specimens to involve human subjects if the following conditions are met:

1. the private information or specimens were not collected specifically for the currently proposed research project through an interaction or intervention with living individuals; AND

2. the investigator(s) cannot readily ascertain the identity of the individual(s) to whom the coded private information or specimens pertain because, for example:

  • a. the investigators and the holder of the key enter into an agreement prohibiting the release of the key to the investigators under any circumstances, until the individuals are deceased (note that the HHS regulations do not require the IRB to review and approve this agreement); or
  • b. there are IRB-approved written policies and operating procedures for a repository or data management center that prohibit the release of the key to the investigators under any circumstances, until the individuals are deceased; or
  • c. there are other legal requirements prohibiting the release of the key to the investigators, until the individuals are deceased.

For more discussion of IRB review of secondary data analysis research, please see the SBS IRB's Guidance on Secondary Data Analysis