Skip to content

Data Sharing

Advances in research often happen when scientists can conduct larger studies with a large number of samples.  However, collecting new samples from research participants can be very time-consuming and expensive because it can take a long time to recruit new participants and collect the samples for a study.  It has become common for scientists to share or exchange their sample collections with other scientists to increase the total number of samples that one might use in a research study.  This sharing of samples often saves money for researchers and for the funding agencies.  Sharing of samples and data also ensures that research participants and their tribes are included in the largest possible number of studies dealing with diverse health issues, and therefore that they can potentially benefit from those studies.

The National Institutes of Health (NIH) is one of the largest funders for genetics research, and all of the funds come from taxpayer dollars.  In an effort to learn as much as possible from the genetics research studies supported with public funds and maximize the public benefit achieved, the NIH developed a new database for scientists to share genetic data from their studies with each other. This database also allows scientists to combine one dataset with other existing datasets to increase the statistical power of the resource to answer genetic questions.  This online database is an example of data sharing and is called the Database of Genotypes and Phenotypes (dbGaP).  Genotypes are the information or data that comes from a person’s DNA.  Phenotypes are physical measurements from a person’s body such as height, weight, blood pressure, and disease status. 

When dbGaP was created, the NIH also developed new guidelines that request that researchers submit genetic data from their studies into dbGaP if the study paid for the genotyping with funds from an NIH grant.  Researchers are asked to provide a data sharing plan stating when and how the data from their study will be released for use by other researchers.  In cases where a researcher does not or cannot share data, an explanation for why data sharing is not possible must be provided. Researchers working on the Healing of the Canoe Project in Washington State, for example, developed a model data sharing plan which requires tribal approval for access to data. More information about tribal data control and options for data sharing is available in another section of this resource guide.

All data that is submitted to dbGaP must be coded or de-identified in a way that nobody will be able to link the sample to a person.  Because the data will contain genotypes and phenotypes, scientists will have to be careful to remove personal identifying information such as names, addresses (including residential, mailing, and email), zip codes, date of birth, social security numbers, and other types of information. Data can only be submitted to dbGaP after the research institution’s IRB has considered the potential risks to individuals, families, groups, and populations associated with the data, and determined that data submission to dbGaP is permissible. The IRB also verifies that submission of the data is consistent with the informed consent provided by study participants.

Researchers who wish to access and use the data in dbGaP must apply through NIH Data Access Committees (DAC).  Researchers must state how they will use the data, name those who will have access to that data on their research team, and how they will ensure the data is used and managed properly.  Before approving requests for dbGaP data, DACs make sure that researchers promise to only use the data in a way that is consistent with the informed consent provided. If a researcher fulfills all requirements, access may be granted to those researchers.

The DAC is in place to review requests for data and oversee on-going data use in order to reduce risks to research participants.  Increasingly, the data that is deposited into dbGaP comes from whole genome sequencing projects in which all of a person’s genetic information is collected, analyzed, and then deposited into dbGaP.  A great deal of security must be taken to ensure only those with authorized access can use whole genome data, that individuals would not be re-identified, and that potentially stigmatizing research would not be carried out. 

Data in dbGaP must be de-identified at the individual level, meaning that all names and other identifying information should not be able to be linked to data collected about a person.  However, if tribes participate in genetic studies in which the researcher plans to deposit the data into dbGaP, then tribes should think carefully about whether they would allow tribal affiliation to be associated with a sample or not.  Tribes should carefully evaluate research proposals of this sort and discuss the risks, benefits, and implications with the researcher before making a decision about whether to allow their data to be deposited into dbGaP.  A risk to having tribal affiliation associated with the sample might be that potentially stigmatizing research may be carried out or that a researcher might use the data for ancestry studies.  On the other hand, if tribal affiliation is not mentioned, researchers might not be able to carry out a study with individuals from one tribe and might pool together all of the samples labeled as “American Indian.”  The researcher then risks analysis that might not be as scientifically valid.  The researcher also would not be able to return results to the tribe if the samples are labeled as “American Indian.” More information on tribal data control and options for data sharing is available in another section of this resource guide.

Discussion Questions:

  1. What does your tribe think about existing federal data sharing policies? 
  2. Would your tribe have any control over how the samples or the data are used?  How would your tribe maintain control?
  3. What data sharing options of those discussed above would work best for your tribe?
  4. What issues would dbGaP raise for your tribe?
  5. Would it be okay for researchers to use your tribal affiliation (i.e. your tribe’s name) in publications, or would you prefer that your tribe remain anonymous (i.e., an American Indian tribe)?  What are the reasons for your decision?

How Do We Decide?

A Guide for American Indian/Alaska Native Communities

The interactive decision guides provide a set of interactive questions to help you reflect on your feelings regarding research. Read More