When preparing to publish your research data, there are many things to consider:
How do I select which data to publish?
Selecting data to publish depends on:
- the overall objectives of your research project
- how you have presented or published findings from your research with that data
- whether your data have any special considerations, such as participant identities to protect, or whether you have used data from a commercial source that limits or prohibits dissemination of the data
Which file formats should I publish?
When publishing your data, non-proprietary file formats are the best option. They can be opened in different software applications. This allows others to easily re-use the data. You can learn more about recommended file formats under What and How to Preserve on Emory's Research Data site.
What documentation should I publish with my data to make them meaningful to others in the future?
The more you document your data during the data collection and analysis phases, the easier it will be for future users to understand. Some types of data have established metadata standards to follow. If your discipline does not, we recommend considering a “readme” file to provide context about your data. Learn more about metadata and other descriptive information.
Where should I publish my data? Is that location stable and likely to endure?
Established data repositories are the most reliable places to publish your data. Disciplinary repositories are often the best option. For example, consider:
- GenBank for genetic sequencing data
- Dryad for general bioscience data
- ICPSR for quantitative and statistical data
For help in selecting a repository for your data, see our Research Data site.
Not all data are suitable for deposit in a repository, but still need to be kept for a certain length of time per government regulations or sponsor requirements. In most cases, responsibility for stewardship of the data rests with the principal investigator. Data retention is covered in more detail on the Research Data site.
Should I worry about publishing data from human subjects?
Take care before sharing certain kinds of data. Personal identifiers—both direct and indirect—should be removed from datasets. Some datasets should not be published for public access, but rather released with provisions for restricted access. There are some repository options that allow for sharing data in restricted-use versions. You may not be ready to share your data until you (and your collaborators) have had a chance to publish initial findings or secure patents. In these cases, you may choose to embargo the release of your published data until a future date.