This recent federal agency changes and confusion over access to federally funded research, including the removal of access to data, alteration of existing datasets, and the removal and alteration of documentation, reaffirms the Data Curation Network’s dedication to data curation and preservation. We are grateful for the time and energy of our colleagues across the United States — including data stewards, researchers, and individuals — who are working to collect and steward these different datasets. 

To aid in that endeavor, we created an abbreviated version of the CURATE(D) steps specifically with data rescue efforts in mind. In order to maximize the utility of data that is in need of rescue, it is imperative that the data are curated and documented. Below are some considerations for those capturing research data from existing access points of concern.

  • Check what you’ve downloaded and be sure you can understand the file(s). Is the dataset you’ve downloaded everything? Are any related documents (articles, README files, etc.) present as well? Did a copy of the metadata automatically download with the files, or do you need to download that, too?
    • To aid in understanding the data, now and into the future, create your own curator log. This will serve as the provenance of the dataset. When did you download it — specifically what date, what time, and what time zone? How did you download the data? If there were parts of the data you DIDN’T download, on purpose or accidentally, note that too. 
  • Requesting additional context for the data in this context is a little harder, especially in times of rescue. You do not want to put more burden on those that are feeling unstable right now. Keep notes on what data might be missing, or questions you may have. They may never be answered, but documenting the gaps may be useful for future researchers.
  • Augment the dataset by improving the documentation and metadata to aid in findability. Include information about why this was downloaded and if this was part of a concerted effort to rescue data. As with other curation efforts, document any necessary software, code packages, or additional documentation needed to use the dataset.
  • Transform the file formats if applicable — consider how they may be most usable and useful for researchers. Should you have one zip file? Multiples? Is there an arrangement that may help with understanding?
  • Evaluate where this data should be stored. Is your local repository the best fit? Would a generalist repository be better? Remember: some disciplinary repositories are funded by federal dollars, and should be considered at least partially at risk at this moment. While some generalist repositories are available to store datasets (e.g., OSF and Zenodo) consider whether storing the data privately is a better solution. If a data location and access point can not be easily determined, store the content robustly such that this determination can be made in the near future.
  • Document your data rescue efforts so that others can find and use the data.

We are grateful for our colleagues in IASSIST, including Lynda Kellam, for aggregating related efforts. We remain committed to FAIR data that aligns with the CARE principles. Regardless of funding mandates, or lack thereof, we will continue to advocate for responsible open scholarship and data stewardship.

Thanks to members of the Data Curation Network, including Lynda Kellam, Melinda Kernik, Joel Herndon, Jon Petters, and Shannon Farrell, for their thoughtful feedback on this piece. Original draft by Mikala Narlock.

Similar Posts