This post was authored by Seth Erickson (UCSB), Neggin Keshavarzian (Princeton), Sophia Lafferty-Hess (Duke), Wanda Marsolek (Minnesota), and Jennifer Moore (WashU)
In 2022, the Data Curation Network received a grant from the Institute of Museum and Library Services (IMLS) for the project “Developing Specialized Data Curation Training to Address Needed Expertise in Focused Areas” [RE-252343-OLS-22]. This project relied upon a cohort model that brought together information and data professionals from around the country to develop new curricula for four unique data types: code, simulations, geospatial data, and scientific images. These cohorts worked with a DCN mentor and developed these new curricula, which were tested during a pilot workshop held at Duke in October 2023, where we gathered feedback from the community on the learning experience to iterate and improve.
Within the DCN we hold a shared value of making things openly accessible for all . For this reason we are happy to share that all curricula materials generated as part of this project as well as materials from our core CURATED curriculum are now openly available within the DCN GitHub with archival copies preserved through the University of Minnesota.
We also recognize the importance of engaging in reflective practice to look back on a project to reflect upon what we accomplished, what we learned, and how we might grow in the future. Below we have invited the mentors from each data type to share some of the learning materials for each of the data types as well some of their own lessons learned.
Code
Code curriculum hosted on GitHub
Archived version of the curriculum
The code cohort’s goal was not to turn curators into software developers but to provide foundations for improving code-based datasets. The curriculum we developed outlines broad concepts like computing platforms, software dependencies, and documentation (to name a few), and it emphasizes common programming mistakes that can interfere with computational reproducibility. We hope curators will find these resources useful and empowering, but we’ve also found that there is a fine line between “too much” and “too little” when it comes to technical detail. Finding the right balance will take time and more opportunities to develop the curriculum—we are fortunate to have both!
Simulations
Simulation curriculum hosted on Github
Archived version of curriculum
The simulation cohort Tidy Sim was made up of simulation research experts and novices which actually made for a well rounded group. Those who had more experience helped the others catch up and those with less experience provided space to slow the others down (in a good way) to help make connections and provide context – the Tidy Sim cohort had a built-in peer review system in this way. There were no weak links – our deficits were at times our strengths.
We figured out near the end of our time working together that working meetings were key. For future iterations a recommendation would be to schedule all your meetings out in advance and schedule meetings more often than you think you will need. More meetings can allow for meetings to be canceled if there is nothing substantive to discuss or work through. Meetings are helpful deadlines – let’s face it, accountability is a big deal.
Scientific Images
Scientific Images curriculum hosted on Github
Archived version of curriculum
The scientific images cohort brought together a variety of research and curation experience. A challenge we had from the beginning was figuring out how to scope and focus the data type more. Considerations for curating scientific images can vary depending on the field of research and instrument producing the image. Therefore, our group used the CURATED steps to explore some of the complexity around curating scientific images as well as training on common image curation tools, particularly ImageJ (Fiji). We also developed a participant-focused role-playing exercise on the “Request” step of CURATED using an AI chatbot. You can paste the instructions to a chatbot of your choice and then ask “Dr. Roe Bott” questions about its dataset that you are curating.
Geospatial
Geospatial data curriculum hosted on GitHub
Archived version of curriculum
The geospatial cohort’s wealth of knowledge caused the earth to tilt on its axis, as demonstrated by our mascot, Globy. Members of the cohort were geographically distributed in the east-to-middle US, but we had a hotspot in Florida. Distilling the salient information into a curriculum, and then condensing it into a short workshop was no small task. We came to appreciate what pieces needed more or less time devoted. An example of needing more time being software installation; it should be included as a part of the workshop, rather than assigning it as pre-work. Less time might be focused on topics, although important to using GIS data, may not need the depth we provided in the workshop. An example is that of projected coordinate reference systems. Projected coordinate systems are an important topic, indeed, but the amount of relevant information has to be right sized.
Final Thoughts and Next Steps
The project team first wants to thank the amazing work of the cohort team members (listed below). Without their hard work this project would not have been possible. One of the overall lessons learned from the project is the power of community and what we can accomplish when we work together on a shared task. As one participant of the pilot workshop shared: “For me, this workshop showcased the DCN’s unique position and ability to bring experienced practitioners and novice curators together in a joint learning experience, and to advance the tools, techniques, and principles of data curation.”
We are also happy to see these curricula being extended to provide a new training opportunity in 2025 for the curation community in partnership with NIH. These workshops will pair the curricula of two unique data types, Code and Simulations, and Geospatial Data and Scientific Images into two 2-day hands-on training experiences held in Bethesda, MD. See more about this new workshop series.
The project team has also been presenting at numerous conferences to share out our lessons learned and engage with others doing similar work. We are excited to be at IDCC this February and will be sharing more lessons learned in our paper. We hope to engage with some of you there!
Thanks to everyone who has participated in this project!
Code
Mentor: Seth Erickson
- Greg Janée, Director, Library Research Data Services, University of California at Santa Barbara
- Nick Ruhs, Research Data Management Librarian, Florida State University
- Talya Cooper, Software Curation Specialist, New York University
- Kaypounyers Maye, Scholarly Engagement Librarian for Social Sciences and Data, Tulane University
Simulation data
Mentor: Wanda Marsolek
- L. Wynholds, Research Data Librarian, University of California at Los Angeles
- Heather Shimon, Science & Engineering Librarian, University of Wisconsin-Madison
- Fernando Rios, Research Data Management Specialist, University of Arizona
- Girmaye Misgna, Mapping & Geospatial Data Librarian, University of Pennsylvania
Scientific images
Mentor: Neggin Keshavarzian
- Mariah Kenney, Data Curator and Metadata Librarian, Brain Image Library
- Amy Schuler, Director, Information Services & Library, Cary Institute of Ecosystem Studies
- Sarah Wright, Research Data and Life Sciences Librarian, Cornell University
- Paul Gignac, Associate Professor, Director of Global Graduate Programs, University of Arizona College of Medicine
Geospatial
Mentor: Jennifer Moore
- Leighton L Christiansen, Data Curator, National Transportation Library, US DOT
- Kelly Grove, GIS and Earth Sciences Librarian, Florida State University
- Tim Norris, Data Scientist, University of Miami Libraries
- Melinda Kernik, Spatial Data Analyst and Curator, University of Minnesota
This project was funded by the Institute of Museum and Library Services and led by Sophia Lafferty-Hess at Duke University and in partnership with University of Minnesota (Wanda Marsolek and Mikala Narlock), Princeton University (Neggin Keshavarzian), Washington University in St. Louis (Jennifer Moore), University of California – Santa Barbara (Seth Erickson), and the Association of Research Libraries (Cynthia Hudson Vitale).
Institute of Museum and Library Services
The Institute of Museum and Library Services is the primary source of federal support for the nation’s libraries and museums. We advance, support, and empower America’s museums, libraries, and related organizations through grantmaking, research, and policy development. Our vision is a nation where museums and libraries work together to transform the lives of individuals and communities. To learn more, visit www.imls.gov and follow us on Facebook and Twitter.