This post was authored by Jen Jordan, Repository Services Analyst at Duke University (Left), and Alex Provo, Research Curation Librarian at New York University (Right).
Intro
Jen: As one of the newest DCN members, I had the opportunity to gather with a small group of other curators back in October for the first in-person workshop since the pandemic began. It was a wonderful introduction to the group, and I was thus looking forward to the larger All Hands meeting in June, where I was scheduled to attend with several colleagues from Duke Libraries. Alas, Newark Airport and the weather had other plans—my flight was canceled as I was grabbing my keys to walk out the door and drive to the airport.
Alex: Like Jen, I am a new DCN member: having recently joined NYU’s Data Services department after working in our technical services unit, June’s activities were my first official introduction to the group. I was able to attend the workshop earlier in the week, which was an excellent hands-on warmup to the All Hands Meeting.
If COVID delivered anything positive, it made us all better at improvising ways to stay connected, both at work and in our personal lives. So it likely comes as no surprise that those in charge of the meeting (shout out to the AHM Planning Committee) quickly sprang into action to hybridize what was intended to be a fully in-person gathering. Although it would have been preferable to see everyone in person, there were several virtual attendees who weren’t able to attend in person.
Day 1
What follows are a few highlights from day one.
It would be remiss of me not to mention the Chat GPT-generated Mad Libs icebreaker! Our breakout group committed to a theme, which should be easy to identify in the following snippet:
With great determination, the (12) dog embarked on a quest to (13) hide the lost data. It was a long and (14) dreamy journey, but finally, they (15) ate the missing information and restored order. From that day forward, the data management team became legends, admired for their (4) pompous skills and ability to (3b) hunt any challenge that came their way.
If only we had to hunt our challenges! The opposite feels more like reality, though we are quite resourceful at tracking down solutions. Ok, onto the actual meeting…
The first session of the day was the SIG Palooza, where we heard updates from each of the DCN’s special interest and working groups, followed by a brainstorm of other topics potentially worthy of their own group. These groups wrestle with weighty, ongoing issues in our work— from racial equity in data curation activities and policies to keeping pace with the exponential increase in size of many types of data deposits. Each topic could occupy its own blog post, so for now I’ll just express my gratitude for their work and look forward to learning and sharing more.
Next up was the workflow diagramming session, which was meant to be a whiteboarding activity led by my colleague Joel Herndon. We were a bit uncertain about how a hybrid version of this would work, but I thoroughly enjoyed what folks brought to the Jamboards, which were a blend of on-the-fly creations and smart reuse of existing slides/documentation. It was interesting to see the similarities between institutions, despite differences in repository platforms, tools, and levels of curation. The standards established by the DCN in its CURATE(D) checklist appeared throughout. As a small aside, I have most enjoyed learning about the ‘R’ (Request) step. Each step of the process is important, but the R can be the most critical, as a response is often required to move a dataset forward to publication. I suspect most of us have challenging examples related to this particular step, so I was amused to see clever representations of this step in various workflow diagrams.
There were more similarities during the institutional report outs—most of all among academic institutions. It was heartening to see that so many have received funding for new staff positions that include the word “data” in their titles. In terms of challenges, a recurring theme was managing researcher expectations for what kinds of data they can deposit (e.g., IR’s generally can’t accommodate sensitive data), and many have felt the effects of the new NIH policy. By March of this year, our curators at Duke had conducted more consultations with researchers in the span of a month than they had in the entire year prior. I greatly enjoyed checking out NYU’s new UltraViolet repository, and although I had heard some of their numbers before, I am continually amazed at the level the Michael J. Fox Foundation is operating at in terms of big data. Half a petabyte…no big deal.
The Data Accessibility session that closed out the day was possibly the most impactful and humbling for me. Rachel Woodbrook led this session, and about half of it involved time spent using our computer’s screen reader to independently explore a spreadsheet populated with sample data. I confess, although I was not prepared for how disorienting and overwhelming that experience was going to be, I was thankful for the introduction. What I realized is that I was sorely deficient in my understanding of accessibility issues as a whole, so I’ve been spending some time since then trying to learn more. If I could recommend a timely podcast, Endless Thread did a recent episode with moderators from the Reddit community r/Blind, where they discussed how Reddit’s recent changes to API access were going to make it difficult (if not impossible) for blind moderators who rely on third party apps to moderate, because of how poor Reddit’s own accessibility tools are. Whether or not readers here are Reddit users, the moderators interviewed demonstrated how they interact with screen readers, which I found interesting and useful. Circling back to the session, I recommend folks who haven’t investigated how screen readers work experiment with a few different common scenarios (email, webpage) before diving into a tabular dataset. I suspect the differences will be striking to most people who don’t require screen readers to navigate digital information.
That was it for the first day! I missed out on the social aspects of the gathering, but derived a lot of value from what I was able to participate in.
Day 2
By day two, we were ice breaker-ed out (perhaps an indication that our hybrid group had quickly developed a sense of community and rapport). After breakfast, we embarked on a day full of group discussions and hands-on activities in both small and large groups.
To start off, facilitators rotated among three groups for co-learning sessions themed around automating curation activities, documentation, and slow curation. The format was excellent for getting to know a smaller cluster of DCN colleagues and in the sense that attendees didn’t have to choose just one topic of discussion. In fact, each topic flowed and connected with the others. In my automation session, we talked about automated metadata extraction and using metadata APIs to create README files; later on, in my documentation session, we had a great discussion on the distinctions between metadata and README files. I especially appreciated the chance to meditate on what “Slow Curation” might look like. Jumping off of the concepts of Slow Librarianship and Slow Archives, we talked about the centrality of relationships, different kinds of power, and how to respect varied capacities and temporalities. One insight that particularly resonated with me was that relationships are perhaps more built into our work than in other niches of the field, in that the curation process inherently involves a lot of back and forth and communication directly with the researcher.
Following a break, attendees had two options for hands-on work: helping curate a dataset or a DCN primer edit-a-relay. As a new curator, I opted to work with the dataset to get more practice. It was invaluable to hear from both my small group members, who had science expertise I didn’t, and the other groups, since everyone uncovered different details of the submitted files. We had a thorough discussion about spectral data and associated software.
Our next activity, led by Heidi Imker and Sandi Caldrone, was a brainstorming session on professional development. We sent a flurry of post-its about curation actions and the importance of curation to the collaborative Jamboard (image below), which will surely be a helpful bank of language curators can use to craft everything from job descriptions to self-evaluations.
After lunch, we broke into small groups again for another round of co-learning sessions. Facilitators rotated among our groups to guide us in looking at a potential DCN Ethics Statement spearheaded by the Racial Justice Interest Group, walking through learning modules and providing feedback, discussing the future of DCN Primers, and brainstorming research topics for ourselves and the wider network. Again, I was glad not to have to choose between these topics, since each showed a different way to get involved with the work of DCN. As a new member, I was especially glad to see how foregrounded ethics were throughout the day.
To wrap up, we all reconvened on Zoom and in the main meeting room. Shawna gave an update on behalf of the Realities of Academic Data Sharing (RADS) team. Again, as a new curator, it was fascinating and extremely helpful to get a big picture sense of the labor, cost, and realities of where and how data is being shared. As a former metadata librarian, I was of course especially interested in the team’s analysis of metadata quality using the FAIR principles as dimensions. Finally, we heard from the DCN Advisory Board, before some folks went on a tour of Princeton’s amazing Makerspace.
Conclusion/wrap up
Jen: It has been an unusually busy July for me, and thus my contributions to this blog were somewhat delayed. However, this delay has provided me the opportunity to share how excited I am about the recent announcement that Duke will be hosting next year’s AHM! As thankful as I was to still be able to participate in the meeting this year, I hope to see you all in person next spring.