Remember 2019? This was our first year of piloting our collaborative data curation service in the Data Curation Network. Our 10 partner institutions submitted 74 datasets from their overall deposits that year (see below) to be matched with a data curator with domain and software expertise. And 95% of these datasets were successfully matched to one of our DCN curators for expert review for quality and FAIRness. At our busiest, the DCN handled eleven datasets in one month (September) while the rest of the year we averaged 6 datasets per month (see below).
We are very proud of how smoothly the network ran in its first pilot year, and how many researchers were impacted (249!), but we wanted to test our capacity with more datasets. So rather than each partner institution choosing a subset of datasets to send to the network for curation, we tried something different in January 2020, that we called “January Jam”.
For January 2020 – typically a busy month for dataset submissions due to the winter break – we asked each of our ten partners to submit every dataset they received to the DCN for curation matchmaking. Since Dryad has a much higher volume than the rest of the repositories, they choose to submit only 1 dataset per work day. Partners could still decide to curate the dataset locally (e.g., a repeat submitter, a tight deadline, etc.), but this approach allowed us to get a better picture of all the overall demand and curation effort happening across the network.
This experiment went really well! In Jan 2020, 44 datasets passed through the DCN – more than half of what we saw in 2019! While 8 of these datasets were curated locally, the other 36 (~82%) were successfully matched to a DCN curator at another institution.
Datasets submitted during the January Jam event show a more representative sample of the domains and data types at each of our DCN institutions (see figures below).
DCN curators typically commit 5% FTE time to the DCN project. In January our curators logged 74.9 curation hours (43% of their commitment). This would seem to indicate that we haven’t reached our full capacity yet, however, our curation capacity for any discipline, data type or file format will probably not match up perfectly to the datasets we receive in any given month. We also must consider the availability of our curators (e.g., vacation, existing workload) to complete a curation assignment by the deadline. This makes calculating our maximum operating capacity very tricky!
Things went well in January – tracking the full picture of data curated across the network was a valuable addition to our implementation pilot. Therefore, we decided to continue this experiment into February, and see how it goes from there!
See more details about individual datasets on our website!