This post was authored by intern Kimberly Gisselle Carlo, as part of DCN’s partnership with the National Center for Data Services (NCDS). These internships are funded with Federal funds from the National Library of Medicine (NLM) and the National Institutes of Health (NIH).*
I participated in the NNLM National Center for Data Services (NCDS) internship this summer, with the Data Curation Network (DCN) as a site host, with just a semester from the MLIS program at University at Buffalo under my belt. I was given the opportunity to work and learn from a researcher perspective, as well as to learn what it takes to be a data curator or data librarian.
I entered this internship not expecting to use artificial intelligence (AI) at any point. When I found out that was an option to use with AI to curate data, albeit experimentally, I instantly thought to myself, “no, thanks.” I had a view about AI that most people probably have: AI is unethical, it steals work from others, and we don’t understand precisely where this information is originating from.
I especially had a bad taste in my mouth regarding generative AI, since I mostly had an artist perspective of this emerging technology. Any person can type in a couple of words into a website and come out with an “art piece” and would say, “I created this!” AI platforms, such as Midjourney, misappropriate art from websites and databases that artists share their work on. I even learned recently that AI therapy exists. Likewise, I also know journalists and writers are fretful that “AI technologies can undermine or even threaten journalism…” AI gives quick information during a time when people are used to getting answers rapidly, even if those answers are imprecise or not based on verified sources. Repeatedly, I was shown reasons to be adverse to AI.
Being part of the DCN with data curators, librarians, and researchers, I quickly became informed that AI was a hot topic in the field. I began to reflect on AI a bit more. I learned about new technologies and programs I had no idea even existed before this summer. During the span of 10 short weeks, I was exposed to an immense amount of new information while reevaluating my career interests, including a newfound curiosity in data librarianship. This led me to be more open-minded about how generative AI could assist with my data curation project, which focuses on visualizing data curation salaries from the IASSIST job board.
Briefly, this summer I used ChatGPT to gather more information about job salaries for data curator jobs from the years 2008 to 2024. I was given “messy data” from the IASSIST job board, which was a combination of qualitative and quantitative data that required cleaning. I was able to experimentally refine this dataset with AI, as well as standard curation tools such as OpenRefine and Voyant. With only a small number of institutions including salaries in their job postings, I did not have a lot of salary ranges to compare to the ChatGPT answers. Many of the job postings I reviewed came from private institutions that did not have the salary information listed in the job posting or anywhere online. I prompted ChatGPT, “Where does ChatGPT get its information to answer questions about salary?”, and ChatGPT advised me it gathers information from a combination of sources, including general knowledge (sources like the Bureau of Labor Statistics, industry reports, and salary websites), historical data, aggregate data (from various surveys and reports), and common knowledge (broad trends about various professions and locations). It was advantageous to go to a single website and have the chatbot gather information from a plethora of sources for my specific question, even while knowing that this information may be imperfect, biased, or even inaccurate. I think AI could be a valuable tool for assisting jobseekers by providing them with the information needed to advocate for themselves when negotiating pay, especially when job postings do not include salaries. However, we should keep in mind that our received answers will vary based on the way we ask these questions to the AI tool.
I do think all data can be made valuable when assessed properly and made accessible to all. However, we need to be aware of where this information may be coming from, especially when we can’t source this information; and we should make it apparent when sharing findings from an AI tool, such as ChatGPT. I am looking forward to seeing how AI can be used responsibly and ethically. Generative AI tools could be used by data curators to share their findings, create visualizations (such as with another tool I experimented with this summer, Akkio), and refine messy data. Even so, we still need the minds and empathy of human data curators who can continue to make data usable to all, as well as ethically sourced and responsibility distributed.
To cite this blog post, please use: Carlo, Kimberly Gisselle. (2024) “A Newcomer’s Thoughts on Artificial Intelligence.” Retrieved from the University of Minnesota Digital Conservancy, https://hdl.handle.net/11299/265723.
*This project was partially funded by Federal funds from the National Library of Medicine (NLM), National Institutes of Health (NIH), under cooperative agreement number UG4LM01234 with the University of Massachusetts Chan Medical School, Lamar Soutter Library. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.