Find Community. Share Expertise. Enhance Library Careers.

Summary: October Core e-Forum, “Regarding Library Database Cleanup”

Summary: October Core e-Forum, “Regarding Library Database Cleanup”

Thank you all for an engaging start to the “Regarding Database Cleanup” e-Forum.

We concluded with 134 messages from 70 contributors! However, there were issues with the delay in or non-delivery of messages half-way through the first day, so we plan to continue this discussion at a later date, tentatively early-December 2020. Any emails posted to the list are viewable at the list archives:

The e-Forum participants included librarians from public, academic, and special libraries across the United States as well as Canada, Great Britain, Bermuda, Thailand, and Egypt. Their ILS/LSPs included (in alphabetical order) Aleph, Alma, Auto-Graphics’ Verso, Axiell’s Book-It, BookSystems by Atrium, EOS.web, Evergreen, Horizon, Koha, Polaris, Sierra, Symphony, TLC, Voyager, and WMS. The most frequently mentioned ILS was Symphony.

The most common database errors that our participants mentioned were primarily those identified in bibliographic data: typographical errors, overlaying brief records, handling or removing missing, lost, or in-transit items over a library-selected period of time, correcting issues in fixed fields, locations mismatching, invalid URLs, retrospective conversion of or the addition of genre headings. Authority record related projects included identifying outdated subject headings or unused / orphan authority records or even downloading the LCSH file then using Python to scan the database for 6XX with second indicator 0 to identify any non-valid subject headings.

A few participants made the point that database cleanup projects should focus on those that most severely impact the patrons’ experiences using the database, so they also consider any barriers that patrons may face when facing non-overt errors. One participant wrote that “insensitive subject headings or outdated language in a 5XX field could be just as a big of a barrier as not being able to access material for some patrons.” Another added that having pristine data should not be the ultimate goal; providing access to resources should be the utmost priority.

The database cleanup projects that the participants work on beyond the review of bibliographic records at the time of loading them, were primarily discovered due to routine working in the catalog, through the regular running of reports, or through emails or Google form submissions from their colleagues reporting problems. A few participants mentioned having a formal ticketing system or an informal one that they track in Trello, LibAnswers, or similar software. Some have forms or comment cards for the submission of noted database errors that are filled out and placed inside relevant item(s) for cataloger review; some libraries report using a shared email account that employees across many technical services areas monitor; some have added a “Report a Problem”, “Catalog Watch” or similar type link into their public display to collect feedback. One participant mentioned that they sample their database by checking a random 100 records periodically to determine the scope of problems. Another that they proactively added a chat feature to their LibGuides, and into “some databases to help increase visibility of our reference services” and that this has resulted in more problems being reported.

As for authority control database cleanup projects, of those who mentioned them, many mentioned that they use authority error reports from their ILS/LSP or their vendor.

Finally, two additional problems were mentioned that were noteworthy:

One participant noted that following an ILS migration there were significant changes in the statistics they reported to a national library statistics agency from the year prior to migration to the current year. This led them to discover that materials had been coded improperly and resulted in several large database cleanup projects in order to maintain accuracy in their assessment data.

Another participant noted that OCLC’s merging of bibliographic records that the OCLC deduping software deems duplicates has caused significant impact on their bibliographic records, particularly for non-book items. 

Helpful database cleanup programs or tools that were mentioned in the initial part of the discussion were (in alphabetical order) Alma’s Normalization Rules, MarcEdit, OpenRefine, Python, Sierra Global Update, SirsiDynix Data Control, SQL, Vger Select, and Voyager Global Data Change (GDC).

We hope to continue this discussion in December, with the inclusion of an additional discussion of best practices or tips for database cleanup projects, projects involving authority control, and project management and documentation of database cleanup projects.

We hope you’ll join us then!

Julene Jones, e-Forum moderator

Core e-Forums are two-day, moderated, electronic discussion forums that provide an opportunity for librarians and library professionals to discuss matters of interest on an email discussion list. These discussions are free of charge and available to anyone who wishes to subscribe to the email list.