Digital Commonwealth Program
Metadata Resources
BPL metadata staff may be able to help partner institutions plan and execute a metadata strategy for both ingested and harvested items. If your materials are not in a repository-ready format, or if your institution has not yet created records for your items, staff may be able to offer options for metadata editing, crosswalking, and/or creation. Note that, while staff cannot help with creation of metadata records held in harvested collections, it can offer guidance and advice.
Metadata records for objects in BPL’s digital repository system conform to the Metadata Object Description Schema (MODS) created by the Library of Congress. MODS provides for complex item description and supports the use of a wide variety of controlled vocabularies and content standards.
BPL has established minimum requirements for metadata records but strongly encourages partner institutions to create descriptions that are as rich and complete as possible. This will facilitate the best search and browse experience for the user.
All metadata records in the digital repository system are accessible in the public search interface. Additionally, metadata records will be harvested into the Digital Public Library of America (DPLA). Please note that all metadata records held in the repository and harvested into DPLA will be made available under a Creative Commons CC0 license.
Because metadata records for the items will also subsequently become available in DPLA, institutions should be cognizant of the larger shared metadata environment and should tailor descriptions accordingly (see Best Practices for Shareable Metadata from the Digital Library Federation).
Metadata Profile
A summary of the requirements for MODS records in the digital repository system — including usage guidelines for elements, sub-elements, and attributes, as well as recommendations for when to apply content standards and controlled vocabularies — is available upon request.
Metadata Template
A spreadsheet template that can be used to create metadata records conforming to the Digital Commonwealth metadata guidelines and used for batch upload of records and objects into the digital repository system is available upon request.
Controlled Vocabularies
Use of controlled data values facilitates browsing, enhances searching, ensures data consistency, and facilitates record sharing. The use of controlled vocabularies for subject headings, personal and corporate names, creator roles, format/genre terms, geographic entities, and languages is strongly recommended. Vocabularies used for a project should be chosen to best complement its unique characteristics. Recommended controlled vocabularies include, but are not limited to:
Subject/Format Headings
- Library of Congress Thesaurus for Graphic Materials
- Library of Congress Subject Headings (LCSH)
- Getty Art & Architecture Thesaurus
- RBMS Controlled Vocabulary for Rare Materials Cataloging
- FAST (Faceted Application of Subject Terminology)
- Homosaurus (an international LGBTQ+ linked data vocabulary)
Authorized Name Headings/Creator Roles
- Library of Congress Name Authority File (NAF)
- Getty Union List of Artist Names
- MARC Code List for Relators
Geographic Headings
Languages
Cataloging Rules
The use of formal cataloging rules (also known as content standards) is strongly recommended to provide specific guidance on the choice and format of data where applicable. The content standard chosen for each collection or project should be appropriate for the institution and the type of material being described. Some commonly used standards include:
- Descriptive Cataloging of Rare Materials — the DCRM suite offers standards for use with artwork, books, early and rare manuscripts, music, maps, serials, and more. Metadata in the digital repository system frequently uses:
- DCRM(G): Descriptive Cataloging of Rare Materials (Graphics) — this is the 2013 update to Graphic Materials: Rules for Describing Original Items and Historical Collections
- DCRM(MSS): Descriptive Cataloging of Rare Materials (Manuscripts)
- DCRM(C): Descriptive Cataloging of Rare Materials (Cartographic)
- AMREMM: Descriptive Cataloging of Ancient, Medieval, Renaissance, and Early Modern Manuscripts
- Resource Description & Access (RDA)
- Describing Archives: A Content Standard (DACS)
- Cataloging Cultural Objects (CCO)
BPL’s metadata staff manages the descriptions that accompany digital items into the repository system. Those descriptions might be created from information on the items themselves, or they might be compiled from information supplied by whomever has custody of the materials. In the course of this work, the team uses a number of applications to facilitate data manipulation and remediation. These tools include, but are not limited to:
- Excel
- Google Sheets
- OpenRefine
- MarcEdit
- PowerShell
- VBA (Visual Basic for Applications)
- Python
As use of Artificial Intelligence (AI) tools becomes more widespread, staff are exploring ways to safely incorporate their use into workflows. While generative AI is an emerging technology, forms of AI have been in use in libraries for some time now. For example, optical character recognition (OCR) technology is used routinely with digitized print materials. On-the-fly transcription to caption meetings or audio/visual materials is also widely used.
While the goal is to find new ways AI could be used to streamline certain metadata tasks, we will build on established best practices and maintain human oversight throughout. To start, we are prioritizing OCR enrichment and metadata enhancement, rather than content generation. Some tools we have been experimenting with are:
- OpenAI ChatGPT
- Google Gemini
- Claude
- Perplexity
If we do begin to use AI for content generation, we will be transparent in how and when it is used.
Date Format Requirements
BPL’s digital repository system requires that dates be formatted using the Extended Date/Time Format (EDTF) specification. Standardization of date entry will allow the system to parse the information into a human-readable display date. EDTF requires that dates be recorded as follows:
Year: YYYY (e.g., 1872)
Year and month: YYYY-MM (e.g., 1872-11)
Complete date: YYYY-MM-DD (e.g., 1872-11-09)
To express a date range, record the start and end dates separately. For example, an item created on November 9 and 10, 1872 would be recorded as:
Date start = 1872-11-09
Date end = 1872-11-10
If a date or date range requires a qualifier (e.g., “circa 1872”) or can only be inferred, the numeric dates would be recorded and the appropriate qualifier from the list below would be used.
Qualifiers:
approximate: Used to identify dates that have been approximated and may not be exact, such as circa dates (e.g., “ca. 1872”).
questionable: Used to identify questionable dates (e.g., “1872?”).
inferred: Used to identify dates that have not been transcribed directly from the resource, but have been inferred from another source (e.g., “[1872]”).
Note that the date used to describe the digital surrogate should be the date on which the item was originally created or issued, not the date on which it was digitally recreated.
Non-numeric dates formats: Sometimes physical items will be notated with dates such as “Early 1850,” “19th century,” Summer 1907” (or similar). This style of dating will not fit into the w3cdtf encoding model. In such instances, the textual dates should be replaced by numeric date ranges (using “start” and “end” dates as described above).
Though by no means exhaustive, the following list provides examples of how one might approach date reconfiguration in certain instances.
Centuries:
For centuries, use xx00 – xx99. For example, 19th century = 1800 – 1899 (approximate), 20th century = 1900 – 1999 (approximate)
For early centuries, use xx00 – xx39. For example, early 19th century = 1800 – 1839 (approximate), early 20th century = 1900 – 1939 (approximate)
For mid centuries, use xx30 – xx69. For example, mid 19th century = 1830 – 1869 (approximate), mid 20th century = 1930 – 1969 (approximate)
For late centuries, use xx60 – xx99. For example, late 19th century = 1860 – 1899 (approximate), late 20th century = 1960 – 1999 (approximate)
Decades:
For decades, use xxx0 – xxx9. For example, 1970s = 1970 – 1979 (approximate), 1850s = 1850 – 1859 (approximate)
For early decades, use xxx0 – xxx3. For example, early 1970s = 1970 – 1973 (approximate), early 1850s = 1850 – 1853 (approximate)
For mid decades, use xxx4 – xxx6. For example, mid 1970s = 1974 – 1976 (approximate), mid 1850s = 1854 – 1856 (approximate)
For late decades, use xxx7 – xxx9. For example, late 1970s = 1977 – 1979 (approximate), late 1850s = 1857 – 1859 (approximate)
Parts of years (early, mid, late):
Early part of year = Jan, Feb, Mar, Apr. For example, early 1970 = 1970-01 to 1970-04 (approximate), early 1850 = 1850-01 to 1850-04 (approximate)
Mid part of year = May, June, July, Aug. For example, mid 1970 = 1970-05 to 1970-08 (approximate), mid 1850 = 1850-05 to 1850-08 (approximate)
Late part of year = Sep, Oct, Nov, Dec. For example, late 1970 = 1970-09 to 1970-12 (approximate), late 1850 = 1850-09 to 1850-12 (approximate)
Seasons:
Winter = Dec, Jan, Feb. For example, winter 1970 = 1969-12 to 1970-02 (approximate), winter 1855 = 1854-12 to 1855-02 (approximate)
Spring = Mar, Apr, May. For example, Spring 1970 = 1970-03 to 1970-05 (approximate), spring 1855 = 1855-03 to 1855-05 (approximate)
Summer = Jun, Jul, Aug. For example, Summer 1970 = 1970-06 to 1970-08 (approximate), summer 1855 = 1855-06 to 1855-08 (approximate)
Fall = Sep, Oct, Nov. For example, Fall 1970 = 1970-09 to 1970-11 (approximate), fall 1855 = 1855-09 to 1855-11 (approximate)
Geocoding
BPL’s digital repository system is capable of displaying locations as points on a map. If you are interested in representing a country, state, town, or physical landmark such as a lake or mountain, we recommend you use the Getty Thesaurus of Geographic Names (TGN), a hierarchical vocabulary that uses a code to represent the location with its full hierarchy (city->county->state->country->world, etc.). If you include the location’s TGN code within your metadata (for example, Boston, MA = 7013445, the repository will place a pointer on a map in the middle of the location.
If you are interested in a more specific location, such as a building, you can try Geonames, which also uses a code to represent a location. In this case, if you include the location’s Geonames code within your metadata (for example, Boston Public Library = 4931010), the repository will place a pointer on a map in the middle of the more specific location.
The repository is also capable of displaying specific addresses as points on a map. In order to utilize this functionality, addresses must be expressed within your metadata as coordinates of latitude and longitude (for example, 42.349394,-71.078378). Texas A&M University Geoservices provides tools for geocoding and related tasks. BatchGeo also provides a similar service.