When the Data Becomes Meta: Quality Control for Digitized Ancient Heritage Collections

Session 6A

Author

Affiliation

Victoria Gioia Désirée Landau

University of Basel

Published

September 13, 2024

Doi

10.5281/zenodo.13907777

Abstract

Digitizing analog historical collections has become common practice for institutions with the necessary funds and infrastructure for the process. This endeavor is typically undertaken in isolation or with external assistance, adjusted to the needs and capabilities of the collection-holding institution. Anticipating the ways in which users will access online collections can be difficult. There can be an underlying assumption that they are a uniform target group: professionals with knowledge in the respective fields/periods/languages, and amateurs expected to take these prerequisites into account. However, institutions housing antiquities are as diverse as their viewers, from museums to art galleries, academic institutions and private collections – all with differing priorities regarding form, content, context and meaning of their objects. The implementation of measures must be considered to ensure online presences account for the hurdles and roadblocks to holistic digital access to ancient heritage collections. Digital formats raise issues of web accessibility, language barriers and interoperability, in addition to complications imported from the analog, such as understanding across disciplines and vocabularies. Drawing from a comparative dataset of papyrus-housing institutions in around 40 countries, expanded from Trismegistos, this paper utilizes selected digitized papyrological collections to spotlight their successes and limitations, as well as characteristics common to the digitization of ancient heritage collections. This includes how collections present themselves to audiences of varying digital knowledge and familiarity, and speak to the understanding of multiple disciplines. Other aspects are increased dedication to provenance investigation, offering contextualization of objects and facilitating cross-institutional/-disciplinary research.

Keywords

ancient heritage, digitization, accessibility, metadata, papyrology

Introduction

Historical objects from ancient civilizations have made their way from their point of origin to institutions worldwide, the majority over the past two centuries, some long before. Today, they are often housed in knowledge, art and cultural institutions – short: GLAMs (galleries, libraries, archives, museums). When an institution makes its holdings available online, it attracts a varied audience with different interests and backgrounds when approaching objects. Among them are researchers investigating items for their archeological and (art) historical properties, or for the textual documents contained on their surface.

One such object type are documents on papyrus, predominantly from modern-day Egypt, but also discovered at sites in countries such as Afghanistan, Jordan, Greece and Italy (famously the Pompeii and Herculaneum papyri). While papyrus collections can display common characteristics and patterns of acquisition (e.g., targeted excavation or purchase in the late 19th and early 20th century, private donations and bequests to Classics departments and national museums in the century since), they can differ vastly in their treatment and prioritization by their holding institutions. From fully interoperable, comprehensive metadata generation and object digitization endeavors to mere mentions of the existence of papyri (edited and unedited) at a given institution in print publications, the prerequisites for accessing and interacting with a set of papyri largely depend on institutional investment in their collection.

So even before texts deciphered from ancient heritage objects like papyri can or should be used for innovative computational methods – such as machine learning (Sommerschield et al. 2023), digital paleography and character recognition (Marthot-Santaniello 2021), as well as fragment reassembly and reconstruction (Pirrone, Beurton-Aimar, and Journet 2021) –, we stand to benefit from enhancing existing metadata, and taking a look at the institutions and platforms that provide us with the information needed to engage with historical source material digitally.

State of Collection Metadata

Most publications in papyrology up to the 21st century, and especially editio princeps of texts on papyrus, have been print editions, making it the community’s work and responsibility to keep track of digitized papyrological editions and the texts contained in them. This is where digital tools and resources have been developed early on, from creating born-digital editions to digitizing collection holdings (Ast 2022). A starting point for an overview of these collection locations is a dataset expanded from Trismegistos’ TM Collections database, identifying around 800 papyrus-holding institutions in close to 40 countries, offering a sense of the scale of distribution of this ancient heritage object type.

For collections that have been digitized, there are questions and problem areas each institution must address throughout the digitization process and after, many of which can start in the analog and are transferred to the digital. Often, these are challenges connected to historical designations, institutional decision-making and accessibility of the ancient heritage objects.

In terms of historical designations, this can include terminology (description, typology and classification of texts, e.g., “private letter”, “amulet”, “writing exercise”) and periodization, both for objects dated to specific years (365 BCE) and to broader timespans (4th c. BCE); even assigning objects to categories like “Ptolemaic” and “Late Roman”, or “Greek world” (a cultural or geographic descriptor) is connected to both terminology and periodization. Institutional decision-making over many years further decides the storage of an object; how it is housed and inventoried is a categorization in itself, and designates which specialist or curator is in charge (classifying an object among, e.g., manuscripts, or in the Antiquities, Classics or Archeology category/department). The papyrus-holding institution (as well as the respective cataloger, curator, data manager) also sets a focus for its collections, anticipating or encouraging audience types, selecting presentation forms and grouping items. Lastly, accessibility encompasses many aspects: how findable an object is (its retrievability and being uniquely identifiable), whether understanding can be generated by proper contextualization being in place (object provenance, acquisition), and if connections and links within and beyond the collection are being offered, such as to aggregators, projects or related institutions (seeing as objects are often fragmented, the pieces having been distributed across collections decades ago). It also includes the information being provided, such as how current metadata is (date last edited), who generated it (metadata editor), and whether it can be displayed in more than one language (through tailor-made translations, or the availability of an automatic translation of the webpage as a whole). Here, with multilingualism, one can again run into the terminology problem, and creating equivalence between languages.

With all these elements, uniformity can be difficult to implement, particularly if oftentimes there are analog predecessors (e.g., card catalogs) being transposed to digital metadata when a collection is digitized. Standards are not often agreed upon and correctly implemented, and even it they were, accessibility pertains not only to the objects themselves and whether they can be viewed, but also to the historical – even specifically papyrological – education necessary to engage with them. When terms are not even agreed upon by scholars engaging with them every day, how can a wider audience be expected to understand and interact with either the objects or the scholarship derived from them?

Approaches to Online/Digital Collections

When the digitization of collections is analyzed, it is usually done by approaching a single institution, by conducting a national survey (of e.g., an institution type, such as all museums, or all libraries), or by using a similar institution- or collection-focused scope; thus, there are not many studies of this kind considering a specific object type across institutions worldwide. However, since this is how researchers tend to approach collections (gathering source material on particular topics, geographic regions or time periods by papyri, ostraca, inscriptions, and other object types), and since with papyrology there is a sub-discipline across several fields (Egyptology, Ancient History, Greek Philology, Latin Philology, Coptology, Arabic Studies, Iranian Studies, and more) devoted to their study, using papyri as a case study object for such research is promising.

Papyri have enjoyed increased accessibility in the past years, thanks in great part to the dedication and work of individuals generating connections to aggregators, such as Trismegistos (Depauw 2018) and Papyri.info (Berkes 2018). Even Europeana, with its aim of being a digital ecosystem for the European cultural heritage sector, has a growing “Papyrus” category (incl. e.g., Rijksmuseum van Oudheden in Leiden), wherein institutions have made their collections available as entries on Europeana, linking back to the museum’s online catalog via an object permalink. Europeana also allows users to customize the website language to one of 24 options. This is rarely the case for institutional websites: the Musées Royaux d’Art et d’Histoire in Brussels have an entirely trilingual web presence (English, Dutch and French), while there have also been interesting compromises, such as in the case of the Medelhavsmuseet in Stockholm, where metadata categories are partially available in both English and Swedish.

The digitization of papyri has not always been linked to available digital metadata, meaning photographs and pertinent, connected information were not uploaded at the same time, or made available in one place. This is unfortunate, since having only the image or only the metadata at hand is rarely enough – one often helps correct the other (in the case of, e.g., inventory number mix-ups) and both offer contextualization to one another. While this has not always been the case, nowadays it is usually done this way: today, the British Library and other larger UK institutions will combine individual digitization requests – made by researchers or projects – for items not yet available on their online catalog with a metadata upload, using the requesting party’s expertise on the item. Technical solutions providing images and metadata together, like making a collection accessible online using IIIF (International Image Interoperability Framework), are gaining traction, typically implemented by the holding institution, since an image server is required to store the digitized assets. Biblioteca Universitaria di Bologna is among the institutions that have recently migrated their digital collections system to one that supports IIIF.

When single institutions have not been willing or able to provide collections with a digital presence, there have been successful attempts by collections at pooling their resources: initiatives like the Papyrus Portal as a platform for primarily German papyrus-holding collections, or collaborations like UC Berkeley’s “Regional Partners” (Badè Museum of Biblical Archaeology, California State University, Stanford University, and Washington State University) for uploading to APIS (Advanced Papyrological Information System), now part of Papyri.info.

While DOIs or similar persistent identifiers for single object entries are rare on institutional websites (e.g., Library of Trinity College Dublin), Trismegistos provides persistent identifiers for a number of categories related to its work, such as TM Texts IDs, TM Archive IDs and even TM Collections IDs. Currently, not all institutions with digitized papyri in their own online collections catalog refer to the connected TM identifier(s), much less provide a hyperlink to the related TM Texts entry. This would have to be remedied with institution-specific outreach initiatives. When it comes to the aforementioned fragmentation of once-whole papyri, identified matches should also be pointed out in online catalogs. This is as of yet still a service rendered by Trismegistos Texts, which must also make decisions regarding how to handle linking or merging related, physically separated texts in their IDs.

Metadata standards from the field of cultural heritage offer themselves for implementation by GLAMs. There are institutional online collection catalogs utilizing Dublin Core (e.g., Biblioteca Universitaria di Bologna) and permitting DCMI metadata downloads (e.g., Library of Trinity College Dublin), but as with any standard, while some objects are equipped with expressive metadata classes, some rely on the «description» class for full text instead of encoding more specific metadata. This is usually a sign that a papyrus has not yet been optimized for online view for any number of reasons, or that papyri represent only a small part of the institution’s overall collection. ICOM’s CIDOC CRM is also becoming increasingly relevant (Liu, Hindmarch, and Hess 2023), including the extension CRMtex intended for cultural heritage objects with textual content, such as ancient documents, akin to CRMsci and CRMarchaeo for scientific and archeological data respectively (Felicetti and Murano 2017).

Papyrology has benefitted from many of the thought processes and implementations originally intended for its sister-field of epigraphy. Standards have been developed specifically for ancient texts when it comes to transcription (according to the Leiden system) and encoding, namely EpiDoc for the digital edition of ancient texts in TEI XML (Bodard 2010). The EpiDoc Guidelines also offer recommendations for metadata fields aimed at the description of text-bearing objects, which apart from the description of physical characteristics includes provenance information. It also allows for linking to external controlled vocabularies.

Existing and developing controlled vocabularies and gazetteers for locations (e.g., Getty, Pleiades, Elliott 2021), chronology (e.g., PeriodO, Golden and Shaw 2016) and terminology (e.g., EAGLE, continued by a Working Group at Epigraphy.info, Mannocci et al. 2014; Liuzzo 2018; and FAIR Epigraphy, Heřmánková, Horster, and Prag 2022) have been massive accomplishments, continuously maintained and ready to be utilized. In the direction of accessibility, multilingualism and openness of collections to a wider audience, inspiration could also be drawn from less complex schemata, but highly informative resources, such as the recent IFLA «Open Access Vocabulary», composed in English, with a translation of the terminology into Spanish, Chinese and Arabic (Bradley and Reilly 2024).

Conclusion

This paper is a work-in-progress of a section of an ongoing dissertation project, presenting and discussing preliminary findings and continued research angles. There are further considerations to be made regarding the funding, available infrastructure and size of institutions, the priority-setting of collections, and institutional guidelines that determine if, why and how a collection is digitized. Marked changes can already be seen in the approach of institutions towards their online collections, in the direction of the tenets of the FAIR principles, making their objects more findable (DOIs, consistently maintained platforms), accessible (digitization, machine-readable metadata), interoperable (standards, nomenclature, vocabularies, IIIF) and even reusable (open licensing of images and metadata). This is also seen in how institutions are becoming increasingly sensitized to a broader audience engaging with their collections, with explicit use of the FAIR, and in some cases the CARE principles, in their online collection presentation and the steps taken to get there (Carroll et al. 2021).

References

Ast, Rodney. 2022. “Can the Digital Humanities Make Us Better Humanists? A Case Study in Papyrology.” In Digital Text Analysis of Greek and Latin Sources. Methods, Tools, Perspectives: Special Issue, Classics@ 21, edited by Stelios Chronopoulos, Felix K. Maier, and Anna Novokhatko. https://classics-at.chs.harvard.edu/volume/classics20-digital-text-analysis-of-greek-and-latin-sources/.

Berkes, Lajos. 2018. “Perspectives and Challenges in Editing Documentary Papyri Online: A Report on Born-Digital Editions Through Papyri.info.” In Digital Papyrology II. Case Studies on the Digital Edition of Ancient Greek Papyri, edited by Nicola Reggiani, 75–85. Berlin/Boston: De Gruyter. https://doi.org/10.1515/9783110547450-004.

Bodard, Gabriel. 2010. “EpiDoc: Epigraphic Documents in XML for Publication and Interchange.” In Latin on Stone. Epigraphic Research and Electronic Archives, edited by Francisca Feraudi-Gruénais, 101–18. Lanham, MD: Lexington Books.

Bradley, Fiona, and Susan Reilly. 2024. “IFLA Open Access Vocabulary.” International Federation of Library Associations and Institutions. https://repository.ifla.org/handle/123456789/3272.

Carroll, Stephanie Russo, Edit Herczog, Maui Hudson, Keith Russell, and Shelley Stall. 2021. “Operationalizing the CARE and FAIR Principles for Indigenous Data Futures.” Scientific Data 8 (108): 1–6. https://doi.org/10.1038/s41597-021-00892-0.

Depauw, Mark. 2018. “Trismegistos: Optimizing Interoperability for Texts from the Ancient World.” In Crossing Experiences in Digital Epigraphy. From Practice to Discipline, edited by Annamaria De Santis and Irene Rossi, 193–201. Warsaw/Berlin: De Gruyter. https://doi.org/10.1515/9783110607208-016.

Elliott, Tom. 2021. “The Pleiadic Gaze: Looking at Archaeology from the Perspective of a Digital Gazetteer.” In Classical Archaeology in the Digital Age – the AIAC Presidential Panel: Proceedings of the 19th International Congress of Classical Archaeology. Cologne/Bonn, 22–26 May 2018, edited by Kristian Göransson, 43–51. Archaeology and Economy in the Ancient World. Heidelberg: Propylaeum. https://doi.org/10.11588/PROPYLAEUM.708.

Felicetti, Achille, and Francesca Murano. 2017. “Scripta Manent: A CIDOC CRM Semiotic Reading of Ancient Texts.” International Journal on Digital Libraries 18 (4): 263–70. https://doi.org/10.1007/s00799-016-0189-z.

Golden, Patrick, and Ryan Shaw. 2016. “Nanopublication Beyond the Sciences: The PeriodO Period Gazetteer.” PeerJ Computer Science 2: 1–18. https://doi.org/10.7717/peerj-cs.44.

Heřmánková, Petra, Marietta Horster, and Jonathan Prag. 2022. “Digital Epigraphy in 2022: A Report from the Scoping Survey of the FAIR Epigraphy Project.” Zenodo. https://doi.org/10.5281/ZENODO.6610695.

Liu, Fangchao, John Hindmarch, and Mona Hess. 2023. “A Review of the Cultural Heritage Linked Open Data Ontologies and Models.” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences XLVIII-M-2-2023: 943–50. https://doi.org/10.5194/isprs-archives-XLVIII-M-2-2023-943-2023.

Liuzzo, Pietro M. 2018. “EAGLE Continued: IDEA. The International Digital Epigraphy Association.” In Crossing Experiences in Digital Epigraphy. From Practice to Discipline, edited by Annamaria De Santis and Irene Rossi, 216–30. Warsaw/Berlin: De Gruyter. https://doi.org/10.1515/9783110607208-018.

Mannocci, Andrea, Vittore Casarosa, Paolo Manghi, and Franco Zoppi. 2014. “The Europeana Network of Ancient Greek and Latin Epigraphy Data Infrastructure.” In Metadata and Semantics Research: 8th Research Conference, MTSR 2014, Karlsruhe, Germany, November 27-29, 2014, Proceedings, edited by Sissi Closs, Rudi Studer, Emmanouel Garoufallou, and Miguel-Angel Sicilia, 286–300. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-13674-5.

Marthot-Santaniello, Isabelle. 2021. “D-Scribes Project and Beyond: Building a Virtual Research Environment for the Digital Palaeography of Ancient Greek and Coptic Papyri.” In Ancient Manuscripts and Virtual Research Environments: Special Issue, Classics@ 18, edited by Claire Clivaz and Garrick V. Allen. https://classics-at.chs.harvard.edu/volume/classics18-ancient-manuscripts-and-virtual-research-environments/.

Pirrone, Antoine, Marie Beurton-Aimar, and Nicholas Journet. 2021. “Self-Supervised Deep Metric Learning for Ancient Papyrus Fragments Retrieval.” International Journal on Document Analysis and Recognition (IJDAR) 24 (3): 219–34. https://doi.org/10.1007/s10032-021-00369-1.

Sommerschield, Thea, Yannis Assael, John Pavlopoulos, Vanessa Stefanak, Andrew Senior, Chris Dyer, John Bodel, Jonathan Prag, Ion Androutsopoulos, and Nando de Freitas. 2023. “Machine Learning for Ancient Languages: A Survey.” Computational Linguistics 49 (3): 703–47. https://doi.org/10.1162/coli_a_00481.

Reuse

CC BY-SA 4.0

Citation

BibTeX citation:

@misc{gioia_désirée_landau2024,
  author = {Gioia Désirée Landau, Victoria},
  editor = {Baudry, Jérôme and Burkart, Lucas and Joyeux-Prunel,
    Béatrice and Kurmann, Eliane and Mähr, Moritz and Natale, Enrico and
    Sibille, Christiane and Twente, Moritz},
  title = {When the {Data} {Becomes} {Meta:} {Quality} {Control} for
    {Digitized} {Ancient} {Heritage} {Collections}},
  date = {2024-09-13},
  url = {https://digihistch24.github.io/submissions/464/},
  doi = {10.5281/zenodo.13907777},
  langid = {en},
  abstract = {Digitizing analog historical collections has become common
    practice for institutions with the necessary funds and
    infrastructure for the process. This endeavor is typically
    undertaken in isolation or with external assistance, adjusted to the
    needs and capabilities of the collection-holding institution.
    Anticipating the ways in which users will access online collections
    can be difficult. There can be an underlying assumption that they
    are a uniform target group: professionals with knowledge in the
    respective fields/periods/languages, and amateurs expected to take
    these prerequisites into account. However, institutions housing
    antiquities are as diverse as their viewers, from museums to art
    galleries, academic institutions and private collections – all with
    differing priorities regarding form, content, context and meaning of
    their objects. The implementation of measures must be considered to
    ensure online presences account for the hurdles and roadblocks to
    holistic digital access to ancient heritage collections. Digital
    formats raise issues of web accessibility, language barriers and
    interoperability, in addition to complications imported from the
    analog, such as understanding across disciplines and vocabularies.
    Drawing from a comparative dataset of papyrus-housing institutions
    in around 40 countries, expanded from Trismegistos, this paper
    utilizes selected digitized papyrological collections to spotlight
    their successes and limitations, as well as characteristics common
    to the digitization of ancient heritage collections. This includes
    how collections present themselves to audiences of varying digital
    knowledge and familiarity, and speak to the understanding of
    multiple disciplines. Other aspects are increased dedication to
    provenance investigation, offering contextualization of objects and
    facilitating cross-institutional/-disciplinary research.}
}

For attribution, please cite this work as:

Gioia Désirée Landau, Victoria. 2024. “When the Data Becomes Meta: Quality Control for Digitized Ancient Heritage Collections.” Edited by Jérôme Baudry, Lucas Burkart, Béatrice Joyeux-Prunel, Eliane Kurmann, Moritz Mähr, Enrico Natale, Christiane Sibille, and Moritz Twente. Digital History Switzerland 2024: Book of Abstracts. https://doi.org/10.5281/zenodo.13907777.