DigiHistCH24
  • Home
  • Book of Abstracts
  • Conference Program
  • Call for Contributions
  • About

transcriptiones – Create, Share and Access Transcriptions of Historical Manuscripts

  • Home
  • Book of Abstracts
    • Data-Driven Approaches to Studying the History of Museums on the Web: Challenges and Opportunities for New Discoveries
    • On a solid ground. Building software for a 120-year-old research project applying modern engineering practices
    • Tables are tricky. Testing Text Encoding Initiative (TEI) Guidelines for FAIR upcycling of digitised historical statistics.
    • Training engineering students through a digital humanities project: Techn’hom Time Machine
    • From manual work to artificial intelligence: developments in data literacy using the example of the Repertorium Academicum Germanicum (2001-2024)
    • A handful of pixels of blood
    • Impresso 2: Connecting Historical Digitised Newspapers and Radio. A Challenge at the Crossroads of History, User Interfaces and Natural Language Processing.
    • Learning to Read Digital? Constellations of Correspondence Project and Humanist Perspectives on the Aggregated 19th-century Finnish Letter Metadata
    • Teaching the use of Automated Text Recognition online. Ad fontes goes ATR
    • Geovistory, a LOD Research Infrastructure for Historical Sciences
    • Using GIS to Analyze the Development of Public Urban Green Spaces in Hamburg and Marseille (1945 - 1973)
    • Belpop, a history-computer project to study the population of a town during early industrialization
    • Contributing to a Paradigm Shift in Historical Research by Teaching Digital Methods to Master’s Students
    • Revealing the Structure of Land Ownership through the Automatic Vectorisation of Swiss Cadastral Plans
    • Rockefeller fellows as heralds of globalization: the circulation of elites, knowledge, and practices of modernization (1920–1970s): global history, database connection, and teaching experience
    • Theory and Practice of Historical Data Versioning
    • Towards Computational Historiographical Modeling
    • Efficacy of Chat GPT Correlations vs. Co-occurrence Networks in Deciphering Chinese History
    • Data Literacy and the Role of Libraries
    • 20 godparents and 3 wives – studying migrant glassworkers in post-medieval Estonia
    • From record cards to the dynamics of real estate transactions: Working with automatically extracted information from Basel’s historical land register, 1400-1700
    • When the Data Becomes Meta: Quality Control for Digitized Ancient Heritage Collections
    • On the Historiographic Authority of Machine Learning Systems
    • Films as sources and as means of communication for knowledge gained from historical research
    • Develop Yourself! Development according to the Rockefeller Foundation (1913 – 2013)
    • AI-assisted Search for Digitized Publication Archives
    • Digital Film Collection Literacy – Critical Research Interfaces for the “Encyclopaedia Cinematographica”
    • From Source-Criticism to System-Criticism, Born Digital Objects, Forensic Methods, and Digital Literacy for All
    • Connecting floras and herbaria before 1850 – challenges and lessons learned in digital history of biodiversity
    • A Digital History of Internationalization. Operationalizing Concepts and Exploring Millions of Patent Documents
    • From words to numbers. Methodological perspectives on large scale Named Entity Linking
    • Go Digital, They Said. It Will Be Fun, They Said. Teaching DH Methods for Historical Research
    • Unveiling Historical Depth: Semantic annotation of the Panorama of the Battle of Murten
    • When Literacy Goes Digital: Rethinking the Ethics and Politics of Digitisation
  • Conference Program
    • Schedule
    • Keynote
    • Practical Information
    • Event Digital History Network
    • Event SSH ORD
  • Call for Contributions
    • Key Dates
    • Evaluation Criteria
    • Submission Guidelines
  • About
    • Code of Conduct
    • Terms and Conditions

On this page

  • Background
  • Project Overview
  • Towards FAIR transcriptions
  • Conclusion
  • References
  • Edit this page
  • Report an issue

Other Links

  • Poster (PDF)

transcriptiones – Create, Share and Access Transcriptions of Historical Manuscripts

Poster Session
Authors
Affiliations

Yvonne Fuchs

University of Basel

University of Lucerne

Dominic Weber

University of Bern

University of Basel

Published

September 12, 2024

Doi

10.5281/zenodo.13908159

Abstract

Transcriptions are crucial for historical research but largely inaccessible, leading to redundant work. transcriptiones revolutionizes the access to transcriptions and metadata of historical sources through a collaborative platform, empowering researchers, students, and citizen scientists to contribute. Thus, it takes transcriptions to the age of FAIR and open research data.

Keywords

Transcriptions, Open Research Data, FAIR Data, Crowdsourcing

A PDF version of the poster is available on Zenodo (PDF).

Background

The significance of Open Research Data (ORD) is rapidly increasing in the research landscape, promoting transparency, reproducibility, and reuse (For more information about ORD in the Swiss higher education system, see swissuniversities 2021a; and swissuniversities 2021b). In historical research, transcriptions are crucial research data, serving as indispensable resources for the interpretation of the past. Despite their immense value, transcriptions have often remained unpublished, difficult to find, and lacked a central platform for access. Therefore, historians frequently had to re-transcribe the same sources. transcriptiones addresses this problem by providing the infrastructure for sharing, editing and reusing transcriptions (Fuchs et al. n.d.c).

Project Overview

transcriptiones is for everyone – researchers, students, and citizen scientists. By contributing their transcriptions, they enhance the visibility and impact of their work. Institutional barriers diminish and collaborations are established. The shared transcriptions are not restricted to a certain period or space. And importantly, contributors are not bound to any digitisation programmes by GLAM institutions but can provide transcriptions of whatever sources they are working on. This leads to the inclusion of diverse sources not typically found on platforms focused on digital copies. In addition, transcriptiones gathers metadata of the transcribed sources, harnessing a rich pool of crowdsourced knowledge. Some of them would otherwise remain uncollected. Overall, transcriptiones enables the reuse of transcriptions and provides valuable insights into sources.

In order to build and uphold a diverse community, transcriptiones needs to cater for the needs and skill sets of many different groups. This includes for example balancing a low-threshold and lightweight upload process for those wishing to quickly publish their transcriptions with the provision of comprehensive metadata required by researchers to properly contextualize the transcriptions they obtain from transcriptiones.

After sharing, transcriptions are not intended to remain stagnant. Rather, the community is encouraged to adapt, enhance and therefore reuse the transcriptions. Additionally, users can also revise by adding metadata. The different versions of a transcribed document can be viewed in the document’s version history. Each revised version is assigned a unique, permanent URL that remains unchanged. This ensures that the exact version of a transcription is easily findable and can be accurately cited. By design, the contributed transcriptions vary in state. Sometimes only parts of a source are transcribed, or incomplete raw versions are provided. However, even such partial transcriptions are valuable for transcriptiones as they provide valuable insights into archival collections. Moreover, their quality improves through collaboration, like the principle used by Wikipedia. The open and collaborative nature of transcriptiones, however, requires the users to possess a certain degree of data literacy. Accessing the transcriptions and metadata demands an understanding of what to expect, along with preparedness for potential preprocessing before further use. Contributors on the other hand should not be afraid to publish transcriptions which contain unclear readings or incomplete sections of a source. They can anticipate that other users are cognizant of the potential appearance of transcriptions and might edit or expand them later. This is also in line with the Swiss Data Literacy Charter, according to which, data literacy enables people to act as data producers and data consumers alike (Swiss Academies of Arts and Sciences 2024, 4).

Another goal of transcriptiones is building a community of transcribers who interact with one another and enhance the transcriptions together. To facilitate this, several features have been implemented. Users can subscribe to other users, specific institutions, and reference numbers in order to stay up to date with recent developments related to their interests. Additionally, users can contact other contributors directly to exchange information about sources, manuscripts, or scientific findings.

Towards FAIR transcriptions

From the research community’s perspective, findability, and therefore effective search strategies, are essential. For that reason, two distinct ways of navigating the transcriptiones collection have been implemented, each serving specific purposes. The field search allows users to initiate queries at varying levels of detail (Fuchs et al. n.d.b). This interface allows users to locate transcriptions of specific sources. By combining multiple fields, users can refine their searches and discover similar sources from a particular time period, for example. The second strategy is an inventory search, offering access to transcriptions based on archives, signatures, scribes, and different types of sources (Fuchs et al. n.d.a). This approach is similar to an archive plan search, designed to align with a search pattern historians are used to and to transpose this pattern to a platform which spans multiple GLAM institutions. Regarding the FAIR principles, these search strategies are crucial in making transcriptions of handwritten sources findable.

Given the increasing importance of digital research methods in history, it is important that data from transcriptiones is not only accessible to humans but also to machines. Therefore, access to transcriptions and metadata is provided through both the web application and a REST-API (Fuchs et al. n.d.d). Via the REST-API, lists and metadata of institutions, reference numbers, source types, scribes and documents can be accessed automatically in the JSON-format. The transcriptions themselves can also be automatically scraped either as plain text or as TEI-XML. Thanks to the API, digital historians can comfortably access the transcriptiones collection the way they need it to conduct quantitative research, to train language models or for any other task that requires automatic access to data and metadata. Furthermore, the API enables interoperability with other stakeholders and ensures that the impact of data reuse extends beyond the platform itself. One example of such a use of the transcriptiones API is the interface between transcriptiones and the Digitaler Lesesaal of the Staatsarchiv Basel-Stadt. Currently in development, this connection will enable direct links to existing transcriptions within the archive catalog.

The central aspects of transcriptiones are accessibility, transparency, collaboration, and reuse. While the aforementioned features and strategies of transcriptiones tackle those aspects with regard to the transcriptions and their metadata, the platform also promotes them in the context of code and its development. For this reason, the source code is openly available on Zenodo and GitHub under the very open BSD-3-Clause license (Fuchs et al. 2023a, 2023b).

Conclusion

transcriptiones provides the infrastructure for sharing and editing transcriptions, which it understands as research data. By doing so, it takes this type of data to the age of FAIR and open research data. As an open and collaborative platform that requires metadata during uploads to ensure proper attribution to the source and offers various search strategies, it ensures that transcriptions are findable. Accessibility is guaranteed through the free web application, which allows viewing transcriptions without registration as well as through the various export formats and the API. The latter is also an important cornerstone in providing transcriptions and metadata interoperably. Reusability is achieved through the plethora of metadata and the versioning of edited transcriptions and metadata (For further information about what the FAIR data principles are, see Wilkinson et al. 2016). At the same time, transcriptiones prompts a reconsideration of the perception of transcriptions, encouraging contributors to open up their work to collaboration. All these parts play together towards understanding transcriptions as invaluable research data which is worth gathering, sharing, enhancing and documenting so that many historians can use them for downstream research.

References

Fuchs, Yvonne, Dominic Weber, Sorin Marti, and Nicolas Diener. 2023a. “Transcriptiones.” Zenodo. https://doi.org/10.5281/zenodo.8124784.
———. 2023b. “Transcriptiones.” https://github.com/transcriptiones/transcriptiones.
———. n.d.a. “Browse Collection.” Transcriptiones. Accessed July 21, 2024. https://transcriptiones.ch/display/.
———. n.d.b. “Search.” Transcriptiones. Accessed July 21, 2024. https://transcriptiones.ch/search/.
———. n.d.c. “Transcriptiones.” Transcriptiones. Accessed July 22, 2024. https://transcriptiones.ch/.
———. n.d.d. “Transcriptiones API.” Transcriptiones. Accessed July 22, 2024. https://transcriptiones.ch/api/documentation/.
Swiss Academies of Arts and Sciences. 2024. Swiss Data Literacy Charter. Swiss Academies of Arts and Sciences.
swissuniversities. 2021a. Swiss National Open Research Data Strategy.
———. 2021b. Swiss National Strategy Open Research Data. Version 1.0. Action Plan.
Wilkinson, Mark D., Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016. “The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 160018. https://doi.org/10.1038/sdata.2016.18.
Back to top

Reuse

CC BY-SA 4.0

Citation

BibTeX citation:
@misc{fuchs2024,
  author = {Fuchs, Yvonne and Weber, Dominic},
  editor = {Baudry, Jérôme and Burkart, Lucas and Joyeux-Prunel,
    Béatrice and Kurmann, Eliane and Mähr, Moritz and Natale, Enrico and
    Sibille, Christiane and Twente, Moritz},
  title = {Transcriptiones – {Create,} {Share} and {Access}
    {Transcriptions} of {Historical} {Manuscripts}},
  date = {2024-09-12},
  url = {https://digihistch24.github.io/submissions/poster/463/},
  doi = {10.5281/zenodo.13908159},
  langid = {en},
  abstract = {Transcriptions are crucial for historical research but
    largely inaccessible, leading to redundant work. transcriptiones
    revolutionizes the access to transcriptions and metadata of
    historical sources through a collaborative platform, empowering
    researchers, students, and citizen scientists to contribute. Thus,
    it takes transcriptions to the age of FAIR and open research data.}
}
For attribution, please cite this work as:
Fuchs, Yvonne, and Dominic Weber. 2024. “Transcriptiones – Create, Share and Access Transcriptions of Historical Manuscripts.” Edited by Jérôme Baudry, Lucas Burkart, Béatrice Joyeux-Prunel, Eliane Kurmann, Moritz Mähr, Enrico Natale, Christiane Sibille, and Moritz Twente. Digital History Switzerland 2024: Book of Abstracts. https://doi.org/10.5281/zenodo.13908159.
Source Code
---
submission_id: 463
categories: 'Poster Session'
title: transcriptiones – Create, Share and Access Transcriptions of Historical Manuscripts
author:
  - name: Yvonne Fuchs
    orcid: 0009-0007-4545-606X
    email: yvonne.fuchs@unibas.ch
    affiliations:
      - University of Basel
      - University of Lucerne
  - name: Dominic Weber
    orcid: 0000-0002-9265-3388
    email: dominic.weber@unibe.ch
    affiliations:
      - University of Bern
      - University of Basel
keywords:
  - Transcriptions
  - Open Research Data
  - FAIR Data
  - Crowdsourcing
abstract: |
  Transcriptions are crucial for historical research but largely inaccessible, leading to redundant work. transcriptiones revolutionizes the access to transcriptions and metadata of historical sources through a collaborative platform, empowering researchers, students, and citizen scientists to contribute. Thus, it takes transcriptions to the age of FAIR and open research data.
key-points:
  - Transcriptions are crucial for historical research but largely inaccessible, leading to redundant work.
  - transcriptiones is a collaborative platform which revolutionizes the access to transcriptions and metadata.
  - transcriptiones takes transcriptions to the age of FAIR and open research data. 
date: 09-12-2024
doi: 10.5281/zenodo.13908159
other-links:
  - text: Poster (PDF)
    href: https://doi.org/10.5281/zenodo.13908159
bibliography: references.bib
---

::: {.callout-note appearance="simple" icon=false}

A PDF version of the poster is available [on Zenodo (PDF)](https://zenodo.org/records/13908159/files/463_DigiHistCH24_transcriptiones_Poster.pdf).

:::

## Background

The significance of Open Research Data (ORD) is rapidly increasing in the research landscape, promoting transparency, reproducibility, and reuse [For more information about ORD in the Swiss higher education system, see @swissuniversitiesSwissNationalOpen2021; and @swissuniversitiesSwissNationalStrategy2021]. In historical research, transcriptions are crucial research data, serving as indispensable resources for the interpretation of the past. Despite their immense value, transcriptions have often remained unpublished, difficult to find, and lacked a central platform for access. Therefore, historians frequently had to re-transcribe the same sources. *transcriptiones* addresses this problem by providing the infrastructure for sharing, editing and reusing transcriptions [@fuchsTranscriptiones].

## Project Overview

*transcriptiones* is for everyone – researchers, students, and citizen scientists. By contributing their transcriptions, they enhance the visibility and impact of their work. Institutional barriers diminish and collaborations are established. The shared transcriptions are not restricted to a certain period or space. And importantly, contributors are not bound to any digitisation programmes by GLAM institutions but can provide transcriptions of whatever sources they are working on. This leads to the inclusion of diverse sources not typically found on platforms focused on digital copies. In addition, *transcriptiones* gathers metadata of the transcribed sources, harnessing a rich pool of crowdsourced knowledge. Some of them would otherwise remain uncollected. Overall, *transcriptiones* enables the reuse of transcriptions and provides valuable insights into sources.

In order to build and uphold a diverse community, *transcriptiones* needs to cater for the needs and skill sets of many different groups. This includes for example balancing a low-threshold and lightweight upload process for those wishing to quickly publish their transcriptions with the provision of comprehensive metadata required by researchers to properly contextualize the transcriptions they obtain from *transcriptiones*.

After sharing, transcriptions are not intended to remain stagnant. Rather, the community is encouraged to adapt, enhance and therefore reuse the transcriptions. Additionally, users can also revise by adding metadata. The different versions of a transcribed document can be viewed in the document's version history. Each revised version is assigned a unique, permanent URL that remains unchanged. This ensures that the exact version of a transcription is easily findable and can be accurately cited. By design, the contributed transcriptions vary in state. Sometimes only parts of a source are transcribed, or incomplete raw versions are provided. However, even such partial transcriptions are valuable for *transcriptiones* as they provide valuable insights into archival collections. Moreover, their quality improves through collaboration, like the principle used by Wikipedia. The open and collaborative nature of *transcriptiones*, however, requires the users to possess a certain degree of data literacy. Accessing the transcriptions and metadata demands an understanding of what to expect, along with preparedness for potential preprocessing before further use. Contributors on the other hand should not be afraid to publish transcriptions which contain unclear readings or incomplete sections of a source. They can anticipate that other users are cognizant of the potential appearance of transcriptions and might edit or expand them later. This is also in line with the Swiss Data Literacy Charter, according to which, data literacy enables people to act as data producers and data consumers alike [@swissacademiesofartsandsciencesSwissDataLiteracy2024, p. 4].

Another goal of *transcriptiones* is building a community of transcribers who interact with one another and enhance the transcriptions together. To facilitate this, several features have been implemented. Users can subscribe to other users, specific institutions, and reference numbers in order to stay up to date with recent developments related to their interests. Additionally, users can contact other contributors directly to exchange information about sources, manuscripts, or scientific findings.

## Towards FAIR transcriptions

From the research community’s perspective, findability, and therefore effective search strategies, are essential. For that reason, two distinct ways of navigating the *transcriptiones* collection have been implemented, each serving specific purposes. The field search allows users to initiate queries at varying levels of detail [@fuchsSearch]. This interface allows users to locate transcriptions of specific sources. By combining multiple fields, users can refine their searches and discover similar sources from a particular time period, for example. The second strategy is an inventory search, offering access to transcriptions based on archives, signatures, scribes, and different types of sources [@fuchsBrowseCollection]. This approach is similar to an archive plan search, designed to align with a search pattern historians are used to and to transpose this pattern to a platform which spans multiple GLAM institutions. Regarding the FAIR principles, these search strategies are crucial in making transcriptions of handwritten sources findable.

Given the increasing importance of digital research methods in history, it is important that data from *transcriptiones* is not only accessible to humans but also to machines. Therefore, access to transcriptions and metadata is provided through both the web application and a REST-API [@fuchsTranscriptionesAPI]. Via the REST-API, lists and metadata of institutions, reference numbers, source types, scribes and documents can be accessed automatically in the JSON-format. The transcriptions themselves can also be automatically scraped either as plain text or as TEI-XML. Thanks to the API, digital historians can comfortably access the *transcriptiones* collection the way they need it to conduct quantitative research, to train language models or for any other task that requires automatic access to data and metadata. Furthermore, the API enables interoperability with other stakeholders and ensures that the impact of data reuse extends beyond the platform itself. One example of such a use of the *transcriptiones* API is the interface between *transcriptiones* and the *Digitaler Lesesaal* of the *Staatsarchiv Basel-Stadt*. Currently in development, this connection will enable direct links to existing transcriptions within the archive catalog.

The central aspects of *transcriptiones* are accessibility, transparency, collaboration, and reuse. While the aforementioned features and strategies of *transcriptiones* tackle those aspects with regard to the transcriptions and their metadata, the platform also promotes them in the context of code and its development. For this reason, the source code is openly available on Zenodo and GitHub under the very open BSD-3-Clause license [@fuchsTranscriptiones2023a; @fuchsTranscriptiones2023].

## Conclusion

*transcriptiones* provides the infrastructure for sharing and editing transcriptions, which it understands as research data. By doing so, it takes this type of data to the age of FAIR and open research data. As an open and collaborative platform that requires metadata during uploads to ensure proper attribution to the source and offers various search strategies, it ensures that transcriptions are findable. Accessibility is guaranteed through the free web application, which allows viewing transcriptions without registration as well as through the various export formats and the API. The latter is also an important cornerstone in providing transcriptions and metadata interoperably. Reusability is achieved through the plethora of metadata and the versioning of edited transcriptions and metadata [For further information about what the FAIR data principles are, see @wilkinsonFAIRGuidingPrinciples2016]. At the same time, *transcriptiones* prompts a reconsideration of the perception of transcriptions, encouraging contributors to open up their work to collaboration. All these parts play together towards understanding transcriptions as invaluable research data which is worth gathering, sharing, enhancing and documenting so that many historians can use them for downstream research.

## References

::: {#refs}
:::
  • Edit this page
  • Report an issue