DigiHistCH24
  • Home
  • Book of Abstracts
  • Conference Program
  • Call for Contributions
  • About

Learning to Read Digital? Constellations of Correspondence Project and Humanist Perspectives on the Aggregated 19th-century Finnish Letter Metadata

  • Home
  • Book of Abstracts
    • Data-Driven Approaches to Studying the History of Museums on the Web: Challenges and Opportunities for New Discoveries
    • On a solid ground. Building software for a 120-year-old research project applying modern engineering practices
    • Tables are tricky. Testing Text Encoding Initiative (TEI) Guidelines for FAIR upcycling of digitised historical statistics.
    • Training engineering students through a digital humanities project: Techn’hom Time Machine
    • From manual work to artificial intelligence: developments in data literacy using the example of the Repertorium Academicum Germanicum (2001-2024)
    • A handful of pixels of blood
    • Impresso 2: Connecting Historical Digitised Newspapers and Radio. A Challenge at the Crossroads of History, User Interfaces and Natural Language Processing.
    • Learning to Read Digital? Constellations of Correspondence Project and Humanist Perspectives on the Aggregated 19th-century Finnish Letter Metadata
    • Teaching the use of Automated Text Recognition online. Ad fontes goes ATR
    • Geovistory, a LOD Research Infrastructure for Historical Sciences
    • Using GIS to Analyze the Development of Public Urban Green Spaces in Hamburg and Marseille (1945 - 1973)
    • Belpop, a history-computer project to study the population of a town during early industrialization
    • Contributing to a Paradigm Shift in Historical Research by Teaching Digital Methods to Master’s Students
    • Revealing the Structure of Land Ownership through the Automatic Vectorisation of Swiss Cadastral Plans
    • Rockefeller fellows as heralds of globalization: the circulation of elites, knowledge, and practices of modernization (1920–1970s): global history, database connection, and teaching experience
    • Theory and Practice of Historical Data Versioning
    • Towards Computational Historiographical Modeling
    • Efficacy of Chat GPT Correlations vs. Co-occurrence Networks in Deciphering Chinese History
    • Data Literacy and the Role of Libraries
    • 20 godparents and 3 wives – studying migrant glassworkers in post-medieval Estonia
    • From record cards to the dynamics of real estate transactions: Working with automatically extracted information from Basel’s historical land register, 1400-1700
    • When the Data Becomes Meta: Quality Control for Digitized Ancient Heritage Collections
    • On the Historiographic Authority of Machine Learning Systems
    • Films as sources and as means of communication for knowledge gained from historical research
    • Develop Yourself! Development according to the Rockefeller Foundation (1913 – 2013)
    • AI-assisted Search for Digitized Publication Archives
    • Digital Film Collection Literacy – Critical Research Interfaces for the “Encyclopaedia Cinematographica”
    • From Source-Criticism to System-Criticism, Born Digital Objects, Forensic Methods, and Digital Literacy for All
    • Connecting floras and herbaria before 1850 – challenges and lessons learned in digital history of biodiversity
    • A Digital History of Internationalization. Operationalizing Concepts and Exploring Millions of Patent Documents
    • From words to numbers. Methodological perspectives on large scale Named Entity Linking
    • Go Digital, They Said. It Will Be Fun, They Said. Teaching DH Methods for Historical Research
    • Unveiling Historical Depth: Semantic annotation of the Panorama of the Battle of Murten
    • When Literacy Goes Digital: Rethinking the Ethics and Politics of Digitisation
  • Conference Program
    • Schedule
    • Keynote
    • Practical Information
    • Event Digital History Network
    • Event SSH ORD
  • Call for Contributions
    • Key Dates
    • Evaluation Criteria
    • Submission Guidelines
  • About
    • Code of Conduct
    • Terms and Conditions

On this page

  • Introduction
  • The Data and the Portal
  • Constant Datawork in an Interdiscplinary Team
  • Portal Test Users and their Feedback
  • References
  • Edit this page
  • Report an issue

Other Links

  • Presentation Slides (PDF)

Learning to Read Digital? Constellations of Correspondence Project and Humanist Perspectives on the Aggregated 19th-century Finnish Letter Metadata

Session 2A
Authors
Affiliation

Hanna-Leena Paloposki

The Finnish Literature Society, Helsinki, Finland

Ilona Pikkanen

The Finnish Literature Society, Helsinki, Finland

Published

September 12, 2024

Modified

November 15, 2024

Doi

10.5281/zenodo.14851507

Abstract

This paper discusses a digital humanities consortium project, Constellations of Correspondence - Large and Small Networks of Epistolary Exchange in the Grand Duchy of Finland (CoCo), which aggregates, analyses and publishes 19th-century epistolary metadata from the collections of Finnish cultural heritage organisations as an open semantic web portal. First, we will approach digital literacy from our own experience as humanists operating in a digital environment. We will share how we have learned to work in a hybrid team with computer scientists and have been able to create fluent ways of analysing the data and developing the portal together. Secondly, we will explore the user testing of the portal carried out in spring 2024. It showed that building a committed test group is challenging and that ‘traditional’ humanists have difficulties in studying mass data. However, the user feedback was encouraging, and guides us in developing the portal before its launch.

Keywords

Semantic web portal, interdisciplinary collaboration, user testing survey

For this paper, slides are available on Zenodo (PDF).

Introduction

This paper discusses data and practices related to an ongoing digital humanities consortium project Constellations of Correspondence – Large and Small Networks of Epistolary Exchange in the Grand Duchy of Finland (CoCo; Research Council of Finland, 2021–2025). The project aggregates, analyses and publishes 19th-century epistolary metadata from letter collections of Finnish cultural heritage (CH) organisations on a Linked Open Data service and as a semantic web portal (the ‘CoCo Portal’), and it consists of three research teams, bringing together computational and humanities expertise. We focus exclusively on metadata considering them to be part of the cultural heritage and a fruitful starting point for research, providing access i.e. to 19th-century epistolary culture and archival biases. The project started with a webropol survey addressed to over 100 CH organisations to get an overview of the preserved 19th-century letters and the Finnish public organisations willing to share their letter metadata with us. Currently the CoCo portal includes seven CH organisations and four online letter publications with the metadata of over 997.000 letters and with 95.000 actors (senders and recipients of letters).

The Data and the Portal

The CoCo Data Model is based on international standards, such as CIDOC CRM (Doerr 2003), Dublin Core, and ICA Records in Contexts to promote interoperability with other datasets. The model supports the modelling of the relevant properties of letter metadata (Letter, Actor, Place, and Time-Span, provenance i.e. MetadataRecord and archival/collection level information) from the source datasets. To represent actors in these different source datasets, we use an adaptation of the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) proxy concept. The collected metadata are transformed into linked open data using an automated transformation pipeline (Drobac et al. 2023), which consists of several steps. First, each received dataset is processed into an intermediate RDF format. Then the data are harmonised with the CoCo Data Model and enriched by linking the recognised actors and places to external resources, such as Wikidata, Finnish National Biographies, as well as the Finnish AcademySampo (Drobac et al. 2023) and the BiographySampo (Tamper et al. 2023). Finally, the transformation pipeline produces a harmonised dataset of correspondence metadata. The availability of the aggregated letter metadata in the linked open data format facilitates the use of the data for data exploration and answering humanities research questions by publishing the data in a semantic portal or by using SPARQL queries. The project’s semantic portal allows users to search, browse and analyse the letters, archives, actors, and places in the CoCo dataset. It is based on the Sampo model (Hyvönen 2022) and is implemented using the Sampo-UI programming framework (Ikkala et al. 2022). The user interface works on a faceted search paradigm, which allows the user to search, e.g. for letters sent by a certain person, letters from a certain period or letters kept in a certain organisation. Data such as the sending places can be visualised on a map, and other visualisations include the yearly distributions of letters, top correspondents, and correspondence networks. The portal also offers some network analysis figures.

Constant Datawork in an Interdiscplinary Team

At this stage (mid 2024), the three consortium teams have been working together for three full years. Two of the teams (Aalto University and the University of Helsinki) have a computational profile and are responsible for data modelling and transformation and the development of the user interface (semantic portal), and the team based at the Finnish Literature Society maintains relations with the data providers (CH organisations), but also participates in the first stages of data processing, for example by manually harmonising finding aids in word format to prepare them for algorithmic processing, and by performing quality checks on automatically processed data. Members of all teams participate in research activities (conferences, seminars, etc.) in their fields, but we mostly publish co-authored papers, currently focusing on what we call ‘critical collection history’. The starting point for this interdisciplinary work was fortuitous, as the key members had years of experience in – to borrow a term from Jo Guldi (2023) – hybrid teamwork. Guldi prefers the word hybrid to the more common interdisciplinary because ‘it is more specific to the kind of aliveness, the ongoing exchange, that I have in mind as the basis for the genesis of a new field. … I can be interdisciplinary in solitude … But hybrid teams require ongoing support and thinking between people trained in and who identify as members of far-flung disciplines’. (Guldi 2023). Thinking together across disciplines requires a strong commitment to developing a common language and an atmosphere in which it is possible to express uncertainty and ask questions. Despite the previous experience, working with the specificities of archival metadata and transforming them into ‘big metadata’ has forced us to study and develop a new shared vocabulary that reflects the particular characteristics or ‘localities’ (Loukissas 2019) of this particular data in the context of Linked Open Data and the chosen Humanities’ questions. One of the realisations and lessons of the project is the importance of collection-level information (identifying the so-called records creators and the fonds, i.e. the archival collections). Data models developed in previous projects dealing with epistolary metadata (usually based on curated and published editions of letters) did not highlight these characteristics, and all the datasets we acquired from archives, libraries and museums conveyed this information in different ways. In order to understand and visualise this aspect of the data, we needed to combine a practical understanding of archival work, theoretical reflections from cultural heritage studies and critical archival theory, and domain expertise in cultural heritage data ontologies and models. Such hybrid thinking takes time, especially when the aim is both to create a new dataset and to understand and analyse it in order to answer historical research questions. About seven months into the project, we started to see aggregated data. It was also at this point that we realised that monthly project meetings were far from enough; we needed to develop systematic ways of looking at the data, and to do so together as often as possible. As a result, a series of events were developed to explore and discuss the data: First, data meetings, where we decided, for example, which parts of the various data collected from the CH organisations would be transformed into linked open data; second, SPARQL meetings, where we learned the query language and used it to interrogate the datasets (this proved to be a good way of detecting errors and anomalies either in the transformation process or in the original data); third, portal meetings, where we discuss the features and properties of the interface and, again, look for patterns in the data by looking at the visualisations; and fourth, the so-called ‘CoCothons’, annual two-day intensive workshops dedicated to data exploration, but with some research content as well. On a monthly basis, we spend a total of 1–2 working days together ‘at the data’. The funding mechanism – a three-part consortium project – makes it possible to work across different universities and research institutions to gain expertise that is not available in any one place. The downside, however, is that we are working in different organisations; this is not a fashionable digital humanities ‘LAB’. All three teams in the project work in the Helsinki metropolitan area, which makes face-to-face meetings possible, but even with the time and effort invested in team work and the digital tools for easy communication (Slack, Trello, etc.), we still find from time to time that the algorithmic premises used by the computational team do not stand up to historical scrutiny, or that the research assistants have misinterpreted instructions for manual data processing.

Portal Test Users and their Feedback

We realised early on that it was extremely important to gather user experience before actually launching the CoCo portal. So, in early February 2024, we opened the portal to a test group of 17 people for a period of 2.5 months. The group was assembled partly through an open call on social media and at some conferences, and partly by asking specific people to join. The volunteers were mostly academic humanities researchers and invited specialists from two museums with 19th-century letter collections. We provided the testers with background material on the portal and the CoCo data endpoint documentation, the organisations whose data was currently in the portal, with user instructions, and with some questions or tasks to help them get started. We asked five specific questions to which we wanted answers, but we welcomed all kinds of feedback. Building an engaged and motivated community of test users proved to be a challenge. The opening online session, where the project’s portal experts taught how to use it, received only four participants. However, they were active in asking questions and satisfied with the introduction. We also offered the possibility of a joint online closing session to discuss the experiences, but only two people registered, so we cancelled it. All this shows that once the initial excitement of joining the test group had worn off, it was difficult to reach the testers and very difficult to engage them in the test or create a group spirit. People got lost in their own research and daily work. In the end, however, after a reminder, we received feedback from eight testers, i.e. from half of the group. We are still waiting for feedback from the later test group of archivists in the Finnish Literary Society. The feedback can be divided into two groups: comments on errors in the data, mostly errors in the disambiguation of actors, and comments on the functionalities and performance of the portal. The latter was what we had hoped for. It is interesting to note that the testers, who work in CH organisations and are used to working with collection management systems and have even catalogued archival material themselves, commented more on the functionalities, while the researchers clearly focused on data errors. In our view, this shows how much hands-on experience with databases affects a humanist’s ability to study mass data material and turn into digital humanities. You have to change your mindset, as our own experience in the project has shown. We also noticed that working only with the metadata of letters was a barrier for some of the testers; in the wish list for the future development of the portal, digitised letters or their transliterations stood out. A positive result from the project’s point of view was that the testers found the portal easy to use, the user interface with its four perspectives clear and the data offered useful both for their own research and for information services in CH organisations. They were able to find unknown connections, relationships and people in the data. The metadata available seemed to respond well to their research questions and to encourage further research. The ability to do queries with places was particularly appreciated. The fact that we had to write ‘preliminary’ instructions and update the portal for the testers in advance stimulated internal discussion about the future development of the portal and how to create a smooth feedback system for data errors and functional problems. Ethical data work (Ahnert et al. 2020) requires us to be very open, precise and thorough in documenting what data we have received and how we have harmonised it, which organisations are included and how they are implemented in the portal. Some difficulties with functionalities, visualisations and data errors have either already been fixed or are in progress or at least under discussion for further action. The feedback has provided us with an insight into the difficulties that users, both ‘traditional’ humanists and members of the wider public, may encounter when using the portal, and some guidelines for us to help them learn to read digital. The feedback proves that we are on the right track towards our goal of creating a new research resource, a virtual archive that crosses organisational silos.

References

Ahnert, Ruth, Sebastian E. Ahnert, Catherine Nicole Coleman, and Scott B. Weingart. 2020. The Network Turn: Changing Perspectives in the Humanities. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781108866804.
Doerr, Martin. 2003. “The CIDOC CRM—an Ontological Approach to Semantic Interoperability of Metadata.” AI Magazine 24(3), 75–92.
Drobac, Senka, Johanna Enqvist, Petri Leskinen, Muhammad Faiz Wahjoe, Heikki Rantala, Mikko Koho, Ilona Pikkanen, et al. 2023. “The Laborious Cleaning: Acquiring and Transforming 19th-Century Epistolary Metadata,” 248–62. https://doi.org/10.5617/dhnbpub.10669.
Guldi, Jo. 2023. The Dangerous Art of Text Mining. A Methodology for Digital History. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781009263016.
Hyvönen, Eero. 2022. “Digital Humanities on the Semantic Web: Sampo Model and Portal Series.” Semantic Web – Interoperability, Usability, Applicability, 14(4), 729–44. https://doi.org/10.3233/SW-223034.
Ikkala, Esko, Eero Hyvönen, Heikki Rantala, and Mikko Koho. 2022. “Sampo-UI: A Full Stack JavaScript Framework for Developing Semantic Portal User Interfaces.” Semantic Web – Interoperability, Usability, Applicability, 13(1), 69–84. https://doi.org/10.3233/SW-210428.
Loukissas, Yanni Alexander. 2019. All Data Are Local: Thinking Critically in a Data-Driven Society. Cambridge: The MIT Press. https://doi.org/10.7551/mitpress/11543.001.0001.
Tamper, Minna, Petri Leskinen, Eero Hyvönen, Risto Valjus, and Kirsi Keravuori. 2023. “Analyzing Biography Collection Historiographically as Linked Data: Case National Biography of Finland.” Semantic Web – Interoperability, Usability, Applicability, 14(2), 385–419. https://doi.org/10.3233/SW-222887.
Back to top

Reuse

CC BY-SA 4.0

Citation

BibTeX citation:
@misc{paloposki2024,
  author = {Paloposki, Hanna-Leena and Pikkanen, Ilona},
  editor = {Baudry, Jérôme and Burkart, Lucas and Joyeux-Prunel,
    Béatrice and Kurmann, Eliane and Mähr, Moritz and Natale, Enrico and
    Sibille, Christiane and Twente, Moritz},
  title = {Learning to {Read} {Digital?} {Constellations} of
    {Correspondence} {Project} and {Humanist} {Perspectives} on the
    {Aggregated} 19th-Century {Finnish} {Letter} {Metadata}},
  date = {2024-09-12},
  url = {https://digihistch24.github.io/submissions/444/},
  doi = {10.5281/zenodo.14851507},
  langid = {en},
  abstract = {This paper discusses a digital humanities consortium
    project, *Constellations of Correspondence - Large and Small
    Networks of Epistolary Exchange in the Grand Duchy of Finland*
    (CoCo), which aggregates, analyses and publishes 19th-century
    epistolary metadata from the collections of Finnish cultural
    heritage organisations as an open semantic web portal. First, we
    will approach digital literacy from our own experience as humanists
    operating in a digital environment. We will share how we have
    learned to work in a hybrid team with computer scientists and have
    been able to create fluent ways of analysing the data and developing
    the portal together. Secondly, we will explore the user testing of
    the portal carried out in spring 2024. It showed that building a
    committed test group is challenging and that “traditional” humanists
    have difficulties in studying mass data. However, the user feedback
    was encouraging, and guides us in developing the portal before its
    launch.}
}
For attribution, please cite this work as:
Paloposki, Hanna-Leena, and Ilona Pikkanen. 2024. “Learning to Read Digital? Constellations of Correspondence Project and Humanist Perspectives on the Aggregated 19th-Century Finnish Letter Metadata.” Edited by Jérôme Baudry, Lucas Burkart, Béatrice Joyeux-Prunel, Eliane Kurmann, Moritz Mähr, Enrico Natale, Christiane Sibille, and Moritz Twente. Digital History Switzerland 2024: Book of Abstracts. https://doi.org/10.5281/zenodo.14851507.
Impresso 2: Connecting Historical Digitised Newspapers and Radio. A Challenge at the Crossroads of History, User Interfaces and Natural Language Processing.
Teaching the use of Automated Text Recognition online. Ad fontes goes ATR
Source Code
---
submission_id: 444
categories: 'Session 2A'
title: Learning to Read Digital? Constellations of Correspondence Project and Humanist Perspectives on the Aggregated 19th-century Finnish Letter Metadata
author:
  - name: Hanna-Leena Paloposki
    orcid: 0000-0003-1412-8622
    email: hanna-leena.paloposki@finlit.fi
    affiliations:
      - The Finnish Literature Society, Helsinki, Finland
  - name: Ilona Pikkanen
    orcid: 0000-0001-9435-7163
    email: ilona.pikkanen@finlit.fi
    affiliations:
      - The Finnish Literature Society, Helsinki, Finland

keywords:
  - Semantic web portal
  - interdisciplinary collaboration
  - user testing survey
abstract: |
  This paper discusses a digital humanities consortium project, *Constellations of Correspondence - Large and Small Networks of Epistolary Exchange in the Grand Duchy of Finland* (CoCo), which aggregates, analyses and publishes 19th-century epistolary metadata from the collections of Finnish cultural heritage organisations as an open semantic web portal. First, we will approach digital literacy from our own experience as humanists operating in a digital environment. We will share how we have learned to work in a hybrid team with computer scientists and have been able to create fluent ways of analysing the data and developing the portal together. Secondly, we will explore the user testing of the portal carried out in spring 2024. It showed that building a committed test group is challenging and that 'traditional' humanists have difficulties in studying mass data. However, the user feedback was encouraging, and guides us in developing the portal before its launch. 
key-points:
  - Key point 1 Hybrid thinking or multidisciplinary collaboration always takes much more time than one estimates, and it is useful to develop several complementary ways of working with data together in order to understand their local specificities.
  - Key point 2 Archival metadata is an untapped research resource for digital humanities, but its use requires close collaboration with cultural heritage organisations and practical knowledge of archival practices.
  - Key point 3 The user test survey of the portal with 19th-century letter metadata showed that building a committed test group is challenging and that 'traditional' humanists have difficulties in studying mass data.
date: 09-12-2024
date-modified: 11-15-2024
doi: 10.5281/zenodo.14851507
other-links:
  - text: Presentation Slides (PDF)
    href: https://doi.org/10.5281/zenodo.14851507
bibliography: references.bib
---

::: {.callout-note appearance="simple" icon=false}

For this paper, slides are available [on Zenodo (PDF)](https://zenodo.org/records/14851507/files/444_DigiHistCH24_LetterMetadata_Slides.pdf).

:::

## Introduction

This paper discusses data and practices related to an ongoing digital humanities consortium project *Constellations of Correspondence – Large and Small Networks of Epistolary Exchange in the Grand Duchy of Finland* (CoCo; Research Council of Finland, 2021–2025). The project aggregates, analyses and publishes 19th-century epistolary metadata from letter collections of Finnish cultural heritage (CH) organisations on a Linked Open Data service and as a semantic web portal (the ‘CoCo Portal’), and it consists of three research teams, bringing together computational and humanities expertise. We focus exclusively on metadata considering them to be part of the cultural heritage and a fruitful starting point for research, providing access i.e. to 19th-century epistolary culture and archival biases. The project started with a webropol survey addressed to over 100 CH organisations to get an overview of the preserved 19th-century letters and the Finnish public organisations willing to share their letter metadata with us. Currently the CoCo portal includes seven CH organisations and four online letter publications with the metadata of over 997.000 letters and with 95.000 actors (senders and recipients of letters).

## The Data and the Portal

The CoCo Data Model is based on international standards, such as CIDOC CRM [@Doerr], Dublin Core, and ICA Records in Contexts to promote interoperability with other datasets. The model supports the modelling of the relevant properties of letter metadata (Letter, Actor, Place, and Time-Span, provenance i.e. MetadataRecord and archival/collection level information) from the source datasets. To represent actors in these different source datasets, we use an adaptation of the Open Archives Initiative Object Reuse and Exchange (OAI-ORE) proxy concept. The collected metadata are transformed into linked open data using an automated transformation pipeline [@Drobac], which consists of several steps. First, each received dataset is processed into an intermediate RDF format. Then the data are harmonised with the CoCo Data Model and enriched by linking the recognised actors and places to external resources, such as Wikidata, Finnish National Biographies, as well as the Finnish AcademySampo [@Drobac] and the BiographySampo [@Tamper]. Finally, the transformation pipeline produces a harmonised dataset of correspondence metadata. The availability of the aggregated letter metadata in the linked open data format facilitates the use of the data for data exploration and answering humanities research questions by publishing the data in a semantic portal or by using SPARQL queries.
The project’s semantic portal allows users to search, browse and analyse the letters, archives, actors, and places in the CoCo dataset. It is based on the Sampo model [@Hyvönen] and is implemented using the Sampo-UI programming framework [@Ikkala]. The user interface works on a faceted search paradigm, which allows the user to search, e.g. for letters sent by a certain person, letters from a certain period or letters kept in a certain organisation. Data such as the sending places can be visualised on a map, and other visualisations include the yearly distributions of letters, top correspondents, and correspondence networks. The portal also offers some network analysis figures.

## Constant Datawork in an Interdiscplinary Team

At this stage (mid 2024), the three consortium teams have been working together for three full years. Two of the teams (Aalto University and the University of Helsinki) have a computational profile and are responsible for data modelling and transformation and the development of the user interface (semantic portal), and the team based at the Finnish Literature Society maintains relations with the data providers (CH organisations), but also participates in the first stages of data processing, for example by manually harmonising finding aids in word format to prepare them for algorithmic processing, and by performing quality checks on automatically processed data. Members of all teams participate in research activities (conferences, seminars, etc.) in their fields, but we mostly publish co-authored papers, currently focusing on what we call ‘critical collection history’.
The starting point for this interdisciplinary work was fortuitous, as the key members had years of experience in – to borrow a term from Jo Guldi [-@Guldi] – hybrid teamwork. Guldi prefers the word hybrid to the more common interdisciplinary because ‘it is more specific to the kind of aliveness, the ongoing exchange, that I have in mind as the basis for the genesis of a new field. ... I can be interdisciplinary in solitude ... But hybrid teams require ongoing support and thinking between people trained in and who identify as members of far-flung disciplines’. [@Guldi]. Thinking together across disciplines requires a strong commitment to developing a common language and an atmosphere in which it is possible to express uncertainty and ask questions. Despite the previous experience, working with the specificities of archival metadata and transforming them into ‘big metadata’ has forced us to study and develop a new shared vocabulary that reflects the particular characteristics or ‘localities’ [@Loukissas] of this particular data in the context of Linked Open Data and the chosen Humanities’ questions.
One of the realisations and lessons of the project is the importance of collection-level information (identifying the so-called records creators and the fonds, i.e. the archival collections). Data models developed in previous projects dealing with epistolary metadata (usually based on curated and published editions of letters) did not highlight these characteristics, and all the datasets we acquired from archives, libraries and museums conveyed this information in different ways. In order to understand and visualise this aspect of the data, we needed to combine a practical understanding of archival work, theoretical reflections from cultural heritage studies and critical archival theory, and domain expertise in cultural heritage data ontologies and models.
Such hybrid thinking takes time, especially when the aim is both to create a new dataset and to understand and analyse it in order to answer historical research questions. About seven months into the project, we started to see aggregated data. It was also at this point that we realised that monthly project meetings were far from enough; we needed to develop systematic ways of looking at the data, and to do so together as often as possible. As a result, a series of events were developed to explore and discuss the data: First, data meetings, where we decided, for example, which parts of the various data collected from the CH organisations would be transformed into linked open data; second, SPARQL meetings, where we learned the query language and used it to interrogate the datasets (this proved to be a good way of detecting errors and anomalies either in the transformation process or in the original data); third, portal meetings, where we discuss the features and properties of the interface and, again, look for patterns in the data by looking at the visualisations; and fourth, the so-called ‘CoCothons’, annual two-day intensive workshops dedicated to data exploration, but with some research content as well. On a monthly basis, we spend a total of 1–2 working days together ‘at the data’.
The funding mechanism – a three-part consortium project – makes it possible to work across different universities and research institutions to gain expertise that is not available in any one place. The downside, however, is that we are working in different organisations; this is not a fashionable digital humanities ‘LAB’. All three teams in the project work in the Helsinki metropolitan area, which makes face-to-face meetings possible, but even with the time and effort invested in team work and the digital tools for easy communication (Slack, Trello, etc.), we still find from time to time that the algorithmic premises used by the computational team do not stand up to historical scrutiny, or that the research assistants have misinterpreted instructions for manual data processing.

## Portal Test Users and their Feedback

We realised early on that it was extremely important to gather user experience before actually launching the CoCo portal. So, in early February 2024, we opened the portal to a test group of 17 people for a period of 2.5 months. The group was assembled partly through an open call on social media and at some conferences, and partly by asking specific people to join. The volunteers were mostly academic humanities researchers and invited specialists from two museums with 19th-century letter collections. We provided the testers with background material on the portal and the CoCo data endpoint documentation, the organisations whose data was currently in the portal, with user instructions, and with some questions or tasks to help them get started. We asked five specific questions to which we wanted answers, but we welcomed all kinds of feedback.
Building an engaged and motivated community of test users proved to be a challenge. The opening online session, where the project’s portal experts taught how to use it, received only four participants. However, they were active in asking questions and satisfied with the introduction. We also offered the possibility of a joint online closing session to discuss the experiences, but only two people registered, so we cancelled it. All this shows that once the initial excitement of joining the test group had worn off, it was difficult to reach the testers and very difficult to engage them in the test or create a group spirit. People got lost in their own research and daily work. In the end, however, after a reminder, we received feedback from eight testers, i.e. from half of the group. We are still waiting for feedback from the later test group of archivists in the Finnish Literary Society.
The feedback can be divided into two groups: comments on errors in the data, mostly errors in the disambiguation of actors, and comments on the functionalities and performance of the portal. The latter was what we had hoped for. It is interesting to note that the testers, who work in CH organisations and are used to working with collection management systems and have even catalogued archival material themselves, commented more on the functionalities, while the researchers clearly focused on data errors. In our view, this shows how much hands-on experience with databases affects a humanist’s ability to study mass data material and turn into digital humanities. You have to change your mindset, as our own experience in the project has shown. We also noticed that working only with the metadata of letters was a barrier for some of the testers; in the wish list for the future development of the portal, digitised letters or their transliterations stood out.
A positive result from the project’s point of view was that the testers found the portal easy to use, the user interface with its four perspectives clear and the data offered useful both for their own research and for information services in CH organisations. They were able to find unknown connections, relationships and people in the data. The metadata available seemed to respond well to their research questions and to encourage further research. The ability to do queries with places was particularly appreciated.
The fact that we had to write ‘preliminary’ instructions and update the portal for the testers in advance stimulated internal discussion about the future development of the portal and how to create a smooth feedback system for data errors and functional problems. Ethical data work [@Ahnert] requires us to be very open, precise and thorough in documenting what data we have received and how we have harmonised it, which organisations are included and how they are implemented in the portal. Some difficulties with functionalities, visualisations and data errors have either already been fixed or are in progress or at least under discussion for further action.
The feedback has provided us with an insight into the difficulties that users, both ‘traditional’ humanists and members of the wider public, may encounter when using the portal, and some guidelines for us to help them learn to read digital. The feedback proves that we are on the right track towards our goal of creating a new research resource, a virtual archive that crosses organisational silos.

::: {#CoCo_portal}

![](images/CoCo_Portal.png)

:::

## References

::: {#refs}
:::
  • Edit this page
  • Report an issue