DigiHistCH24
  • Home
  • Book of Abstracts
  • Conference Program
  • Call for Contributions
  • About

Modeling in history: using LLMs to automatically produce diagrammatic models synthesizing Piketty’s historiographical thesis on economic inequalities

  • Home
  • Book of Abstracts
    • Data-Driven Approaches to Studying the History of Museums on the Web: Challenges and Opportunities for New Discoveries
    • On a solid ground. Building software for a 120-year-old research project applying modern engineering practices
    • Tables are tricky. Testing Text Encoding Initiative (TEI) Guidelines for FAIR upcycling of digitised historical statistics.
    • Training engineering students through a digital humanities project: Techn’hom Time Machine
    • From manual work to artificial intelligence: developments in data literacy using the example of the Repertorium Academicum Germanicum (2001-2024)
    • A handful of pixels of blood
    • Impresso 2: Connecting Historical Digitised Newspapers and Radio. A Challenge at the Crossroads of History, User Interfaces and Natural Language Processing.
    • Learning to Read Digital? Constellations of Correspondence Project and Humanist Perspectives on the Aggregated 19th-century Finnish Letter Metadata
    • Teaching the use of Automated Text Recognition online. Ad fontes goes ATR
    • Geovistory, a LOD Research Infrastructure for Historical Sciences
    • Using GIS to Analyze the Development of Public Urban Green Spaces in Hamburg and Marseille (1945 - 1973)
    • Belpop, a history-computer project to study the population of a town during early industrialization
    • Contributing to a Paradigm Shift in Historical Research by Teaching Digital Methods to Master’s Students
    • Revealing the Structure of Land Ownership through the Automatic Vectorisation of Swiss Cadastral Plans
    • Rockefeller fellows as heralds of globalization: the circulation of elites, knowledge, and practices of modernization (1920–1970s): global history, database connection, and teaching experience
    • Theory and Practice of Historical Data Versioning
    • Towards Computational Historiographical Modeling
    • Efficacy of Chat GPT Correlations vs. Co-occurrence Networks in Deciphering Chinese History
    • Data Literacy and the Role of Libraries
    • 20 godparents and 3 wives – studying migrant glassworkers in post-medieval Estonia
    • From record cards to the dynamics of real estate transactions: Working with automatically extracted information from Basel’s historical land register, 1400-1700
    • When the Data Becomes Meta: Quality Control for Digitized Ancient Heritage Collections
    • On the Historiographic Authority of Machine Learning Systems
    • Films as sources and as means of communication for knowledge gained from historical research
    • Develop Yourself! Development according to the Rockefeller Foundation (1913 – 2013)
    • AI-assisted Search for Digitized Publication Archives
    • Digital Film Collection Literacy – Critical Research Interfaces for the “Encyclopaedia Cinematographica”
    • From Source-Criticism to System-Criticism, Born Digital Objects, Forensic Methods, and Digital Literacy for All
    • Connecting floras and herbaria before 1850 – challenges and lessons learned in digital history of biodiversity
    • A Digital History of Internationalization. Operationalizing Concepts and Exploring Millions of Patent Documents
    • From words to numbers. Methodological perspectives on large scale Named Entity Linking
    • Go Digital, They Said. It Will Be Fun, They Said. Teaching DH Methods for Historical Research
    • Unveiling Historical Depth: Semantic annotation of the Panorama of the Battle of Murten
    • When Literacy Goes Digital: Rethinking the Ethics and Politics of Digitisation
  • Conference Program
    • Schedule
    • Keynote
    • Practical Information
    • Event Digital History Network
    • Event SSH ORD
  • Call for Contributions
    • Key Dates
    • Evaluation Criteria
    • Submission Guidelines
  • About
    • Code of Conduct
    • Terms and Conditions

On this page

  • Introduction
  • Models in history: from prediction to exploration
  • “Manual” causal modeling and Piketty’s historical narrative
  • LLMs and the automatic generation of historiographical diagrams: starting with a small article
  • Benefits and implications of that research
  • References
  • Edit this page
  • Report an issue

Modeling in history: using LLMs to automatically produce diagrammatic models synthesizing Piketty’s historiographical thesis on economic inequalities

Poster Session
Author
Affiliation

Axel Matthey

Université de Lausanne

Published

September 12, 2024

Doi

10.5281/zenodo.13908110

Abstract

This study seeks to merge two realms: theoretical digital history, specifically the modeling in history, and economic history, with a focus on the history of income and wealth inequalities. The central objective is to apply theoretical research outcomes concerning models and their application in history to scrutinize a historical explanation of the evolution of economic inequalities between 1914 and 1950. Traditionally, predictive models with reproducible results were paramount for validating explanations through observed data. However, the role of models has expanded, moving beyond mere predictive functions. This paradigm shift, acknowledged by the philosophy of science in recent decades, emphasizes that models now serve broader purposes, guiding exploration and research rather than just prediction. These models are not merely tools for validating predictions; they serve to bring clarity to our thinking processes, establishing the conditions under which our intuitions prove valid. Beyond merely representing systematic relationships between predetermined facets of reality, models aspire to elucidate causal connections. When a historical model aims to provide causal explanations, the process involves identifying the “explanandum” (the aspect of reality being explained) as the dependent variable and working backward to pinpoint its hypothetical causes as independent. Using a diagrammatic approach, we formalized a qualitative model aligning with an historiographical explanation of the evolution of economic inequalities by Thomas Piketty during 1914-1950. The intent was to employ causal diagrams, translating the narrative embedded in Piketty’s historiography of inequalities into a formal model. This endeavor sought to make explicit the implicit causal relationships within the historian’s narrative: the resulting causal model serves as a symbolic synthesis of our comprehension of a specific subject, enabling the construction of a highly refined narrative synthesis from a complex topic.

Keywords

LLM, diagrams, economic inequality, pedagogy, Artificial Intelligence

Introduction

This research explores the potential of Large Language Models (LLMs) to automatically generate diagrammatic models that synthesize complex historical narratives. Specifically, we focus on Thomas Piketty’s analysis of economic inequalities in the first half of the 20th century, as presented in his seminal work, Capital in the Twenty-First Century . Our project aims to bridge the gap between theoretical digital history and economic history by employing LLMs to translate Piketty’s causal explanations into visual representations, thus enhancing understanding and facilitating further analysis.

Models in history: from prediction to exploration

Traditionally, models in the natural sciences served primarily as predictive tools, used to validate explanations through observed data. However, their role has expanded to encompass exploration, research, and the clarification of thought processes. This shift, acknowledged by the philosophy of science, emphasizes the ability of models to provide new perspectives and challenge existing assumptions. The models’ main function becomes to clarify our thinking processes.

With regard to that evolution of modeling, in the digital humanities, it should be noted that there is also a more “introspective” branch of research on models in the humanities, which can be described as the “meta-discipline” of the digital humanities, which attempts to evaluate the epistemological effects of models on research in the humanities, and which “calls for a shift from models as static objects (e.g., what functionalities they enable) to the dynamic process of modeling” . This distinction—between the simple use of models (model-based quantitative operationalization) and epistemological research into the implications of formal modeling—can be mapped onto the division between applied and theoretical digital humanities proposed by Michael Piotrowski .

“Manual” causal modeling and Piketty’s historical narrative

Our research delves into the realm of causal modeling, aiming to elucidate the cause-and-effect relationships within historical narratives. In this context, the “explanandum” (the phenomenon to be explained) is treated as the dependent variable, and the model seeks to identify its potential causes. We began by manually creating a semi-formal qualitative model based on Piketty’s explanation of economic inequalities from 1914 to 1950. Utilizing causal diagrams, as described by Judea Pearl (Pearl 2018), we formalized Piketty’s narrative, making explicit the implicit causal relationships within his analysis. The resulting model serves as a symbolic synthesis of our understanding of Piketty’s core argument: that outside periods of significant economic interventionism, wealth tends to grow faster than economic output, leading to increased inequality (r > g, where r is the rate of return on capital and g is the rate of economic growth). While this trend was mitigated in the first half of the 20th century due to major sociopolitical shocks like the World Wars and the Great Depression, Piketty argues that it has resurfaced since the 1970s and 1980s, a phenomenon he terms the “return of capital.”

LLMs and the automatic generation of historiographical diagrams: starting with a small article

Our initial exploration will involve using Google’s LLM (Gemini 1.5 Pro) to convert a concise historical article by Piketty into a simplified causal diagram. This article will be A Historical Approach to Property, Inequality and Debt: Reflections on Capital in the 21st Century . Our previous experience with manually constructing a causal model based on Piketty’s work highlighted the potential for automation using LLMs. LLMs have demonstrated remarkable capabilities in various domains, including understanding and generating code, translating languages, and even creating different creative text formats. We believe that LLMs can be trained to analyze historical texts, identify causal relationships, and automatically generate corresponding diagrammatic models. This could significantly enhance our ability to visualize and comprehend complex historical narratives, making implicit connections explicit and facilitating further exploration and analysis. Historiographical theories explore the nature of historical inquiry, focusing on how historians represent and interpret the past. The use of diagrams has been considered as a means to enhance the communication and understanding of these complex theories. Diagrams have been utilized to represent causal narratives in historiography, providing a visual means to support historical understanding and communicate research findings effectively Diagrams have indeed been employed to represent historiographical theories, particularly to illustrate causal narratives and enhance the clarity of historical explanations. On the other hand, Large Language Models (LLMs) have been increasingly integrated into various aspects of coding, from understanding and generating code to assisting in software development and customization. These models leverage vast amounts of data to provide support for a range of programming-related tasks. LLMs are proving to be versatile tools in the realm of coding, capable of understanding, generating, and customizing code across various programming languages and applications. They offer improvements in code-related tasks, user-friendly interactions, and support for low-resource languages. However, challenges such as bias in code generation and the need for human oversight in code review remain. Overall, LLMs are becoming an integral part of the software development process, offering both opportunities and areas for further research and development.

Benefits and implications of that research

The ability to automatically generate historiographical diagrams using LLMs offers several potential benefits:

  • Enhanced understanding of complex historical narratives: Visual representations can clarify intricate causal relationships and make historical analysis more accessible to a wider audience.
  • Identification of uncertainties and biases: LLMs can be trained to recognize subtle markers of uncertainty and bias within historical texts, encouraging critical engagement with historical interpretations.
  • Efficiency and scalability: Automating the process of diagram generation would save time and resources, allowing researchers and teachers to explore a wider range of historical topics and narratives.

References

Back to top

Reuse

CC BY-SA 4.0

Citation

BibTeX citation:
@misc{matthey2024,
  author = {Matthey, Axel},
  editor = {Baudry, Jérôme and Burkart, Lucas and Joyeux-Prunel,
    Béatrice and Kurmann, Eliane and Mähr, Moritz and Natale, Enrico and
    Sibille, Christiane and Twente, Moritz},
  title = {Modeling in History: Using {LLMs} to Automatically Produce
    Diagrammatic Models Synthesizing {Piketty’s} Historiographical
    Thesis on Economic Inequalities},
  date = {2024-09-12},
  url = {https://digihistch24.github.io/submissions/poster/476/},
  doi = {10.5281/zenodo.13908110},
  langid = {en},
  abstract = {This study seeks to merge two realms: theoretical digital
    history, specifically the modeling in history, and economic history,
    with a focus on the history of income and wealth inequalities. The
    central objective is to apply theoretical research outcomes
    concerning models and their application in history to scrutinize a
    historical explanation of the evolution of economic inequalities
    between 1914 and 1950. Traditionally, predictive models with
    reproducible results were paramount for validating explanations
    through observed data. However, the role of models has expanded,
    moving beyond mere predictive functions. This paradigm shift,
    acknowledged by the philosophy of science in recent decades,
    emphasizes that models now serve broader purposes, guiding
    exploration and research rather than just prediction. These models
    are not merely tools for validating predictions; they serve to bring
    clarity to our thinking processes, establishing the conditions under
    which our intuitions prove valid. Beyond merely representing
    systematic relationships between predetermined facets of reality,
    models aspire to elucidate causal connections. When a historical
    model aims to provide causal explanations, the process involves
    identifying the “explanandum” (the aspect of reality being
    explained) as the dependent variable and working backward to
    pinpoint its hypothetical causes as independent. Using a
    diagrammatic approach, we formalized a qualitative model aligning
    with an historiographical explanation of the evolution of economic
    inequalities by Thomas Piketty during 1914-1950. The intent was to
    employ causal diagrams, translating the narrative embedded in
    Piketty’s historiography of inequalities into a formal model. This
    endeavor sought to make explicit the implicit causal relationships
    within the historian’s narrative: the resulting causal model serves
    as a symbolic synthesis of our comprehension of a specific subject,
    enabling the construction of a highly refined narrative synthesis
    from a complex topic.}
}
For attribution, please cite this work as:
Matthey, Axel. 2024. “Modeling in History: Using LLMs to Automatically Produce Diagrammatic Models Synthesizing Piketty’s Historiographical Thesis on Economic Inequalities.” Edited by Jérôme Baudry, Lucas Burkart, Béatrice Joyeux-Prunel, Eliane Kurmann, Moritz Mähr, Enrico Natale, Christiane Sibille, and Moritz Twente. Digital History Switzerland 2024: Book of Abstracts. https://doi.org/10.5281/zenodo.13908110.
Source Code
---
submission_id: 476
categories: 'Poster Session'
title: "Modeling in history: using LLMs to automatically produce diagrammatic models synthesizing Piketty's historiographical thesis on economic inequalities"
author:
  - name: Axel Matthey
    orcid: 0000-0001-7454-1131
    email: axel.matthey@unil.ch
    affiliations:
      - Université de Lausanne
keywords:
  - LLM
  - diagrams
  - economic inequality
  - pedagogy
  - Artificial Intelligence
abstract: |
  This study seeks to merge two realms: theoretical digital history, specifically the modeling in history, and economic history, with a focus on the history of income and wealth inequalities. The central objective is to apply theoretical research outcomes concerning models and their application in history to scrutinize a historical explanation of the evolution of economic inequalities between 1914 and 1950. Traditionally, predictive models with reproducible results were paramount for validating explanations through observed data. However, the role of models has expanded, moving beyond mere predictive functions. This paradigm shift, acknowledged by the philosophy of science in recent decades, emphasizes that models now serve broader purposes, guiding exploration and research rather than just prediction. These models are not merely tools for validating predictions; they serve to bring clarity to our thinking processes, establishing the conditions under which our intuitions prove valid. Beyond merely representing systematic relationships between predetermined facets of reality, models aspire to elucidate causal connections. When a historical model aims to provide causal explanations, the process involves identifying the "explanandum" (the aspect of reality being explained) as the dependent variable and working backward to pinpoint its hypothetical causes as independent. Using a diagrammatic approach, we formalized a qualitative model aligning with an historiographical explanation of the evolution of economic inequalities by Thomas Piketty during 1914-1950. The intent was to employ causal diagrams, translating the narrative embedded in Piketty's historiography of inequalities into a formal model. This endeavor sought to make explicit the implicit causal relationships within the historian's narrative: the resulting causal model serves as a symbolic synthesis of our comprehension of a specific subject, enabling the construction of a highly refined narrative synthesis from a complex topic.
date: 09-12-2024
doi: 10.5281/zenodo.13908110
bibliography: references.bib
---

## Introduction

This research explores the potential of Large Language Models (LLMs) to automatically generate diagrammatic models that synthesize complex historical narratives. Specifically, we focus on Thomas Piketty's analysis of economic inequalities in the first half of the 20th century, as presented in his seminal work, Capital in the Twenty-First Century . Our project aims to bridge the gap between theoretical digital history and economic history by employing LLMs to translate Piketty's causal explanations into visual representations, thus enhancing understanding and facilitating further analysis.

## Models in history: from prediction to exploration

Traditionally, models in the natural sciences served primarily as predictive tools, used to validate explanations through observed data. However, their role has expanded to encompass exploration, research, and the clarification of thought processes. This shift, acknowledged by the philosophy of science, emphasizes the ability of models to provide new perspectives and challenge existing assumptions. The models' main function becomes to clarify our thinking processes.

With regard to that evolution of modeling, in the digital humanities, it should be noted that there is also a more "introspective" branch of research on models in the humanities, which can be described as the "meta-discipline" of the digital humanities, which attempts to evaluate the epistemological effects of models on research in the humanities, and which “calls for a shift from models as static objects (e.g., what functionalities they enable) to the dynamic process of modeling” . This distinction—between the simple use of models (model-based quantitative operationalization) and epistemological research into the implications of formal modeling—can be mapped onto the division between applied and theoretical digital humanities proposed by Michael Piotrowski .

## “Manual” causal modeling and Piketty's historical narrative

Our research delves into the realm of causal modeling, aiming to elucidate the cause-and-effect relationships within historical narratives. In this context, the "explanandum" (the phenomenon to be explained) is treated as the dependent variable, and the model seeks to identify its potential causes.
We began by manually creating a semi-formal qualitative model based on Piketty's explanation of economic inequalities from 1914 to 1950. Utilizing causal diagrams, as described by Judea Pearl (Pearl 2018), we formalized Piketty's narrative, making explicit the implicit causal relationships within his analysis. The resulting model serves as a symbolic synthesis of our understanding of Piketty's core argument: that outside periods of significant economic interventionism, wealth tends to grow faster than economic output, leading to increased inequality (r > g, where r is the rate of return on capital and g is the rate of economic growth). While this trend was mitigated in the first half of the 20th century due to major sociopolitical shocks like the World Wars and the Great Depression, Piketty argues that it has resurfaced since the 1970s and 1980s, a phenomenon he terms the "return of capital."

## LLMs and the automatic generation of historiographical diagrams: starting with a small article

Our initial exploration will involve using Google’s LLM (Gemini 1.5 Pro) to convert a concise historical article by Piketty into a simplified causal diagram. This article will be A Historical Approach to Property, Inequality and Debt: Reflections on Capital in the 21st Century .
Our previous experience with manually constructing a causal model based on Piketty's work highlighted the potential for automation using LLMs. LLMs have demonstrated remarkable capabilities in various domains, including understanding and generating code, translating languages, and even creating different creative text formats. We believe that LLMs can be trained to analyze historical texts, identify causal relationships, and automatically generate corresponding diagrammatic models. This could significantly enhance our ability to visualize and comprehend complex historical narratives, making implicit connections explicit and facilitating further exploration and analysis.
Historiographical theories explore the nature of historical inquiry, focusing on how historians represent and interpret the past. The use of diagrams has been considered as a means to enhance the communication and understanding of these complex theories.
Diagrams have been utilized to represent causal narratives in historiography, providing a visual means to support historical understanding and communicate research findings effectively Diagrams have indeed been employed to represent historiographical theories, particularly to illustrate causal narratives and enhance the clarity of historical explanations.
On the other hand, Large Language Models (LLMs) have been increasingly integrated into various aspects of coding, from understanding and generating code to assisting in software development and customization. These models leverage vast amounts of data to provide support for a range of programming-related tasks. LLMs are proving to be versatile tools in the realm of coding, capable of understanding, generating, and customizing code across various programming languages and applications. They offer improvements in code-related tasks, user-friendly interactions, and support for low-resource languages. However, challenges such as bias in code generation and the need for human oversight in code review remain. Overall, LLMs are becoming an integral part of the software development process, offering both opportunities and areas for further research and development.

## Benefits and implications of that research

The ability to automatically generate historiographical diagrams using LLMs offers several potential benefits:

- Enhanced understanding of complex historical narratives: Visual representations can clarify intricate causal relationships and make historical analysis more accessible to a wider audience.
- Identification of uncertainties and biases: LLMs can be trained to recognize subtle markers of uncertainty and bias within historical texts, encouraging critical engagement with historical interpretations.
- Efficiency and scalability: Automating the process of diagram generation would save time and resources, allowing researchers and teachers to explore a wider range of historical topics and narratives.

## References

::: {#refs}
:::
  • Edit this page
  • Report an issue