From Source-Criticism to System-Criticism, Born Digital Objects, Forensic Methods, and Digital Literacy for All
In an era dominated by digital information processing and communication systems, digital literacy has emerged as a critical competence. This competency is vital at all educational levels, fostering a profound and critical understanding of how information is processed digitally. Especially crucial is the ability to discern information sources, evaluate their expertise, and recognize potential biases, fundamental for the stability of democratic societies. This imperative becomes even more pronounced for historians engaging with original digital sources and analytical tools. Without reflective consideration of the epistemic conditions and consequences of these methods, historians risk compromising the reproducibility of their findings and undermining their long-term epistemic authority. This paper contends that historians, with their specific expertise and perspectives, can significantly contribute to the establishment and dissemination of a digital literacy canon. Source criticism and hermeneutics, integral to interpreting information, transcend the medium, be it digital or otherwise. However, in the digital realm, source criticism requires verifying the completeness, authenticity, provenance, context, and environment of data. To achieve this, the historian’s toolkit must integrate methods from digital forensics. Conducting digital forensics and source criticism demands a foundational understanding of computer architecture and functionality. An accessible route to grasp the complexities of modern digital technology is to study its historical genesis. Viewing digital information and communication systems as culturally situated technologies within historical and social contexts allows historians to contribute significantly to digital literacy in education. The historical discipline, positioned as a mediator, can play a pivotal role in comprehending the evolution of information technology. Understanding the cultural, political, economic, and social contexts is imperative. Embracing this role would position historians as key agents in transmitting digital literacy in educational settings, both at schools and universities.
Digital Literacy, History of Computing, Digital Forensics, Digital Born Sources, Source Criticism
Introduction
We live in the information age, in the information society, respectively. Scholars might disagree on when exactly it began and what its defining characteristics are, but they all agree that the flow and power of information are of unprecedented importance in our world (Castells and Castells 2000; Cortada 2002; Beniger 1997). The advent of digital information and communication technology has accelerated the pace and increased the amount of information humanity is producing, processing, and transmitting. Thus, to navigate through our world made up of information, it has become imperative to develop a sort of “digital literacy,” commonly defined as the ability to acquire information, assess its quality, and apply it to a given problem (Lankshear and Knobel 2008; Reedy and Parker 2018; Carmi and Yates 2020). This essay argues that historians can contribute significantly to the formulation of a canon in digital literacy, because their training and epistemic traditions are based on evaluating the authenticity, credibility, perspective, and context of sources. However, this paper will emphasize that a foundational understanding of the functional principles of digital information processing and basic approaches of digital forensics must be incorporated into the historian’s toolbox. It will demonstrate that the history of computing offers a path to acquire this knowledge and to disseminate it. To conclude, the paper will point out that the technological, social, political, cultural, and economic context and embeddedness of information, its production, and circulation, are fundamental for interpretation and understanding, highlighting again the favorable position for historians to play a significant part in providing orientation and critique in the information age and contributing to general digital literacy.
Forensic Source Criticism
The need for historians and archivists to engage with the methods of computer forensics is well established among those who work with digitally born objects (Ries 2022; Duranti and Endicott-Popovsky 2010; Fickers 2020). Increasingly, digitized sources and re-born digitals are retrieved via the internet and incorporated into scholarship. Therefore, “digital basics” become indispensable for evaluating the credibility and meaning of a given source (Milligan 2019, 241). Learning to deal with digital objects and their specific qualities, elements, and characteristics is as indispensable for historians of the information age as learning the scriptures and languages of the past for historians of the pre-modern period. As Trevor Owens and Thomas Padilla stated: “In much the same way that a historian who studies eighteenth-century documents needs to learn to read various kinds of handwriting scripts to develop an ability to read and decipher those texts, historians are going to need to develop sophisticated understandings of how digital media systems functioned at particular points in time and how different kinds of users used them” (Owens and Padilla 2021, 12). Arguing for the importance of forensic methods for historical inquiry and source criticism, Thorsten Ries has stated: “If historians are to critically appraise primary sources and establish the circumstances of their creation, provenance, processing history, so as to facilitate the identification of forgeries, fakes and disinformation, it is essential to explore the forensic history of the material creation of these records” (Ries 2022, 184).
The most basic and essential skills of forensic source criticism for historians working on born-digital objects include overcoming “screen essentialism,” the ability to retrieve and interpret metadata, knowledge of encoding formats and their meaning, and the ability to read and understand code. Overcoming “screen essentialism” (Owens 2018, 46) means acknowledging that the interfaces through which we interact with computing devices should be understood as performances. Interfaces themselves are complex interactions between various programs, routines, hardware, and software, not only the displays, screens, input-output devices, and peripherals allowing us to control the operations of the computer. They are designed to free the user’s mind from thinking about all of these underlying functions, enabling her to focus on her specific task. The downside of this comfortable arrangement is that the user loses sight of many preconfigured decisions on how to process, render, and display data. “Screen essentialism” refers to the tendency to take “what you see for what there is.” Overcoming it means understanding that the visual impression we get on a given system is just one of many possible others. Think of resolution and colors on a basic level, think of the difference between a text-editor on the one hand and an integrated development environment on the other. Being able to distinguish between the “performative” elements of the display and the core properties of a digital object is an essential part of digital literacy. The properties characterizing a digital object are often stored in its metadata.
“Metadata is our friend,” James Baker wrote (Baker 2019). Indeed, metadata, i.e., descriptive data about digital objects attached to them by the system that produced the object or data, can contain valuable information for answering some of the most important questions of source criticism: date of production, authorship, size, format, and some more or less useful properties. Knowing where to find metadata of files and objects and checking whether or not they are consistent with the content of the data is thus a basic and pivotal first step in evaluating born-digital sources. An example would be to check if the metadata in a text file about the time of creation and last modification is in line with what it claims to report and from what perspective. However, to substantiate such evaluations based on metadata, historians also need to know how metadata can be misleading or manipulated. The date-time-stamp attached to each file, for example, is automatically generated by the operating system. This again is dependent on the configuration of the system time and the time zone in the system’s settings and can be changed (Baker 2019).
Further important insights into the characteristics of any given digital object can be derived from some understanding of the principles of encoding and formats. Basically, all digital objects are comprised of two distinct bits (1 and 0), but there is an infinite number of ways to encode information based on the binary representation of signals. Text or letters, respectively, can be encoded in different ways: there is the Morse/alphabet, for example, which uses only short and long signals; there are modern and widely used encoding schemes for letters like ASCII or UTF/8, which supports also non-western characters (Pargman and Palme 2009). The same goes for numbers, which can be represented in binary, hexadecimal, or any nested encoding. Images can be represented in various ways, depending on the way the distribution of black, white, or colored pixels in a grid are encoded (Dourish 2017). To understand and critically read the encoding of any digital object is to acknowledge and scrutinize the choices of the significant properties and their representation determined by the respective encoding format.
Finally, an integral part of digital literacy is a basic understanding of algorithms and code. Here again, the history of computing serves as an introduction and explanation at the same time. In his book “Computer Power and Human Reason” from 1976, computing pioneer Joseph Weizenbaum provides a simple explanation of the principle of a Turing machine, demonstrating that there is no functional difference between data and processing instruction, because they are equally codified in binaries and stored in the same memory. Almost in passing, he introduces the concept of giving human-readable and easy-to-memorize names for the instruction, like “STORE,” “GET,” etc., thereby conveying Assembler language to his readers (Weizenbaum 1976). With these concepts in mind, it is simple to understand that even different higher programming languages employ a similar set of basic concepts and instructions, such as functions, values, arguments, loops, conditional statements, and so on. These basics, which are available in countless introductory chapters and tutorials all over the internet, are sufficient to follow the arguments made by proponents of critical code studies on single lines of code or longer programs (Marino 2020; Krajewski 2020; Jaton and Bowker 2020; Montfort 2014).
Digital literacy and source criticism of born-digital objects employing basic concepts of forensics, therefore, aims to understand the logical and functional location within and relations to its environment and operating system, because such objects can neither exist nor can they be understood outside of these relationships. How can these insights be made productive for source criticism, i.e., for evaluating the integrity and authenticity of a born-digital object? One such application is scrutinizing a given file’s integrity by comparing different versions of it. Most systems automatically produce backup copies of each file and also store temporary versions while the file is in use. These previous and alternative versions are often either invisible in the contents of a given directory as displayed by the common file managers of personal computers. In addition, such files are often stored within an application’s directory instead of the directory the user is working on (Kirschenbaum 2012). If a copy or a previous version of a file can be located, assumptions about its coherence and originality are possible. Even without the ability to open or read a file, comparing its size and the one of the previous version can be revealing. If a copy and the original of a given file are truly identical, it can be verified by comparing the hash-sum of both files. There are numerous open-source tools and instructions to do that available also to novices (Altheide and Carvey 2011; Hosmer and Kessler 2014).
Another approach to source criticism inspired by digital forensics is to read between the lines, or “between the bits,” more precisely. Thorsten Ries, for example, has demonstrated that text files can contain much more than they might reveal at first sight, i.e., overcoming the perspective of screen essentialism (Ries 2022, 176–78). In his examples, he reads the Revision Identifier for Style Definition (RSID) automatically attached to each MS Word file to make statements about a file’s creation and revision history. Similarly, Trevor Owens has demonstrated that important information about a file’s history can be retrieved simply by changing its file-type extension and thus opening it with different software (Owens 2018, 43). In Owens’ example, he opens an .mp3 music file with a simple text editor by changing the extension to .txt, which enables him literally to “read” all of the file’s metadata. Depending on the scheme employed by the file managing system, this reading “against the grain” might reveal information that is not accessible with a simple right-click. Similarly, it might be worth a try to open a file of unknown format with the vim editor for a first inspection.
System and Environment: Contextualizing digital objects
All elements of the expanded and updated version of source criticism outlined above point to an increased attention towards the systems and environments into which the production and processing of digital objects are embedded. On a basic level, computation always relies on specific logical, material, and technical systems and environments, i.e., the operating system, hardware, storage media, exchange formats, transmission protocols, etc. Inspired by platform studies and the “new materialism,” recent research on digital objects has emphasized the platform character of all digital media and objects and argued for their understanding as “assemblages” (Owens 2018; Zuanni 2021). This line of research emphasizes the multiple relations and dependencies of all digital objects to systems and environments. All data has to be organized according to certain file formats and standards to be transferable and processable; file formats require specific applications to be read and manipulated; applications and programs, in turn, rely on operating systems, which again are bound to specific hardware configurations and must be maintained and updated, and so on. With networked computing, web-based applications, and cloud storage, complex and nested platforms and assemblages have become the norm. Consequently, any concept of digital literacy or data literacy must incorporate critical reflection on the relations, dependencies, and determinations of systems and infrastructure (Gray, Gerlitz, and Bounegru 2018). This is in line with recent research in science and technology studies and the materialistic turn in the history of computing, which center on connectivity and reliance on large and complex infrastructure networks in their studies (Parks and Starosielski 2015; Galloway and Thacker 2014; Edwards et al. 2009). Here again, following the historical unfolding and development of these infrastructures helps to understand both their general functionality and their specifics, which are sometimes more the result of traditions and path dependencies than of technical necessity.
In the same way that historians trace back the provenance, perspective, and implicit presuppositions of a “classic” paper-based source, they must reflect on the system-environment of a digital object, its relations to it, its location within it, and the epistemic consequences of that positionality and relations.
Theorizing characteristics of born-digital sources, Chiara Zuanni illustrates this positionality within technological assemblages with the example of a social media post: “Provenance might refer in the first instance to the author of a post, but it can also be traced to the data center hosting the specific content (thinking about where the content is written on a server, its forensic origin), reflecting the ways the post has traveled through the infrastructure, e.g., from a personal device to a server, and has subsequently been queried by its viewer. The agency of assemblage is therefore critical in delivering content through its material infrastructure. This agency leads to the circulation of information, a global participation in events and cultural trends, and the environmental and economic impacts of the infrastructure” (Zuanni 2021, 189).
This quote touches upon a different meaning of the term “environment,” referring to the ecological consequences of data processing and transmission and the necessary infrastructures. The impact and environmental costs of data transfer and routing via the internet are difficult to evaluate, but the energy needs and the direct and indirect results of environmental destruction are enormous (Pasek, Vaughan, and Starosielski 2023). Similarly, the term “system” can be employed to describe the socio-economic, political, and cultural configurations and power structures in which the production of computing software and hardware as well as data-driven knowledge are organized and enforced. While proponents of “new materialism” have convincingly demonstrated that the “cloud” is actually a very material and manifest assemblage consisting of data centers and routing and transmission infrastructures, the term “smart technologies” suggests that data processing is a predominantly automatized and de-humanized affair, omitting the work of humans at each and every level and corner of the information society: from underpaid female workers assembling processors to click-workers around the world training algorithms and normalizing data, to the often unremarked and uncredited specialists maintaining and repairing the systems we take for granted. Even Artificial Intelligence, the very symbol of working and thinking without human involvement, actually consists of a lot of human labor (Mullaney et al. 2021).
Including context, systems, and environment into the analysis and reflection of computing and born-digital objects is therefore at the same time a productive research agenda for the history of computing and an approach to an updated variant of source criticism and general digital literacy. Historians, trained to contextualize and situate information provided by sources within specific historic, social, cultural, and spatial contexts, can apply their instruments of critique and evaluation easily to digital objects and additionally provide guidance to the formulation of a general digital literacy.
Contextualization and Critique
This essay has so far argued that historians already have some valuable methods and approaches at their disposal to adapt their inquiries to novel, digital-born artifacts and media, and that they need to incorporate knowledge about the basic principles of computing and its history into their toolbox to be able to make sense of new media and archives. However, it is pivotal to keep in mind that providing interpretation and critique remains their core task. Decoding a digital artifact, tracing the history of its emergence, and understanding its relation to its technical environment serve but one objective: to make claims and arguments about its meaning. This is and remains a fundamentally critical approach that does not exclude a reflection on the methods themselves. Zack Lischer-Katz, for example, reminds us that digital forensics were not developed by and for historians, but serve a specific task in police investigations and the courtroom: “However, caution must be exercised when considering forensics as a guiding approach to archives. The epistemological basis of forensic science embeds particular assumptions about knowledge and particular systems of verification and evidence that are based on hierarchical relations of power, positivist constructions of knowledge, and the role of evidence […] A critical approach to the tools of digital forensics by archivists and media scholars requires thinking through how the forensic imagination may impose forms of knowing that reproduce particular power relations” (Lischer-Katz 2016, 5–6).
At a very basic level, historians and humanists, in general, are particularly strong exactly when their findings are more than just an addition and re-arrangement of available information. Using the distinctions between symbols and signals, Berry and colleagues have formulated an eloquent reminder of this task: “Digital humanists must address the limits of signal processing head-on, which becomes even more pressing if we also consider another question brought about by the analogy to Shannon and Weaver’s model of communication. The sender-receiver model describes the transmission of information. The charge of the digital humanities is, instead, the production of knowledge. An uncritical trust in signal processing becomes, from this perspective, quite problematic, insofar as it can confuse information for knowledge, and vice versa. […] Neither encoding or coding (textual analysis) is in fact a substitute for humanistic critique (understood in the broad sense)” (Berry et al. 2019).
Critique is and must remain the central concern of historians. This critique must be directed to the authenticity and credibility of born-digital objects and the systems that produce them. To do so, they must learn from the tools and approaches of computer forensics. But what distinguishes historians from the forensic experts is that they don’t stop at the limits of the technical systems but extend their contextualization to the broader cultural, economic, and social structures that enable the development of specific technologies. This is why the historian’s perspective and approach are indispensable for general digital literacy.
References
Reuse
Citation
@misc{feichtinger2024,
author = {Feichtinger, Moritz},
editor = {Baudry, Jérôme and Burkart, Lucas and Joyeux-Prunel,
Béatrice and Kurmann, Eliane and Mähr, Moritz and Natale, Enrico and
Sibille, Christiane and Twente, Moritz},
title = {From {Source-Criticism} to {System-Criticism,} {Born}
{Digital} {Objects,} {Forensic} {Methods,} and {Digital} {Literacy}
for {All}},
date = {2024-07-26},
url = {https://digihistch24.github.io/submissions/474/},
langid = {en},
abstract = {In an era dominated by digital information processing and
communication systems, digital literacy has emerged as a critical
competence. This competency is vital at all educational levels,
fostering a profound and critical understanding of how information
is processed digitally. Especially crucial is the ability to discern
information sources, evaluate their expertise, and recognize
potential biases, fundamental for the stability of democratic
societies. This imperative becomes even more pronounced for
historians engaging with original digital sources and analytical
tools. Without reflective consideration of the epistemic conditions
and consequences of these methods, historians risk compromising the
reproducibility of their findings and undermining their long-term
epistemic authority. This paper contends that historians, with their
specific expertise and perspectives, can significantly contribute to
the establishment and dissemination of a digital literacy canon.
Source criticism and hermeneutics, integral to interpreting
information, transcend the medium, be it digital or otherwise.
However, in the digital realm, source criticism requires verifying
the completeness, authenticity, provenance, context, and environment
of data. To achieve this, the historian’s toolkit must integrate
methods from digital forensics. Conducting digital forensics and
source criticism demands a foundational understanding of computer
architecture and functionality. An accessible route to grasp the
complexities of modern digital technology is to study its historical
genesis. Viewing digital information and communication systems as
culturally situated technologies within historical and social
contexts allows historians to contribute significantly to digital
literacy in education. The historical discipline, positioned as a
mediator, can play a pivotal role in comprehending the evolution of
information technology. Understanding the cultural, political,
economic, and social contexts is imperative. Embracing this role
would position historians as key agents in transmitting digital
literacy in educational settings, both at schools and universities.}
}