Digital approaches to translation history

Digital translation history is defined here as a methodological approach that uses digital technologies to produce, enhance or disseminate research on translation history. This can help translation historians pose fresh questions and answer new and old ones. It entails mastering technical competencies in varying degrees while remaining grounded in the fundamentals of the historian’s craft. This paper outlines the main affordances of digital approaches as applied to the study of translation history (how these can help translation historians do things better and/or differently in some respects), as well as the limitations. It introduces relevant techniques of text analysis (such as distant reading, topic modelling and stylometrics) and data visualization, which can help tease out patterns and relationships (e.g. textual, conceptual, geographic and personal networks) in dynamic ways that potentially create new knowledge and facilitate public engagement with scholarship.


Relevance of digital humanities for translation historians
The digital humanities (DH) investigate traditional humanities questions and questions made newly possible by applying computing tools and techniques to digitized and born-digital materials. 1Although translation historians make wide use of digital media to facilitate and enhance conventional research (e.g.information retrieval and management; software for presenting and disseminating research), many have not fully explored how information technologies can help pose and/or answer research questions that might otherwise be difficult to even envisage.
DH methods have been applied to both historical and textual studies, which suggests their relevance to studying the history of translated texts.Digital translation history is defined here as a methodological approach that uses digital technologies to produce, enhance or disseminate research on translation history, including the study of digitized texts, born-digital texts, and other digital artefacts (e.g.images, audio) relevant to translation history.The goals include:  supporting conventional research agendas, by saving time and effort and allowing more thorough and extensive investigations,  revising previous assumptions and findings on the basis of new and more data and newly revealed patterns and connections,  generating unanticipated research questions and facilitating new kinds of research and new presentation modes,  facilitating teamwork and public engagement.
Examples of databases relevant to translation history include the Perso-Indica database of Persian works on Indian learned traditions, which identifies the proportion of translations in relation to original works in India between the thirteenth and nineteenth centuries2 ; the Renaissance Cultural Crossroads project, 3 which has served as a basis for research by Barker and Hosington (2013) and others; the French Book Trade in Enlightenment Europe database4 of book trade-based cultural transfers in late eighteenth-century francophone Europe; and the TETRA (Teatro e Tradução) project, which focuses on the history of theatre translation in Portugal (1800Portugal ( -2009)). 5 Databases are not, however, the only useful tool for translation historians, as outlined later.
Digital translation history requires not only the skills and insights of any historian (including source evaluation and comparison, contextualization, critical interpretation, the imagination to envisage new questions and approaches), but also those of a data analyst.It involves considering who created the digital material, for what purpose, when, what was excluded, whether the digital source is "a coherent body of materials" since its origin or an assembly from diverse sources (Cohen & Rosenzweig, 2006, p. 25), and whether non-digital materials were altered during digitization, possibly without readers being notified.Although digitized texts might seem to be a textual and visual facsimile, they are often decontextualized.Hence Weller (2013, p. 7) stresses the importance of noting "the original experience, the original medium", particularly because material is often shifted from medium to medium nowadays.
So how can translation historians use DH to complement noncomputational methods in historiographically valid ways?What are the advantages, implications and potential pitfalls of a partial shift from documents to data (or documents as data)?

Advantages and potential
Digital media allow us to do history better in several respects: Capacity and comprehensiveness: digital media make more of the historical record available because of the low costs in saving it, while massive data sets allow more extensive investigations than relying on random or 'representative' cases.
Time-saving: "text-mining methods allow us to direct our scarce attention to those materials in which we already have reason to believe we will find relevant information" (Wilkens, 2012, p. 255).
Flexibility: digital media can handle sounds, images and moving pictures, opening up translation and interpreting history beyond the textual medium.
Diversity: digital media enable more public engagement, allowing "experts and users alike to comment on original source material" (Terras, 2012, p. 49).Digital historians can also do things differently because of the following features: Manipulability: electronic tools allow searches not otherwise (readily) possible, particularly across documents.Nevertheless, Jockers (2013, p. 9) argues that "the sheer amount of data now available makes search ineffectual as a means of evidence gathering.[…] What are required are methods for aggregating and making sense out of both the nuggets and the tailings."DH also provides tools for this.
Interactivity: user-generated content is a feature of Web 2.0 interfaces, which facilitate "multiple forms of historical dialogueamong professionals, between professionals and nonprofessionals, between teachers and students, among students, among people reminiscing about the past" (Cohen & Rosenzweig, 2006, p. 6).
Hypertextuality: this allows non-linear movement through data or narratives.Hyperlinks to other texts can enhance digitally published translations.Calhoun (2017, p. 139) says "a digital edition might incorporate supplementary material such as definitions, textual variants, and bibliographic references, at a hypertext level; render primary sources searchable for specific tokens and metalanguage; or enable users to define, isolate, and then save subcorpora." Time analyses: Robertson and Mullen (2017, p. 20) observe that "computing affords a view of the longue durée otherwise obscured by individual examples", with the potential to problematize existing periodizations.Time-and date-stamping of digitally created documents allows "a new form of temporaneous comparison and analysis" (Weller, 2013, p. 8).
Challenge to canonicity: by minimizing bias in text selection, digital approaches can supplement, even undermine, existing canons, which can be somewhat arbitrary and self-perpetuating.
Identification of the typical, anomalous, (dis)continuities and clusters: shifting away from the canonical allows greater focus on the 'mundane' translations that constitute the bulk of translation history.Software can help identify the typical and the exceptional, cluster items into categories, and reduce big data to a small dataset that represents the corpus more comprehensively than standard sampling.If something of interest appears in the smaller set, the computer can retrieve similar items.Researchers can go back and forth between the two sets, "experimenting with new categories and groupings" (Manovich, 2012, p. 469).
Patterns: digital corpora can reveal systematicitye.g.through corpus approaches focusing on keywords (e.g.their different frequency from in other corpora) and collocations.6These patterns might not otherwise be apparent or sufficiently delineated.Interpreting their significance, however, requires human judgment.
Repurposing: with little time or effort, datasets can be "adapted, supplemented and transformed" (Mussell, 2013, p. 87) or placed in new contexts that can reveal "unexpected properties and relationships" (p.90).Metadata also offer a source for mining (although translations and translators are not always assigned a field in databases).
Virtual unification: digital collections can bring scattered sources together.

New concepts of 'text', 'author' and 'language'
Digital media have expanded textual notions to include multimedia forms that differ in some respects from oral, manuscript and print texts.Websites, wikis, blogs, email and tweets are subject to translation and can constitute historical sources (sometimes with untraceable authors).Many sources are already available only in digital form.This requires rethinking our concept of archives, the connection between medium and knowledge production, and how we preserve, access and interpret these artefacts.Digital texts are also affecting models of authorship and readership.Web tools facilitate collaborative writing, so the meaning of author is changing.The fact that "all digital work can be easily manipulated and remixed" undermines textual authority (Eyman, 2015, p. 72).Readers can also "customize the presentation of data to isolate issues of particular interest to them, rather than depending on the author" (Theibault, 2013, p. 180).
The growing perception of programming languages as language and of programming as writing acknowledges source code as a semiotic system with its own stylistic elegance and as a signifying cultural object.This arguably places computer programs within the purview of translation research, particularly in terms of intersemiotic translation.The field of Critical Code Studies applies literary analysis methods to computer code, and this can be done within a historical context.Although studying the history of translation between programming languages or between natural and computer language lies beyond the interests and expertise of most translation historians, these possibilities suggest how digital media broaden our object of study.

Building digital resources
Although the consensus seems to be that designing or building digital archives, tools or methodsnot just digitizing material, but knowing how to codeis not necessary for qualifying as a digital humanist, "sensitivity to the capacities and possibilities of working in a digital environment" is essential (What is Digital Humanities, 2012).
The verbal, visual and structural design of resources can affect their argument and use.If one is creating a website, for example, it is essential to decide on its main purpose -"to share knowledge, to educate the public, to appeal to donors, to connect to a wider research community, etc." (Potts, 2015, p. 259) -, its audience (e.g.translation historians, interdisciplinary researchers, the public), and whether to take a hands-off approach, interpret the materials, or mix archival materials with interpretive essays.Sample features for a translation history website include biographical sketches, oral histories (audioor videotaped interviews, with or without transcripts), primary documents (preferably searchable both within and across texts), background essays, historic photographs, zoomable and pannable maps, a bibliography, links to relevant websites, and a glossary. 7Cohen and Rosenzweig caution, however, that topical sites … sometimes lack focus and wind up being a hodgepodge of materials centered on a particular theme.Often, it makes more sense to try to excel at one thingat providing access to a rich archive, offering an intriguing interpretive exhibit, or supplying effective classroom tools or resources.(2006, pp. 49-50).
Preparing critical editions is one approach.Boyle (2015, p. 134) suggests considering "as one corpus, the evolving relations between primary texts, secondary scholarship, and tertiary commentary".For instance, The Quintilian Project8 aims to compile "all the English translations alongside secondary scholarship" regarding the classical Roman rhetorician, so as to offer "a unique vantage point from which to visualize how Quintilian is taken up over time, determine which passages are cited most frequently, and discover which translations instigate the most responses" (Boyle, 2015, p. 134).Boyle also mentions digital editions that emphasize contexts of text production and reception (p.130)e.g. by including the notebooks, manuscript fragments, prose essays, letters and journalistic articles of a translator or theorist from the past.With digitalized manuscripts, Calhoun (2017, p. 147) stresses the importance of quality images, faithful transcription, and the inclusion of annotations about "lineation, hand changes, scribal emendations and abbreviations"details whose omission hinders access to "the underlying manuscript reality".
An example of digital tools designed to compare retranslations over time is the Version Variation Visualization project, 9 where researchers have built language-neutral tools for analysing parallel multi-translation corpora to "uncover patterns relating to different types of translation, historical periods and genetic relations and patterns relating to different sub-sets of segments" (Geng et al. 2015, p. 274).

Distant reading
An alternative to creating digital resources is to make more effective use of existing ones.Although the immersive reading long applied to print texts can be used with digital texts, the extensiveness of big data can offer a different, more comprehensive and representative picture.Franco Moretti (2005Moretti ( , 2013) ) advocates 'distant reading' of massive numbers of canonical and unexceptional texts, through text analysis methods such as word frequencies, sentiment analysis 10 (systematically identifying and classifying a writer's attitudes on a particular topic and comparing the results with norms identified in other texts; this can be used to trace attitudinal changes over time), topic modeling, pattern recognition, and visualization in the form of graphs, maps, trees and clouds.The focus is on quantitative breadth rather than qualitative, interpretive depth, but it is possible to drill down to more granular levels.
Although distant reading "can flatten the particularity and ambiguity of the objects and processes that literary critics often seek to capture" (Long, 2015, p. 289), it complements close reading that focuses on singularities, facilitating back-and-forth movement between the micro-and macro-scales.
Some reader-related websites of potential interest to translation historians include the Reading Experience Database (RED) 11 and The Archaeology of Reading in Early Modern Europe website (focusing on manuscript annotations). 12

Text analysis tools
One place to start looking for useful software is DIRT (Digital Research Tools 13 ), which helps with choosing a tool based on one's aimse.g.annotation, collaboration, network analysis, publishing, statistical analysis, text cleaning or visualization.
Corpus researchers already use text analysis software, and a corpusinformed approach (e.g.concordancing; retrieving lexical clusters) can be applied to certain aspects of translation history, such as analysing translated works or paratexts, oral history transcripts, or "changes and constants in language and vocabulary use" (Hudson, 2000, p. 241).
Textual analysis packages have four broad functions (Hoffman & Waisanen, 2015 " [G]enerate basic statistics about a text, such as word count, average sentence length, number of adjectives" (p.171), to gauge lexical richness, frequent syntactical patterns and readability indexes.This allows "simple but substantiated generalizations" (p.171) and comparisons of these features between source and target texts and also over time.Features such as frequency do not, however, necessarily correlate with (historical) significance.Nor does the absence of a term in surviving texts necessarily mean it was never used or that the concept was not in play (p.172).
"[C]reate indexes and concordances", showing expressions in context (Hoffman & Waisanen, 2015, p. 170).This reveals usage patterns and, for instance, positive or negative valences of culturally or theoretically important conceptual words and how these have changed or spread over time and/or space. 14The mass digitization of (mostly Western) books now under wayas well as newspaper databases, which are disproportionately prominentoffers rudimentary concordances, but these holdings are not representative of commercially available works or the works of interest to translation historians.Nor are they amenable to proper corpora searches such as those possible with specialized software (e.g.Antconc15 or WordSmith Tools16 ).
Use preprogrammed or user-generated dictionary-based programs to indicate "how common or deviant a text's language is in comparison with other texts" (Hoffman & Waisanen, 2015, p. 176).These programs cannot, however, indicate "how the actual locations of various terms relate and link with other terms" (p.177).
" [D]o cluster analyses […] to determine the most important concepts in a given text or group of texts and how they are related to each other" (pp.170-171)e.g.not just how terms tend to collocate linguistically but also how they are related conceptually, which might change over time.Pinpointing conceptual clusters could be particularly useful, for instance, in examining historical texts discussing translation theory.Automated semantic analysis can identify classes of comments in paratexts, revealing patterns in how translators have conceptualized the act of translating.
Topic modeling is a related technique for identifying recurring themes in a corpus (rather than searching for predetermined keywords).17A sample project might involve exploring (changes in) the preoccupations and discursive framework in a translation journal or theorist's writing over time.
Stylometrics software such as the Java Graphical Authorship Attribution Program could help ascribe translatorship of anonymous translations, based on translations of known provenance. 18Hung, Bingenheimer and Wiles (2010) used a digital approach to show that 24 Buddhist sutras, traditionally attributed to different Chinese translators, were translated by the same translator or group of translators.Other uses include textual dating, verifying the authenticity of historical documents and examining the relationship (e.g.stylistic diversity over time) among different translations by the same translator or among translators from different periods, genders, locations, classes or educational backgrounds, or among translations of the works of the same author.The assumption is that a translator's stylistic habits remain detectable through the style of the different authors translated.As Jockers (2013, p. 63) points out, external factors (e.g.genre, register, age, ethnicity, nationality, time period) might "influence or even overpower the latent … signal".Research suggests that features such as articles, conjunctions and pronouns are most indicative of individual style (p.64).Forsyth and Lam (2014) found that inter-translator discriminability was possible in their digital study of nineteenth-century French translations (i.e. the translators' 'handprints' were present, although less so than the authors').Historians might use stylometrics to explore questions such as the nature of the differences between canonical and marginal translators, or whether women translators have historically been more likely to use sentence fragments, for instance, and how any such tendencies have changed over time.
Another relevant function is automatic extraction of places and names (people, organizations) through named entity recognition (NER).Place names identified in a corpus of texts about Translation Studies, for example, might trace the shifting 'balance power' in the discipline, or personal names in translators' correspondence might point to social networks.Other useful tools are image-processing techniques and handwritten text recognition (HTR) technology that facilitate the reading of old documents.For instance, SMART-GS 19 is a tool for transcribing and studying digitized historical manuscripts (mainly Japanese).Grossman (2015, p. 42) points out that "Ironically, an excess of information resists analysis and comprehension in much the same way a lack of it does."Data do not necessarily equate with knowledge and understanding.One aid here is data visualization, the intersemiotic 'translation' of statistical or other information into visual representations.Beyond merely displaying findings more efficiently than in print, it can help tease out patterns and relationships in ways that create new knowledge and facilitate public engagement.

Information visualization
Historians have long made use of tables, graphs, dynastic and genealogical charts, timelines, maps and cartograms, but less static possibilities are now available, such as animated maps or interactive timelines.Visualization packages include Wordle, Many Eyes and Phrase Net, but simple word-cloud tools can lead to erroneous conclusions.As noted above, frequency does not always equate with significance, and word length and the space around words can distort relative importance.Other possible problems with visualization software include unclear legends, "false visual cues" and "unnecessary clutter and contrived images that [make] visualizations confusing" (Theibault, 2013, p. 177).Ironically, complex visualizations can require textual explanations and argumentation for historians lacking visual literacy.
Google's Ngram Viewer is a search engine that helps chart the trajectory of words and phrases in Google's text corpora (8 languages) between 1500 and 2008.It could be used, for instance, to trace the changing interest value of particular translators or theorists.However, "The only metadata provided are publication dates, and even these are frequently incorrect.Different printings, different editions, and the unaccounted-for presence of duplicate works in the corpus complicate matters even further."(Jockers, 2013, p. 120).Jockers concludes that Ngram Viewer cannot tell us why a particular word was popular or not; it cannot address the historical meaning of the word at the time it was used …, and it cannot offer very much at all in terms of how readers might have perceived the use of the word."(p.122) Moreover, the corpus changes over time; there is no way to find "words near other words" or search for synonyms; and the interface is poor (Shea, 2014, para. 39).
One alternative is Bookworm, 20 which "makes it easy to turn any collection of texts into a richly searchable database; you can visualize trends, but with many more ways to slice data than Ngram Viewer allows" (2014, para.42).Although word frequency-based conclusions about themes or significance are open to error 21 and frequency results do not explain underlying causal mechanisms, they might challenge existing ideas or narratives and trigger questions or hypotheses for follow-up by other means.
Another use of visualization software is to show historical networkstextual, conceptual, geographical and personal, as exemplified, for instance, through ties and communications among translators, authors and stakeholders such as publishers.Network analysis can be used to explore correlations between position within a network and "strategies of translation and selection", as in Long (2015).The possibilities are suggested by network analysis software such as Gephi 22 and sites such as Mapping the Republic of Letters, 23 while the challenges are noted by Theibault (2013, pp. 182-183) and Da (2019, pp. 630-631).Despite potential drawbacks, visualization tools help generate questions and test hypotheses (e.g. about centrality and marginality).The translation historian can then explore the underlying causes.

Spatial analysis
Visualization is particularly helpful with geographical data.Historical materials often contain location information, and historians have long paid attention to how space and place shape historical experiences and processes.Recent years have witnessed a focus on "themes of region, diaspora, colonial territory, and contact zones and rubrics such as 'border' and 'boundary'" (Bodenhamer, 2013, p. 24)all relevant to translation history, as are questions of core and periphery.Maps support spatially embedded arguments and narratives, and computerbased spatial analysis helps historians formulate questions and identify patterns that textual sources alone might not readily suggest.Putnam (2016, p. 398) adds that "Visualizations of geotagged data can free us from reliance on predetermined spatial units" (e.g.nation-states).
Geographic Information Systems (GIS) software highlights aspects such as scale and proximity. 24It "captures, stores, manages, displays, and analyses information linked to a location on earth.[…] It also is an intelligent or interactive map that allows users to query the database and see the results visualized" (Bodenhamer, 2013, p. 25), including in terms of temporal change.GIS software integrates and interrelates not just quantitative data, but also textual, image, audio and other qualitative data that share a location.For instance, it would be possible to link population, publication and employment statistics, oral histories, videos, or images of historical texts and translators related to a particular site of translation.Information can be viewed separately or together and at different scales, and different layers can represent different themes.
An example of a text-to-map move would be georeferencing source text publications in a given language and the site of their translation in one or more target languages to highlight 'hot spots' or 'blank spaces'.Mapping could also be used, for instance, to identify patterns in translators' locations.Other tasks 20 http://bookworm.culturomics.org/ 21For instance, a search for "Lawrence Venuti" would miss references to "Venuti" and "Larry Venuti" (false negatives) or might include people with the same name who are not the translation theorist (false positives).See Da (2019, p. 605) for a critique of word frequency-based studies. 22https://gephi.org/ 23http://republicofletters.stanford.edu/ 24ESRI ArcGIS is the most widely used GIS software.It is expensive, but many universities have licenses.Free GIS software includes QGIS (https://qgis.org/en/site/).Sample mapping software includes eSpatial (https://www.espatial.com)and iMapBuilder (https://www.imapbuilder.com/).A helpful bibliography about historical GIS can be found at http://www.hgis.org.uk/bibliography.htm.
might involve creating a translation history layer for Google Earth or mapping translation theorists' institutional affiliations using Neatline. 25odenhamer (2013) presents several valid criticisms of GIS as a tool for historians.There is now a trend toward simpler mapping software, such as databases with mapping capabilities and the even simpler web mapping (Google Maps, etc.).Some governments make digitized maps available to researchers.Despite the drawbacks of spatial approaches, translation historians can benefit from giving greater consideration to spatial relationality.Although this concept underpins connected history, translation historians have been slow to explore digital tools that help to reveal such connections and construct spatial arguments.

Digital oral history
Oral histories can offer embodied, unmediated voices from people involved in recent translation history, thereby sharing authorship/authority in generating knowledge.Digital technologies can enhance oral history through improved recording and new engagement modes, such as allowing listeners to add their voices to online oral histories in an evolving 'conversation'.The Internet has opened up access to oral histories in terms of distribution, archiving and content management.Boyd and Larson note that Media outlets such as YouTube or SoundCloud offer near instant and free distribution of audio and video oral histories, while digital repository and content management systems like Omeka or CONTENTdm, or even Drupal or Wordpress, provide powerful infrastructure for housing oral histories in a digital archive or library.(2014, p. 4) Although creating an online oral history database is a major undertaking,26 translation-related searches of existing oral history repositories can prove beneficial.
Nevertheless, digital oral history raises issues such as the "increased vulnerability of narrators, infrastructure obsolescence, and a host of other ethical issues, particularly with heritage collections" (Boyd & Larson, 2014, p. 5), so it is important to balance availability with an ethical approach.Oral recordings are also difficult to search or navigate, so descriptive metadata in textual form are necessary.Boyd and Larson (2014, pp. 4-5) note that systems such as OHMS (Oral History Metadata Synchronizer) "enhance access to oral histories online, connecting a textual search of a transcript or an index to the correlating moment in the online audio or video interview."Transcriptionan expensive processraises issues such as whether to correct grammatical errors, which affects the reliability and unmediated nature of accounts.Preservation costs are another aspect.

Collaboration and publicly engaged scholarship
DH makes information more freely sharable and lends itself to participatory, multi-authored forms of knowledge production with other researchers and the public.Translation historians wishing to build digital resources will find it helpful, even essential, to collaborate with information sciences colleagues and can in turn contribute "qualitative and interpretive perspectives" (Grossman, 2012, para. 5), not to mention linguistic and area studies expertise.Digital translation history, particularly large data-driven studies, can benefit from collaboration, since it is difficult for single researchers to 'cover' the relevant materials and skill sets.
Another focus of DH is scholarship that engages the public more, as well as more directly.This can help break down barriers between translation researchers, professional translators and the community by making research more relevant, personalized and accessible (e.g.blogs and podcasts).For instance, CommentPress, a WordPress plug-in, "allows users to read a document and comment on specific paragraphs, thus forming communities of discourse around discrete zones of text" (Liu, 2013).Potts (2015, p. 256) argues that rather than data-driven experiences, what is needed is more experiences and design.Even without direct interaction, DH projects typically encourage readers to interpret the information for themselves.
There is democratic potential in crowdsourcing (e.g. of text transcription 27 ) and user-generated content.Online platforms for collaborative volunteer research (e.g.annotating and tagging documents for projects at Zooniverse.org) have similar potential.Wikipedia-like approaches can augment professionally written or archived sources.Davidson (2012, p. 480) suggests that "users might contribute information about the projects in which they are using the archive ..., or engage in theoretical debates in an open forum, or even contribute digitized content to the archive itself."Wikis offer an opportunity for dialogue between researchers and the (professional translation) community.Digital outreach projects can go beyond knowledge production and knowledge-sharing to collective activism, participating in broader cultural debates driven by a social purpose.Nevertheless, despite the potential of more publicly engaged scholarship, public participation in online translation history projects is likely to be low even among translators, and it might hinder innovative research that runs counter to accepted norms.

Limitations
Digital possibilities are seductive, but translation historians need to consider the following limitations and adopt an informed approach complemented by nondigital historical procedures and arguments.The "technical problems, logical fallacies, and conceptual flaws" in computational literary analysismany of which are also relevant to computational historical analysisare detailed in Da (2019).
Complexity: many meaningful aspects of translation history (e.g.causality) are too 'messy' for the quantitative approaches underpinning many (not all 28 ) digital tools.Digital history also tends to rely on homogenous sources (Robertson & Mullen, 2017, p. 18), rather than the range of sources typically used by historians.Another challenge is the fluidity of categories over time.Country names and borders shift, and social changes mean that labels (e.g.socioeconomic labels) from one period might not reflect realities at other times.Although this fluidity also presents challenges in non-digital approaches, it makes it "difficult to insert any kind of authority control" into database fields (Crone & Halsey, 2013, p. 104).
Quality (and authenticity): all historians face questions of how and where to source reliable material, the completeness, accuracy and impartiality of sources, and how much constitutes an adequate sample.Apart from the possibility of digitally forged or manipulated documents, many digital materials do not exactly match the archival materials (e.g. in terms of selection, presentation or completeness) 29 , and optical character recognition errors 27 E.g.Scripto at http://scripto.org/. 28Information technology can handle not just quantitative data and structured textual information, but also unstructured texts such as books, web pages, sounds, and images. 29For instance, the physical properties of manuscripts and printed mediasignifiers in their own (particularly with older texts) or human input errors can lead to incorrect conclusions. 30 Borgman (2010, p. 217) concludes that page images (rather than digitized texts) are "better for comparing features of the original artefact".The archivist's selection of keywords can skew searches.Large datasets might be collected on an ad hoc basis and contain gaps and errors, often inherited from smaller datasets, but users might be unaware of this unless already knowledgeable about the topic.Nor might they realize how interpretive decisionsthe selection (and exclusion), collation, structuring, and presentation of resourcesshape their understanding or privilege particular ways of interacting with the materials (Crone & Halsey, 2013, p. 96).
to exploit the potential: data collection and storage modes can limit the kinds of analysis possible, and some modes of online interaction can be rather passive or foster unnuanced responses (Cohen & Rosenzweig, 2006, p. 12).Users usually need to know in advance what they are looking for, and this must be describable in a search query, which is not always easy with the interpretative research typical of the humanities.Research questions need to be scaled appropriately, and the data needs to be organized using useful conceptual frameworks.
Durability: Terras (2012, p. 50) notes that digitalizing historical texts is "not a substitute for proper preservation" and might even "damage or compromise fragile or rare original materials".Moreover, there are challenges as to which aspects of the digital present to preserve for future translation historians.The ephemerality and sheer quantity of digital evidence (e.g.email correspondence between translators, authors and publishers) has implications for archiving born-digital material.Translators' successive drafts might not be available unless efforts are made to retain each electronic iteration.Similarly, online texts have multiple instantiations, so stable data capture becomes important."Version control systems such as Git or Subversion trace changesets, or iterative development histories of live digital projects.All these forms (and many others) contain metadata that may be mined for research purposes."(Kennedy & Long, 2015, p. 142).There will be an ongoing need to recopy digital materials to new storage media and convert them into new formats to ensure continued accessibility.Another problem is link rot, so it is good practice to use permanent links. 31 Culture blindness: since text production is in part a social process, crosscultural differences are to be expected.Robertson and Mullen (2017, p. 20) point out that "Text analysis algorithms, for example, rely on cultural assumptions regarding language and its use that have repercussions for historical analysis."Anglo-American and European languages and cultures are over-represented in digitalized sources.Differences in access to technology in different parts of the world also risk perpetuating imbalances between scholars from the North and South.
Ethics: DH raises issues of privacy, cultural heritage, interpretive control and the right of representation.Relevant here are the Association of Internet Researchers 2012 guidelines on ethics and the 2006 Protocols for Native American Archival Materials, for example.It is possible to give varying levels of access to different groups (e.g.not allowing non-Aboriginals access to sensitive Aboriginal sources). 32rightare easily lost in digital versions unless precautions are taken (e.g.specifying the dimensions).Other facts might also be obscured (e.g. a book's borrowing history) or altered (e.g.how readers navigate through the work). 30Standardizing spellings before input affects source integrity."If it becomes necessary to code or standardize in order to speed processing or create algorithms, this is added (rather than substituted for column fields) at a later stage."(Hudson, 2000, p. 231). 31For instance, see https://perma.cc/. 32The Mukurtu project (http://www.mukurtu.org) is a "platform built with indigenous communities to manage and share digital cultural heritage" (Sano-Franchini, 2015, p. 161).It Other issues are that intellectual property gates hamper access, rights to reproduce material from archives and books are expensive, and books still in copyright cannot be subjected to large-scale data-driven investigation.Large datasets relevant to translation historians' concerns, particularly with 'minor' languages or cultures, might not exist, and research on social media sites (e.g.networks of translation activists) might face bans on "scraping" material.A lack of interoperability with other interfaces is another constraint on access.
In addition, computational history "tends to work on a scale that elides individual historical actors" (Robertson & Mullen, 2017, pp. 18-19).Conley et al. (2015) point out such "big-data pitfalls" as reverse causality (Y causing changes in X, rather than the expected direction of X causing a change in Y), unobserved heterogeneity (relevant variables that correlate with observed variables but are unobserved), sample-selection issues, aggregation bias (inappropriate extrapolation to a sub-group or individual from data aggregated for a group), or "spatial or temporal autocorrelation" (similarity between nearby observations as a function of spatial or temporal proximity).More fundamentally, DH risks a reductionist, positivist or uncritical approach with banal results.It is important to avoid fetishizing big data, which needs to be complemented by case studies and conventional sources.Lara Putnam (2016, p. 392) points out that digitized sources make it possible to bypass contextual browsing, which can lead to negative results.Adequate theorization is also essential if the data are not to seem trivial.Although digital approaches create new intellectual possibilities, they risk occluding others.
Digital translation history also presents practical challenges.One involves the necessary skills, although not all projects require advanced computing skills.Another is the sheer work involved in digitalizing and describing items in an existing collection or creating digital projects.Labour and infrastructure costs make DH challenging for researchers with little funding.

Closing thoughts
Digital resources and methods offer additional tools for exploring historical experiences of translation.Naturally, the tool must fit the purpose, and not all research projects or paradigms lend themselves to digital approaches.Nevertheless, in the early stages of any project it is worth considering such possibilities.If appropriate and implemented thoughtfully, DH can add a dimension to how we understand translation history.In addition, Gibbs and Owens (2013, p. 159) argue that [T]he new methods used to explore and interpret historical data demand a new level of methodological transparency in history writing.Examples include discussions of data queries, workflows with particular tools, and the production and interpretation of data visualizations.At a minimum, historians' research publications need to reflect new priorities that explicate the process of interfacing with, exploring, and then making sense of historical sources in a fundamentally digital formthat is, the hermeneutics of data.This may mean de-emphasizing narrative in favor of illustrating the rich complexities between an argument and the data that supports it.It may mean calling attention to productive failurewhen a certain methodology or technique proved ineffective or had to be abandoned.
Although digital tools (no matter how carefully chosen) do not replace 'analogue' research or critical thinking, I hope this preliminary examination of the transformative potential of digital translation history will encourage further explorations.Ultimately, however, what is of interest is the results of research uses cultural protocols that allow users to "define a range of access levels for digital heritage objects and collections".
enabled by these tools, rather than the platform or methodology or unsubstantiated promises.