Adventures in Digital Humanities Research

11 January 2013

Apologies for the lull in posting. It is that time of year for we mere (grad) students when we must write, fly home, write some more, visit family, write even more, fly back, etcetera. I have been passing many hours writing papers, eating with family, drinking with friends, and slowly learning Python (you should see the mighty tome on the subject now weighing down one of my tiny bookshelves in my student flat in Aberdeen). I have some more substantive stuff in the works but in the mean I hope you enjoy this report on some of my recent work in the exciting field of digital history.

My final paper for my Introduction to Historical Research Seminar consisted in a primary source analysis of an online archive which I am hoping to employ for my dissertation. The archive, The Correspondence of Hugo Grotius, is a digitised version of Briefwisseling van Hugo Grotius (1928-2001), a print collection of transcriptions of the surviving correspondence of Hugo Grotius, spanning his entire life, published as part of the ‘Rijksgeschiedkundige Publicatiën, Grote Serie’ (Source Publications of the Dutch Government, Large Folio Series). Alongside the decidedly mind-numbing banality of describing the archive and discussing its pros and cons, I made some attempts to analysing the correspondences themselves with a simple topic-modelling tool, in order to explore the applicability of digital research methodologies to the resource. As a colourful and illustrative example, I used the simple online word-cloud generator Wordle. Though simple, my results, which are visible below, demonstrated the potential applicability of distance reading tools to this sort intellectual historical research.

The three letters were chosen as random exemplars from over the course of his career. The first is from his earliest days as a jurist, the second from immediately prior to or at the very beginning of his exile from the United Provinces, the third from his tenure as Swedish ambassador (it is actually addressed to then-regent of Christina of Sweden, Axel Oxenstierna). Clicking on the images below will take you to a larger version of each, hosted by Wordles public library.

Wordle: CHG 170. 1609 Sept 18. Aan P. Jeannin

Wordle: 618. 1621 Febr. 27. Aan B. Aubéry du Maurier

Wordle: 4299. 1639 september 17. Aan A. Oxenstierna

Both advantages and problematics of current distance-reading methodologies are in evidence in the word clouds. A basic knowledge of Latin will reveal that Wordle was not able to exclude a large number of common words, which I elected not to remove manually in order to highlight the deficiency.[1] Certainly this can be overcome with a program (or module) to remove Latin language stop words before introducing a wordlist into Wordle or indeed any such topic-modeller which cannot adequately handle these distracting terms. It was this very experience which helped to inspire my current project in the digital humanities: modelling-grotius, a program/code repository, written in Python, to faciliate topic-modelling of the archive which I have been working with here. It’s a project I hope will help not only my own research but also facilitate more in-depth research of this excellent resource. I am also looking forward to forking the project in the future when The Grotius Archive, a still-in-early-stages project to digitise Grotius’ manuscript working papers, goes online, to create a similar resource for that archive, which I hope and expect will be equally, if not more, valuable to Grotius scholarship in particular and early modern intellectual history more generally.

Notes and Bibliography

[1] Latin is not the only language of Grotius’ correspondences. He wrote also in French, German, and (unsurprisingly) Dutch. Of these, Dutch is completely unknown to me, which does pose a problem for my further use of this particular archive.

  1. 14 February 2013 6:17 pm

    What a very exciting possibility. I am glad you addressed the limitation of ‘common words’ diluting the utility. My first thought was, I admit, the problematic of using word clouds with a declined language such as Latin; if a significant noun, say, appears throughout a text, but in many various cases, it may not even break the frequency threshold and won’t show up in the visual presentation at all. But I will definitely be exploring some possibilities for using this or similar tools in my own research, and look forward to hearing more about your own efforts in this direction.

    • 15 February 2013 8:50 am

      Though I remain a beginner to the world of programming in general and natural language prcessing in particular, I am hopeful that there exist means of associating words by stem, rather than form, thus making such topic-modelling more applicable to highly inflected languages such as Latin.

