Jan Rybicki
Jagiellonian University, Kraków, Poland

Pretty Things Done with (Electronic) Texts: Why We Need Full-Text Access

Wednesday, 07.09.2016, 9:30 – 10:30

Stylometry, aka computational stylistics, is a field that has produced compelling visualizations of patterns of similarity and difference between texts, based on various quantitative, i.e. countable features. Even if these countable features are often very basic elements that have not been traditionally associated with “style” or “meaning” or “message” – more often than not, stylometrists work with frequencies of function words or of part-of-speech n-grams – the various statistical measures applied to them often yield results that “make sense” in terms of authorial attribution, chronology, genre, or gender – or simply from the point of view of traditional literary studies. It has now become quite simple to produce a “map,” or in fact a network analysis, of 1000 novels in English that shows a very clear progression from early (green) to modern (purple) writing:

Whether or not this is a simple effect of linguistic change (there are reasons to think it is NOT), the feasibility of such approaches – apart from a plethora of methodological problems – relies on stylometrists’ access to full texts. This is still a particularly unpleasant stumbling block: even when dealing with public-domain material, textual collections are dispersed or incompatible or unreliable or fragmentary (pick one or any of these), and stylometrists continue to struggle, often steering on the margins of reliability and of (copyright) laws. Perhaps the main reason is that they do not complain enough to the right people; digital librarians might be a good group to start with.


Jan Rybicki is Assistant Professor at the Institute of English Studies, Jagiellonian University, Kraków, Poland; he also taught at Rice University, Houston, TX and Kraków’s Pedagogical University. His interests include translation, comparative literature and humanities computing (especially stylometry and authorship attribution). He has worked extensively (both traditionally and digitally) on Henryk Sienkiewicz and the reception of the Polish novelist's works into English, and on the reception of English literature in Poland. Rybicki is also an active literary translator, with more than twenty translated novels by authors such as Coupland, Fitzgerald, Golding, Gordimer, le Carré or Winterson.