History and philosophy

Looking back: PsyBorgs on the loose!

Christopher D. Green and his team are taking the history of psychology into the digital realm, producing surprising insights

18 April 2014

William James published more than 300 books and articles over the course of his career (Perry, 1920). I would venture to guess that not one living person has read all of them. Wilhelm Wundt published nearly as many items (Titchener & Geissler, 1908–1914, 1922), and probably many more printed pages (consider just the 10-volume Völkerpsychologie). I would speculate that more than half of his work goes unread today. And those are just two of the most significant and prolific figures in the history of psychology. Consider that in just the first 10 years of the existence of Psychological Review, that journal published nearly 400 articles. Who has read all of that in the past, say, 50 years? Or even a quarter of it?

What this highlights is the fact that historians are faced with an impossible task: to read, comprehend, interpret and ultimately know the material from the time and place of their research topic. Nearly always, however, there is far too much material for any one person to absorb and understand. As a result, historians do a great deal of culling. This work was significant or influential, or relevant to what I am doing; that work was trivial, redundant or peripheral. This kind of filtering is necessary, but only because of our cognitive limitations as humans. It is not a requirement of scholarship in principle. An omniscient being would not need to identify 'less important' works so that they could then be ignored; it could draw on any portion of the entire archive at any time, no matter how large or small. That is not to say that some works are not, in fact, more important or relevant than others. Of course they are, but everything in the complete archive plays some role in the story of a significant individual, institution or event. In reality, there is no cut-off point below which an item should go unnoticed. We do it just to make things manageable.

The problem becomes particularly acute when one aims to tell the story of a very large historical unit: a nation, a century or an entire academic discipline. Tens of thousands of documents may be involved and there is no possibility of giving every one its due. We often identify a small number of key documents – the 'canon' – and frame our story around their production, and the reactions to them. The result can easily be a superficial historical account, which is very common in 'popular histories'. They may be engaging, but in trying to capture the grand sweep of things one can be led to gloss over a wealth of stubborn little details that all too often turn out to undermine whatever snappy and memorable characterisation of events one is trying to peddle; for example, no, Dr Spengler (1918–1923/1991), cultures are not really very much at all like organisms in their 'rise' and 'fall'. Partly in an effort to avoid such traps, many scholarly historians in recent decades have turned to 'micro-histories': accounts of very short periods of time, tightly delimited in location and the number of characters involved. Many people outside of the circle of professional historians, however, find this kind of history dull and even trivial.

Things would be very different if there were a way to capture the totality of an entire archive at once without drowning in the details. We need to learn to 'swim' in the wealth of available information. This is precisely what the modern era of 'Big Data' is all about. Although we tend to hear about Big Data in the context of spying scandals and the like, roughly the same principles apply to any large electronic database. If we can organise the data according to principles that are relevant to our interests, then we can display it graphically in a way that makes key elements of its structure visually salient. This general approach to humanistic research has become increasingly popular in recent years (see Gold, 2012). Several American universities have developed institutes and programmes dedicated to 'Digital Humanities'. Even Google has created a public tool called the 'Ngram Viewer', which enables anyone to search and graph the historical frequency of particular terms or phrases in the vast collection of Google Books (see, for example, my blog post on mesmerism at tinyurl.com/nrraw8d).

The insights revealed by a digital examination of documents are, of course, not the same as those revealed by the 'close reading' of the traditional historian. It is, in a certain sense, 'superficial' because one does not access directly the 'deep' meanings of the text. But the tradeoff is that it becomes practicable to include information about a vastly wider array of texts than one would have been able to consider otherwise. The Stanford historian of literature Franco Moretti (2005, 2013) calls this 'distant reading' in which one rises far enough above the 'trees' that one is now able to glean the overall shape of the 'wood'. Unlike superficial popular history, however, in digital history all the detail remains in place. Each and every item is always present, though it might be just one bit among thousands, and one can always 'zoom in' for a closer look if that is what the research demands.

My colleagues at the York Digital History of Psychology Laboratory – we jokingly call ourselves 'The PsyBorgs' – have spent the past couple of years poring over a number of large computerised databases, borrowed and purpose-built, both to confirm things that were already known about psychology's past (to assess the validity of the methods) and to uncover new aspects of the discipline's development. For instance, Ingo Feinerer (Vienna University of Technology), Jeremy Burman (a York doctoral student) and I have been converting decades-long runs of articles in the early volumes of American Journal of Psychology and Psychological Review into a series of networks. In each one, the nodes represent particular articles, and the links between them represent how similar the vocabularies used in any two articles are (short link = high similarity). The results are networks in which lexically similar articles cluster together spatially into distinct 'research communities'. Examining one of these networks, one can directly see information about how many such communities there were, their relative sizes, how close the communities were to each other in 'lexical space', and how densely interconnected the articles were within each community (i.e. how elaborate and mandatory was the subdisciplinary 'dialect' that each had developed?).

For example, the figure above [see PDF] shows a network of all of the full articles published in Psychological Review from its launch in 1894 up to 1898. One of the more interesting findings here is that the largest and most densely interconnected research communities pertain to philosophical psychology (e.g. consciousness, mind–body problem, the self) and to psychological metatheory (e.g. Is psychology a science? If so, what kind? What are its relationships to neighbouring disciplines?). They are represented by the green and blue clusters, respectively at the far left of the image. This runs contrary to the great emphasis that is placed on the development of laboratories and 'schools' in traditional historical writing about this period in American psychology.

Another interesting finding was that the 'Notes' written by G. Stanley Hall in early volumes of the American Journal of Psychology (in essence, Hall's reviews of psychological literature published elsewhere) served to draw together a number of autonomous research groups under the common banner of 'psychology', effectively creating the discipline out of research communities that were already present on the American scene. How did we draw so 'deep' a conclusion from a simple network of shared words? The network revealed a hub-and-spoke configuration in which the 'Notes' were the hub, and each of the distinct research communities were strongly connected with this hub, but they were not very strongly connected to each other. That is to say, Hall cleverly wrote his 'Notes' to share some substantive vocabulary with each of his constituent groups even though, at first, they did not share very much with each other. A PhD student of ours, Daniel Lahham, has been working on a similar project in which the journals of interest are early 20th-century comparative psychology and behaviourism periodicals.

The York PsyBorgs are also experimenting with a number of other digital approaches to the history of psychology. The lab's co-director, Michael Pettit, is pursuing a number of projects that involve geographical maps. For example, when the popular press writes about the research of an academic psychologist, where does the bulk of the resulting correspondence from the public come from (i.e. where in the country does psychology have the greatest public 'uptake') and why? In another project, Pettit, David Schmit (St. Catherine University, St. Paul, Minnesota), and Eric Oosenbrug (a York doctoral student) are investigating mesmerism, popular in the US prior to the Civil War but not equal everywhere. Using newspaper advertisements they can discover where itinerant mesmerists plied their trade most successfully. In addition, one of our MA students, Shayna Fox Lee, is mapping both the biographical origins and the ultimate occupational fates of the dozens of women who attended graduate programmes in social science around the turn of the 20th century. Similarly, one of our doctoral students, Jacy Young, is geographically tracking the public lectures of G. Stanley Hall, who travelled far and wide to spread his message of 'Child Study'.

Jacy Young and Jeremy Burman are collaborating to develop a computer program to automatically extract all of the proper names that are mentioned from a series of long texts, such as books. Once the names of the deceased are excluded, the resulting list is a set of candidates whose archival collections might contain letters of the texts' author. This is especially useful in the case of important historical figures who did not leave behind major archival collections of their own. In addition, Burman is using the descriptions used in the American Psychological Association's PsycINFO database to track very broad topic trends in psychology across nearly 2,000,000 articles published over the past 50 years.

Arlie Belliveau, also a doctoral student, is using 'bubble charts' to track the membership numbers of the many divisions (now over 50) of the American Psychological Association over the past 60-some years. Although it is well known that divisional membership in the APA has been dropping for the past decade or so, the fates of all divisions have not been the same. Much is revealed in the membership dynamics of individual divisions.

Many of these projects are still in progress. My aim here has been to sketch a just few examples of the wide range of novel historical research projects that can be pursued when one begins to think of digital databases as research resources, and starts thinking of statistical visualisations as a way of slicing into those databases to reveal their inner structures. Although some enthusiasts for this new approach have predicted that digital history will soon begin to replace the conventional qualitative study of the past (Moretti, 2005, 2013), we are not so inclined to speculating about that. In our experience, the qualitative informs the quantitative and vice versa. Sometimes a visualisation raises a new question that we need to pursue via qualitative means (e.g. why were a disproportionate number of women involved in vision research at the beginning of the 20th century?). At other times a matter that is murky in the textual literature can be clarified by looking at the 'right' visualisation of it (e.g., the nearly forgotten figure Hiram Stanley, librarian at Lake Forest College
in Illinois, was involved in a wide range of distinct research communities in the 1890s).

We invite queries and comments on our work, and are eager to consult and collaborate with others who have similar interests. To keep in touch with our progress, check out our website (http://psyborgs.lab.yorku.ca) or follow our blog (http://ahp.yorku.ca)

Christopher D. Green is at the Department of Psychology, York University, Toronto, [email protected]

References

Gold, M.K. (Ed.) (2012). Debates in the digital humanities. Minneapolis, MN: University of Minnesota Press.
Moretti, F. (2005). Graphs, maps, trees. London & New York: Verso.
Moretti, F. (2013). Distant reading. London: Verso.
Perry, R.B. (1920) Annotated bibliography of the writings of William James. New York: Longmans.
Spengler, O. (1991). The decline of the West (A. Helps & H. Werner, Eds.; C.F. Atkinson, Trans.). New York: Oxford University Press. (Original work published in German, 1918–1923)
Titchener, E.B. & Geissler, L.R. (1908–1914, 1922). A bibliography of the scientific writings of Wilhelm Wundt. American Journal of Psychology, 19, 541–556; 20, 570; 21, 603–604; 22, 586–587; 23, 533; 24, 586; 25, 599; 33, 260–262.