Methodologies and Challenges in Forensic Linguistic Casework. Группа авторов
is known that the set of possible authors contains the actual author of the questioned document, then the analysis of consistency and distinctiveness can lead to a correct attribution. Crucially, with a problem such as this, it is not the responsibility of the forensic linguist to select the set of possible authors—this is not a linguistic question—but equally crucially, the analyst’s opinion becomes conditional, for example, “If A & B are the only possible authors of the Q texts, then as the Q text is consistent with A’s distinctive features and further from B’s distinctive features, a conclusion can be drawn that A is the more likely author.”
In forensic cases the linguist cannot determine that a forensic problem is a closed-set problem or how big that closed set might be. It is, however, incumbent upon the linguist to ask the police or lawyers questions to understand whether the set is truly closed or assumptions of a closed set are unfounded. Once reassured that the closed-set assumption is reasonable, then it is possible to accept the provided set of possible authors and work within the limits placed on us by this decision-making. It is, of course, possible to question whether this is a closed-set problem, and the decision that it is indeed a closed set needs to be made consciously and carefully.
On many occasions, the initial response of the consulting forensic linguist has to be to request further investigation by the police, so that they can convincingly demonstrate that the structure of the problem is closed. This interrogation of the problem structure often requires a full understanding of the background of the case, and, if this information is carried forward into the analysis, it can be a source of potential bias. In the Starbuck analysis, TG was responsible for establishing the basis for treating the problem as a closed set and closely questioned the investigating officers around this issue.
As noted, this structure of closed-set authorship attribution is far easier than the alternative—the task of open-set authorship verification, which arises when the forensic linguist is asked to determine if the known author of a set of texts did or did not write a questioned document. In such cases, we can provide an opinion on how consistent the style of the candidate is with the style of the questioned document and how distinctive any shared features are, but in these cases the task is to establish population-level distinctiveness (Grant, 2020), which is especially challenging. How do we decide which consistent features are distinctive? Against what comparison corpus? And how many distinctive features do we need before we find a reasonable match?
We do not have clear answers to these questions. Coulthard (2004) suggests that because no two people have the same linguistic histories, no two people will have exactly the same style. However, this position is hard to demonstrate, and, even if it is true, it is unclear how much data we need to consider before we can distinguish a given author from all other authors. Further to this, there are suggestions (e.g., Wright, 2017) that some authors are more distinctive than others at a population level, and this, too, might need to be accounted for in any particular problem. These issues that we face in the task of authorship verification can be sidestepped when investigating closed set authorship attributions, as in the Starbuck case.
The second point of evaluation for TG was whether there was sufficient, relevant comparison data for the analysis. It is most important to have a good quantity of known material as all comparative authorship analysis is about describing a pattern of known usage to compare with the usage in the disputed material. Therefore, the analysis depends on the frequency or even the mere occurrence of linguistic forms in the comparison material. If the quantity of known-author data is limited, then we cannot speak with any kind of confidence about a pattern of use and we cannot describe whether the use of a given feature is typical of a given author. In particular, as the amount of data decreases, so does the number of features that could possibly be meaningful. Additionally, when we have more data, we can potentially compare the use of a wide range of features, thereby creating a basis for a more reliable attribution.
The amount of data here is not large compared with many of the historical and hypothetical nonforensic cases considered in stylometry and computational stylistic research, but it is relatively large in our experience for a forensic linguistic investigation. The decision on whether the amount of data is sufficient for the analysis is thus a further entry point for potential bias. In the Starbuck analysis, this decision fell to TG (although it would have been possible for JG to report back that there was insufficient data to proceed).
In terms of relevance of comparison in the Starbuck material, register variation was largely controlled. All the texts under analysis were emails. This homogeneity is of great value in any authorship analysis because we know that language, in general and that of individuals, varies across different communicative contexts, as people use different linguistic structures to better achieve different communicative goals (Biber, 1995). Comparing the authorship of texts written across multiple registers can be a very challenging task, as the register signal will almost always be stronger than the authorship signal. For example, consider how quickly any reader can distinguish different registers of texts. It takes only a few seconds to distinguish an email from a newspaper article, but to determine authorship is clearly much more difficult. Dealing with register variation in authorship analysis is therefore an extremely difficult task, especially because we do not have a strong understanding of how the style of individual authors tends to shift across registers. Indeed, it seems likely that different authors would shift in different ways across registers, making the task even harder. This is an important area for future research in authorship analysis—perhaps the main challenge facing the field and a challenge of real practical importance in the forensic context, as data often comes from different genres.
It is not the case, however, that there was no register variation at all. In particular, the types of emails differed substantially across the three subsets of data, including in terms of topic and audience. Debbie Starbuck’s known emails were mostly emails to family and friends, many of which narrated her travels from before meeting Jamie. Jamie’s known emails were mostly interacting with his personal assistant while he traveled on practical matters, and the disputed texts were mostly in response to emails from Debbie’s family giving them updates and assuring them she was well. These differences in communicative context necessarily have linguistic consequences. To take the most superficial example, consider the difference in Debbie and Jamie’s email length, which clearly reflect these differences in purpose and audience.
Nevertheless, the registers here were judged to be sufficiently similar that we felt confident looking for consistently and distinctively used authorship patterns that did not seem to simply be explained by register variation, although we kept these differences in mind and adjusted our interpretation of the results accordingly. Similarly, the fact that the data was all from a relatively similar time period was also helpful, as we know people’s language can change over time (Wagner, 2012). On the basis of these considerations, TG judged that the comparison set was sufficiently relevant and not a major issue in this case and thus that the problem was tractable and should be passed to JG. TG’s evaluation of the texts, however, brought up other points worthy of discussion.
One advantage given to the analysis is that the texts were precisely time stamped. Emails as a genre naturally create an ordered series of texts for analysis, and this structure to the data can assist in devising a method and in hypothesis formation and testing. For example, if there is a working hypothesis of an account takeover by a different writer at some point in a series of emails, then this provides an analytic advantage over a situation where an email account might have been hacked and subject to occasional use by a second author.
In the Starbuck case, TG was able to clarify with the police investigator that the hypothesis of an account takeover was indeed central, and thus he was able to take this into account in analysis design. This is an advantage in analysis as it allowed the creation of different sets of texts. The first set was a group of known emails sent from Debbie’s account before any account takeover had occurred. This group included emails up to the last time Debbie had been seen alive and well. The second set was emails after which any account takeover may have occurred. If a style shift was to be found, it is likely that it will be within this group with the later emails in the group being stylistically different from those in the known set of emails. This is not to say that each email was not considered individually, but that they were also considered in terms of their position