What AntConc’s Concordance Tool Can Do For You

by Dr. Ryan Nichols, Philosophy, Cal State Fullerton, Orange County CA

The surprising robustness of AntConc to do interesting analyses and dig deep into sets of texts is immediately apparent when putting the concordance tool to work. We left off last post noting some discoveries from our exploration of Joyce’s early novels with AntConc’s clustering tool. Using the same three novels, now we’ll go into more detail about the role of God in them.

To get a better sense of the role of God in Dubliners, Portrait, and Ulysses, let’s first capture a bird’s eye point of view on its dispersal through the three works. Loading them back into AntConc, in Concordance view, we enter ‘God’ into the search bar. Up comes a window of all the sentences in which that term occurs in the novels. But now toggle to Concordance Pro view, and you will see this image:

antconc2photo1

 

We can identify some differences between the novels’ use of ‘God’ by noting that Ulysses disperses ‘God’ throughout the length of the novel, though with a gap near the end. This contrasts with the other two works, where the term is less evenly represented. Portrait, though, has a dense, telling representation of the word in the middle. Ulysses closes with its most dense representation of this term.

Those who know the novels can begin to map these patterns easily onto content.  Consider Portrait. In the middle of the book, as Stephen sits in the pew, thinking about memories of his childhood, Father Arnall offers a homily of sorts at the beginning of the boys’ retreat. This part of the book culminates in Stephen going to confession for the first time in a long time. This is probably the section with dense representations of ‘God’ in the middle of the book. We can find out for sure simply by clicking on one of the thick black filaments in the horizontal bar representing Portrait. By doing so, AntConc brings us to File View:

antconc2photo2

 

This might give the impression that ‘God’, for Joyce, is associated with negative words, words like ‘crimes’, ‘misfortunes’, and ‘dishonourable’. Indeed, several scholars in the secondary literature have discussed Joyce’s negative view of God and Catholicism. It is one thing to say this on the basis of close interpretations of dramatic, attention-getting scenes like this fateful event in the confession booth. But it is another to show that this association is true by testing it with text analytics. So is God as represented in Joyce’s early novels represented negatively or with a negative moral valence?

This question brings us to some of AntConc’s most useful features. Leaving all three books loaded in, click over to the Collocates tab. This area will allow us to identify words that ‘God’ appears with in the texts. In turn, this will provide us data with which to draw influences about, for example, the moral valence of God in early Joyce novels. AntConc offers us several particulars we can manually set. These include the word window span left, the word window span right, the minimum collocate frequency, and the test statistic with which we want to sort our results. ‘Window span’ refers to the range of words, separated by spaces, to the left or the right of the target word. Setting the window span at 5 left and 5 right is good for starters. But given English grammar, authors might describe or discuss God using words that are >5 away from the token ‘God’. So we might want to expand it later for good measure.

The default “Sort By” is “Stat”. And the default test statistic is MI, or Mutual Information. Mutual Information represents a ratio of the observed frequency (fo) of the combination of two words (or two word phrases) divided by the expected frequency (fe) of the combination: fo / fe . Sometimes this result is converted into base 2 (see: Bieber, Quantative Methods in Corpus Linguistics). The expected frequency is the frequency supposing the combination were to occur by chance. The observed frequency, of course, is the actual number of times the two words co-occur in the corpus. (AntConc also supports T-scores, which are used to assess the dissimilarity of collocates between two terms. For reasons we will enter into in a later post, T-scores are not the best test statistic for colocation studies.) To toggle between one or the other, go to Settings/Tool Preferences/Options. AntConc also allows you to sort by frequency, frequency on the left or on the right (useful when looking for relations via parts of speech)

To get a sense for the utility of the AntConc Concordance tool for your own research, please note the differences between MI-score outputs and T-score outputs. To generate this pair of outputs, I left the defaults as they were, and set the minimum colocation frequency as 2. See the two tables below.

MI Collocates with ‘God’ in Dubliners, Portrait, & Ulysses

Rank Freq FreqL FreqR MI Collocate
1 2 0 2 10.7387 enlighten
2 2 2 0 10.7387 cherubim
3 3 2 1 10.3236 displease
4 2 2 0 9.73870 thoth
5 2 0 2 9.73870 reigneth
6 2 1 1 9.73870 inalienably
7 6 3 3 9.32366 omnipotent
8 9 6 3 9.20818 almighty
9 3 2 1 9.00173 dieu
10 11 0 11 8.80581 bless
11 7 1 6 8.73870 merciful
12 3 1 2 8.73870 catholics
13 2 1 1 8.73870 begotten
14 7 7 0 8.54605 declare
15 9 7 2 8.44919 created

 

T-score Collocates with ‘God’ in Dubliners, Portrait, & Ulysses

Rank Freq FreqL FreqR MI Collocate
1 262 165 97 14.34391 the
2 173 130 43 11.95026 of
3 151 82 69 11.44190 to
4 151 75 76 11.04698 and
5 92 15 77 9.10928 s
6 93 24 69 8.69440 he
7 94 47 47 8.45890 a
8 75 21 54 8.06571 i
9 73 31 42 7.68971 his
10 69 34 35 7.66831 that
11 63 56 7 7.64881 by
12 66 27 39 7.49530 was
13 59 18 41 7.16244 you
14 56 34 22 7.00579 for
15 65 32 33 6.94264 in

 

These columns of data are almost self-explanatory, given their names. However, the Freq column deserves a word. This column refers to how many times the collocate appears in the targeted word window. In this case ‘enlighten’ appears twice within 5L/5R of ‘God’.

Information from calculation of the Mutual Information score, however, appears quite useful. We learn from the first of the tables above that ‘God’ appears to be surrounded by rather positive terms. But MI can also be misleading.

Consider ‘enlighten’ again. This term appears to occur twice within 5L/5R of ‘God’. Let’s find out for ourselves by clearing the search window, toggling the Concordance bar, and typing ‘enlighten’. Here we find that we have only a single occurrence of that word in all three books.

antconc2photo3

 

When we click on the sentence, we can read in File View this: ” And let you, Stephen, make a novena to your holy patron saint, the first martyr, who is very powerful with God, that God may enlighten your mind. But you must be quite sure, Stephen, that you have a vocation because it would be terrible if you found afterwards that you had none.” ‘Enlighten’ has earned its MI of 10.7 by occurring a single time in the entire corpus, but by occurring next to two different tokens of ‘God’. This does not appear to be an important collocate of ‘God’ after all.

Now that we have identified a problem we can do something about it. Let’s return to the collocate window. This time, in the “Min. Collocate Frequency” box, set the value to 10. This way we will avoid focusing on under-representative colocations.

MI Collocates  at 10 LR with ‘God’ in Dubliners, Portrait, & Ulysses

Rank Freq FreqL FreqR MI Collocate
1 10 7 3 9.36018 almighty
2 11 0 11 8.80581 bless
3 11 9 2 7.84057 prayed
4 15 10 5 7.81270 souls
5 13 7 6 7.68425 pray
6 12 1 11 7.51630 save
7 13 3 10 7.21032 help
8 11 5 6 7.17576 grace
9 82 41 41 7.11041 god
10 11 5 6 7.04838 blessed
11 20 14 6 7.00534 sin
12 12 0 12 6.98381 knows
13 14 7 7 6.68807 holy
14 11 5 6 6.53992 instant
15 16 7 9 6.50029 cried

 

The result, pictured above, indicates that the most frequent collocates of ‘God’ represent a wide mix of parts of speech. On the whole, the terms have a very positive valence. With occurrences on the right side of the term, we infer that Joyce’s God does a lot of blessing (rank 1), a lot of knowing (rank 4), and a lot of making (rank 14). This raises some doubt about scholarly claims to the effect that God in Joyce’s early novels is primarily dour or punishing. Of course this is only the first step in a longer journey to evaluate such a claim. We might, next, click on certain collocates to view them in File View. But I leave that as homework of a sort.

In the next post we will work on using AntConc to examine contrasts in word use between different texts.