Patterns – Field – Corpus

A corpus search for speech acts depends on the availability of typical conventionalized patterns for a particular speech act. Deutschmann (2003), for instance, found that apologies regularly include expressions such as sorry, pardon, excuse, which allow the identification of many, perhaps even most, apologies in a large computerized corpus, such as the BNC. For compliments this is more difficult since compliments are less conventionalized than apologies. They do not display standard illocutionary indicating devices.

However, Manes and Wolfson (1981) proposed a range of syntactic patterns that they claim to be typical for their data of 686 compliments collected with the diary method (Patterns – Field – Diary). Jucker et al. (2008) tried to turn these compliment patterns into search strings that can be used to retrieve compliments from a large computerized corpus. They used the British National Corpus for this purpose.

Manes and Wolfson’s pattern (1a), for instance, was turned into the search string (1b). Extract (1c) from the BNC gives a relevant example (Jucker et al. 2008: 279).

(1a) NP {is/looks} (really) ADJ (53.6%)
(1b) _NN* (is|'re|are|were|look*|seem*) (really|very|such|so) _AJ0
(1c) Your hair looks amazing," said Christina. (BNC FRS 3252-56)

The search string in (1b) seriously overgenerates and undergenerates, that is to say it produces results that are not actually compliments and it fails to produce some strings that would actually be compliments. The search string overgenerates because the adjective at the end of the string is not restricted to positive ones. The search string, therefore, also retrieves cases that are clearly not compliments because the adjective has negative connotations. The search string also undergenerates because the intensifier is required while in the original pattern it is optional. If the intensifier is left out in search string (1c), the pattern overgenerates to such an extent that the results can no longer be searched manually. The search string also undergenerates because both the list of linking verbs and the list of intensifiers are limited to those that are explicitly listed. In the original pattern these lists are not restricted. Those that are listed in the search string are the most frequent ones and it is not clear how many have been left out. And finally, the search string undergenerates because it only allows for NPs that end in a noun (i.e. no postmodification).

Jucker et al. (2008) propose a range of modification to this and all the other search strings to reach a good balance between precision and recall. For several search strings, it was possible to hand search all the results and pick out those that were actually compliments. For other sets of results it was necessary to hand-search a subset of the entire set and extrapolate the frequency to the entire set.

On this basis, Jucker et al. (2008: 290) conclude that there are approximately 343 compliments in the 100 million-word British National Corpus. They then compared their results with the results obtained by Manes and Wolfson (1981) on the basis of their diary collection of compliments. Figure 1 compares the two sets of results. Note that Manes and Wolfson’s pattern 4 and 6 had to be merged because they are overlapping patterns that proved to be difficult to distinguish systematically.

Compliment pattern frequencies in the BNC compared to Manes and Wolfson's (1981) data
Figure 1: Compliment pattern frequencies in the BNC compared to Manes and Wolfson’s (1981) data

It is obvious that the results obtained on the basis of search strings are very limited. The procedure only allows the identification of patterns that have been identified in advance. It will not produce any new patterns, and obviously it is possible that some compliments with the required patterns were missed because a pattern deviates in some small way from the search string. It may include a discourse particle, a false start, a correction, a filled hesitation or some other speech related phenomenon that does not basically alter the underlying patter but that changes the sequence of elements sufficiently in order not to be caught by the appropriate search string.