КОГНИТИВНЫЙ И ПРОСОДИЧЕСКИЙ АСПЕКТЫ СЕГМЕНТАЦИИ РЕЧИ В ПОВЕЛИТЕЛЬНЫХ ВЫСКАЗЫВАНИЯХ ПРИ ЧТЕНИИ ВСЛУХ: С ПАУЗОЙ ИЛИ БЕЗ?

Научная статья
DOI:
https://doi.org/10.18454/RULB.2020.22.2.2
Выпуск: № 2 (22), 2020
PDF

Аннотация

Статья посвящена исследованию когнитивного и просодического аспектов процессов речевосприятия/речепроизводства в чтении вслух. На примере эмпирических данных дистрибуции пауз при чтении вслух автор показывает, что сегментация речи в речепроизводстве происходит на нескольких уровнях. В экспериментальном исследовании были применены технологии корпусного и компьютерного акустического анализа. Членение речи (speech chunking) осуществляется на нижнем уровне кратковременной памяти и может быть реализовано на небольших речевых отрезках без применения воспринимаемой органами слуха паузации. Механизмы обработки информации и принятия решений о распределении и локализации пауз активируются контекстуальными актуализаторами, обеспечивая сегментацию высказывания на более высоком уровне и определяя коммуникативное значение высказывания. Эти механизмы способствуют обработке предложений со сложной синтаксической структурой. Дистрибуция и локализация пауз в высказываниях осуществляется согласно моделям и тенденциям, специфичным для контекста высказывания.

Introduction

This paper presents a study that includes corpus analysis of the written text and acoustic analysis of the utterances sampled from the aforementioned texts to describe cognitive and prosodic aspects of the read-aloud speech. The cognitive aspect comprises comprehension and decision-making processes utilized by the reader when performing the speech segmentation. The prosodic aspect concerns pause placement in the prosodic structure of the utterances. We argue that pauses do not always function as the indicators of ‘chunks’ as such. In many cases there may not be any perceived pauses between ‘chunks’. Pause placement as the result of information processing and decision making represents the higher level of speech segmentation and is performed according to the patterns or tendencies established by accompanying contextualization cues.

1.1 Corpus-assisted discourse analysis and information processing

As prosodic research advances, the researchers are more and more interested in cognitive aspects that determine prosodic organization of speech. A widespread view has been established that the underlying mechanisms of everyday speech, especially in regard to its prosody, can only be understood by studying the spontaneous speech. We agree with Yi Xu [10, P. 329-330], that though spontaneous speech is rich in variation, it is exactly the feature that renders the spontaneous speech unusable: there is always difficulty in recognizing and controlling the contributing factors of spontaneous speech. Speech that has been recorded in controlled environment may appear more useful to investigate cognitive aspects in speech production. We propose that audio variants of fiction texts may prove to be good material for studying the comprehension and decision-making processes utilized by the readers when performing the speech segmentation/chunking in read-aloud utterances.

As the definition of discourse falls into three main categories: (1) anything beyond the sentence, (2) language use, and (3) a range of social practice with nonlinguistic/nonspecific instances of language [2, P.23], we believe that fiction text complies with all three provisions: (1) it is a unity beyond the sentence, (2) it covers a range of language usage, (3) it functions in a specific social environment. As the pragmatics of fiction texts work through an active cooperative effort, shared between reader and author [3], [4], [5, P.12], the same is true for their audio variants: the reader aims at both understanding the text and creating credible, true-to-life speaking impersonations of characters, which means that they perform online processing of information during the reading that results in the intonation of the direct speech of fiction characters. Audio variants provide empirical study material that can be examined and interpreted to identify and describe prosodic structure of the utterance and its constituents that reflect information processing.

We employ corpus-assisted discourse analysis to investigate how the readers comprehend and process information from the written text and consequently encode it in the prosodic structures of reproduced speech through pause placement in the direct speech of fiction characters. Corpus analysis helps to study contextualization cues, which are combining syntactic and semantic information from and about the components of the sentence that accompany prosodic features, determine their distribution [6], [7], [8], [9]. It also allows to investigate how these cues help the prosodic structure of the utterance to perform its communicative purpose [10], [11].

1.2 Chunking as means of information processing and prosodic structure of utterances

Segmenting speech into meaningful units has received its due attention in the literature since 1950s [12] and is still an issue in psycholinguistics in connection with the work and mechanisms of short-term memory [for an extensive review, see 13]. In cognitive science it was termed ‘chunking’ and it has been established as the unifying information-processing key mechanism of human cognition [14], [15]; ‘chunks’ have much in common, but are not identical to syntactic/semantic/or prosodic constituents of the utterance.

Speech segmentation is generally associated with pause placement. As B.L Webber [18, P.800] presses the point that “aspects of intonation and syntactic choice are generally associated with information structure”, pause placement is not only the feature of the prosodic structure of the utterance, it also pertains to the syntactic choices made by the speaker. As ‘chunks’ are not identical to syntactic/semantic/or prosodic constituents of the utterance, it would be fundamentally wrong to equate them with the prosodic phrases that map syntactic forms [19]. It creates the controversy between segmenting speech into intonation contours (ICs) and chunks, which is supported by extensive evidence that no definitive pause length for the pause-detection has been established empirically so far [20], [21].

Prosodic structure is relevant for information processing in reading, both reading aloud and silent reading [16], [17]. Pause placement and distribution, as elements of the prosodic structure, function in compliance with the Implicit Prosody Hypothesis (IPH) [22]. “Inner” prosody that contributes to sentence processing in reading treats constituents of the prosodic structure of the utterance as “contextualization cues” which activate interpretation programs supporting assumptions specific to the particular context in which the utterance is produced [23]. As decision-making in speech production represents resource-limited inferential search, the comprehension process in reading (as well as speech reproduction in reading aloud) is crucially dependent on typical scenarios. When insufficient/irrelevant or no context is supplied, the speakers use knowledge of ‘default’, most typical scenarios and corresponding prosodic patterns of whole utterances, or patterns of their constituents.

In consistence with the theoretical proposal named “The Now-or-Never” bottleneck [24], we suggest that pause placement and distribution function on the higher phonological level than speech chunking and facilitate complex sentence processing in regard to its communicative meaning. We suggest that contextualization cues identified by the corpus analysis of written fiction texts help the readers to activate scenarios and corresponding patterns of pause placement and pause distribution in reproduced direct speech.

Method, materials, procedure

2.1 Corpus selection and processing

The research data was drawn from 24 fiction text by native British authors read by 6 native British speakers (3 male/3 female) according to the principles of corpus compilation. The data was limited to British variant of the English to avoid intonational variation present in national varieties of English. Written text corpus analysis was performed with the help of Sketch Engine service [25].

2.2 Acoustic analysis

The data for acoustic analysis was limited to imperative utterances from direct speech of fiction characters in monologue and polilogue interactions. As intonation is characterized by high variability, we must limit the number of factors that may influence ICs representing the prosodic structure of utterances, to draw substantiated conclusions about how readers make their decisions about pause placement in information processing. We limited intonation patterns in the research to those of “utterances of imperative sentences”, characterized as “genuine” “canonical” – second person imperatives [26, P.14, 28], [27, P.18]. “Genuine” imperatives were chosen for several reasons: (1) they have the specific linguistic form; (2) they are characterized by “sole prototypical function” which is “the performance of the full range of directive speech acts” [26, P.36, 305]. Major non-imperative sentence types that can be used as a directive speech act – locatives, operatives, declaratives – were excluded from the analysis [28]. Written speech allows for longer, syntactically complex sentences, that is why utterances, including imperative constructions, were also included in the selection for acoustic analysis, if the imperative constructions met the requirements for the “canonical” imperatives.

Imperative utterances were extracted from audio corpora for further audio processing (9117 utterances). Audio records were converted to .wav format (CBR 129 kbps, 44100 Khz, stereo) and further subjected to the acoustic analysis using Prosogram script [29]. “Prosogram and prosodic profile” task (wide rich view, values pitch targets function) performs automatic segmentation of the utterance into syllable-sized units, motivated by phonetic, acoustic or perceptual properties. As pause placement belongs to the temporal prosodic features, we collected the data on a number of temporal and pitch variables that appeared to be meaningful: proportion of estimated phonation time to speech time, estimated speech rate, number of automatically segmented phonetic syllables, proportion of syllables with large pitch movement (abs (distance) >= 4 ST (semitones).

We also marked the location and measured the duration of perceived (equal or longer than 150 ms) pauses at syntactic boundaries within utterances. To see whether the patterns of pause placement and distribution are universal for imperative utterances and utterances with imperative constructions for various readers, we have measured these parameters in two corpora of written fiction texts (AImperative and YA Imperative) and two corresponding samples of imperative utterances selected from their audio variants.

Results

It was found that contextualization cues in discourse may function as formal markers making canonical imperatives functionally marked [30]. These structural elements perform a double function: (1) they mark the context of the utterance, and, (2) as stimuli, they activate cognitive mechanisms of information processing through detection and activation of prosodic cues, directing the reader`s controlled search for the appropriate prosodic pattern when reproducing speech [31].

3.1 Pause placement

Even though written speech allows for longer, syntactically complex sentences, authors prefer to use short imperative sentences. It has been discovered that syntactical structure functions as the contextualization cue both for the authors of the written texts and the readers (as shown in Fig.1, 2.).

The readers also prefer to separating clauses with imperative constructions from the rest of the utterance by a pause, even when the sentences contain more than one clause (as shown in Tables 1-2).

Table 1 – Utterances with the single imperative construction separated from the rest of the utterance (AImperative)

AImperative (1788 utterances)

Utterances

Pauses

Average duration

Utterances with 1 clause

1566

83

0.354

Utterances with 2 clauses

131

116

0.348

Utterances with 3 clauses

53

55

0.368

Utterances with 4 clauses

25

27

0.362

Utterances with 5+ clauses

13

13

0.346

 

Table 2 – Utterances with the single imperative construction separated from the rest of the utterance (YAImperative)

YAImperative (5529 utterances)

Utterances

Pauses

Average duration

Utterances with 1 clause

3930

497

0.264

Utterances with 2 clauses

940

842

0.297

Utterances with 3 clauses

384

392

0.315

Utterances with 4 clauses

144

172

0.407

Utterances with 5+ clauses

131

166

0.306

 

It can be seen from the results that the readers also put relatively long pauses into sentences with a single clause, which can be explained by the presence of contextualization cues of a different type, which will be discussed further in the Pause distribution section of the paper.

We have marked pauses in utterances from AImperative and YAImperative selections with respect to their location in the utterance. Table 3 shows how the pauses are placed by the readers in regard to the imperative construction.

Table 3: Number and average duration of pauses in different locations

Pause location

Number of pauses AImperative (2413 utterances)

Average duration AImperative

Number of pauses YAImperative (6704 utterances)

Average duration YAImperative

Pause before the clause with the imperative construction

140

0.353

920

0.303

Pause after the clause with the imperative construction

176

0.366

1059

0.32

Pause b/n 1st and 2nd clauses with imperative constructions

70

0.319

430

0.305

 

The placement of pauses shows that the writers tend to put clauses with imperative constructions in the beginning of the sentence, and the readers follow these tendencies by putting the pauses after the clauses with imperative constructions.

3.2 Pause distribution

We distinguished structural elements which appeared to influence the pause distribution: (1) syntactic contextualization cues; (2) punctuation contextualization cue. Syntactic contextualization cues that manifest in the imperative constructions aid the reader in extracting the information from the message and encoding it in prosodic contours of utterances (Table 4).

Table 4: Grammatical contextualization cues in utterances and in clauses with imperative constructions separated by the pause from the rest of the utterance

Contextualization cues

In utterances AImperative (2413 utterances)

Separated by the pause from the rest of the utterance (AImperative)

In utterances YAImperative (6704 utterances)

Separated by the pause from the rest of the utterance (YA Imperative)

Examples

Vocative in preposition to the imperative

68

30

347

233

Madam, please!

Vocative in postposition to the imperative

346

8

1039

105

Come in, Bill.

Subject in preposition to the imperative

60

0

203

6

You go ahead.

Subject in postposition to the imperative

27

0

61

0

Don’t you believe it!

Please in preposition to the imperative

30

12

104

79

Please be seated.

Please in postposition to the imperative

4

0

45

2

Give me my parcel, please.

 

It can be seen that when reading utterances with vocatives in the preposition to the imperative readers tend to put a pause after them, separating them from the clause with the imperative construction. But they do not do that when the vocative is in postposition to the imperative construction. Subjects tend to be placed in the same prosodic group with the imperative whether they precede or follow it. ‘Please’ is generally separated from the clause with the imperative construction in the preposition to the imperative. It also appears that punctuation does not appear as obligatory as it seems to be. The readers easily distinguish the necessity to separate semantically relevant elements of the sentences. But, when the punctuation is present, usually marking contextualization cues like ‘please’, vocatives, the readers follow the authors instructions, preferring to put pauses after the clauses with imperative constructions, so punctuation contextualization cues act as complimentary.

3.3 Correlation analysis.

Correlation analysis of the temporal variables shows considerable negative correlation between PropPhon and SpeechRate parameters in both selections – -0.35 (AImperative) and -0.42 (YA Imperative). The readers tend to slower the rate of the speech when the duration of the IC increases.

There is considerable positive correlation between PropPause and SpeechRate parameters in both selections – 0.46 (AImperative) and 0.43 (YA Imperative). It shows that the readers tend to speak faster when the amount of unvoiced phonation in the IC increases. At the same time there is no significant correlation between the amount of the unvoiced phonation and the duration of the IC which shows that the proportion of the pauses within the IC does not depend on the duration of the IC itself, but on the number of clauses in the utterance.

There also is considerable positive correlation between NuclDur and Nucleus parameters in both selections – 0.92 (AImperative) and 0.67 (YA Imperative). It shows that the duration of the nuclear syllable increases together with the duration of the other syllables in the IC.

Discussion

According to the “The Now-or-Never” bottleneck theoretical proposal, speech chunking is not the only speech segmentation mechanism that is implemented in speech comprehension and production. The empirical evidence shows that sentence processing in reading aloud speech segmentation is done at multiple levels of linguistic abstraction: pause placement and distribution in the ICs do not correspond to speech chunking. whereas chunking in longer utterances is supposed to result in 3-4 element sequences [14], [15], ICs in the same utterances result in bigger units and pauses signal about the boundaries of ICs, not speech chunks.

Perceived pauses relate to semantic and syntactic relevance of units. The decision on which units are considered to be semantically and syntactically relevant and the prosodic structure of these units is made in accordance to typical scenarios and patterns, which are activated by contextualization cues. As default syntactic structure for the imperative utterance is a simple sentence with one independent clause, it proved true in both written and audio samples in the research – 64.8% imperative utterances of AImperative and 58.6% of YAImperative samples had the default syntactic structure. In making decision people can rely on both knowledge resulting from context and experience and algorithmic mechanisms based on rules and form [32], [33]. When the context in the utterances from the research selections was not enough, or was ambiguous, the readers relied on algorithmic mechanisms based on rules and form of the language norms, dividing utterances by pauses into smaller units with one clause. The authors of the written fiction texts do not combine many imperative constructions within one sentence due to the clearly defined prototypical function of the imperatives. It is also evident that the readers in most cases make the decision to separate the clause with the imperative construction from the rest of the utterance with a pause based on the same presumptions.

The pause placement and distribution, as found by the research, follow the same pattern of decision making: even utterances that contain a single clause are divided by the pause into two prosodic groups if the sentence includes contextualization cues that can make an independent intonation group (vocatives, ‘please’, ‘oh’, ‘well’ etc.). This conclusion is supported by the number of utterances in which a clause with the imperative construction is separated from the rest of the utterance when the element that serves as the contextualization cue is found in the preposition to the imperative. In sentences with vocative/‘please’ in preposition, readers tend to separate them into a separate prosodic phrases. These elements acquire independent pragmatic meaning: a vocative becomes ‘calling a person from the distance/identifying a specific person’, ‘please’ forms a formulaic sentence. It shows that the readers prefer to identify and reproduce imperatives as autonomous prosodic structures, if they consider that the contextualization cue should be interpreted as a separate message.

Our findings show that the pause placement and pause distribution in reading in reading do not necessarily reflect the speech chunking as the result of the work of short-term memory mechanism, but rather follow the patterns which have the statistical nature and appear as results of the work of information retrieval mechanisms of the long-term memory activated by contextualization cues. There is conclusive evidence supporting the hierarchical character of the speech segmentation processes and the complexity of the interaction of cognitive and prosodic aspects in speech comprehension and production in reading. Pause placement and distribution patterns result from the cognitive aspect of speech production, they influence the prosodic structure of the utterance by establishing boundaries for prosodic structures and providing enough syllables for chunking within them.

Conclusion

As decision-making in speech production represents resource-limited inferential search, the comprehension and speech reproduction mechanisms in reading aloud are complex and hierarchical, they are activated by elements of the syntactic and prosodic structures supporting assumptions specific to the particular context in which the utterance is produced. Pause placement and distribution are the result of the work of the speech comprehension and reproduction mechanisms in reading aloud on the higher phonological level than speech chunking. The main purpose of the pause placement and distribution is to facilitate complex sentence processing in regard to its communicative meaning and provide enough space for lower level speech chunking.

Further research of the utterances with similar semantic and/or pragmatic structures (e.g. performative modals vs. imperatives) will allow to compute distribution patterns of pause placement and their similarities conditioned by the same semantics/pragmatics and differences determined by contextualization cues. This would help to establish cognitive and pragmatic criteria for the classification of prosodic patterns and resolve difficulties arising when prosodic contours overlap, creating prosodic homophony and ambiguity. We also suggest that prosodic means that manifest speech chunking are to be found in the tonal variation within the prenuclear part of the IC: when the unvoiced phonation cannot be found, tonal variation in the adjacent syllables may be perceived as the boundary between chunks.

Список литературы

  • Xu Y. In defense of lab speech / Y. Xu // Journal of Phonetics. – 2010. – №38. – P. 329-336.

  • Schiffrin D. Introduction / D. Schiffrin, D. Tannen, H.E. Hamilton // The Handbook of Discourse Analysis. – Oxford, UK: Blackwell Publishing, 2001. – P. 1-10.

  • Mey, J.L. Literary Pragmatics / J.L. Mey // The Handbook of Discourse Analysis. Oxford, UK: Blackwell Publishing, 2001. – P. 787-797.

  • Goodwin C. Conversational Organization: Interaction Between Speakers and Hearers / C. Goodwin – New York: Academic Press, 1981. – 195 p.

  • Garnham A. Observations on the Past and Future of Psycholinguistics / A. Garnham, S. Garrod, A. Sanford // Handbook of Psycholinguistics. – San Diego, CA: Academic Press. – 2006. – P. 1-18.

  • Schiffrin D. Approaches to Discourse: Language as Social Interaction / D. Schiffrin – Oxford, UK: Blackwell Textbooks in Linguistics, 2005. –482 p.

  • Upton TH. A. An Approach to Corpus-based Discourse Analysis: The Move Analysis as Example / TH.A. Upton, M.A. Cohen // Discourse Studies. – 2009. – №11(5). – P. 585-605.

  • Fery, C. Intonation and prosodic structure / C. Fery – Cambridge: Cambridge University Press, 2017. – 374 p.

  • Stojnic U. Discourse, Context, and Coherence / U. Stojnic // Beyond Semantics and Pragmatics. – Oxford, UK: Oxford University Press, 2018. – P. 97-124.

  • Kawaguchi Y. Introdiuction / Y. Kawaguchi, M. Minegishi, J. Durand // Corpus analysis and variation in linguistics. – Amsterdam: John Benjamins Pub. Co. 2009. – 398 p.

  • Cheng, W. A corpus-driven study of discourse intonation: The Hong Kong corpus of spoken English (prosodic) / W. Cheng, C. Greaves, M. Warren – Amsterdam: John Benjamins Pub., 2008. – 325 p.

  • Miller G. A. The magical number seven, plus or minus two: some limits on our capacity for processing information / G. A. Miller // Psychological Review. – 1956. – №63. – P. 81-97.

  • Cowan N. The magical number 4 in short-term memory: a reconsideration of mental storage capacity / N. Cowan // Behavioral and Brain Science. – 2000. – №24. – P. 87-185.

  • Yanushevskaya I. The distribution of pitch patterns and communicative types in speech chunks preceding pauses and gaps / I. Yanushevskaya, J. Kane, C. de Looze et al. // The Proceedings of the 7th International Conference on Speech Prosody 2014. – Speech Prosody, 2014. P. 959-963.

  • McCauley S.M. Chunking Ability Shapes Sentence Processing at Multiple Levels of Abstraction / S.M. McCauley, E.S. Isbilen, M.H. Christiansen // Cognitive Science, 2017. P. 2681-2686.

  • Paige D.D. Is prosodic reading a strategy for comprehension? / D.D. Paige, W.H. Rupley, , G.S. Smith et al. // Journal for Educational Research Online. – 2017. – №9 (2). – P. 245-275.

  • Groen M. A. The role of prosody in reading comprehension: evidence from poor comprehenders / M. A. Groen, N. J. Veenendaal, L. Verhoeven // Journal of Research in Reading. – 2019. – Volume 42. Issue 1. – P. 37-57.

  • Webber B.L. Computational Perspectives on Discourse and Dialog / B.L. Webber // The Handbook of Discourse Analysis. – Oxford, UK: Blackwell Publishing, 2001. P. 798-816.

  • Frazier L. Prosodic phrasing is central to language comprehension / L. Frazier, K. Carlson, C. Clifton // Trends in Cognitive Sciences. – 2006. – №10. – P. 244-249.

  • Stejskal V. Empty Speech Pause Detection in Spontaneous Speech / V.Stejskal, N.Bourbakis, A.Esposito // The Proceedings of the 21st IEEE International Conference on Tools with Artificial Intelligence, Newark, NJ, 2009. – P. 237-242.

  • Holzgrefe-Lang J. Infants’ Processing of Prosodic Cues: Electrophysiological Evidence for Boundary Perception beyond Pause Detection / J. Holzgrefe-Lang, C. Wellmann, B. Höhle et al. // Language and speech. – 2018. – №61 (1). – P. 153-169.

  • Fodor J. D. Psycholinguistics cannot escape prosody / J.D. Fodor // Proceedings of Speech Prosody, Aix-en-Provence, France. – 2002. – P. 83-88.

  • Drury J. E. Punctuation and Implicit Prosody in Silent Reading: An ERP Study Investigating English Garden-Path Sentences / J.E. Drury, S.R. Baum, H. Valeriote et al. // Frontiers in Psychology. – 2016. – №7. Article 1375.

  • Christiansen M.H. The Now-or-Never bottleneck: A fundamental constraint on language / M.H. Christiansen, N. Chater // Behavioral and Brain Sciences. – 2016. – 39:e62.

  • Kilgarriff A. The Sketch Engine: Ten Years On / A. Kilgarriff, V. Baisa, J. Busta et al. // Lexicography. – 2014. – №1. – P. 7-36.

  • Jary M. Imperatives / M. Jary, M. Kissine // Cambridge: Cambridge University Press, 2014. – 326 p.

  • Aikhenvald A. Y. Imperatives and Commands / A.Y. Aikhenvald // Oxford: Oxford University Press, 2010. – 520 p.

  • Jary, M. When Terminology Matters: The Imperative as a Comparative Concept / M. Jary, M. Kissine // Linguistics. – 2016. – №54 (1). – P. 119-148.

  • Mertens P. The Prosogram: Semi-automatic Transcription of Prosody Based on a Tonal Perception Model / P. Mertens // Proceedings of Speech Prosody, Nara 2004. – P. 549-552.

  • Giavazzi M. Structural priming in sentence comprehension: A single prime is enough / M. Giavazzi, S.Sambin, R. de Diego-Balaguer et al. // PLoS ONE. – 2018. – №13(4). – e0194959.

  • Prieto, P. Prosody: Stress, Rhythm, and Intonation /P. Prieto, P. Roseano // The Cambridge Handbook of Spanish Linguistics. –Cambridge: CUP, 2018. – P. 211-236.

  • Husband, E. M. The role of selection in the comprehension of focus alternatives / E.M. Husband, F. Ferreira // Language, Cognition and Neuroscience. – 2016. – №31:2. – P. 217-235.

  • Dwivedi V.D. Heuristics in Language Comprehension / V.D. Dwivedi, K.E. Goertz, J. Selvanayagam // Journal of Behavioral and Brain Science. – 2018. – №8. – P. 430-446.