ЯЗЫКОВЫЕ ОСОБЕННОСТИ РУССКОЯЗЫЧНЫХ ТЕКСТОВ ЛИЦ, СОВЕРШИВШИХ СУИЦИД, И ЛИЦ С ВЫСОКИМ РИСКОМ АУТОАГРЕССИВНОГО ПОВЕДЕНИЯ

Научная статья
DOI:
https://doi.org/10.18454/RULB.12.19
Выпуск: № 4 (12), 2017
PDF

Аннотация

Одним из перспективных направлений современных исследований является анализ речи с целью выявления психического состояния и оценки психического здоровья говорящего/пишущего. В последние годы наблюдается повышенный интерес к решению задач подобного рода с привлечением методов и средств компьютерной лингвистики и интеллектуального анализа данных (data mining). Отдельной научной проблемой, далекой от своего решения и, несомненно, требующей консолидации усилий психологов, лингвистов и специалистов по интеллектуальному анализу данных, является проблема диагностирования склонности личности к аутоагрессивному поведению и (суициду как к крайней его форме) на основе лингвистического анализа речи. Эта проблема имеет не только теоретическую, но и очевидную практическую значимость и в последние годы активно изучается лингвистами и психологами. С применением методов natural language processing ученые анализируют тексты (преимущественно англоязычные) суицидентов и строят модели, позволяющие классифицировать текст как принадлежащий или не принадлежащий суициденту, а также выявляют особенности таких текстов. При этом если ранее анализировались преимущественно художественные тексты суицидентов, то в новейших работах ученые исследуют интернет-тексты (блоги, твиты, посты в соцсетях) лиц, совершивших суицид либо выражающих намерение его совершить. Русский язык долгое время оставался на периферии подобных исследований. В статье представлены результаты исследований, направленных на выявление языковых особенностей русскоязычных текстов лиц, совершивших законченный суицид, а также лиц, склонных к аутоагрессивному поведению. В указанных исследованиях применялись методы и приемы корпусной лингвистики, компьютерной лингвистики, статистического анализа. Обозначены перспективы дальнейших исследований.

Introduction

The problem of personality profiling of authors of written texts has been dealt with for over several decades, but recently there has been a growing interest to it due to a rapid growth of Internet communication and an increasing need for the methods allowing one based on the quantitative analysis of anonymous and pseudoanonymous online texts to trace the personality (gender, age, education level, native language, psychological traits, etc.) of their authors. Psychologists and linguists have been leading the way in text-based personality profiling (see [2] for more detail).

As part of studies of text-based personality profiling, attempts have been made to diagnose certain mental diseases (depression, schizophrenia, bipolar disorder, etc.) in authors of written texts [6; 8; 9]. Another problem waiting to be addressed as part of the joint effort by psychologists, linguists and data mining experts is that of detecting suicidal tendencies in individuals based on their speech. That certainly is of both theoretical and practical significance.

Method

Over 800000 people are reported to die of suicide annually [20] with only 30 % having previously stated their suicidal intentions [20]. Therefore there is a pressing need to develop methods to identify individuals displaying suicidal tendencies and thus to prevent them from committing suicides. Linguistic analysis appears to be one of the ways to tackle that [8]. Lately there has been a lot of focus on analyzing Internet texts related to suicide (blogs, tweets, etc.) [7, 17, 18]. However, these are mostly texts regarding suicide that are analyzed rather than Internet texts by individuals who committed suicide (those few that are analyzed are written by one person, i.e. case study, see [14] for example). It should also be noted that most studies dealing with linguistic analysis of suicidal individuals have been performed on English-language texts. However, as it is rightly pointed out in [8], in order to address the problem, it is crucial to make use of other languages to be able to identify culturally (and linguistically) universal suicide predictors. It is also essential that related studies are interdisciplinary.

Another way to go about addressing the issue is to identify individuals with high risks of autoaggressive behavior that turns into suicidal one in its extreme form.

Russian texts have long not been analyzed as part of the problem except fiction texts [10; 12]. Russian Internet texts by suicidal individuals were first analyzed in [16]. Linguistic features of Russian texts by individuals with high risks of autoaggressive behavior were investigated in [3]. This paper summarizes the previous findings and outlines directions for future research.

Results and Discussion

1. Linguistic features of Russian fiction texts by individuals who committed suicide

Texts by Russian poets (translated into English) as well as those by poets of other nationalities were researched by S. Stirman, J. Pennebaker [22] to compare them with those by poets who did not end up committing suicide regardless of the author’s nationality and native language. Suicidal poets were generally found to use more pronouns “I” and fewer words describing social interaction.

Texts by Russian suicidal poets were investigated as part of a special study [10]. Linguistic parameters (labeling methods are not mentioned in the article) were those used by S. W. Stirman, J. W. Pennebaker [22]. Davidson found, however, that the proportion of pronouns “I” and corresponding object pronouns were steadily on the rise in texts by suicidal poets and is not stably high, while in texts by the control group it goes down.  Additionally, the number of negations (no, not) was analyzed and their proportion was found to increase in texts by suicidal individuals and to decrease over time in those by the control group.

The authors of [12] designed the classifiers to distinguish the fiction texts by Russian suicidal poets and those of the control group. The classifier based on a full set of parameters (word n-grams, relative frequencies of parts of speech, punctuation marks, word length, etc.) was shown to be most effective (F-measure = 0.825). Unfortunately, no analysis of the differences of texts by suicidal individuals and those of the control group was carried out.

2. Linguistic features of Russian online texts by individuals who committed suicide

In [16] the results of a pilot study of Internet texts – online diaries (on the LiveJournal platform) - by individuals who committed suicide (SUI corpus) are described. 45 such diaries were found by means of manual search and further checking. There is a total of 196037 words in the SUI corpus. Texts used for comparison (i.e. those by the control group) were samples of writing of students of Russian universities that make up RusPersonality corpus [15] with the total of 198045 words (NON-SUI corpus).

All the texts were labeled using the LIWC software [19] with the users’ dictionaries compiled by the authors with the total of 104 parameters. Statistically significant differences were found between the parameters of texts by suicidal individuals and those of the control group. A series of operations was performed in order to select the properties and as a result, the classifier was designed with the accuracy of 71,5 %. The approach set forth by the authors was shown to be highly accurate for text classification given that linguistic parameters that are maximum content-independent (proportion of commas, function words, etc.), which is indicative of how effective methods of natural language processing and data mining can be in identifying suicidal tendencies of individuals based on their texts.

The analysis suggests that in Russian texts by suicidal individuals there are more function words, verbs, conjunctions, cognitive words, commas, fewer prepositions, more comparison words and pronouns. These texts appear to be more abstract and contain fewer spatial references.

Texts by suicidal individuals were also found to contain more words for negative emotions and fewer of those describing social relations and perception (particularly visual), which is indicative of these people being more preoccupied with their own thoughts and isolated from the outside world (see [16] for more detail).

Note that existing studies addressing linguistic analysis of suiciders commonly rely on the sociological concept of suicide [11] according to which a suicidal individual is not capable of social integration and is excluded from society. According to the psychological concept of suicide a suicidal individual provokes the sense of hopelessness, despair and helplessness and a range of associated negative emotions [21]. Therefore one can expect there to be more words to describe the author and negative emotions in these texts.

Through the course of existing studies varying results (sometimes contradictory ones) have been obtained regarding the linguistic features of texts by suiciders (see [16] for more detail), but the above theories were mostly shown to be correct. The analysis that we have conducted for Russian texts showed these theories to generally hold as well.

In [13] there is a hypothesis that suicidal poets see the world as unstable, undetermined, hostile, which is expressed with ontological and epistemological categories that are reflective of one’s inner world. Suicidal poets were found to use fewer words to describe motion, space, bodily state (world’s general characteristics); more words for negation and exclusion (relationship with the world); more words expressing uncertainty but fewer words for vision and perception overall. The authors [18] assume that it is the perception of the outside world as being hostile and incomprehensible that causes these individuals to shut themselves down from it and as a result, to become increasingly self-centered and isolated. These results are largely in agreement with those obtained for Russian blogs [16].

3. Linguistic features of texts by individuals with autoaggressive tendencies

The authors [3] have conducted a study to find out how possible it would be to identify personality traits of authors of written texts that might be personal determinants of autoaggressive behavior (suicidal behavior is one of its forms) using data on the neurobiological nature of individual characteristics on one hand and cerebral mechanisms of discourse production on the other hand.

The scientific literature suggests that in individuals with suicidal tendencies the right-hemispheric modus of solution predominates for both verbal and visual-spatial problems, which is associated with the left prefrontal dysfunction (see [1; 3] for more detail). At the same time based on studies of temporary inactivation of cerebral hemispheres it is known what parts of the brain are responsible for producing certain discourse units (e.g., abstract nouns, function words, complex syntactic structures) as well as what “language functions” can be performed by the right and left cerebral hemispheres (see review in [4; 5]). The authors [3] assumed that in texts by individuals with high risks of autoaggressive behavior there are more language structures that the right hemisphere is responsible for than in those by people with no autoaggressive tendencies and that there are increasingly fewer structures that the left hemisphere is responsible for, particularly the left part of the prefrontal cortex.

The study material was a text corpus “RusPersonality” [15]. As was shown in [3], overall for texts by individuals with high risks of autoaggressive behavior lower lexical diversity, fewer prepositions, more pronouns overall particularly personal ones with a higher index of logical cohesion (created due to more conjunctions and deictic units) and a larger average word length are typical. The data on a lower index of lexical diversity are in agreement with those on a decreasing vocabulary level for hyperactivity of the right hemisphere, a lower proposition of prepositions is due to insufficient activation of the zones of the left hemisphere that is known to be responsible for most abstract vocabulary; a higher pronominalisation index is commonly observed when paradigmatic language connections that rely on the back hemisphere [5] get weakened. It was also shown that it is insufficient activation of the back hemisphere that is associated with aggressive and suicidal behavior [1].

Conclusion

According to the World Health Organization, suicide is one of the most common causes of death in young individuals (15-19 years old). Unfortunately, in Russia this problem is also relevant. The problem of suicide in teenagers has drawn a significant amount of attention from the state and public. In several countries systems of monitoring social networks and redirecting users to a psychological counseling website as well as mobile applications are developed for timely detection of individuals with high risks of suicidal behavior and counseling [9]. Methods of natural language processing are essential in designing such systems. It was not until recently that Russian texts started being investigated as part of this problem.

As the analysis suggests, texts by individuals who committed suicide and those with suicidal and, more broadly, autoaggressive tendencies have typical linguistic features. Certainly, the obtained data require further clarification first of all due to expanding language materials to be studied. One of the directions for future research is to analyze the dynamics of idiolect of individuals who committed suicide as well as indices of linguistic complexity  of corresponding texts.

Список литературы

  • Егоров А. Ю. Особенности индивидуальных профилей функциональной асимметрии у лиц, совершивших суицидальную попытку / А. Ю. Егоров, О. В. Иванов // Социальная и клиническая психиатрия. – 2007. – № 2 (17). – С. 20-24.

  • Литвинова Т.А. Установление характеристик (профилирование) автора письменного текста / Т.А. Литвинова // Филологические науки. Вопросы теории и практики. – 2012. – № 2 (13). – C. 90-94.

  • Литвинова Т.А. Диагностирование склонности автора письменного текста к аутоагрессивному поведению / Т.А. Литвинова, П.В. Середин, О.А. Литвинова и др. // Вестник Воронежского гос. ун-та. Серия: Лингвистика и межкультурная коммуникация. – 2015. – № 3. – С. 98-104.

  • Петрова Т.Е. Особенности построения текста в аспекте функциональной асимметрии мозга: дис. ... канд. филол. наук / Т.Е. Петрова. – СПб., 2000.

  • Фотекова Т. А. Диагностика речевых нарушений школьников с использованием нейропсихологических методов: пос. для логопедов и психологов / Т. А. Фотекова, Т. В. Ахутина. – М.: АРКТИ, 2002.

  • Baddeley J.L. Email Communications Among People with and Without Major Depressive Disorder: Doctoral diss / J.L. Baddeley. – Austin, TX, 2011.

  • Barak A. Writing characteristics of suicidal people on the Internet: a psychological investigation of emerging social environments / A. Barak, O. Miron // Suicide and Life-Threatening Behavior. - 2005. – Vol. 35, Iss. 5. – Рp. 507-524.

  • Calvo R. Natural language processing in mental health applications using non-clinical texts / R. Calvo, D. Milne, M. Hussain, H. Christensen // Natural Language Engineering. – 2017. – Vol. 23, Iss. 5. – Рp. 649-685.

  • Christensen H. E-Health Interventions for Suicide Prevention / H. Christensen // Int J Environ Res Public Health. – 2014. – Vol. 11, Iss. 8. – Pp. 8193-8212.

  • Davidson Ch. Comparative Psychological Analysis of Six Russian Poets / Ch. Davidson // US-China Foreign Language. – 2013. – Vol. 11, Iss. 1. – Рp. 40-45.

  • Durkheim E. Suicide / E. Durkheim. – New York: Free Press, 1951.

  • Ermakov S. Linguistic Approach to Suicide Detection / S. Ermakov, L. Ermakova // Труды ИСП РАН. – 2014. – Т. 26, вып. 4. – С. 113-121.

  • Katarzyna P. Escaping the World: Linguistic Indicators of Suicide Attempts in Poets / P. Katarzyna, J. Trzebiński // Journal of Loss and Trauma: International Perspectives on Stress & Coping. – 2014. – Vol. 19, Iss. 5. – Рp. 389-402.

  • Li T.M. Temporal and computerized psycholinguistic analysis of the blog / T.M. Li, M. Chau, P.S. Yip, P.W. Wong // Crisis. – 2014. – Vol. 35, Iss. 3. – Рp. 168-75.

  • Litvinova T. “RusPersonality”: A Russian corpus for authorship profiling and deception detection / T. Litvinova, О. Litvinlova, O. Zagorovskaya, P. Seredin, A. Sboev, O. Romanchenko // Proceedings of International FRUCT Conference on Intelligence, Social Media and Web (ISMW FRUCT). – St. Petersburg, 2016. – Pp. 1-7.

  • Litvinova T.A. Identification of Suicidal Tendencies of Individuals Based on the Quantitative Analysis of Their Internet Texts / T.A. Litvinova, P.V. Seredin, O.A. Litvinova, O.V. Zagorovskaya // Computación y Sistemas. – 2017. – Vol. 21, Iss. 2. – Рp. 243-252.

  • Masuda N. Suicide ideation of individuals in online social networks / N. Masuda, I. Kurahashi, H. Onari // PloS One. - 2013. – Vol. 8, Iss. 4. – Р. e62262.

  • O’Dea B. Detecting suicidality on Twitter / B. O’Dea, S. Wan, P. J. Batterham, A. L. Calear, C. Paris, H. Christensen // Internet Interventions. – 2015. – Vol. 2, Iss. 2. – Рp. 183–188.

  • Pennebaker J. W. The development and psychometric properties of LIWC2007 / J. W. Pennebaker. – Austin, TX: LIWC.net, 2007.

  • Pestian J. Suicide Note Classification Using Natural Language Processing: A Content Analysis / J. Pestian, H. Nasrallah, P. Matykiewicz, A. Bennett, A. Leenaars // Biomed Inform Insights. – 2010. – Iss. 3. – Рp. 19-28.

  • Petrie K. Sense of coherence, self-esteem, depression and hopelessness as correlates of reattempting suicide / K. Petrie, R. Brook // British Journal of Clinical Psychology. – 1992. – Iss. 31. – Рp. 293-300.

  • Stirman S. Word use in the poetry of suicidal and nonsuicidal poets / S. Stirman, J. Pennebaker // Psychosomatic Medicine. – 2001. – Iss. 63. – Рp. 517-522.