HTML-content

2313-0288

2411-2968

Russian Linguistic Bulletin

2313-0288

Cifra LLC

10.60797/RULB.2026.73.14

Brief communication

LLMs and the Domain of Machine Translation

Kostinikova

Olga Alekseevna

olk2004@mail.ru 1

1 The Russian Presidential Academy of National Economy and Public Administration

16 01 2026

2026

3 73 1 3 19 11 2025 30 12 2025

2022

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See http://creativecommons.org/licenses/by/4.0/ .

LLMs have become powerful tools for synthetic data creation, automatic corpus expansion, and structure-aware prompting, all of which have unlocked progress in areas previously constrained by data scarcity. From augmenting speech-to-speech translation resources to generating domain-specific evaluation corpora, LLMs now act as both the objects of study and the instruments that enable continued field advancement. The review of the recent articles published in November 2025 shows that across speech, translation, and multimodal generation, the meta-trend is clear: low-resource, domain-intense, and structurally complex tasks that use synthetic data, linguistically informed representations, and human-aligned evaluation metrics are now the main drivers of progress in the domain of Machine Translation.

LLMs Automatic Speech recognition Machine Translation

HTML-content

1. Introduction

With the advent of large language models, the field of translation has entered a period of accelerated development, marked by new levels of fluency, robustness, and cross-lingual generalization showing progress in language technology. Yet, the integration of LLMs into translation and speech systems is far from straightforward. Despite their impressive capabilities, LLMs must navigate challenges such as data imbalance, complex morphology, diverse dialects, multimodal constraints, and discourse-level coherence — factors that traditional pipelines often handled through specialized modules and handcrafted linguistic knowledge [1], [5], [8], [10].

In translation, LLMs have demonstrated remarkable gains in general-domain text, but their performance declines in expert-level, culture-heavy, or long-document scenarios. These limitations have motivated the creation of new evaluation frameworks and benchmarks that can capture discourse structure, terminology consistency, and stylistic fidelity — dimensions essential for real-world deployment. At the same time, LLMs have proven invaluable for synthetic parallel data generation, enabling low-resource machine translation (MT) and speech-to-speech translation (S2ST) systems to achieve quality previously impossible with human-curated datasets alone. So the emergence of LLM-centred translation marks a turning point in the field. The integration of generative models with traditional signal-processing and linguistic components is enabling hybrid systems that surpass the existing limitations. As research pushes into more complex, low-resource, and domain-intense scenarios, LLMs are poised not only to enhance translation but to redefine the scientific and engineering principles that underpin them [2], [3], [6], [9]. The idea is clearly presented in some recent research published in November 2025 that is the subject of the review.

2. Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation

This article

[6]

In the second step, TAI uses a semantic graph–based prompt generator to prepare text descriptions for latent diffusion models. These graphs map dependencies, metaphors, and symbolic relations within the poem, enabling the image generation system to produce visuals that reflect conceptual meaning rather than shallow keyword associations. The authors show an example where a Punjabi poem describing a farming scene is translated into English and visualized with culturally accurate elements such as landscape, attire, and mood.

A major contribution of the work is the introduction of MorphoVerse, a new dataset of 1,570 poems across 21 low-resource Indian languages, each with rich morphological variation. This dataset helps address the lack of training resources for Indic literary NLP and supports broader research on cross-cultural poetic understanding. The authors argue that existing LLMs struggle with such texts because morphological richness and metaphorical density exceed the patterns seen in mainstream training corpora. They observe that raw LLM translations often omit cultural nuance or misinterpret figurative expressions, motivating the need for alignment mechanisms.

The paper also emphasizes the importance of multimodal comprehension for global accessibility of literary heritage. By linking translation and visual generation, TAI provides an interpretive bridge for readers unfamiliar with the cultural or linguistic context of Indian poetry. Experimental evaluation — both automated and human—shows that the proposed TAI Diffusion approach outperforms strong baselines in producing accurate, meaningful images aligned with poetic content. The authors conclude that multimodal methods, enriched with structural linguistic knowledge, offer a promising direction for revitalizing interest in low-resource literary traditions and making them accessible to a worldwide audience.

3. Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic Parallel Data

This article

[9]

A major contribution is the construction of a synthetic Persian–English parallel corpus, created by translating Persian transcripts using GPT-4o and synthesizing English audio with VoiceCraft TTS. This synthetic corpus expands existing parallel data by a factor of six and significantly improves training effectiveness.

Qualitative analyses show that synthetic data increases fluency and reduces omissions, particularly in long or rare constructions. The paper also states that discrete units help stabilize training and disentangle content from prosody. The authors argue that their pipeline is especially suitable for dubbing applications where speaker identity preservation and low latency matter. Their results demonstrate that combining self-supervised pretraining, discrete units, and synthetic corpora is a powerful strategy for low-resource S2ST.

They conclude that this framework can generalize to other under-resourced languages and plan future work involving prosody-aware unit modelling, multilingual training, and cross-lingual pretraining. Overall, this study provides both a new dataset and a strong model that advance the state of direct S2ST for Persian–English translation.

4. Non-Linear Scoring Models for Translation Quality Evaluation

This article

[3]

The authors then present a mathematically simple, two-parameter logarithmic scoring model which captures actual tolerance patterns. Figures in the paper show that logarithmic curves closely follow expert judgments, while linear curves significantly diverge, especially for long texts. Importantly, the authors emphasize that for very small samples (<250 words), no deterministic curve is reliable, and statistical quality-control methods must be used instead.

They argue that implementing this model will bring translation-quality evaluation closer to real-world perception and reduce unfair pass/fail outcomes. They propose integrating this tolerance function into AI-evaluation systems as well, since LLM outputs suffer from similar short- vs. long-text biases, so the idea is that adopting non-linear calibration is essential for fair, human-aligned translation-quality evaluation.

5. Conclusion

This collection of the research papers reveals a coherent picture of modern computational linguistics: due to LLMs, the field has moved decisively toward low-resource languages, specialized domains, and complex multimodal tasks that go far beyond traditional translation or ASR. Across all papers, a central shared theme is that data scarcity, not modeling limitations, is the primary barrier — and each paper proposes a novel strategy to overcome it.

First, the papers attack low-resource problems through synthetic data generation or major dataset expansion. The Persian–English S2ST work constructs 6× more data using LLM translation and TTS synthesis, proving that synthetic data dramatically boosts BLEU. The poetry paper introduces MorphoVerse, the first sizeable dataset for 21 Indic languages. Together, these demonstrate that data augmentation using LLMs, TTS, or expert curation consistently yields the largest jumps in performance.

Second, multiple papers demonstrate the importance of representation learning tailored to each linguistic challenge. Discrete speech units simplify S2ST in resource-scarce conditions. Semantic graph prompting captures metaphorical structure in poetry, bridging a major gap in literary translation.

Third, evaluation itself emerges as a central research frontier. The scoring-model paper argues that the entire translation industry misjudges quality due to inappropriate linear scoring models, proposing a perception-aligned logarithmic alternative. Together, these papers show that evaluation is not a solved problem — and future benchmarks must incorporate discourse, domain expertise, and human-like tolerance curves.

Additional File

The additional file for this article can be found as follows:

Online Supplementary Material

Further description of analytic pipeline and patient demographic information. DOI: https://doi.org/10.60797/RULB.2026.73.14

Acknowledgements

Competing Interests

1 Berdejo-Espinola V. AI tools can improve equity in science / V.Berdejo-Espinola, T. Amano // Science. — 2023. — Vol. 379 (6636). — P. 991. — DOI: 10.1126/science.adg9714. 2 Dubey P. The Hindi to Dogri machine translation system: grammatical perspective / P.Dubey // International Journal of Information Technology. — 2019. — Vol. 11 (1). — P. 171–182. — DOI: 10.1007/s41870-018-0085-4. 3 Gladkoff S. Non-Linear Scoring Models for Translation Quality Evaluation / S.Gladkoff, H. Lifeng, K. Gasova // arXiv. — 2025. — DOI: 10.48550/arXiv.2511.13467. 4 Gray A. ChatGPT “contamination”: estimating the prevalence of LLMs in the scholarly literature / A.Gray // arXiv. — 2024. — DOI: 10.48550/arXiv.2403.16887. 5 Hu E.J. LoRA: Low-Rank Adaptation of Large Language Models/ E.J. Hu, Y. Shen, P. Wallis [et al.] // arXiv. — 2021. — DOI: 10.48550/arXiv.2106.09685. 6 Jamil S. Crossing Borders: A Multimodal Challenge for Indian Poetry Translation and Image Generation / S.Jamil, K.S. Charan, S. Saha [et al.] // arXiv. — 17 2025. — DOI: 10.48550/arXiv.2511.13689. 7 Litvinova T.A. Writing in the era of large language models: a bibliometric analysis of research field / T.A. Litvinova, G.K. Mikros, O.V. Dekhnich // Research Result. Theoretical and Applied Linguistics. — 2024. — Vol. 10. — № 4. — P. 5–16. — DOI 10.18413/2313-8912-2024-10-4-0-1. 8 Navigli R. Biases in large language models: origins, inventory, and discussion / R. Navigli, S. Simone, B. Ross // ACM Journal of Data and Information Quality. — 2023. — Vol. 15 (2). — P. 1–21. — DOI: 10.1145/3597307. 9 Rashidi S. Improving Direct Persian-English Speech-to-Speech Translation with Discrete Units and Synthetic / S. Rashidi, H. Sameti // arXiv. — 2025. — DOI: 10.48550/arXiv.2511.12690. 10 Wang P. Large language models are not fair evaluators / P. Wang, L. Li, L. Chen [et al.] // arXiv. — 2023. — DOI: 10.48550/arXiv.2305.17926.