Systems and Means of Informatics
2025, Volume 35, Issue 2, pp 103-115
METHOD OF AUTOMATED DETECTION OF PUNCTUATION ASYMMETRY IN PARALLEL TEXTS
- S. D. Ignatova
- A. A. Goncharov
- N. V. Buntman
Abstract
The paper explores the method of automated detection of interlingual punctuation asymmetry in parallel texts. Analyzing the functioning of punctuation marks requires a large scale of empirical data, which determines the use of parallel text corpora. The study outlines the potential of using search with exclusion in a parallel text database to automate detection of punctuation asymmetry in two languages. A search with exclusion involves identifying pairs of text fragments that contain certain language units in one language but do not contain any units from a defined set in another language. The feasibility of automated detection of punctuation asymmetry was tested by examining the use of the exclamation mark in Russian and French. Throughout the study, seven types of language substitutions were identified and quantitively analyzed.
[+] References (18)
- Chowdhary, K. R. 2020. Natural language processing. Fundamentals of artificial intelligence. New Delhi: Springer. 603-649. doi: 10.1007/978-81-322-3972-7-19.
- Chopra, A., A. Prashar, and C. Sain. 2013. Natural language processing. Int. J. Technology Enhancements Emerging Engineering Research 1(4):13U134.
- Ek, A., J.P. Bernardy, and S. Chatzikyriakidis. 2020. How does punctuation affect neural models in natural language inference. Probability and Meaning Conference Proceedings. Gothenburg: Association for Computational Linguistics. 109M16.
- Chordia, V. 2021. PunKtuator: A multilingual punctuation restoration system for spoken and written text. 16th Conference of the European Chapter of the Association for Computational Linguistics Proceedings: System Demonstrations. Association for Computational Linguistics. 312{320. doi: 10.18653/v1 /2021.eacl-demos.37.
- Pais, V., and D. Tufis. 2022. Capitalization and punctuation restoration: A survey. Artif. Intell. Rev. 55(3):1681{1722. doi: 10.1007/s10462-021-10051-x.
- Nozaki, J., T. Kawahara, K. Ishizuka, and T. Hashimoto. 2022. End-to-end speech- to-punctuated-text recognition. Cornell University. 5 p. Available at: https://arxiv. org/pdf/2207.03169 (accessed April 22, 2025).
- Zhou, Z., T. Tan, and Y. Qian. 2022. Punctuation prediction for streaming on-device speech recognition. IEEE Conference (International) on Acoustics, Speech and Signal Processing Proceedings. IEEE. 7277{7281. doi: 10.1109/ ICASSP43922.2022.9746366.
- Garbovskiy, N. K. 2007. Teoriya perevoda [Theory of translation]. 2nd ed. Moscow: Moscow State University Publs. 544 p. EDN: UUJKUZ.
- Lekomtseva, I. A., and T. V. Kuraleva. 2018. Mezh"yazykovaya asimmetriya v perevode [Interlingual asymmetry in translation]. Baltiyskiy gumanitarnyy zh. [Baltic Humanitarian J.] 7(1):101{104. EDN: YWKWLS.
- Zaliznyak, Anna A., I. M. Zatsman, O.Yu. In'kova, and M. G. Kruzhkov. 2015. Nadkorpusnye bazy dannykh kak lingvisticheskiy resurs [Supracorpora databases as a linguistic resource]. 7th Conference (International) on Corpus Linguistics Proceedings. St. Petersburg: Saint Petersburg State University. 211-218.
- Zaliznyak, Anna A., I. M. Zatsman, and O.Yu. Inkova. 2017. Nadkorpusnaya baza dannykh konnektorov: postroenie sistemy terminov [Supracorpora database on connectives: Term system development]. Informatika i ee Primeneniya - Inform. Appl. 11(1):100-108. doi: 10.14357/19922264170109. EDN: YOCMYN.
- Kruzhkov, M. G. 2021. Kontseptsiya postroeniya nadkorpusnykh baz dannykh [Conceptual framework for supracorpora databases]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 31(3): 101-112. doi: 10.14357/08696527210309. EDN: UMWNIU.
- Goncharov, A. A. 2023. Annotirovanie parallel'nykh korpusov: podkhody i napravleniya razvitiya [Parallel corpus annotation: Approaches and directions for development]. Informatika i ee Primeneniya - Inform. Appl. 17(4):81-87. doi: 10.14357/ 19922264230411. EDN: GDKDOZ.
- Zakharov, V.P., and S.Yu. Bogdanova. 2020. Korpusnaya lingvistika [Corpus linguistics]. 3rd ed. St. Petersburg: Saint Petersburg State University. 234 p.
- Goncharov, A. A. 2023. Poisk s isklyucheniem v parallel'nykh tekstakh [Search with exclusion in parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 33(4):102-114. doi: 10.14357/08696527230410. EDN: CVPFDV.
- Goncharov, A. A., and O.Yu. Inkova. 2019. Metodika poiska implitsitnykh logiko- semanticheskikh otnosheniy v tekste [Methods for identification of implicit logical- semantic relations in texts]. Informatika i ee Primeneniya - Inform. Appl. 13(3):97- 104. doi: 10.14357/19922264190314. EDN: MWGFJW.
- Goncharov, A. A. 2022. Metody poiska implitsitnykh logiko-semanticheskikh otnosheniy v parallel'nykh tekstakh [Methods for retrieval of implicit logical-semantic relations from parallel texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 32(4):32-44. doi: 10.14357/08696527220404. EDN: VBWRFS.
- Nuriev, V. A., and S.D. Ignatova. 2024. Nadkorpusnaya baza dannykh kak instrument izucheniya punktuatsii [Supracorpora database as a tool for studying punctuation]. Vestn. Mosk. un-ta. Ser. 19. Lingvistika i mezhkul'turnaya kommunikatsiya [Moscow State University Bulletin. Ser. 19. Linguistics and Intercultural Communication] 27(4): 147-158. doi: 10.55959/MSU-2074-1588-19-27-4-11. EDN: KVNQWW.
[+] About this article
Title
METHOD OF AUTOMATED DETECTION OF PUNCTUATION ASYMMETRY IN PARALLEL TEXTS
Journal
Systems and Means of Informatics
Volume 35, Issue 2, pp 103-115
Cover Date
2025-05-20
DOI
10.14357/08696527250207
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
punctuation; interlingual asymmetry; parallel texts; search with exclusion; databases
Authors
S. D. Ignatova  , A. A. Goncharov  , and N. V. Buntman
Author Affiliations
 Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
 M.V. Lomonosov Moscow State University, 1-52 Leninskie Gory, GSP-1, Moscow 119991, Russian Federation
|