Systems and Means of Informatics
2025, Volume 35, Issue 4, pp 45-59
AN ARCHITECTURE OF A SYSTEM FOR THE SEMANTIC ANALYSIS OF THE SCIENTIFIC AND TECHNICAL DOCUMENTS
- A. V. Kan
- Alexander A. Khoroshilov
- I. A. Chechulin
- S. A. Stupnikov
- Yu. V. Nikitin
- Alexey A. Khoroshilov
Abstract
The paper considers the functional architecture of the system of semantic analysis of scientific and technical documents. The system belongs to the category of intelligent platforms for the aggregation and analysis of scientific and technical documents. The main technological solution of the system is to create a set of declarative tools adaptable to various domains of science and technology, implementing the concept of phraseological conceptual analysis of texts. This assumes combining the results of graphemic, morphological, conceptual, and semantic and syntactic analysis of texts into a formalized metadata model. The subsystems and modules constituting the architecture are considered, their functions and interaction are described. One of the most important components of the system is a set of morphological, semantic, syntactic, and conceptual dictionaries that are automatically generated and customized to domains of knowledge, ensuring the relevance of a set of linguistic resources.
A classification of systems for semantic analysis of texts and the problems they solve are considered. The system being developed is positioned in the classification and its distinctive features are distinguished.
[+] References (14)
- Huang, L., W. Yu, W. Ma, W. Zhong, Z. Feng, H. Wang, Q. Chen, W. Peng, X. Feng, B. Qin, and T. Liu. 2025. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. ACM T. Inform. Syst. 43(2):42. 55 p. doi: 10.1145/3703155.
- Smirnov, I. V. 2023. Intellektual'nyy analiz tekstov na osnove metodov raznourovnevoy obrabotki estestvennogo yazyka [Intelligent text analysis based on multilevel natural language processing methods]. Moscow: FRC CSC RAS. 356 p. EDN: VCDQEW.
- Khoroshilov, Al-dr A., A.V. Kan, and Al-ey A. Khoroshilov. 2019. Frazeologicheskiy mashinnyy perevod tekstov: teoreticheskie osnovy i tekhnologicheskie resheniya [Phraseological machine translation: Theoretical foundations and technical solutions]. Moscow: Direkt-Media. 466 p. doi: 10.23681/563869. EDN: PYQZHX.
- Mel'cuk, I., and J. Milicevic. 2020. An advanced introduction to semantics: A meaning- text approach. Cambridge University Press. 450 p.
- Smirnov, I., M. Stankevich, Y. Kuznetsova, M. Suvorova, D. Larionov, E. Nikitina, M. Savelov, and O. Grigoriev. 2021. TITANIS: A tool for intelligent text analysis in social media. Artificial intelligence. Eds. S. M. Kovalev, S. O. Kuznetsov, and A. I. Panov. Lecture notes in computer science ser. Cham: Springer. 12948:232-247. doi: 10.1007/978-3-030-86855-0-16. EDN: DQVODN.
- Ananieva, M. I., D. A. Devyatkin, D. V. Zubarev, G. S. Osipov, I V. Smirnov, I. V. Sochenkov, I. A. Tikhomirov, A. V. Shvets, and A. O. Shelmanov. 2016. TextAppliance: poisk i analiz bol'shikh massivov tekstov [TextAppliance: Retrieval and analysis of large text datasets]. Trudy 15-y Natsional'noy konferentsii po iskusstven- nomu intellektu [15th National Conference on Artificial Intelligence Proceedings] 3:220-228. EDN: WZPBPB.
- Turdakov, D.Y., N. A. Astrakhantsev, Y. R. Nedumov, et al. 2014. Texterra: A framework for text analysis. Programing and Computer Software 40:288-295. doi: 10.1134/S0361768814050090.
- Lopez, P. 2009. GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications. Research and advanced technology for digital libraries. Eds. M. Agosti, J. L. Borbinha, S. Kapidakis, C. Papatheodorou, and G. Tsakonas. Lecture notes in computer science ser. Berlin, Heidelberg: Springer. 5714:473-474. doi: 10.1007/978-3-642-04346-8-62.
- Kashyap, A. R., and M.-Y. Kan. 2020. SciWING - a software toolkit for scientific document processing. 1st Workshop on Scholarly Document Processing Proceedings. Association for computational linguistics. 113-120. doi: 10.18653/v1/2020.sdp-1.13.
- Fok, R., H. Kambhamettu, L. Soldaini, J. Bragg, K. Lo, M. Hearst, A. Head, and D. S. Weld. 2023. Scim: Intelligent skimming support for scientific papers. 28th Conference (International) on Intelligent User Interfaces Proceedings. New York, NY: Association for Computing Machinery. 476-490. doi: 10.1145/3581641.3584034.
- Nikitin, Yu.V., and A. A. Khoroshilov. 2023. Intellektual'nyy tekstovyy protsessor [Intelligent word processor]. Iskusstvennyy intellekt: teoriya i praktika [Artificial intelligence: Theory and practice] 1(1):56-75. EDN: KOZSNI.
- Zakharov, V.N., Al-dr A. Khoroshilov, and Al-ey A. Khoroshilov. 2018. Avtomaticheskoe postroenie sintaksicheskikh modeley yazyka dlya sistem obrabotki tekstovoy informatsii [Automatic construction of syntactic language models for text processing systems]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 28(4):4-9. doi: 10.14357/08696527180401. EDN: PJMVOD.
- Zatsman, I. 2022. Model' protsessa izvlecheniya novykh terminov i tonal'nykh slov iz tekstov [A model of discovering novel terms and sentiments in texts]. Sistemy i Sredstva Informatiki - Systems and Means of Informatics 32(2):115{127. doi: 10.14357/08696527220211. EDN: VPYVNK.
- Khoroshilov, A. A., A. V. Kan, and G. V. Filippov. 2024. Metody avtomatizirovannogo formirovaniya kompleksa ontologicheskikh spravochnikov po aviakosmicheskoy tematike [Methods for automated generation of a complex of ontological reference books on aerospace topics]. Nauchno-tekhnicheskiy vestnik Povolzh'ya [Scientific and Technical Volga Region Bulletin]. Kazan: Rashin Sayns. 5:150M54. EDN: CPKZTE.
[+] About this article
Title
AN ARCHITECTURE OF A SYSTEM FOR THE SEMANTIC ANALYSIS OF THE SCIENTIFIC AND TECHNICAL DOCUMENTS
Journal
Systems and Means of Informatics
Volume 35, Issue 4, pp 45-59
Cover Date
2025-12-25
DOI
10.14357/08696527250404
Print ISSN
0869-6527
Publisher
Institute of Informatics Problems, Russian Academy of Sciences
Additional Links
Key words
phraseological conceptual analysis of texts; semantic analysis of texts; scientific and technological text processing
Authors
A. V. Kan  ,  , Alexander A. Khoroshilov  ,  , I. A. Chechulin  , S.A. Stupnikov  ,  ,
Yu. V. Nikitin  , and Alexey A. Khoroshilov
Author Affiliations
 All-Russian Institute for Scientific and Technical Information of the Russian Academy of Sciences, 20 Usievicha Str., Moscow 125190, Russian Federation
 Moscow State Aviation Institute (National Research University), 4 Volokolamskoe Shosse, Moscow 125933, Russian Federation
 Federal Research Center "Computer Science and Control", Russian Academy of Sciences, 44-2 Vavilov Str., Moscow 119333, Russian Federation
 National Research Center "Zhukovsky Institute," 1 Zhukovsky Str., Zhukovsky 140180, Russian Federation
|