Features of calculating the information entropy of the text in case of attacking the linguistic stegosystem by semantic compression
DOI:
https://doi.org/10.18372/2225-5036.24.12954Keywords:
linguistic steganography, counteraction the steganography methods, information entropy, semantic compression, semantic compression limit, semantic redundancy, steganalysis, textual steganography, removal of the stegomessageAbstract
The article deals with the improvement of well-known methods for calculating the entropy of the text, and the description of the information entropy of the text calculating peculiarities in case of the semantic compression attack on the linguistic stegosystem, implemented in the cognominal program complex. The problem of determining the natural language text entropy in the context of further discursive analysis and semantic redundancy removal is formalized. Additional parameters that contribute to determining the semantic entropy of meaningful and artificially generated text for a semantic compression attack on the linguistic stegosystem, the container of which is textual information of natural (English) language are entered. The entropy variety for different language styles is substantiated and its changing according to the style is explained due to the need of adding specialized terminology dictionaries to the general terminology dictionary. In addition to the calculation features of conditional and unconditional entropy in case of using the software complex for attack the linguistic stegosystem, the dictionary size used in it and the prescribed grammar rules size are given, which are the additional parameters determining the entropy calculation in a particular case. The maximum entropy calculation for meaningless texts and the amount of information of a single word or a grammatical form calculation in case of maximum and real entropy are shown. In addition, the calculation of the semantic compression limit is given and the task of determining the semantic information redundancy is formalized. Thus, it becomes possible to determine the quality of the compression attack, carried out on the basis of the software complex use. The obtained results can be used in further research to improve the means of conducting an attack, which will increase its efficiency by maximally approximating the semantic compression limit.
References
В. Грибунин, И. Оков, И. Туринцев «Цифровая стеганография». Москва, СОЛОН-ПРЕСС, 263 с., 2009.
Я. Тарасенко, «Програмний комплекс проведення атаки на лінгвістичну стегосистему», Безпека інформації, №24(1), с. 56-61, 2018.
Я. Тарасенко, «Експериментальне дослідження роботи програмного комплексу проведення атаки на лінгвістичну стегосистему», Захист інформації, Том 20, № 2, c. 79-88, 2018.
В. Мищенко, Ю. Виланский, «Ущербные тексты и многоканальная криптографія». Минск, Энциклопедикс, 292 с., 2007.
Z. Chen, L. Huang, Z. Yu, Xi. Zhao, Xu. Zhao «Effective Linguistic Steganography Detection», 8th International Conference on Computer and Information Technology Workshops, Sidney, Australia, July 08-11, pp. 224-229, 2008.
C. Bentz, D. Alikaniotis, M. Cysouw, R. Ferrer-i-Cancho «The Entropy of Words – Learnability and Expressivity across More than 1000 Languages», Entropy, №19(6):275, 2017. URL: http://www.mdpi.com /1099-4300/19/6/275/htm.
A. Herbelot, M. Ganesalingam, «Measuring semantic content in distributional vectors», Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 04-09, Vol. 2, pp. 440-445, 2013.
Z. Jiapeng, Y. Yang, L. Tingwen, S. Jinqiao, «Towards Personal Relation Extraction Based on Sentence Pattern Tree», China Conference on Knowledge Graph and Semantic Computing, Beijing, China, September 19-22, Vol. 650, pp. 92-103, 2016.
В. Иванов, «Избранные труды по семиотике и истории культуры. Том 4: Знаковые системы культуры искусства и науки». Москва, Языки славянских культур, 792 с., 2007.
R. Ospanova, «Calculating Information Entropy of Language Texts», World Applied Sciences Journal, №22(1), pp. 41-45, 2013.
С. Гусаренко, «О семантических структурах дискурса и семантической энтропии», Известия Волгоградского государственного педагогического университета, № 5, с. 71-74, 2007.
Н. Валгина, «Теория текста». Москва, Логос, 280 с., 2003.
Е. Зверева, Е. Лебедько, «Сборник примеров и задач по основам теории информации и кодирования сообщений». СПб, НИУ ИТМО, 76 с., 2014.