Спосіб прискореного розпізнавання фейкових новин на основі обробки природної мови та видалення голосних літер у словах

L.D. Mishchenko; I.A. Klymenko

doi:10.18372/2073-4751.73.17643

A method of accelerated fake news recognition based on natural language processing and removal of vowels in words

Authors

L.D. Mishchenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”
I.A. Klymenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” https://orcid.org/0000-0001-5345-8806

DOI:

https://doi.org/10.18372/2073-4751.73.17643

Keywords:

Natural Language Processing technology, fake, manipulation, text analysis, Levenshtein distance, vector word representation

Abstract

News web resources are gaining more popularity these days. Such sources of information can use the trust of the audience to manipulate facts and spread fakes. Thus, protection against such resources is a huge challenge today.

The most important part of any software is its speed. Fakes appear every day, but today there are no automatic fact-checking systems. All checks are done by journalists or by semi-automated systems that are either specific to small tasks or too slow. Therefore, this article proposes a fact-checking method using NLP and Levenshtein algorithm. At the same time, the method offers accelerated text analysis, making calculations with the minimum value of the vector representation of words. This was achieved at the level of NLP lemmatization by removing vowels from words.

In our time, the topic is studied quite deeply. But most research focuses on using NLP technology for natural speech analysis in specific fields such as text search, bots, text markup, etc. In addition, the reduction of the vector representation of words for accelerated text analysis and structuring of its tokens was not considered.

The main task of the research is to develop an effective forgery detection system using Natural Language Processing technology, which shows the result quite quickly by reducing the length of words, and not based on the previous training of the system.

The paper proves the ability of NLP technology to solve the task of fact checking. However, there are still several directions for further work. For example, using a learning neural network to detect the most common forgeries or investigating the occurrence of collisions in the vector representation of words without vowels.

References

Mishchenko L., Klymenko I. Method for detecting fake news based on natural language processing. The VI International Scientific and Practical Conference “Modern ways of solving the problems in science in the world”, February 13-15, Warsaw, Poland. – P. 375-378. URL: https://eu-conf.com/ua/events/modern-ways-of-solving-the-problems-of-science-in-the-world/.

Zhou X., Zafarani R. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 2020. – Vol. 53 – No. 5. – P. 1-40.

Ruchansky N., Seo S., Liu Y. CSI: A hybrid deep model for fake news detection. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. – 2017. – P. 797-806.

McCormick Ch. Word2Vec Tutorial – The Skip-Gram Model. – P. 1-5. URL: https://www.fer.unizg.hr/_download/repository/TAR-2020-reading-05.pdf.

Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed Representations of Words and Phrases and their Compositionality. Advances in neural information processing systems, 2013, – P. 26. URL: https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.

Сonvert Word to Vector component. Microsoft documentation, 2021. URL: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/convert-word-to-vector.

Downloads

PDF (Українська)

Published

2023-04-28

How to Cite

Mishchenko, L., & Klymenko, I. (2023). A method of accelerated fake news recognition based on natural language processing and removal of vowels in words. Problems of Informatization and Management, 1(73), 39–44. https://doi.org/10.18372/2073-4751.73.17643

Download Citation

Issue

Vol. 1 No. 73 (2023)

Section

Статті

License

Автори, які публікуються у цьому журналі, погоджуються з наступними умовами:

Автори залишають за собою право на авторство своєї роботи та передають журналу право першої публікації цієї роботи на умовах ліцензії Creative Commons Attribution License, котра дозволяє іншим особам вільно розповсюджувати опубліковану роботу з обов'язковим посиланням на авторів оригінальної роботи та першу публікацію роботи у цьому журналі.

Автори мають право укладати самостійні додаткові угоди щодо неексклюзивного розповсюдження роботи у тому вигляді, в якому вона була опублікована цим журналом (наприклад, розміщувати роботу в електронному сховищі установи або публікувати у складі монографії), за умови збереження посилання на першу публікацію роботи у цьому журналі.

Політика журналу дозволяє і заохочує розміщення авторами в мережі Інтернет (наприклад, у сховищах установ або на особистих веб-сайтах) рукопису роботи, як до подання цього рукопису до редакції, так і під час його редакційного опрацювання, оскільки це сприяє виникненню продуктивної наукової дискусії та позитивно позначається на оперативності та динаміці цитування опублікованої роботи (див. The Effect of Open Access).