A method of accelerated fake news recognition based on natural language processing and removal of vowels in words

Authors

  • L.D. Mishchenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”
  • I.A. Klymenko National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” https://orcid.org/0000-0001-5345-8806

DOI:

https://doi.org/10.18372/2073-4751.73.17643

Keywords:

Natural Language Processing technology, fake, manipulation, text analysis, Levenshtein distance, vector word representation

Abstract

News web resources are gaining more popularity these days. Such sources of information can use the trust of the audience to manipulate facts and spread fakes. Thus, protection against such resources is a huge challenge today.

The most important part of any software is its speed. Fakes appear every day, but today there are no automatic fact-checking systems. All checks are done by journalists or by semi-automated systems that are either specific to small tasks or too slow. Therefore, this article proposes a fact-checking method using NLP and Levenshtein algorithm. At the same time, the method offers accelerated text analysis, making calculations with the minimum value of the vector representation of words. This was achieved at the level of NLP lemmatization by removing vowels from words.

In our time, the topic is studied quite deeply. But most research focuses on using NLP technology for natural speech analysis in specific fields such as text search, bots, text markup, etc. In addition, the reduction of the vector representation of words for accelerated text analysis and structuring of its tokens was not considered.

The main task of the research is to develop an effective forgery detection system using Natural Language Processing technology, which shows the result quite quickly by reducing the length of words, and not based on the previous training of the system.

The paper proves the ability of NLP technology to solve the task of fact checking. However, there are still several directions for further work. For example, using a learning neural network to detect the most common forgeries or investigating the occurrence of collisions in the vector representation of words without vowels.

References

Mishchenko L., Klymenko I. Method for detecting fake news based on natural language processing. The VI International Scientific and Practical Conference “Modern ways of solving the problems in science in the world”, February 13-15, Warsaw, Poland. – P. 375-378. URL: https://eu-conf.com/ua/events/modern-ways-of-solving-the-problems-of-science-in-the-world/.

Zhou X., Zafarani R. A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys (CSUR), 2020. – Vol. 53 – No. 5. – P. 1-40.

Ruchansky N., Seo S., Liu Y. CSI: A hybrid deep model for fake news detection. Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. – 2017. – P. 797-806.

McCormick Ch. Word2Vec Tutorial – The Skip-Gram Model. – P. 1-5. URL: https://www.fer.unizg.hr/_download/repository/TAR-2020-reading-05.pdf.

Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed Representations of Words and Phrases and their Compositionality. Advances in neural information processing systems, 2013, – P. 26. URL: https://proceedings.neurips.cc/paper/2013/file/9aa42b31882ec039965f3c4923ce901b-Paper.pdf.

Сonvert Word to Vector component. Microsoft documentation, 2021. URL: https://learn.microsoft.com/en-us/azure/machine-learning/component-reference/convert-word-to-vector.

Published

2023-04-28

Issue

Section

Статті