Filter for confidential information

Authors

  • Oleksii Bezymiannyi National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”
  • Nataliia Shapoval National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” https://orcid.org/0000-0002-8509-6886

DOI:

https://doi.org/10.18372/1990-5548.78.18256

Keywords:

large language models, confidential information filter, word embedding, prompt injection, jailbreaking, NLP model, SBERT

Abstract

The research is conducted on the topic of preventing various types of attacks on large language models (LLM), as well as preventing the leakage of confidential data when working with local text databases. Research is performed by implementing a filter and testing it on an example that aims to filter requests to the model. The proposed filter does not block the request to the LLM, but removes its parts, which is much faster and makes it impossible for an attacker to pick up a request, as it destroys its structure. The filter uses word embedding to evaluate the request to the LLM, which together with the use of a hash table for forbidden topics, speeds up the operation of the filter. To protect against attacks such as prompt injection and prompt leaking attack, the filter uses the method of randomly closing the sequence. During the testing process, significant improvements were obtained in maintaining the security of data used by LLM. Currently, the use of such filters in product projects and startups is an extremely important step, but there is a lack of ready-made implementations of filters with similar properties. The uniqueness of the filter lies in its independence from LLM and the use of semantic similarity as a fine-tuned way of classifying queries.

Author Biographies

Oleksii Bezymiannyi , National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

The research is conducted on the topic of preventing various types of attacks on large language models, as well as preventing the leakage of confidential data when working with local text databases. Research is performed by implementing a filter and testing it on an example that aims to filter requests to the model. The proposed filter does not block the request to the large language models, but removes its parts, which is much faster and makes it impossible for an attacker to pick up a request, as it destroys its structure. The filter uses word embedding to evaluate the request to the large language models, which together with the use of a hash table for forbidden topics, speeds up the operation of the filter. To protect against attacks such as prompt injection and prompt leaking attack, the filter uses the method of randomly closing the sequence. During the testing process, significant improvements were obtained in maintaining the security of data used by large language models. Currently, the use of such filters in product projects and startups is an extremely important step, but there is a lack of ready-made implementations of filters with similar properties. The uniqueness of the filter lies in its independence from large language models and the use of semantic similarity as a fine-tuned way of classifying queries.

Nataliia Shapoval, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”

Candidate of Science (Engineering)

Associate Professor

References

ChatGPT Question Filter. [Electronic resource]. URL:https://github.com/derwiki/llm-prompt-injection-filtering (accessed 30.09.23).

KANG, Daniel, et al. Exploiting programmatic behavior of llms: Dual-use through standard security attacks. arXiv preprint arXiv:2302.05733, 2023.

NI, Jianmo, et al. Sentence-t5: Scalable sentence encoders from pre-trained text-to-text models. arXiv preprint arXiv:2108.08877, 2021. https://doi.org/10.18653/v1/2022.findings-acl.146

Using GPT-Eliezer against ChatGPT Jailbreaking. [Electronic resource]. URL:https://www.alignmentforum.org/posts/pNcFYZnPdXyL2RfgA/using-gpt-eliezer-against-chatgpt-jailbreaking (accessed 30.09.23).

“Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks,” Nils Reimers, Iryna Gurevych, 2019, arXiv:1908.10084.

Downloads

Published

2023-12-27

Issue

Section

COMPUTER SCIENCES AND INFORMATION TECHNOLOGIES