APPLICATION OF SEMANTIC AND OBJECT SEARCH FOR OPTIMIZING MEDIA SEARCH IN OSINT INVESTIGATIONS
DOI:
https://doi.org/10.18372/2310-5461.66.20246Keywords:
cybersecurity, OSINT, open source intelligence, media search, video analysis, semantic search, object search, object detection, perceptual hashing, information filteringAbstract
The modern information space is characterized by growth in volumes of diverse media data, which has become particularly acute after the widespread adoption of large language models and increased user activity on social media. This process of growth creates fundamentally new challenges for professionals in the field of Open Source Intelligence (OSINT), who must promptly process and analyze big arrays of textual, audio, photo, and video information. At the same time, the quality of this data does not always improve proportionally to its quantity. This creates need for the development of new approaches for effective filtering and searching of relevant information under conditions of information overload. One of the tools that can help in this regard is computer vision. Computer vision allows for the automation of visual content analysis processes and the identification of key elements in media data. The application of computer vision methods combined with modern machine learning algorithms opens new possibilities for improving the efficiency of OSINT investigations, so they must be studied further. The article examines the challenges of searching and filtering information in the context of Open Source Intelligence amid increasing volumes of data. Existing methods of video frame comparison are analyzed, including pixel-level methods, feature-based comparison, perceptual hashing, object detection algorithms, and semantic embeddings. A combined approach that integrates semantic and object analysis for effective key frame detection in video materials is proposed. A method for evaluating frame importance has been developed based on five criteria: appearance of new object classes, disappearance of object classes, changes in object quantity, appearance of priority objects, and detection confidence level. The combined score is calculated as a weighted sum of semantic distance and object evaluation, providing balanced detection of key video moments. This approach creates an adaptive analysis system that ensures a deeper understanding of video content for OSINT investigations, effectively filtering out irrelevant data.
References
Dupré, M. H. (2024). Huge Proportion of Internet Is AI-Generated Slime, Researchers Find. https://futurism.com/the-byte/internet-ai-generated-slime.
Shih-Fu, M. Recent Advances and Challenges of Semantic Image/Video Search. IEEE International Conference.
Sutskever, I. (2021). Learning Transferable Visual Models From Natural Language Supervision.
Palucha, S. (2024). Understanding OpenAI's CLIP model. https://medium.com/@paluchasz/understanding-openais-clip-model-6b52bade3fa3.
Deng, J. (2023). A Unified Model for Video Understanding and Knowledge Embedding with Heterogeneous Knowledge Graph Dataset.
Ma, Q. X. L. (2021). Perceptual hashing method for video content authentication with maximized robustness.
Alaa, T. (2024). Video Summarization Techniques: A Comprehensive Review. In Scitepress.
Castellano, B. (2025). Python and OpenCV-based scene cut/transition detection program & library. https://github.com/Breakthrough/PySceneDetect?tab=readme-ov-file.
Berrios, I. (2023). Introduction to Motion Detection. https://medium.com/@itberrios6/introduction-to-motion-detection-part-1-e031b0bb9bb2.
Iqbal, S. (2022). Proving Reliability of Image Processing Techniques in Digital Forensics Applications. Security and Communication Networks. https://doi.org/10.1155/2022/1322264.
X-Ways. (n.d.). X-Ways Forensics: Integrated Computer Forensics Software. https://www.x-ways.net/forensics/index-m.html.
Tan, Y. (1999). A Framework for Measuring Video Similarity and Its Application to Video Query by Example. Department of Electrical Engineering Princeton University.
Sommerville, G. (2021). Automatically compare two videos to find common content. Amazon. https://aws.amazon.com/blogs/media/metfc-automatically-compare-two-videos-to-find-common-content/.
Kender, R. L. a. P. J. (2021). Determining Video Similarity With Object Detection.
Luo, Z. (2023). Frame Comparison and Frame Clustering with Vision Transformer and K-Means on COVID-19 News Videos from Different Affinity Groups. Columbia University.
Kwasny, R. (2024). Fine-tuning YOLO in a Cafe: Custom Object Detection. https://rafalkwasny.com/yolo-object-detection.
Alkandary, A. S. K. (2025). A Comparative Study of YOLO Series (v3–v10) with DeepSORT and StrongSORT: A Real-Time Tracking Performance Study. https://doi.org/10.3390/electronics14050876.
Daniel, M. W. A. (2025). Video semantic search with AI on AWS. Amazon. https://aws.amazon.com/blogs/media/video-semantic-search-with-ai-on-aws/.
Ramanathan, V. (2015). Learning Temporal Embeddings for Complex Video Analysis. https://doi.org/10.48550/arXiv.1505.00315.
Radford, A. (2021). Learning Transferable Visual Models From Natural Language Supervision. https://doi.org/10.48550/arXiv.2103.00020.
Pendlebury, B. O. M. (2024). Exploring Video Search with OpenOrigins. Datastax. https://www.datastax.com/blog/video-search-with-openorigins-frame-search-versus-multi-modal-embeddings.