Machine learning-based anomaly detection with Isolation Forest in large-scale data analysis

Authors

  • M.O. Kalashnyk

DOI:

https://doi.org/10.18372/2073-4751.83.20511

Keywords:

anomaly detection, Isolation Forest, unsupervised learning, threshold calibration, multivariate analysis, scalable analytics, interpretability

Abstract

This paper presents an applied study of unsupervised anomaly detection with Isolation Forest on large multivariate sensor data. The study implements a concise Python workflow that acquires city‑scale measurements for Kyiv, merges reference metadata, removes invalid records, selects two continuous indicators, and trains Isolation Forest with parameters. Temporal analysis shows that anomalies concentrate in contiguous intervals rather than isolated single points, while a two‑feature projection indicates that many flags coincide with jointly high values and others arise from atypical value combinations, highlighting multivariate effects.
The study documents practical advantages of Isolation Forest, including minimal distributional assumptions, direct control of alert volume via the contamination parameter, and near‑linear scaling that supports repeated retraining. It also notes limitations, such as sensitivity on small samples due to random tree construction, dependence on threshold calibration that can drift across datasets, and limited inherent explainability of individual alerts. Configuration guidance, robustness checks, and lightweight diagnostics are provided to support deployment and to maintain stable performance under changing conditions.

References

Yepmo V., Smits G., Lesot M.-J., Pivert O. Leveraging an Isolation Forest to Anomaly Detection and Data Clustering. Journal of Systems and Software. 2024. URL:https://www.sciencedirect.com/science/article/abs/pii/S0169023X24000260.

DataCamp. Isolation Forest Guide: Explanation and Python Implementation. 2024.URL:https://www.datacamp.com/tutorial/isolation-forest.

Xu H., Pang G., Wang Y., Wang Y. Deep Isolation Forest for Anomaly Detection. arXiv preprint. 2023. arXiv:2206.06602.URL:https://arxiv.org/pdf/2206.06602.

Laskar M. T. R., Huang J. X., Smetana V., Stewart C., Pouw K., An A., Chan S., Liu L. Extending Isolation Forest for Anomaly Detection in Big Data via K-Means. ACM Digital Library. 2021. DOI: 10.1145/3460976.

Ащепков В. О. Використання моделі isolation forest для виявлення аномалій у даних вимірювань. Сучасний стан наукових досліджень та технологій в промисловості. 2024. № 1(27). С. 236–245. DOI:https://doi.org/10.30837/ITSSI.2024.27.236.

Міністерство захисту довкілля та природних ресурсів України. Дані моніторингу якості атмосферного повітря в Україні. 2024. URL: https://data.gov.ua/datastore/dump/f6755e36-f910-4482-8260-6a601b8d8da4

Downloads

Published

2025-12-19

Issue

Section

Статті