Перенавчання у сфері машинного навчання

M.V. Struk; Yu.B. Modenov

doi:10.18372/2073-4751.78.18968

Overfitting in machine learning

Authors

M.V. Struk National Aviation University
Yu.B. Modenov National Aviation University https://orcid.org/0000-0003-3898-4159

DOI:

https://doi.org/10.18372/2073-4751.78.18968

Keywords:

overfitting, regularization (dropout, L1, L2), bias-variance tradeoff, polynomial regression, VC dimension

Abstract

The problem of overfitting in machine learning is relevant and important for achieving high accuracy and reliability of predictions on real data. This article is dedicated to exploring the problem of overfitting from a mathematical perspective. It begins with a general overview of the problem and its importance for scientific and practical tasks such as pattern recognition, forecasting, and diagnostics. Starting with defining key concepts such as model complexity, sample size, bias-variance tradeoff, and dispersion, the text reveals the relationship between them and the influence of sample size on the learning process. To demonstrate these concepts, Python code is developed that uses polynomial regression as a model for analysis. Through the creation of synthetic data and fitting different models to them, the phenomenon of overfitting and its impact on prediction accuracy is illustrated. The concluding remarks emphasize the importance of understanding the mathematical aspects of overfitting for developing reliable and effective models in machine learning. Further analysis of recent research and publications in this field demonstrates various approaches to solving the problem, including regularization methods, the use of ensemble methods, and the development of new neural network architectures. Unresolved aspects, such as finding the optimal balance between model complexity and generality, are highlighted for further investigation. The ultimate goal of the article is to identify the key aspects of the overfitting problem and formulate goals for further research in this area.

References

What is overfitting?. URL: https://www.ibm.com/topics/overfitting?source=post_page-----09af234e9ce4-------------------------------- (дата звернення: 21.04.2024).

Fang C. et al. 4 – The Overfitting Iceberg. URL: https://blog.ml.cmu.edu/2020/08/31/4-overfitting/ (дата звернення: 26.04.2024).

Dijkinga F. J. Explaining L1 and L2 regularization in machine learning. URL: https://medium.com/@fernando.dijkinga/explaining-l1-and-l2-regularization-in-machine-learning-2356ee91c8e3 (дата звернення: 26.04.2024).

Oppermann A. Regularization in Deep Learning – L1, L2, and Dropout. URL: https://towardsdatascience.com/regularization-in-deep-learning-l1-l2-and-dropout-377e75acc036 (дата звернення: 25.04.2024).

Vignesh Sh. The Perfect Fit for a DNN. URL: https://medium.com/analytics-vidhya/the-perfect-fit-for-a-dnn-596954c9ea39 (дата звернення: 26.04.2024).

Downloads

PDF (Українська)

Published

2024-07-01

Issue

Vol. 2 No. 78 (2024)

Section

Статті

License

Автори, які публікуються у цьому журналі, погоджуються з наступними умовами:

Автори залишають за собою право на авторство своєї роботи та передають журналу право першої публікації цієї роботи на умовах ліцензії Creative Commons Attribution License, котра дозволяє іншим особам вільно розповсюджувати опубліковану роботу з обов'язковим посиланням на авторів оригінальної роботи та першу публікацію роботи у цьому журналі.

Автори мають право укладати самостійні додаткові угоди щодо неексклюзивного розповсюдження роботи у тому вигляді, в якому вона була опублікована цим журналом (наприклад, розміщувати роботу в електронному сховищі установи або публікувати у складі монографії), за умови збереження посилання на першу публікацію роботи у цьому журналі.

Політика журналу дозволяє і заохочує розміщення авторами в мережі Інтернет (наприклад, у сховищах установ або на особистих веб-сайтах) рукопису роботи, як до подання цього рукопису до редакції, так і під час його редакційного опрацювання, оскільки це сприяє виникненню продуктивної наукової дискусії та позитивно позначається на оперативності та динаміці цитування опублікованої роботи (див. The Effect of Open Access).