Exploratory data analysis and visualization on the example of an e-commerce enterprise

Authors

  • S.R. Suleimanova National Aviation University

DOI:

https://doi.org/10.18372/2073-4751.79.19385

Keywords:

exploratory data analysis, data visualization, time series, correlation analysis, cluster analysis, machine learning

Abstract

This article presents an approach to exploratory data analysis and visualization using the example of an e-commerce company. The study examines key stages of exploratory data analysis, including data preprocessing, visualization, anomaly detection, correlation analysis, and cluster analysis, aimed at preparing data for solving machine learning tasks in future research. These tasks include estimating a time series model, identifying trends, seasonal and cyclical components of time series, customer clustering, new customer classification, and predicting the quantity of items sold within customer clusters. The proposed approach can be applied to the analysis of other e-commerce datasets.

References

Пінцак І. Використання машинного навчання та аналізу даних для прогнозування тенденцій у електронній комерції. Information Technology: Computer Science, Software Engineering and Cyber Security. 2024. № 1. С. 80–88. DOI: 10.32782/it/2024-1-10.

Про електронну комерцію: Закон України від 01.01.2024 № 675-VIII. URL: https://zakon.rada.gov.ua/laws/show/675-19#Text (дата звернення: 01.09.2024).

Що чекає на український e-commerce у 2024 році: розбираємо ключові тренди? URL: https://rau.ua/novyni/ukr-e-commerce-2024-trendi/ (дата звернення: 10.09.2024).

12 Best Machine Learning Strategies for E-commerce Businesses. URL: https://www.prefixbox.com/blog/machine-learning-for-ecommerce/ (дата звернення: 24.09.2024).

Apache Superset. The Apache Software Foundation. URL: https://superset.apache.org/ (дата зве-рнення: 01.10.2024).

Chen D., Sain S. L., Guo K. Data mining for the online retail industry: A case study of RFM model-based customer segmentation using data mining. Journal of Database Marketing & Customer Strategy Management. 2012. Vol. 19, no. 3. P. 197–208. URL: https://doi.org/10.1057/dbm.2012.17.

García-Aroca C. et al. An algorithm for automatic selection and combination of forecast models. Expert Systems with Applications. 2024. 121636. DOI: 10.1016/j.eswa.2023.121636.

How Ukrainian eCommerce Survived 2023. Annual Indicators & Forecast 2024. URL: https://www.promodo.com/research/ukrainian-ecommerce-2023#obsyag-ukrayinskogo-rinku-2023 (дата звернення: 11.09.2024).

Syakur M.A. et al. Integration K-Means Clustering Method and Elbow Method For Identification of The Best Customer Profile Cluster. IOP Conference Series: Materials Science and Engineering. 2018. Т. 336. 012017. DOI: 10.1088/1757-899x/336/1/012017.

Looker Studio Overview. URL: https://lookerstudio.google.com/ (дата звернення: 01.10.2024).

Matplotlib – Visualization with Python. URL: https://matplotlib.org/ (дата звернення: 02.10.2024).

NumPy. The fundamental package for scientific computing with Python. URL: https://numpy.org/ (дата звернення: 02.10.2024).

Pandas. Python Data Analysis Library. URL: https://pandas.pydata.org/ (дата звернення: 02.10.2024).

Power BI. Uncover powerful insights and turn them into impact. URL: https://www.microsoft.com/en-us/power-platform/products/power-bi (дата звернення: 01.10.2024).

Scikit-learn: machine learning in Python. URL: https://scikit-learn.org/stable/ (дата звернення: 03.10.2024).

Seaborn: statistical data visualization. URL: https://seaborn.pydata.org/ (дата звернення: 03.10.2024).

Sinaga K. P., Yang M.-S. Unsupervised K-Means Clustering Algorithm. IEEE Access. 2020. Т. 8. С. 80716–80727. DOI: 10.1109/access.2020.2988796.

Tableau: Business Intelligence and Analytics Software. URL: https://www.tableau.com/ (дата звер-нення: 01.10.2024).

Taylor S. J., Letham B. Forecasting at Scale. The American Statistician. 2018. Т. 72, № 1. С. 37–45. DOI: 10.1080/00031305.2017.1380080.

Published

2024-11-04

Issue

Section

Статті