GENERATION OF SYNTHETICAL MEDICAL DATA BY MDR-ANALYSIS

Authors

  • Kateryna Sazonova National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»
  • Olena Nosovets National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»
  • Vitalii Babenko National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»
  • Olga Averianova National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

DOI:

https://doi.org/10.18372/2306-1472.87.15719

Keywords:

data generation, synthetic data, entropy, correlation, communication direction, MDR-analysis

Abstract

Purpose: The purpose of this article is to outline an algorithm for generating synthetic medical data in order to augment small samples of data. Methods: To achieve the research goal, methods such as: correlation analysis (to identify significant variables and the relationships between them), MDR analysis (to build logical chains of relationships between medical data), and regression analysis (to model medical data variables to use this to generate synthetic data) were used. Results: A database of heart failure patients that is publicly available was used to test the developed algorithm for generating synthetic medical data in action; as a result, statistical relationships between data were found and used to build linear regression models. Discussion: The proposed algorithm allows, with a few simple, yet important actions, to perform the generation of medical data, which makes it possible to obtain large data sets that can be used to implement machine learning methods in any tasks related to medicine.

Author Biographies

Kateryna Sazonova, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Student. Department of Biomedical Cybernetics, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute». Research area: information technologies in medicine, computer science, data science, deep learning.

Olena Nosovets, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

PhD of Technical Science. Department of Biomedical Cybernetics, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute». Education: National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute» (2015). Research area: information technologies in medicine, computer science, data science, deep learning.

Vitalii Babenko, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Master of Science. Department of Biomedical Cybernetics, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute». Education: National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute» (2021). Research area: information technologies in medicine, computer science, data science, deep learning.

Olga Averianova, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Senior Lecturer. Department of Biomedical Cybernetics, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute». Research area: information technologies in medicine, computer science, data science, deep learning, system analysis, information system design, IT management

References

Patki N. The Synthetic Data Vault / N. Patki, R. Wedge, K. Veeramachaneni // IEEE International Conference on Data Science and Advanced Analytics (DSAA). – 2016. – Available at: https://bit.ly/3uU1IWU.

Towards Fairer Datasets: Filtering and Balancing teh Distribution of the People Subtree in the ImageNet Hierarchy / [K. Yang, K. Qinami, L. Fei-Fei та ін.] // Conference on Fairness, Accountabiility and Transparency. – 2020. – Available at: https://doi.org/10.1145/3351095.3372833.

Dodge S. A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions / S. Dodge, L. Karam. – 2017. – Available at: https://arxiv.org/pdf/1705.02498.pdf.

Watson A. Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data. / Alexander Watson. – 2020. – Available at: https://medium.com/gretel-ai/using-generative-differentially-private-models-to-build-privacy-enhancing-synthetic-datasets-c0633856184.

Privacy: Theory meets Practice on the Map / [A. Machanavajjhala, D. Kifer, J. Abowd et.al.]. – 2018. – Available at: https://bit.ly/33RpdnC.

Walters A. Why You Don’t Necessarily Need Data for Data Science / Austin Walters // Capital One Tech. – 2018. – Available at: https://bit.ly/2SZm4Qz.

Pouget-Abadie J. Generative Adversarial Networks / J. Pouget-Abadie, M. Mirza, B. Xu. – 2014. – Available at: https://arxiv.org/abs/1406.2661.

Fernández S. An application of recurrent neural networks to discriminative keyword spotting / S. Fernández, A. Graves, J. Schmidhuber // ICANN'07: Proceedings of the 17th international conference on Artificial neural networks. – 2007. – Available at: https://dl.acm.org/doi/10.5555/1778066.1778092.

Heart Failure Prediction Available at: https://www.kaggle.com/andrewmvd/heart-failure-clinical-data

Published

2021-07-27

How to Cite

Sazonova, K. ., Nosovets, O. ., Babenko, V. ., & Averianova, O. . (2021). GENERATION OF SYNTHETICAL MEDICAL DATA BY MDR-ANALYSIS. Proceedings of the National Aviation University, 87(2), 31–36. https://doi.org/10.18372/2306-1472.87.15719

Issue

Section

INFORMATION TECHNOLOGY