GENERATION OF SYNTHETICAL MEDICAL DATA BY MDR-ANALYSIS
DOI:
https://doi.org/10.18372/2306-1472.87.15719Keywords:
data generation, synthetic data, entropy, correlation, communication direction, MDR-analysisAbstract
Purpose: The purpose of this article is to outline an algorithm for generating synthetic medical data in order to augment small samples of data. Methods: To achieve the research goal, methods such as: correlation analysis (to identify significant variables and the relationships between them), MDR analysis (to build logical chains of relationships between medical data), and regression analysis (to model medical data variables to use this to generate synthetic data) were used. Results: A database of heart failure patients that is publicly available was used to test the developed algorithm for generating synthetic medical data in action; as a result, statistical relationships between data were found and used to build linear regression models. Discussion: The proposed algorithm allows, with a few simple, yet important actions, to perform the generation of medical data, which makes it possible to obtain large data sets that can be used to implement machine learning methods in any tasks related to medicine.
References
Patki N. The Synthetic Data Vault / N. Patki, R. Wedge, K. Veeramachaneni // IEEE International Conference on Data Science and Advanced Analytics (DSAA). – 2016. – Available at: https://bit.ly/3uU1IWU.
Towards Fairer Datasets: Filtering and Balancing teh Distribution of the People Subtree in the ImageNet Hierarchy / [K. Yang, K. Qinami, L. Fei-Fei та ін.] // Conference on Fairness, Accountabiility and Transparency. – 2020. – Available at: https://doi.org/10.1145/3351095.3372833.
Dodge S. A Study and Comparison of Human and Deep Learning Recognition Performance Under Visual Distortions / S. Dodge, L. Karam. – 2017. – Available at: https://arxiv.org/pdf/1705.02498.pdf.
Watson A. Using generative, differentially-private models to build privacy-enhancing, synthetic datasets from real data. / Alexander Watson. – 2020. – Available at: https://medium.com/gretel-ai/using-generative-differentially-private-models-to-build-privacy-enhancing-synthetic-datasets-c0633856184.
Privacy: Theory meets Practice on the Map / [A. Machanavajjhala, D. Kifer, J. Abowd et.al.]. – 2018. – Available at: https://bit.ly/33RpdnC.
Walters A. Why You Don’t Necessarily Need Data for Data Science / Austin Walters // Capital One Tech. – 2018. – Available at: https://bit.ly/2SZm4Qz.
Pouget-Abadie J. Generative Adversarial Networks / J. Pouget-Abadie, M. Mirza, B. Xu. – 2014. – Available at: https://arxiv.org/abs/1406.2661.
Fernández S. An application of recurrent neural networks to discriminative keyword spotting / S. Fernández, A. Graves, J. Schmidhuber // ICANN'07: Proceedings of the 17th international conference on Artificial neural networks. – 2007. – Available at: https://dl.acm.org/doi/10.5555/1778066.1778092.
Heart Failure Prediction Available at: https://www.kaggle.com/andrewmvd/heart-failure-clinical-data
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.