Exploring how synthetic data can be used to share valuable primary care information for AI modeling, incorporating geolocation and temporal drift.
The project will explore our synthetic data generator, constructed in collaboration with the Medicines and Healthcare Products Regulatory Agency to enable sharing of valuable primary care information without risking patient privacy. It will explore to what extent temporal information (how primary care data changes over time) and spatial information (the impact of regional differences) can be incorporated into synthetic primary care data generation.
Geolocation data will be incorporated into Bayesian Networks (BNs) for modeling primary care data in the UK. Latent variables will be explored in these models using inference and visualisation techniques to gain an understanding of the importance and semantics of these latent variables. Temporal drift will be measured in models of primary care data using concept drift metrics.
Publications
de Benedetti, J., Oues, N., Wang, Z., Myles, P., Tucker, A. (2020). Practical lessons from Generating Synthetic Healthcare Data with Bayesian Networks. In: Koprinska I. et al. (eds) ECML PKDD 2020 Workshops. ECML PKDD 2020. Communications in Computer and Information Science, vol 1323. Springer, Cham.
Tucker, A., Wang, Z., Rotalinti, Y. et al. Generating high-fidelity synthetic patient data for assessing machine learning healthcare software. npj Digit. Med. 3, 147 (2020).
Wang, Z, Myles, P, Tucker, A. Generating and evaluating cross-sectional synthetic electronic healthcare data: Preserving data utility and patient privacy. Computational Intelligence. 2021; 1– 33.
Read more