Advancing Clinical Research with Large-Scale EHR Data and Machine Learning
Yi Guo, PhD, FAMIA
Department of Health Outcomes and Biomedical Informatics, University of Florida
The widespread adoption of electronic health record (EHR) systems has created unprecedented opportunities for data-driven clinical research by enabling access to large-scale, longitudinal real-world data (RWD). National initiatives such as PCORnet have demonstrated how harmonized EHR and claims data can support population-scale analytics across diverse health systems. As one of the eight PCORnet networks, the OneFlorida+ Clinical Research Network maintains a centralized repository of linked, longitudinal RWD for over 20 million patients across multiple U.S. states. This talk will introduce the OneFlorida+ data infrastructure from a data science perspective and discuss how large-scale clinical data can be integrated with machine learning and artificial intelligence methods for tasks such as phenotyping, risk prediction, and outcome modeling. Particular emphasis will be placed on methodological considerations, including data heterogeneity, temporal structure, bias, and causal interpretation, that are critical for translating advanced analytics into robust and trustworthy clinical insights.
Keywords: Real-World Data, Machine Learning in Healthcare, Causal Inference.