medRxiv [Preprint]. 2024 Mar 30:2024.03.28.24305045. doi: 10.1101/2024.03.28.24305045.


BACKGROUND: Electronic health records (EHR) are increasingly used for studying multimorbidities. However, concerns about accuracy, completeness, and EHRs being primarily designed for billing and administration raise questions about the consistency and reproducibility of EHR-based multimorbidity research.

METHODS: Utilizing phecodes to represent the disease phenome, we analyzed pairwise comorbidity strengths using a dual logistic regression approach and constructed multimorbidity as an undirected weighted graph. We assessed the consistency of the multimorbidity networks within and between two major EHR systems at local (nodes and edges), meso (neighboring patterns), and global (network statistics) scales. We present case studies to identify disease clusters and uncover clinically interpretable disease relationships. We provide an interactive web tool and a knowledge base combing data from multiple sources for online multimorbidity analysis.

FINDINGS: Analyzing data from 500,000 patients across Vanderbilt University Medical Center and Mass General Brigham health systems, we observed a strong correlation in disease frequencies ( Kendall’s ι- = 0.643) and comorbidity strengths (Pearson π = 0.79). Consistent network statistics across EHRs suggest a similar structure of multimorbidity networks at various scales. Comorbidity strengths and similarities of multimorbidity connection patterns align with the disease genetic correlations. Graph-theoretic analyses revealed a consistent core-periphery structure, implying efficient network clustering through threshold graph construction. Using hydronephrosis as a case study, we demonstrated the network’s ability to uncover clinically relevant disease relationships and provide novel insights.

INTERPRETATION: Our findings demonstrate the robustness of large-scale EHR data for studying complex disease interactions. The alignment of multimorbidity patterns with genetic data suggests the potential utility for uncovering shared etiology of diseases. The consistent core-periphery network structure offers a strategic approach to analyze disease clusters. This work also sets the stage for advanced disease modeling, with implications for precision medicine.

FUNDING: VUMC Biostatistics Development Award, UL1 TR002243, R21DK127075, R01HL140074, P50GM115305, R01CA227481.

PMID:38585743 | PMC:PMC10996752 | DOI:10.1101/2024.03.28.24305045