Events

DMS Statistics and Data Science Seminar

Time: Oct 29, 2025 (01:00 PM)
Location: ONLINE

Details:

maweidong

Speaker: Weidong Ma (Univ. of Pennsylvania, Perelman School of Medicine, Biostatistics and Epidemiology)

Title: A Novel Framework for Addressing Disease Under-Diagnosis Using EHR Data.

 
Abstract: Effective treatment of medical conditions begins with an accurate diagnosis. However, many conditions are often underdiagnosed, either being overlooked or diagnosed after significant delays. Electronic Health Records (EHRs) contain extensive patient health information, offering an opportunity to probabilistically identify underdiagnosed individuals. The rationale is that both diagnosed and underdiagnosed patients may display similar health profiles in EHR data, distinguishing them from condition-free patients. Thus, EHR data can be leveraged to develop models that assess an individual’s risk of having a condition. To date, this opportunity has largely remained unexploited, partly due to the lack of suitable statistical methods. The key challenge is the positive-unlabeled EHR data structure, which consists of data for diagnosed ("positive") patients and the remaining ("unlabeled") that include underdiagnosed patients and many condition-free patients. Therefore, data for patients who are unambiguously condition-free, essential for developing risk assessment models, is unavailable. To overcome this challenge, we propose ascertaining condition statuses for a small subset of unlabeled patients. We develop a novel statistical method for building accurate models using this supplemented EHR data to estimate the probability that a patient has the condition of interest. Building on the developed risk prediction model, we further study the potential factors that may contribute to under-diagnosis. Numerical simulation studies and real data applications are conducted to assess the performance of the proposed methods.
 
 
Host: Huan He