Statistics and Data Science Seminars



Upcoming Statistics and Data Science Seminars
Past Statistics and Data Science Seminars
DMS Statistics and Data Science Seminar
Sep 25, 2024 02:00 PM
ZOOM


lanluo

Speaker: Dr. Lan Luo (Assistant Professor, Department of Biostatistics and Epidemiology at Rutgers University)

Title: Online statistical inference with streaming data: renewability, dependence, and dynamics

 

Abstract: New data collection and storage technologies have given rise to a new field of streaming data analytics, including real-time statistical methodology for online data analyses. Streaming data refers to high-throughput recordings with large volumes of observations gathered sequentially and perpetually over time. Such data collection scheme is pervasive not only in biomedical sciences such as mobile health, but also in other fields such as IT, finance, services, and operations. Despite a large amount of work in the field of online learning, most of them are established under strong independent and identical data distribution, and very few target statistical inference. This talk will center around three key components in streaming data analyses: (i) renewable updating, (ii) cross-batch dependency, and (iii) time-varying effects. I will first introduce how to conduct a renewable updating procedure, in the case of independent data batches, with a particular aim of achieving similar statistical properties to the offline oracle methods but enjoying great computational efficiency. Then I will discuss how we handle the dependency structure that spans across a sequence of data batches to maintain statistical efficiency in the process of renewable updating. Lastly, a dynamic weighting scheme will be integrated into the online inference framework to account for time-varying effects. I will provide both conceptual understanding and theoretical guarantees of the proposed method and illustrate its performance via numerical examples.  

 


DMS Statistics and Data Science Seminar
Sep 18, 2024 02:00 PM
ZOOM


Zou

Speaker: Dr. James Zou (Associate Professor of Biomedical Data Science, Computer Science, and Electrical Engineering at Stanford University)

Title: Generative AI agents for science and medicine 

 

Abstract: This talk will explore how we can develop and use generative AI to help researchers. I will first discuss how generative AI can act as research co-advisors. We will then discuss how genAI can expand researchers' creativity by designing and experimentally validating new drugs. Finally, I will present how visual-language AI helps clinicians aggregate and interpret noisy data. I will conclude by sharing some thoughts on the future of AI agents for science. 

 

Mini Bio: James Zou is an associate professor of Biomedical Data Science, CS and EE at Stanford University. He is also the faculty director of Stanford AI4Health. He works on advancing the foundations of ML and in-depth scientific and clinical applications. Many of his innovations are widely used in tech and biotech industries.  He has received a Sloan Fellowship, an NSF CAREER Award, two Chan-Zuckerberg Investigator Awards, a Top Ten Clinical Achievement Award, several best paper awards, and faculty awards from Google, Amazon, and Adobe. His research has also been profiled in popular press including the NY Times, WSJ, and WIRED.

 


DMS Statistics and Data Science Seminar
Sep 11, 2024 02:00 PM
354 Parker Hall


guidolin

Speaker: Dr. Mariangela Guidolin (University of Padua)

Title: Innovation Diffusion Models: Theory and Practice

 

Abstract: The seminar is a general overview of a class innovation diffusion models that can be used to describe and forecast the evolution in time of sales of new products or technologies. Starting from the basic Bass model (BM), the seminar will be devoted to present some of its generalizations, which account for the presence of exogenous shocks, affecting the timing of the diffusion process, and for the presence of a dynamic market potential, as a function of a communication process, which develops over time. Moreover, some generalizations of the univariate BM are proposed to account for the presence of competition. The statistical techniques involved in model estimation combine time-series analysis with nonlinear regression techniques. The key objectives of the seminar are: to describe the main mathematical features of the models, discussing the meaning of the parameters from the economic point of view with real-data applications; to present and discuss the statistical aspects involved in model estimation and selection; to show and discuss predictive and explanatory ability of the proposed models, highlighting the properties and limitations of each of the models described.


DMS Statistics and Data Science Seminar
Sep 04, 2024 02:00 PM
ZOOM


lee

Speaker: Dr. JungWun Lee (Assistant Professor, Department of Biostatistics, Boston University School of Public Health)

Title: A latent trajectory analysis for multivariate mixed outcomes: a study on the effect of bariatric surgery via electronic health records.

 

Abstract: Trajectory analysis can be a statistical solution for explaining heterogeneities by partitioning patients into less heterogeneous subgroups based on similarities in outcome variables. This work proposes a novel trajectory analysis for electronic health records, a longitudinal data set containing multiple biomarkers, demographic factors of patients, and many missing values. The proposed model discovers subgroups of patients so that patients with the same trajectory group memberships are similar in their observed outcomes, while patients with different trajectories are heterogeneous. The proposed model may conceive multivariate mixed outcomes consisting of categorical and continuous variables simultaneously. We suggest an estimation strategy using the expectation-maximization algorithm, which provides the maximum-likelihood estimates and is highly stable to many missing values. We also present an application of our methodology to the DURABLE data set, an NIH-funded study examining long-term outcomes of patients who experienced bariatric surgery between 2007 and 2011.


DMS Statistics and Data Science Seminar
Apr 24, 2024 02:00 PM
ZOOM


shuoyangWang

Speaker: Dr. Shuoyang Wang (Assistant Professor, University of Louisville)

Title: Inference on High-dimensional Mediation Analysis with Convoluted Confounding via Deep Neural Networks

 

Abstract: Traditional linear mediation analysis has inherent limitations when it comes to handling high-dimensional mediators. Particularly, accurately estimating and rigorously inferring mediation effects is challenging, primarily due to the intertwined nature of the mediator selection issue. Despite recent developments, the existing methods are inadequate for addressing the complex relationships introduced by confounders. To tackle these challenges, we propose a novel approach called DP2LM (Deep neural network based Penalized Partially Linear Mediation). DP2LM incorporates deep neural network techniques to account for nonlinear effects in confounders and utilizes the penalized partially linear model to accommodate high dimensionality.  In addition, to address the influence of outliers on mediation effects, we present an enhanced version of DP2LM called QDP2LM (Quantile Deep Neural Network-based Penalized Partially Linear Mediation). QDP2LM builds upon DP2LM and provides a comprehensive assessment of mediation effects across various quantiles. Unlike most existing works that concentrate on mediator selection, our methods prioritize estimation and inference on mediation effects. Specifically, we develop test procedures for testing the direct and indirect mediation effects. Theoretical analysis shows that the proposed procedures control type I error rates for hypothesis testing on mediation effects. Numerical studies show that the proposed methods outperform existing approaches under a variety of settings, demonstrating their versatility and reliability as modeling tools for complex data. Our application of the proposed methods to study DNA methylation's mediation effects of childhood trauma on cortisol stress reactivity reveals previously undiscovered relationships through a comprehensive analysis.


DMS Statistics and Data Science Seminar
Apr 17, 2024 02:00 PM
354 Parker Hall


ShujieMa

Speaker: Dr. Shujie Ma (University of California at Riverside)

Title: Causal Inference on Quantile Dose-response Functions via Local ReLU Least Squares Weighting

 

Abstract: In this talk, I will introduce a novel local ReLU network least squares weighting method to estimate quantile dose-response functions in observational studies. Unlike the conventional inverse propensity weighting (IPW)  method, we estimate the weighting function involved in the treatment effect estimator directly through local ReLU least squares optimization. The proposed method takes advantage of ReLU networks applied for the baseline covariates with increasing dimension to alleviate the dimensionality problem while retaining flexibility and local kernel smoothing for the continuous treatment to precisely estimate the quantile dose-response function and prepare for statistical inference. Our method enjoys computational convenience, scalability, and flexibility. It also improves robustness and numerical stability compared to the conventional IPW method. We show that the ReLU networks can break the notorious `curse of dimensionality' when the weighting function belongs to a newly introduced smoothness class.  We also establish the convergence rate for the ReLU network estimator and the asymptotic normality of the proposed estimator for the quantile dose-response function. We further propose a multiplier bootstrap method to construct confidence bands for quantile dose-response functions. The finite sample performance of our proposed method is illustrated through simulations and a real data application.


DMS Statistics and Data Science Seminar
Apr 10, 2024 02:00 PM
354 Parker Hall


tedwestling.jpg

Speaker: Dr. Ted Westling (Assistant Professor, University of Massachusetts Amherst)

Title: Consistency of the bootstrap for asymptotically linear estimators based on machine learning

 


Abstract: The bootstrap is a popular method of constructing confidence intervals due to its ease of use and broad applicability. Theoretical properties of bootstrap procedures have been established in a variety of settings.  However, there is limited theoretical research on the use of the bootstrap in the context of estimation of a differentiable functional in a nonparametric or semiparametric model when nuisance functions are estimated using machine learning. In this article, we provide general conditions for consistency of the bootstrap in such scenarios. Our results cover a range of estimator constructions, nuisance estimation methods, bootstrap sampling distributions, and bootstrap confidence interval types. We provide refined results for the empirical bootstrap and smoothed bootstraps, and for one-step estimators, plug-in estimators, empirical mean plug-in estimators, and estimating equations-based estimators. We illustrate the use of our general results by demonstrating the asymptotic validity of bootstrap confidence intervals for the average density value and G-computed conditional mean parameters, and compare their performance in finite samples using numerical studies. Throughout, we emphasize whether and how the bootstrap can produce asymptotically valid confidence intervals when standard methods fail to do so.

This is joint work with UMass Amherst Statistics PhD student Zhou Tang. A preprint of the paper is available here: https://arxiv.org/abs/2404.03064.


DMS Statistics and Data Science Seminar
Apr 03, 2024 02:00 PM
ZOOM


panpanzhang.jpg

Speaker: Dr. Panpan Zhang (Assistant Professor, Vanderbilt University Medical Center)

Title: Challenges and Opportunities for Longitudinal Analysis of Neurodegenerative Disorders

 

 
Abstract: Alzheimer's disease (AD) and Parkinson's disease (PD) are chronic neurodegenerative disorders that gradually destroy memory, thinking skills, and mobility, causing significant impacts on life quality and economic burden. Longitudinal analysis is a promising tool that helps clinicians and neuroscientists better understand changes in the characteristics of the target population over the continuum of AD (or PD) progression. However, the lengthy course of development of such diseases poses many challenges in biostatistical studies. In this presentation, I will introduce two recent projects respectively focusing on missing covariate problems and mismatching time scale problems arising from the longitudinal modeling of AD and PD. I will showcase the novelty of the proposed methods, but also discuss their limitations and potential improvements. The applications in these two projects are primarily based on the open data from the Parkinson's Progression Markers Initiative (PPMI) and the Alzheimer's Disease Neuroimaging Initiative (ADNI).


DMS Statistics and Data Science Seminar
Mar 20, 2024 02:00 PM
354 Parker Hall


linbowang.png

Speaker:  Dr. Linbo Wang (University of Toronto)

Title: Sparse Causal Learning

 

Abstract: In many observational studies, researchers are interested in studying the effects of multiple exposures on the same outcome. Unmeasured confounding is a key challenge in these studies as it may bias the causal effect estimate. To mitigate the confounding bias, we introduce a novel device, called the synthetic instrument, to leverage the information contained in multiple exposures for causal effect identification and estimation. We show that under linear structural equation models, the problem of causal effect estimation can be formulated as an \(\ell_0\)-penalization problem, and hence can be solved efficiently using off-the-shelf software. Simulations show that our approach outperforms state-of-art methods in both low-dimensional and high-dimensional settings. We further illustrate our method using a mouse obesity dataset.

 

Bio: Linbo Wang is an assistant professor in the Department of Statistical Sciences and the Department of Computer and Mathematical Sciences, University of Toronto. He is also a faculty affiliate at the Vector Institute, a CANSSI Ontario STAGE program mentor, and an Affiliate Assistant Professor in the Department of Statistics, University of Washington, and Department of Computer Science, University of Toronto. Prior to these roles, he was a postdoc at Harvard T.H. Chan School of Public Health. He obtained his Ph.D. from the University of Washington. His research interest is centered around causality and its interaction with statistics and machine learning.


DMS Statistics and Data Science Seminar
Mar 13, 2024 02:00 PM
354 Parker Hall


sathyanarayanan.jpg

Speaker: Dr. Sathyanarayanan Aakur, Assistant Professor from Auburn CSSE.

Title: Towards Multimodal Open World Event Understanding with Neuro Symbolic Reasoning.

 

Abstract: Deep learning models for multimodal understanding have taken great strides in tasks such as event recognition, segmentation, and localization. However, there appears to be an implicit closed world assumption in these approaches; i.e., they assume that all observed data is composed of a static, known set of objects (nouns), actions (verbs), and activities (noun+verb combination) that are in 1:1 correspondence with the vocabulary from the training data.  One must account for every eventuality when training these systems to ensure their performance in real-world environments. In this talk, I will present our recent efforts to build open-world understanding models that leverage the general-purpose knowledge embedded in large-scale knowledge bases for providing supervision using a neuro-symbolic framework based on Grenander’s Pattern Theory formalism. Then I will talk about how this framework can be extended to abductive reasoning for natural language inference and commonsense reasoning for visual understanding. Finally, I will briefly present some results from the bottom-up neural side of open-world event perception that helps navigate clutter and provides cues for the abductive reasoning frameworks.

 

 


More Events...