Statistics and Data Science Seminars



Upcoming Statistics and Data Science Seminars
DMS Statistics and Data Science Seminar
Oct 08, 2025 01:00 PM
358 Parker Hall


bo li

Speaker: Prof. Bo Li (Department of Statistics and Data Science, Washington University in St. Louis)

Title: Spatially Varying Changepoint Detection with Application to Mapping the Impact of the Mount Pinatubo Eruption

 

Abstract: Significant events such as volcanic eruptions can exert global and long-lasting impacts on climate. These impacts, however, are not uniform across space and time. Motivated by the need to understand how the 1991 Mt. Pinatubo eruption influenced global and regional climate, we propose a Bayesian framework to simultaneously detect and estimate spatially varying temporal changepoints. Our approach accounts for the diffusive nature of volcanic effects and leverages spatial correlation. We then extend the changepoint detection problem to large-scale spherical spatiotemporal data and develop a scalable method for global applications. The framework enables Gibbs sampling for changepoints within MCMC, offering greater computational efficiency than the Metropolis–Hastings algorithm. To address the high dimensionality of global data, we incorporate spherical harmonic transformations, which further substantially reduce computational burden while preserving accuracy. We demonstrate the effectiveness of our method using both simulated datasets and real data on stratospheric aerosol optical depth and surface temperature to detect and estimate changepoints associated with the Mt. Pinatubo eruption.

 

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm


More Events...

Past Statistics and Data Science Seminars
DMS Statistics and Data Science Seminar
Sep 24, 2025 01:00 PM
ZOOM


xiadong

Speaker: Xiaodong Li (University of California, Davis)

Title: Estimating SNR in High-Dimensional Linear Models: Robust REML and a Multivariate Method of Moments

 

Abstract: This talk presents two complementary approaches to estimating signal-to-noise ratios (and residual variances) in high-dimensional linear models, motivated by heritability analysis. First, I show that the REML estimator remains consistent and asymptotically normal under substantial model misspecification—fixed coefficients and heteroskedastic and possibly correlated errors. Second, I extend a method-of-moments framework to multivariate responses for both fixed- and random-effects models, deriving asymptotic distributions and heteroskedasticity-robust standard-error formulas. Simulations corroborate the theory and demonstrate strong finite-sample performance.

Host: Haoran Li

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm


DMS Statistics and Data Science (SDS) Seminar
Sep 17, 2025 01:00 PM
352 Parker Hall


yin tang

Speaker: Yin Tang (University of Kentucky)

 Title: Belted and Ensembled Neural Network for Linear and Nonlinear Sufficient Dimension Reduction

 

Abstract: We introduce a unified, flexible, and easy-to-implement framework of sufficient dimension reduction that can accommodate both linear and nonlinear dimension reduction, and both the conditional distribution and the conditional mean as the targets of estimation. This unified framework is achieved by a specially structured neural network -- the Belted and Ensembled Neural Network (BENN) -- that consists of a narrow latent layer, which we call the belt, and a family of transformations of the response, which we call the ensemble. By strategically placing the belt at different layers of the neural network, we can achieve linear or nonlinear sufficient dimension reduction, and by choosing the appropriate transformation families, we can achieve dimension reduction for the conditional distribution or the conditional mean. Moreover, thanks to the advantage of the neural network, the method is very fast to compute, overcoming a computation bottleneck of the traditional sufficient dimension reduction estimators, which involves the inversion of a matrix of dimension either p or n. We develop the algorithm and convergence rate of our method, compare it with existing sufficient dimension reduction methods, and apply it to two data examples.

https://arxiv.org/abs/2412.08961

Host: Haotian Xu

SDS seminar’s website: https://auburn.edu/cosam/datascienceseminar/index.htm


DMS Statistics and Data Science Seminar
Apr 23, 2025 02:00 PM
354 Parker Hall


barrientos

Speaker: Dr. Andrés Felipe Barrientos (Assistant Professor, Department of Statistics, Florida State University)

Title: Bayesian nonparametric modeling of mixed-type bounded data

 

Abstract: We propose a Bayesian nonparametric model for mixed-type bounded data, where some variables are compositional and others are interval-bounded. Compositional variables are non-negative and sum to a given constant, such as the proportion of time an individual spends on different activities during the day or the fraction of different types of nutrients in a person's diet. Interval-bounded variables, on the other hand, are real numbers constrained by both a lower and an upper bound. Our approach relies on a novel class of random multivariate Bernstein polynomials, which induce a Dirichlet process mixture model of products of Dirichlet and beta densities. We study the theoretical properties of the model, including its topological support and posterior consistency. The model can be used for density and conditional density estimation, where both the response and predictors take values in the simplex space and/or hypercube. We illustrate the model's behavior through the analysis of simulated data and data from the 2005-2006 cycle of the U.S. National Health and Nutrition Examination Survey.

Joint work with Rufeng Liu, Claudia Wehrhahn, and Alejandro Jara.


DMS Statistics and Data Science Seminar
Apr 16, 2025 02:00 PM
354 Parker Hall


jiwonpark

Speaker: Dr. Jiwon Park (postdoctoral researcher, Department of Epidemiology, Johns Hopkins University)

 

Title: A Robust Pleiotropy Testing Method with Applications to Inflammatory Bowel Disease Subtypes with Sample Overlap

 

Abstract: Pleiotropy, where a genetic region influences multiple traits, is common in complex diseases and provides insight into shared biological mechanisms. However, identifying pleiotropic loci remains challenging, especially for correlated traits or case-control studies with overlapping samples. We present PLACO+, a statistical method for detecting pleiotropic associations using GWAS summary statistics from two traits. PLACO+ models a composite null hypothesis with an inflated variance structure, allowing for partial associations, and computes analytical p-values based on the distribution of the product of correlated Z-scores. Applied to genome-wide studies of inflammatory bowel disease (IBD) subtypes—Crohn’s disease and ulcerative colitis—PLACO+ identifies shared genetic loci missed by conventional approaches, particularly when effects are in opposite directions. These results demonstrate the utility of PLACO+ in uncovering novel pleiotropic signals in complex trait genetics.


DMS Statistics and Data Science Seminar
Apr 09, 2025 02:00 PM
250 Parker Hall


marzia

Speaker: Marzia A. Cremona (Dept. Operations and Decision Systems; Université Laval; Québec, Canada)

Title: Local clustering and motif discovery of functional data

 

Abstract: Recent evolution in data acquisition technologies enabled the generation of high-dimensional, complex data in several research areas – in the sciences and engineering, among other disciplines. Increasingly sophisticated statistical and computational methods are needed in order to analyze these data. Functional data analysis (FDA) can be broadly employed to analyze functional data, i.e., data that vary over a continuum and can be naturally viewed as smooth curves or surfaces, exploiting information in their shapes.

In this talk, I will present probabilistic 𝐾-mean with local alignment (probKMA, [1]), an unsupervised learning method to locally cluster a set of misaligned curves and to address the problem of discovering functional motifs, i.e. typical “shapes” or “patterns” that may recur several times along and across a set of curves, capturing important local characteristics of these curves. After demonstrating the performance of the method on simulated data and showing how it generalizes other clustering methods for functional data, I will present three applications to the analysis of functional data from different fields. First, I will apply probKMA to discover functional motifs in “Omics” signals related to mutagenesis and genome dynamics. Second, I will employ probKMA as a probabilistic clustering method to group COVID-19 death curves of the different Italian regions during the first wave of the pandemic. Finally, I will present a generalization of probKMA and its application to the discovery and characterization of functional motifs in stock market prices [2].

 

[1] Cremona, Chiaromonte (2023) Probabilistic K-means with local alignment for clustering and motif discovery in functional dataJournal of Computational and Graphical Statistics 32(3): 1119-1130.

[2] Cremona, Doroshenko, Severino (2023) Functional motif discovery in stock market pricesSSRN 4642040.


DMS Statistics and Data Science Seminar
Apr 02, 2025 02:00 PM
ZOOM


 jessietong

Jessie Tong (Assistant Professor, Department of Biostatistics, Johns Hopkins University)

Title: Using Electronic Health Records Data for Clinical Evidence Generation

 

Abstract: In the era of expanding real-world data (RWD) availability from distributed research networks (DRNs), it becomes essential to leverage the data to generate evidence for clinical inquiries relevant to stakeholders in the healthcare system. To provide the answer to inquiries regarding hospital and treatment options, medication queries, and others, we still face practical challenges in analyzing the RWD, such as reporting bias, confounding factors, and rare events. It is particularly challenging when integrating data from multiple clinical sites within the DRNs given the data privacy, patient heterogeneity, also known as case-mix situation, and communication cost. In my presentation, centered on the theme of real-world evidence-based health system performance assessment, I will introduce our distributed learning frameworks designed to produce actionable analytical insights for hospital profiling, with the ultimate goal of enhancing hospitals’ quality of care and patients’ clinical outcomes. The effectiveness and reliability of our proposed framework have been validated through real-world application in collaboration with 12 clinical sites across three countries within the Observational Health Data Sciences and Informatics (OHDSI) network.


DMS Statistics and Data Science Seminar
Mar 26, 2025 02:00 PM
354 Parker Hall


di ioria

Speaker: Dr. Jacopo Di Iorio (Emory University) 

Title: Identifying functional motifs with funBIalign

 

Abstract: Functional data analysis is dealing with a novel challenge: the identification of functional motifs, or “shapes” that may be repeated multiple times within each functional observation or across multiple curves belonging to the same set. To address this issue, we introduce the funBIalign algorithm, a multi-step approach employing agglomerative hierarchical clustering with complete linkage and functional distances based on mean squared residue scores and virtual error, two widely used validation measures in the biclustering literature. These distances enable funBIalign to detect functional motifs that are shifted or scaled along the y-axis. To validate the effectiveness of our methodology, we present simulations and case studies that demonstrate its ability to identify functional motifs.

 


DMS Statistics and Data Science Seminar
Mar 19, 2025 02:00 PM
352 Parker Hall


 li

Dr. Haoran Li (Auburn University)

Title: Tracy-Widom Law of High Dimensional Ridge-Regularized F-matrix. 

 

Abstract: In multivariate analysis, many core problems involve the eigen-analysis of an F matrix,  constructed from two Wishart matrices.  These so-called Double Wishart problems arise in contexts such as MANOVA, covariance matrix equality testing, and hypothesis testing in multivariate linear regression. A prominent classical approach, Roy's largest root test, relies on the largest eigenvalue of the F matrix for inference. However, in high-dimensional settings, this test becomes impractical due to the singularity or near-singularity of the Wishart matrix. To address this challenge, we propose a ridge-regularization framework by introducing a ridge term. Specifically, we develop a family of ridge-regularized largest root tests, leveraging the largest eigenvalue of the ridge-regularized F matrix. Under mild assumptions, we establish the asymptotic Tracy-Widom distribution of the regularized largest root after appropriate scaling. An efficient method for estimating the scaling parameters is proposed using the Marčenko-Pastur equation.  


DMS Statistics and Data Science Seminar
Feb 12, 2025 02:00 PM
250 Parker Hall


bakalli

Speaker: Dr. Gaetan Bakalli (Assistant Professor of Econometrics and Data Science, Emlyon Business School, Lyon)

Title: Nonstandard Errors

 

Abstract: In statistics, samples are drawn from a population in a data-generating process (DGP). Standard errors measure the uncertainty in estimates of population parameters. In science, evidence is generated to test hypotheses in an evidence-generating process (EGP). We claim that EGP variation across researchers adds uncertainty—nonstandard errors (NSEs). We study NSEs by letting 164 teams test the same hypotheses on the same data. NSEs turn out to be sizable, but smaller for more reproducible or higher rated research. Adding peer-review stages reduces NSEs. We further find that this type of uncertainty is underestimated by participants.


DMS Statistics and Data Science Seminar
Dec 04, 2024 02:00 PM
ZOOM


Speaker: Russell J. Bowater (Independent Statistical Consultant, Oaxaca City, Mexico)

Title: The 7 hardest lessons to learn in statistics 

 

Abstract:  What is the current state of the theory of statistical inference? Is it essentially in a good state except for a relatively small number of issues that need to be tidied up? Or is what is usually presented as being the standard and accepted theory of statistical inference so full of conceptual holes that it is nothing short of an embarrassment for anyone who wishes to describe themselves as a statistician? This talk explores these questions by presenting lessons that arguably need to be learnt but have proved difficult to learn for reasons that to a great extent are not related to doing good independent and impartial science. By exposing ourselves to such an uncomfortable level of introspection, a greater understanding can be gained about what we have done, where we are at and where we should be going.

 


More Events...