| Wednesday, February 13 at 12 pm |
Lu Tian, Assistant Professor Department of Preventive Medicine, Northwestern University
Title: Lasso Regularization for the Accelerated Failure Time Model
Abstract: It is challenging to develop a stable regression model for predicting failure time outcomes when the dimension of the covariates is big relative to the sample size. Further complication arises due to the fact that failure time responses are often not completely observed because of right censoring. In this paper, we proposed to couple the LASSO type regularization methods with the Gehan's rank based estimator in the setting of accelerated failure time model to construct a stable and parsimonious prediction model. Unlike the inverse probability weighting approach, the proposed estimators are valid under the general noninformative censoring assumption. We also propose an efficient numerical algorithm for obtaining the entire regularization path to facilitate the adaptive selection of the tuning parameter. We illustrate the proposed methods with an application to predict the survival time of breast cancer patients based on a set of clinical prognostic factors and collected gene signatures and evaluate their finite sample performance through a simulation study.
|
| Wednesday, February 27 at 12 pm |
Peter McCullagh, John D. MacArthur Distinguished Service Professor Department of Statistics, University of Chicago
Title: Sampling bias and logistic models
Abstract: In a regression model, the joint distribution for each finite sample of units is determined by a function px(y) depending only on the list of covariate values x = (x(u1), . . . , x(un)) on the sampled units. No random sampling of units is involved. In biological work, random sampling is frequently unavoidable, in which case the joint distribution p(y, x) depends on the sampling scheme. Regression models can be used for the study of dependence provided that the conditional distribution p(y | x) for random samples agrees with px(y) as determined by the regression model for a fixed sample having a non-random configuration x. This paper develops a model that avoids the concept of a fixed population of units, thereby forcing the sampling plan to be incorporated into the sampling distribution. For a quota sample having a predetermined covariate configuration x, the sampling distribution agrees with the standard logistic regression model with correlated components. For most natural sampling plans such as sequential or simple random sampling, the conditional distribution p(y | x) is not the same as the regression distribution unless px(y) has independent components. In this sense, most natural sampling schemes involving binary random-effects models are biased. The implications of this formulation for subject-specific and population-averaged procedures are explored. Paper
|
| Wednesday, March 5 at 12 pm |
Sandy L. Zabell, Professor Department of Statistics and Department of Mathematics, Northwestern University
Title: On Student’s 1908 paper “The probable error of a mean”
Abstract: This month marks the one-hundredth anniversary of the appearance of William Sealey Gosset’s celebrated paper “The probable error of a mean”. Gosset’s elegant contributionrepresented the first in a series of exact, “small-sample” results that were developed by Gosset, Fisher, and others to form a central component of the modern theory of statistical inference. This talk celebrates the centenary of Gosset’s paper by discussing both its background and impact on modern statistical theory and practice.
|
| Wednesday, March 12 at 12 pm |
Rong Chen, Professor Department of Statistics, Rutgers University
Title: Constrained Sequential Monte Carlo (CSMC)
Abstract: The sequential Monte Calo (SMC) methodologies have been shown to have great promises in solving very high dimensional and complex problems often encountered in applications such as communication, bioinformatics and financial data analysis. The key to a successful SMC implementation is efficiency, not only in terms of statistical inference accuracy, but also on the computational complexity. Efficiency is directly related to the design of the key components of SMC, including the intermediate distributions, the trial 'growth' distribution, and the resampling method. Many problems in application share a common feature - the target distribution is highly constrained. That is, the target distribution is a truncated distribution on an ill-shaped subspace of a high dimensional space. The constraints, without careful treatments, are a main source of obstacles in successful implementations of SMC. In this talk, we develop a set of algorithms categorized as Constrained Sequential Monte Carlo (CSMC) for solving such problems, including strategies in designing the intermediate distributions, the trial distributions, the resampling steps and Markov moves with CSMC.
|
Created by Noelle I. Samia
Last Updated 03/03/2008