College of Liberal Arts & Sciences
Patrick Breheny- Colloquium Speaker
Abstract: Penalized regression methods are an attractive tool for feature selection with many appealing properties, although their widespread adoption has been hampered by the difficulty of applying inferential tools. In particular, the question "How reliable is the selection of those features?" has proved difficult to address, partially due to the complexity of defining a false discovery in the penalized regression setting. In this talk, I will define a false inclusion as a variable that is independent of the outcome regardless of whether other variables are conditioned on. I show that this definition permits straightforward estimation of the number of false inclusions in near-independence conditions. I also discuss a permutation-based approach and show that it yields more accurate estimation of the false inclusion rate in highly correlated settings.