Nonparametric Feature Screening for Interaction Effects in Ultrahigh-dimensional Survival Data

Jing ZHANG, Yan Yan LIU

Acta Mathematica Sinica, Chinese Series ›› 2024, Vol. 67 ›› Issue (3) : 582-598.

PDF(812 KB)
PDF(812 KB)
Acta Mathematica Sinica, Chinese Series ›› 2024, Vol. 67 ›› Issue (3) : 582-598. DOI: 10.12386/A20220179

Nonparametric Feature Screening for Interaction Effects in Ultrahigh-dimensional Survival Data

  • Jing ZHANG1, Yan Yan LIU2
Author information +
History +

Abstract

Linear regression models are often used to study the relationship between variables in various fields of scientific research, such as medicine, genetics, economics. However, main effects may not be sufficient to characterize the relationship between the response and predictors in complex situations, the interaction effects between variables will also have an important influence on the response variable in many practical problems. Interaction model that considers both the main effect and the interaction effect can describe the relationship between variables more comprehensively. For high-dimensional data, the number of variables p is relatively large, and the number of second-order interaction terms p(p+1)2 is much larger, the statistical analysis of the interaction model faces many difficulties and challenges. How to select the important interaction effects that have a significant impact on the event of interest from huge number of interaction effects is a very important problem. The existing research on this problem mainly focuses on the complete data under the framework of the linear model. In this paper, we will consider this problem for ultrahigh-dimensional right-censored survival data. Based on distance correlation and the two-step analysis method, we propose a model-free screening method for interaction effects which does not depend on any model assumptions. This method can select the important main effects and important interaction effects at the same time, and can handle ultrahigh-dimensional data with large p. Extensive simulation studies are carried out to evaluate the finite sample performance of the proposed procedure, and the results show that this method can effectively select the important interaction effects for ultrahigh-dimensional right-censored survival data. As an illustration, we apply the proposed method to analyze the diffuse large-B-cell lymphoma (DLBCL) data.

Key words

interaction effect / ultrahigh-dimensional survival data / distance correlation / two-stage method / feature screening

Cite this article

Download Citations
Jing ZHANG, Yan Yan LIU. Nonparametric Feature Screening for Interaction Effects in Ultrahigh-dimensional Survival Data. Acta Mathematica Sinica, Chinese Series, 2024, 67(3): 582-598 https://doi.org/10.12386/A20220179

References

[1] Bien J., Taylor J., Tibshirani R., A lasso for hierarchical interactions, Ann. Stat., 2013, 41(3): 1111-1141.
[2] Choi N. H., Li W., Zhu J., Variable selection with the strong heredity constraint and its oracle property, J. Am. Stat. Assoc., 2010, 105(489): 354-364.
[3] Fan J., Li R., Variable selection via nonconcave penalized likelihood and its oracle properties, J. Am. Stat. Assoc., 2001, 96(456): 1348-1360.
[4] Fan J., Lv J., Sure independence screening for ultrahigh dimensional feature space, J. R. Stat. Soc. B, 2008, 70(5): 849-911.
[5] Fan Y., Kong Y., Li D., et al., Interaction pursuit with feature screening and selection, 2016, arXiv:1605. 08933.
[6] Fan Y., Kong Y., Li D., et al., Innovated interaction screening for high-dimensional nonlinear classification, Ann. Stat., 2015, 43(3): 1243-1272.
[7] Gorst-Rasmussen A., Scheike T., Independent screening for single-index hazard rate models with ultrahigh dimensional features, J. R. Stat. Soc. B, 2013, 75(2): 217-245.
[8] Hao N., Feng Y., Zhang H. H., Model selection for high-dimensional quadratic regression via regularization, J. Am. Stat. Assoc., 2018, 113(522): 615-625.
[9] Hao N., Zhang H. H., Interaction screening for ultrahigh dimensional data, J. Am. Stat. Assoc., 2014, 109(507): 1285-1301.
[10] He Y., Chen Z., The EBIC and a sequential procedure for feature selection in interactive linear models with high-dimensional data, Ann. Inst. Stat. Math., 2016, 68(1): 155-180.
[11] Li J., Zhong W., Li R., et al., A fast algorithm for detecting gene-gene interactions in genome-wide association studies, Ann. Appl. Stat., 2014, 8(4): 2292-2318.
[12] McCullagh P., Nelder J., Generalized Linear Models, Monographs on Statistics and Applied Probability, Boca Raton, FL: Chapman and Hall, 1989.
[13] Nelder J., A reformulation of linear models, J. R. Stat. Soc. A Stat., 1977, 140(1): 48-77.
[14] Niu Y. S., Hao N., Zhang H. H., Interaction screening by partial correlation, Stat. Interface, 2018, 11(2): 317-325.
[15] Peixoto J. L., Hierarchical variable selection in polynomial regression models, Am. Stat., 1987, 41(4): 311- 313.
[16] Radchenko P., James G. M., Variable selection using adaptive nonlinear interaction structures in high dimensions, J. Am. Stat. Assoc., 2010, 105(492): 1541-1553.
[17] Rosenwald A., Wright G., Chan W. C., et al., The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma, New Engl. J. Med., 2003, 346(25): 1937-1947.
[18] Song R., Lu W., Ma S., et al., Censored rank independence screening for high-dimensional survival data, Biometrika, 2014, 101(4): 799-814.
[19] Székely G. J., Rizzo M. L., Bakirov N. K., Measuring and testing dependence by correlation of distances, Ann. Stat., 2007, 35(6): 2769-2794.
[20] Tibshirani R., Regression shrinkage and selection via the lasso, J. R. Stat. Soc. B, 1996, 73(1): 273-282.
[21] Uno H., Cai T., Pencina M. J., et al., On the C-statistics for evaluating overall adequacy of risk prediction procedures with censored survival data, Stat. Med., 2011, 30: 1105-1117.
[22] Wang J. H., Chen Y. H., Interaction screening by kendall’s partial correlation for ultrahigh-dimensional data with survival trait, Bioinformatics, 2020, 36(9): 2763-2769.
[23] Wu J., Devlin B., Ringquist S., et al., Screen and clean: A tool for identifying interactions in genome-wide association studies, Genet. Epidemiol., 2010, 34(3): 275-285.
[24] Yuan M., Joseph V. R., Zou H., Structured variable selection and estimation, Ann. Appl. Stat., 2009, 3(4): 1738-1757.
[25] Zhao P., Rocha G., Yu B., The composite absolute penalties family for grouped and hierarchical variable selection, Ann. Stat., 2009, 37(6A): 3468-3497.
[26] Zhao S. D., Li Y., Principled sure independence screening for Cox models with ultra-high-dimensional covariates, J. Multivar. Anal., 2012, 105(1): 397-411.
[27] Zhang C. H., Nearly unbiased variable selection under minimax concave penalty, Ann. Stat., 2010, 38(2): 894-942.
[28] Zhang J., Liu Y., Cui H., Model-free feature screening via distance correlation for ultrahigh dimensional survival data, Stat. Pap., 2021, 62: 2711-2738.
[29] Zhang J., Yin G., Liu Y., et al., Censored cumulative residual independent screening for ultrahighdimensional survival data, Lifetime Data Anal., 2018, 24(2): 273-292.
PDF(812 KB)

371

Accesses

0

Citation

Detail

Sections
Recommended

/