中国科学院数学与系统科学研究院期刊网

15 February 2025, Volume 41 Issue 2
    

  • Select all
    |
    Articles
  • Zhiming Ma, Fuzhou Gong, Liuquan Sun
    Acta Mathematica Sinica. 2025, 41(2): 497-497. https://doi.org/10.1007/s10114-025-4551-1
    Abstract ( ) Download PDF ( )   Knowledge map   Save
  • Xiangyu Zheng, Songxi Chen
    Acta Mathematica Sinica. 2025, 41(2): 498-521. https://doi.org/10.1007/s10114-025-3349-5
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Tree-based models have been widely applied in both academic and industrial settings due to the natural interpretability, good predictive accuracy, and high scalability. In this paper, we focus on improving the single-tree method and propose the segmented linear regression trees (SLRT) model that replaces the traditional constant leaf model with linear ones. From the parametric view, SLRT can be employed as a recursive change point detect procedure for segmented linear regression (SLR) models, which is much more efficient and flexible than the traditional grid search method. Along this way, we propose to use the conditional Kendall's $\tau$ correlation coefficient to select the underlying change points. From the non-parametric view, we propose an efficient greedy splitting method that selects the splits by analyzing the association between residuals and each candidate split variable. Further, with the SLRT as a single-tree predictor, we propose a linear random forest approach that aggregates the SLRTs by a weighted average. Both simulation and empirical studies showed significant improvements than the CART trees and even the random forest.
  • Lingyue Zhang, Dawei Lu, Hengjian Cui
    Acta Mathematica Sinica. 2025, 41(2): 522-546. https://doi.org/10.1007/s10114-025-3225-3
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Measuring and testing tail dependence is important in finance, insurance, and risk management. This paper proposes two tail dependence matrices based on classic rank correlation coefficients, which possess the desired population properties and interpretability. Their nonparametric estimators with strong consistency and asymptotic distributions are derived using the limit theory of $U$-processes. The simulation and application studies show that, compared to the tail dependence matrix based on Spearman's $\rho$ with large deviation, the Kendall-based tail dependence measure has stable variances under different tail conditions; thus, it is an effective approach to testing and quantifying tail dependence between random variables.
  • Changhu Wang, Jianhua Guo, Yanyuan Ma, Shurong Zheng
    Acta Mathematica Sinica. 2025, 41(2): 547-552. https://doi.org/10.1007/s10114-025-3383-3
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Despite of the wide use of the factor models, the issue of determining the number of factors has not been resolved in the statistics literature. An ad hoc approach is to set the number of factors to be the number of eigenvalues of the data correlation matrix that are larger than one, and subsequent statistical analysis proceeds assuming the resulting factor number is correct. In this work, we study the relation between the number of such eigenvalues and the number of factors, and provide the if and only if conditions under which the two numbers are equal. We show that the equality only relies on the properties of the loading matrix of the factor model. Guided by the newly discovered condition, we further reveal how the model error affects the estimation of the number of factors.
  • Kang Hu, Danning Li, Binghui Liu
    Acta Mathematica Sinica. 2025, 41(2): 553-568. https://doi.org/10.1007/s10114-025-3324-1
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Gaussian graphical models (GGMs) are widely used as intuitive and efficient tools for data analysis in several application domains. To address the reproducibility issue of structure learning of a GGM, it is essential to control the false discovery rate (FDR) of the estimated edge set of the graph in terms of the graphical model. Hence, in recent years, the problem of GGM estimation with FDR control is receiving more and more attention. In this paper, we propose a new GGM estimation method by implementing multiple data splitting. Instead of using the node-by-node regressions to estimate each row of the precision matrix, we suggest directly estimating the entire precision matrix using the graphical Lasso in the multiple data splitting, and our calculation speed is $p$ times faster than the previous. We show that the proposed method can asymptotically control FDR, and the proposed method has significant advantages in computational efficiency. Finally, we demonstrate the usefulness of the proposed method through a real data analysis.
  • Zhen Meng, Yuke Shi, Jinyi Lin, Qizhai Li
    Acta Mathematica Sinica. 2025, 41(2): 569-587. https://doi.org/10.1007/s10114-024-3328-2
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Combining p-values is a well-known issue in statistical inference. When faced with a study involving $m$ p-values, determining how to effectively combine them to arrive at a comprehensive and reliable conclusion becomes a significant concern in various fields, including genetics, genomics, and economics, among others. The literature offers a range of combination strategies tailored to different research objectives and data characteristics. In this work, we aim to provide users with a systematic exploration of the p-value combination problem. We present theoretical results for combining p-values using a logarithmic transformation, which highlights the benefits of this approach. Additionally, we propose a combination strategy together with its statistical properties utilizing the gold section method, showcasing its performance through extensive computer simulations. To further illustrate its effectiveness, we apply this approach to a real-world scenario.
  • Xun Zhao, Ling Zhou, Weijia Zhang, Huazhen Lin
    Acta Mathematica Sinica. 2025, 41(2): 588-618. https://doi.org/10.1007/s10114-024-3310-z
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    To learn the subgroup structure generated by multidimensional interaction, we propose a novel multiview subgroup integration technique based on tensor decomposition. Compared to the traditional subgroup analysis that can only handle single-view heterogeneity, our proposed method achieves a greater level of homogeneity within the subgroups, leading to enhanced interpretability and predictive power. For computational readiness of the proposed method, we build an algorithm that incorporates pairwise shrinkage-encouraging penalties and ADMM techniques. Theoretically, we establish the asymptotic consistency and normality of the proposed estimators. Extensive simulation studies and real data analysis demonstrate that our proposal outperforms other methods in terms of prediction accuracy and grouping consistency. In addition, the analysis based on the proposed method indicates that intergenerational care significantly increases the risk of chronic diseases associated with diet and fatigue in all provinces while only reducing the risk of emotion-related chronic diseases in the eastern coastal and central regions of China.
  • Yijin Zhang, Liuquan Sun
    Acta Mathematica Sinica. 2025, 41(2): 619-639. https://doi.org/10.1007/s10114-024-3381-x
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Doubly truncated data arise when the survival times of interest are observed only if they fall within certain random intervals. In this paper, we consider a semiparametric additive hazards model with doubly truncated data, and propose a weighted estimating equation approach to estimate the regression coefficients, where the weights are estimated both parametrically and nonparametrically. The asymptotic properties of the resulting estimators are established. Simulation studies demonstrate that the proposed estimators perform well in a finite sample. An application to Parkinson's disease data is provided.
  • Zhihuang Yang, Siming Zheng, Niansheng Tang
    Acta Mathematica Sinica. 2025, 41(2): 640-676. https://doi.org/10.1007/s10114-025-3335-y
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Single-index model offers the greater flexibility of modelling than generalized linear models and also retains the interpretability of the model to some extent. Although many standard approaches such as kernels or penalized/smooothing splines were proposed to estimate smooth link function, they cannot approximate complicated unknown link functions together with the corresponding derivatives effectively due to their poor approximation ability for a finite sample size. To alleviate this problem, this paper proposes a semiparametric least squares estimation approach for a single-index model using the rectifier quadratic unit (ReQU) activated deep neural networks, called deep semiparametric least squares (DSLS) estimation method. Under some regularity conditions, we show non-asymptotic properties of the proposed DSLS estimator, and evidence that the index coefficient estimator can achieve the semiparametric efficiency. In particular, we obtain the consistency and the convergence rate of the proposed DSLS estimator when response variable is conditionally sub-exponential. This is an attempt to incorporate deep learning technique into semiparametrically efficient estimation in a single index model. Several simulation studies and a real example data analysis are conducted to illustrate the proposed DSLS estimator.
  • Junfeng Cui, Guanghui Wang, Fengyi Song, Xiaoyan Ma, Changliang Zou
    Acta Mathematica Sinica. 2025, 41(2): 677-702. https://doi.org/10.1007/s10114-025-3362-8
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    We consider the problem of multi-task regression with time-varying low-rank patterns, where the collected data may be contaminated by heavy-tailed distributions and/or outliers. Our approach is based on a piecewise robust multi-task learning formulation, in which a robust loss function—not necessarily to be convex, but with a bounded derivative—is used, and each piecewise low-rank pattern is induced by a nuclear norm regularization term. We propose using the composite gradient descent algorithm to obtain stationary points within a data segment and employing the dynamic programming algorithm to determine the optimal segmentation. The theoretical properties of the detected number and time points of pattern shifts are studied under mild conditions. Numerical results confirm the effectiveness of our method.
  • Yu Zheng, Jin Zhu, Junxian Zhu, Xueqin Wang
    Acta Mathematica Sinica. 2025, 41(2): 703-732. https://doi.org/10.1007/s10114-025-3329-9
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Finding a highly interpretable nonlinear model has been an important yet challenging problem, and related research is relatively scarce in the current literature. To tackle this issue, we propose a new algorithm called Feat-ABESS based on a framework that utilizes feature transformation and selection for re-interpreting many machine learning algorithms. The core idea behind Feat-ABESS is to parameterize interpretable feature transformation within this framework and construct an objective function based on these parameters. This approach enables us to identify a proper interpretable feature transformation from the optimization perspective. By leveraging a recently advanced optimization technique, Feat-ABESS can obtain a concise and interpretable model. Moreover, Feat-ABESS can perform nonlinear variable selection. Our extensive experiments on 205 benchmark datasets and case studies on two datasets have demonstrated that Feat-ABESS can achieve powerful prediction accuracy while maintaining a high level of interpretability. The comparison with existing nonlinear variable selection methods exhibits Feat-ABESS has a higher true positive rate and a lower false discovery rate.
  • Yunzhi Jin, Yanqing Zhang
    Acta Mathematica Sinica. 2025, 41(2): 733-756. https://doi.org/10.1007/s10114-025-3390-4
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Quantile regression is widely used in variable relationship research for statistical learning. Traditional quantile regression model is based on vector-valued covariates and can be efficiently estimated via traditional estimation methods. However, many modern applications involve tensor data with the intrinsic tensor structure. Traditional quantile regression can not deal with tensor regression issues well. To this end, we consider a tensor quantile regression with tensor-valued covariates and develop a novel variational Bayesian estimation approach to make estimation and prediction based on the asymmetric Laplace model and the CANDECOMP/PARAFAC decomposition of tensor coefficients. To incorporate the sparsity of tensor coefficients, we consider the multiway shrinkage priors for marginal factor vectors of tensor coefficients. The key idea of the proposed method is to efficiently combine the prior structural information of tensor and utilize the matricization of tensor decomposition to simplify the complexity of tensor coefficient estimation. The coordinate ascent algorithm is employed to optimize variational lower bound. Simulation studies and a real example show the numerical performances of the proposed method.
  • Senyuan Zheng, Ling Zhou
    Acta Mathematica Sinica. 2025, 41(2): 757-779. https://doi.org/10.1007/s10114-025-3305-4
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    With the advent of modern devices, such as smartphones and wearable devices, high-dimensional data are collected on many participants for a period of time or even in perpetuity. For this type of data, dependencies between and within data batches exist because data are collected from the same individual over time. Under the framework of streamed data, individual historical data are not available due to the storage and computation burden. It is urgent to develop computationally efficient methods with statistical guarantees to analyze high-dimensional streamed data and make reliable inferences in practice. In addition, the homogeneity assumption on the model parameters may not be valid in practice over time. To address the above issues, in this paper, we develop a new renewable debiased-lasso inference method for high-dimensional streamed data allowing dependences between and within data batches to exist and model parameters to gradually change. We establish the large sample properties of the proposed estimators, including consistency and asymptotic normality. The numerical results, including simulations and real data analysis, show the superior performance of the proposed method.
  • Haili Zhang, Alan T. K. Wan, Kang You, Guohua Zou
    Acta Mathematica Sinica. 2025, 41(2): 780-826. https://doi.org/10.1007/s10114-025-3409-x
    Abstract ( ) Download PDF ( )   Knowledge map   Save
    Ridge regression is an effective tool to handle multicollinearity in regressions. It is also an essential type of shrinkage and regularization methods and is widely used in big data and distributed data applications. The divide and conquer trick, which combines the estimator in each subset with equal weight, is commonly applied in distributed data. To overcome multicollinearity and improve estimation accuracy in the presence of distributed data, we propose a Mallows-type model averaging method for ridge regressions, which combines estimators from all subsets. Our method is proved to be asymptotically optimal allowing the number of subsets and the dimension of variables to be divergent. The consistency of the resultant weight estimators tending to the theoretically optimal weights is also derived. Furthermore, the asymptotic normality of the model averaging estimator is demonstrated. Our simulation study and real data analysis show that the proposed model averaging method often performs better than commonly used model selection and model averaging methods in distributed data cases.