Sparse regression for related problems • sparselink

Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation

Armin Rauschenberger\(~^{1,2,*}\) , Petr V. Nazarov\(~^{1,\dagger}\) , and Enrico Glaab\(~^{2,\dagger}\)

\(^1\)Bioinformatics and Artificial Intelligence, Department of Medical Informatics, Luxembourg Institute of Health (LIH), Strassen, Luxembourg.

\(^2\)Biomedical Data Science, Luxembourg Centre for Systems Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette, Luxembourg.

\(^{*}\)To whom correspondence should be addressed.

\(^{\dagger}\)Petr V. Nazarov and Enrico Glaab share senior authorship.

Abstract

Here we propose a simple two-stage procedure for sharing information between related high-dimensional prediction or classification problems. In both stages, we perform sparse regression separately for each problem. While this is done without prior information in the first stage, we use the coefficients from the first stage as prior information for the second stage. Specifically, we designed feature-specific and sign-specific adaptive weights to share information on feature selection, effect directions and effect sizes between different problems. The proposed approach is applicable to multi-task learning as well as transfer learning. It provides sparse models (i.e., with few non-zero coefficients for each problem) that are easy to interpret. We show by simulation and application that it tends to select fewer features while achieving a similar predictive performance as compared to available methods. An implementation is available in the R package ‘sparselink’ (https://github.com/rauschenberger/sparselink).

Full text

Rauschenberger et al. (2025). “Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation”. Under revision. (Available on ORBilu.)