Sparse regression for related problems

Estimates sparse regression models (i.e., performing feature selection) in multi-task learning or transfer learning. Multi-task learning involves multiple targets, and transfer learning involves multiple datasets.

Usage

sparselink(
  x,
  y,
  family,
  alpha.init = 0.95,
  alpha = 1,
  type = "exp",
  nfolds = 10,
  cands = NULL
)

Arguments

x: \(n \times p\) matrix (multi-task learning) or list of \(n_k \times p\) matrices (transfer learning)
y: \(n \times q\) matrix (multi-task learning) or list of \(n_k\)-dimensional vectors (transfer learning)
family: character "gaussian" or "binomial"
alpha.init: elastic net mixing parameter for initial regressions, default: 0.95 (lasso-like elastic net)
alpha: elastic net mixing parameter of final regressions, default: 1 (lasso)
type: default "exp" scales weights with \(w_{ext}^{v_{ext}}+w_{int}^{v_{int}}\) (see internal function construct_penfacs for details)
nfolds: number of internal cross-validation folds, default: 10 (10-fold cross-validation)
cands: candidate values for both scaling parameters, default: NULL ({0, 0.2, 0.4, 0.6, 0.8, 1})

Value

Returns an object of class sparselink, a list with multiple slots:

Stage 1 regressions (before sharing information): Slot glm.one contains \(q\) objects of type cv.glmnet (one for each problem).
Candidate scaling parameters (exponents): Slot weight contains a data frame with \(n\) combinations of exponents for the external (source) and internal (target) weights
Stage 2 regressions (after sharing information): Slot glm.two contains \(q\) lists (one for each problem) of \(n\) objects of type cv.glmnet (one for each combination of exponents).
Optimal regularisation parameters: Slot lambda.min contains the cross-validated regularisation parameters for the stage 2 regressions.
Optimal scaling parameters: Slots weight.ind and weight.min indicate or contain the cross-validated scaling parameters.

References

Armin Rauschenberger, Petr N. Nazarov, and Enrico Glaab (2025). "Estimating sparse regression models in multi-task learning and transfer learning through adaptive penalisation". Under revision. https://hdl.handle.net/10993/63425

Examples

#--- multi-task learning ---
n <- 100
p <- 200
q <- 3
family <- "gaussian"
x <- matrix(data=rnorm(n=n*p),nrow=n,ncol=p)
y <- matrix(data=rnorm(n*q),nrow=n,ncol=q)
object <- sparselink(x=x,y=y,family=family)
#> mode: multi-target learning, alpha.init=0.95 (elastic net), alpha=1 (lasso)
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold

#--- transfer learning ---
n <- c(100,50)
p <- 200
x <- lapply(X=n,function(x) matrix(data=stats::rnorm(n*p),nrow=x,ncol=p))
y <- lapply(X=n,function(x) stats::rnorm(x))
family <- "gaussian"
object <- sparselink(x=x,y=y,family=family)
#> mode: transfer learning, alpha.init=0.95 (elastic net), alpha=1 (lasso)
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold
#> Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per fold

Usage

Arguments

Value

References

See also

Examples