select_coglasso()
selects the best combination of hyperparameters given to
coglasso()
according to the selected model selection method. The three
availble options that can be set for the argument method
are "xstars",
"xestars" and "ebic".
Usage
select_coglasso(
coglasso_obj,
method = "xestars",
stars_thresh = 0.1,
stars_subsample_ratio = NULL,
rep_num = 20,
max_iter = 10,
old_sampling = FALSE,
light = TRUE,
ebic_gamma = 0.5,
verbose = TRUE
)
Arguments
- coglasso_obj
The object of
S3
classcoglasso
returned bycoglasso()
.- method
The model selection method to select the best combination of hyperparameters. The available options are "xstars", "xestars" and "eBIC". Defaults to "xestars".
- stars_thresh
The threshold set for variability of the explored networks at each iteration of the algorithm. The \(\lambda_w\) or the \(\lambda_b\) associated to the most stable network before the threshold is overcome is selected.
- stars_subsample_ratio
The proportion of samples in the multi-omics data set to be randomly subsampled to estimate the variability of the network under the given hyperparameters setting. Defaults to 80% when the number of samples is smaller than 144, otherwise it defaults to \(\frac{10}{n}\sqrt{n}\).
- rep_num
The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20.
- max_iter
The greatest number of times the algorithm is allowed to choose a new best \(\lambda_w\). Defaults to 10.
- old_sampling
Perform the same subsampling
xstars()
would if set to TRUE. Makes a difference with bigger data sets, where computing a correlation matrix could take significantly longer. Defaults to FALSE.- light
Do not store the "merged" matrixes recording average variability of each edge, making the algorithm more memory efficient, if set to TRUE. Defaults to TRUE.
- ebic_gamma
The \(\gamma\) tuning parameter for eBIC selection, to set between 0 and 1. When set to 0 one has the standard BIC. Defaults to 0.5.
- verbose
Print information regarding the progress of the selection procedure on the console.
Value
select_coglasso()
returns an object of S3
class select_coglasso
containing the results of the
selection procedure, built upon an object of S3
class coglasso
. Some
output elements depend on the chosen model selection method.
These elements are returned by all methods:
... are the same elements returned by
coglasso()
.sel_index_c
,sel_index_lw
andsel_index_lb
are the indexes of the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.sel_c
,sel_lambda_w
andsel_lambda_b
are the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.sel_adj
is the adjacency matrix of the final selected network.sel_density
is the density of the final selected network.sel_icov
is the inverse covariance matrix of the final selected network.call
is the matched call.method
is the chosen model selection method.
These are the additional elements returned when choosing "xestars":
opt_adj
is a list of the adjacency matrices finally selected for each \(c\) parameter explored.opt_variability
is a numerical vector containing the variabilities associated to the adjacency matrices inopt_adj
.opt_index_lw
andopt_index_lb
are integer vectors containing the index of the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.opt_lambda_w
andopt_lambda_b
are vectors containing the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.merge_lw
andmerge_lb
are returned only iflight
is set to FALSE. They are lists with as many elements as the number of \(c\) parameters explored. Every element is a "merged" adjacency matrix, the average of all the adjacency matrices estimated for those specific \(c\) and the selected \(\lambda_w\) (or \(\lambda_b\)) values across all the subsampling in the last path explored before convergence, the one when the final combination of \(\lambda_w\) and \(\lambda_b\) is selected for the given \(c\) value.
These are the additional elements returned when choosing "xstars":
merge_lw
andmerge_lb
are lists with as many elements as the number of \(c\) parameters explored. Every element is in turn a list of as many matrices as the number of \(\lambda_w\) (or \(\lambda_b\)) values explored. Each matrix is the "merged" adjacency matrix, the average of all the adjacency matrices estimated for those specific \(c\) and \(\lambda_w\) (or \(\lambda_b\)) values across all the subsampling in the last path explored before convergence, the one when the final combination of \(\lambda_w\) and \(\lambda_b\) is selected for the given \(c\) value.variability_lw
andvariability_lb
are lists with as many elements as the number of \(c\) parameters explored. Every element is a numeric vector of as many items as the number of \(\lambda_w\) (or \(\lambda_b\)) values explored. Each item is the variability of the network estimated for those specific \(c\) and \(\lambda_w\) (or \(\lambda_b\)) values in the last path explored before convergence, the one when the final combination of \(\lambda_w\) and \(\lambda_b\) is selected for the given \(c\) value.opt_adj
is a list of the adjacency matrices finally selected for each \(c\) parameter explored.opt_variability
is a numerical vector containing the variabilities associated to the adjacency matrices inopt_adj
.opt_index_lw
andopt_index_lb
are integer vectors containing the index of the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.opt_lambda_w
andopt_lambda_b
are vectors containing the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.
These are the additional elements returned when choosing "ebic":
ebic_scores
is a numerical vector containing the eBIC scores for all the hyperparameter combination.
Details
select_coglasso()
provides three model selection strategies:
"xstars" uses eXtended StARS (XStARS) selecting the most stable, yet sparse network. Stability is computed upon network estimation from multiple subsamples of the multi-omics data set, allowing repetition. Subsamples are collected for a fixed amount of times (
rep_num
), and with a fixed proportion of the total number of samples (stars_subsample_ratio
). Seexstars()
for more information on the methodology."xestars" uses eXtended Efficient StARS (XEStARS), a significantly faster and memory-effcient version of XStARS. It could produce marginally different results to "xstars" due to a different sampling strategy. See
xestars()
for more information on the methodology."ebic" uses the extended Bayesian Information Criterion (eBIC) selecting the network that minimizes it.
gamma
sets the wait given to the extended component, turning the model selection method to the standard BIC if set to 0.
Examples
cg <- coglasso(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3,
nlambda_b = 3, nc = 3, verbose = FALSE)
# Using eXtended Efficient StARS, takes less than five seconds
sel_cg_xestars <- select_coglasso(cg, method = "xestars", verbose = FALSE)
# \donttest{
# Using eXtended StARS, takes around a minute
sel_cg_xstars <- select_coglasso(cg, method = "xstars", verbose = FALSE)
# }
# Using eBIC
sel_cg_ebic <- select_coglasso(cg, method = "ebic", verbose = FALSE)