bs()
wraps the two main functions of the package in a single one:
coglasso()
, to build multiple multi-omics networks, and select_coglasso()
to select the best one according to the chosen criterion.
Usage
bs(
data,
p = NULL,
pX = lifecycle::deprecated(),
lambda_w = NULL,
lambda_b = NULL,
c = NULL,
nlambda_w = NULL,
nlambda_b = NULL,
nc = NULL,
lambda_w_max = NULL,
lambda_b_max = NULL,
c_max = NULL,
lambda_w_min_ratio = NULL,
lambda_b_min_ratio = NULL,
c_min_ratio = NULL,
icov_guess = NULL,
cov_output = FALSE,
lock_lambdas = FALSE,
method = "xestars",
stars_thresh = 0.1,
stars_subsample_ratio = NULL,
rep_num = 20,
max_iter = 10,
old_sampling = FALSE,
light = TRUE,
ebic_gamma = 0.5,
verbose = TRUE
)
Arguments
- data
The input multi-omics data set. Rows should be samples, columns should be variables. Variables should be grouped by their assay (e.g. transcripts first, then metabolites).
data
is a required parameter.- p
A vector with with the number of variables for each omic layer of the data set (e.g. the number of transcripts, metabolites etc.), in the same order the layers have in the data set. If given a single number,
coglasso()
assumes that the total of data sets is two, and that the number given is the dimension of the first one.- pX
- lambda_w
A vector of values for the parameter \(\lambda_w\), the penalization parameter for the "within" interactions. Overrides
nlambda_w
.- lambda_b
A vector of values for the parameter \(\lambda_b\), the penalization parameter for the "between" interactions. Overrides
nlambda_b
.- c
A vector of values for the parameter \(c\), the weight given to collaboration. Overrides
nc
.- nlambda_w
The number of requested \(\lambda_w\) parameters to explore. A sequence of size
nlambda_w
of \(\lambda_w\) parameters will be generated. Defaults to 8. Ignored whenlambda_w
is set by the user.- nlambda_b
The number of requested \(\lambda_b\) parameters to explore. A sequence of size
nlambda_b
of \(\lambda_b\) parameters will be generated. Defaults to 8. Ignored whenlambda_b
is set by the user.- nc
The number of requested \(c\) parameters to explore. A sequence of size
nc
of \(c\) parameters will be generated. Defaults to 8. Ignored whenc
is set by the user.- lambda_w_max
The greatest generated \(\lambda_w\). By default it is computed with a data-driven approach. Ignored when
lambda_w
is set by the user.- lambda_b_max
The greatest generated \(\lambda_b\). By default it is computed with a data-driven approach. Ignored when
lambda_b
is set by the user.- c_max
The greatest generated \(c\). Defaults to 10. Ignored when
c
is set by the user.- lambda_w_min_ratio
The ratio of the smallest generated \(\lambda_w\) over the greatest generated \(\lambda_w\). Defaults to 0.1. Ignored when
lambda_w
is set by the user.- lambda_b_min_ratio
The ratio of the smallest generated \(\lambda_b\) over the greatest generated \(\lambda_b\). Defaults to 0.1. Ignored when
lambda_b
is set by the user.- c_min_ratio
The ratio of the smallest generated \(c\) over the greatest generated \(c\). Defaults to 0.1. Ignored when
c
is set by the user.- icov_guess
Use a predetermined inverse covariance matrix as an initial guess for the network estimation.
- cov_output
Add the estimated variance-covariance matrix to the output.
- lock_lambdas
Set \(\lambda_w = \lambda_b\). Force a single lambda parameter for both "within" and "between" interactions.
- method
The model selection method to select the best combination of hyperparameters. The available options are "xstars", "xestars" and "eBIC". Defaults to "xestars".
- stars_thresh
The threshold set for variability of the explored networks at each iteration of the algorithm. The \(\lambda_w\) or the \(\lambda_b\) associated to the most stable network before the threshold is overcome is selected.
- stars_subsample_ratio
The proportion of samples in the multi-omics data set to be randomly subsampled to estimate the variability of the network under the given hyperparameters setting. Defaults to 80% when the number of samples is smaller than 144, otherwise it defaults to \(\frac{10}{n}\sqrt{n}\).
- rep_num
The amount of subsamples of the multi-omics data set used to estimate the variability of the network under the given hyperparameters setting. Defaults to 20.
- max_iter
The greatest number of times the algorithm is allowed to choose a new best \(\lambda_w\). Defaults to 10.
- old_sampling
Perform the same subsampling
xstars()
would if set to TRUE. Makes a difference with bigger data sets, where computing a correlation matrix could take significantly longer. Defaults to FALSE.- light
Do not store the "merged" matrixes recording average variability of each edge, making the algorithm more memory efficient, if set to TRUE. Defaults to TRUE.
- ebic_gamma
The \(\gamma\) tuning parameter for eBIC selection, to set between 0 and 1. When set to 0 one has the standard BIC. Defaults to 0.5.
- verbose
Print information regarding the network building and the network selection processes.
Value
bs()
returns an object of S3
class select_coglasso
containing
several elements. The most
important is probably sel_adj
, the adjacency matrix of the
selected network. Some output elements depend on the chosen model selection
method.
These elements are always returned, and they are the result of network
estimation with coglasso()
:
loglik
is a numerical vector containing the \(log\) likelihoods of all the estimated networks.density
is a numerical vector containing a measure of the density of all the estimated networks.df
is an integer vector containing the degrees of freedom of all the estimated networks.convergence
is a binary vector containing whether a network was successfully estimated for the given combination of hyperparameters or not.path
is a list containing the adjacency matrices of all the estimated networks.icov
is a list containing the inverse covariance matrices of all the estimated networks.nexploded
is the number of combinations of hyperparameters for whichcoglasso()
failed to converge.data
is the input multi-omics data set.hpars
is the ordered table of all the combinations of hyperparameters given as input tobs()
, with \(\alpha(\lambda_w+\lambda_b)\) being the key to sort rows.lambda_w
,lambda_b
, andc
are numerical vectors with, respectively, all the \(\lambda_w\), \(\lambda_b\), and \(c\) valuesbs()
used.p
is the vector with the number of variables for each omic layer of the data set.D
is the number of omics layers in the data set.cov
optional, returned whencov_output
is TRUE, is a list containing the variance-covariance matrices of all the estimated networks.
These elements are returned by all selection methods available:
sel_index_c
,sel_index_lw
andsel_index_lb
are the indexes of the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.sel_c
,sel_lambda_w
andsel_lambda_b
are the final selected parameters \(c\), \(\lambda_w\) and \(\lambda_b\) leading to the most stable sparse network.sel_adj
is the adjacency matrix of the final selected network.sel_density
is the density of the final selected network.sel_icov
is the inverse covariance matrix of the final selected network.call
is the matched call.method
is the chosen model selection method.
These are the additional elements returned when choosing "xestars":
opt_adj
is a list of the adjacency matrices finally selected for each \(c\) parameter explored.opt_variability
is a numerical vector containing the variabilities associated to the adjacency matrices inopt_adj
.opt_index_lw
andopt_index_lb
are integer vectors containing the index of the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.opt_lambda_w
andopt_lambda_b
are vectors containing the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.merge_lw
andmerge_lb
are returned only iflight
is set to FALSE. They are lists with as many elements as the number of \(c\) parameters explored. Every element is a "merged" adjacency matrix, the average of all the adjacency matrices estimated for those specific \(c\) and the selected \(\lambda_w\) (or \(\lambda_b\)) values across all the subsampling in the last path explored before convergence, the one when the final combination of \(\lambda_w\) and \(\lambda_b\) is selected for the given \(c\) value.
These are the additional elements returned when choosing "xstars":
merge_lw
andmerge_lb
are lists with as many elements as the number of \(c\) parameters explored. Every element is in turn a list of as many matrices as the number of \(\lambda_w\) (or \(\lambda_b\)) values explored. Each matrix is the "merged" adjacency matrix, the average of all the adjacency matrices estimated for those specific \(c\) and \(\lambda_w\) (or \(\lambda_b\)) values across all the subsampling in the last path explored before convergence, the one when the final combination of \(\lambda_w\) and \(\lambda_b\) is selected for the given \(c\) value.variability_lw
andvariability_lb
are lists with as many elements as the number of \(c\) parameters explored. Every element is a numeric vector of as many items as the number of \(\lambda_w\) (or \(\lambda_b\)) values explored. Each item is the variability of the network estimated for those specific \(c\) and \(\lambda_w\) (or \(\lambda_b\)) values in the last path explored before convergence, the one when the final combination of \(\lambda_w\) and \(\lambda_b\) is selected for the given \(c\) value.opt_adj
is a list of the adjacency matrices finally selected for each \(c\) parameter explored.opt_variability
is a numerical vector containing the variabilities associated to the adjacency matrices inopt_adj
.opt_index_lw
andopt_index_lb
are integer vectors containing the index of the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.opt_lambda_w
andopt_lambda_b
are vectors containing the selected \(\lambda_w\)s (or \(\lambda_b\)s) for each \(c\) parameters explored.
These are the additional elements returned when choosing "ebic":
ebic_scores
is a numerical vector containing the eBIC scores for all the hyperparameter combination.
Details
When using bs()
, first, coglasso()
estimates multiple multi-omics networks
with the algorithm collaborative graphical lasso, one for each combination
of input values for the hyperparameters \(\lambda_w\), \(\lambda_b\) and
\(c\). Then, select_coglasso()
selects the best combination of
hyperparameters given to coglasso()
according to the selected model
selection method. The three availble options that can be set for the argument
method
are "xstars", "xestars" and "ebic". For more information on these
selection methods, visit the help page of select_coglasso()
.
Examples
# Suggested usage: give the input data set, set the values for `p` and the
# number of hyperparameters to explore (to choose how extensively to explore
# the possible hyperparameters). Then, let the default behavior do the rest:
sel_mo_net <- bs(multi_omics_sd_micro, p = c(4, 2), nlambda_w = 3,
nlambda_b = 3, nc = 3, verbose = FALSE)