Title: | Unsupervised Multi-Task and Transfer Learning on Gaussian Mixture Models |
---|---|
Description: | Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the Expectation-Maximization (EM) algorithm that not only can effectively utilize unknown similarity between related tasks but is also robust against a fraction of outlier tasks from arbitrary sources. The proposed procedure is shown to achieve minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Finally, we demonstrate the effectiveness of our methods through simulations and a real data analysis. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees. This package implements the algorithms proposed in Tian, Y., Weng, H., & Feng, Y. (2022) <arXiv:2209.15224>. |
Authors: | Ye Tian [aut, cre], Haolei Weng [aut], Yang Feng [aut] |
Maintainer: | Ye Tian <[email protected]> |
License: | GPL-2 |
Version: | 0.1.0 |
Built: | 2025-02-24 05:49:55 UTC |
Source: | https://github.com/cran/mtlgmm |
Align the initializations. This function implements the two alignment algorithms (Algorithms 2 and 3) in Tian, Y., Weng, H., & Feng, Y. (2022). This function is mainly for people to align the single-task initializations manually. The alignment procedure has been automatically implemented in function mtlgmm
and tlgmm
. So there is no need to call this function when fitting MTL-GMM or TL-GMM.
alignment(mu1, mu2, method = c("exhaustive", "greedy"))
alignment(mu1, mu2, method = c("exhaustive", "greedy"))
mu1 |
the initializations for mu1 of all tasks. Should be a matrix of which each column is a mu1 estimate of a task. |
mu2 |
the initializations for mu2 of all tasks. Should be a matrix of which each column is a mu2 estimate of a task. |
method |
alignment method. Can be either "exhaustive" (Algorithm 2 in Tian, Y., Weng, H., & Feng, Y. (2022)) or "greedy" (Algorithm 3 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: "exhaustive" |
the index of two clusters to become well-aligned, i.e. the "r_k" in Section 2.4.2 of Tian, Y., Weng, H., & Feng, Y. (2022). The output can be passed to function alignment_swap
to obtain the well-aligned intializations.
For examples, see part "fit signle-task GMMs" of examples in function mtlgmm
.
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
mtlgmm
, tlgmm
, predict_gmm
, data_generation
, initialize
, alignment_swap
, estimation_error
, misclustering_error
.
alignment_swap
.Complete the alignment of initializations based on the output of function alignment_swap
. This function is mainly for people to align the single-task initializations manually. The alignment procedure has been automatically implemented in function mtlgmm
and tlgmm
. So there is no need to call this function when fitting MTL-GMM or TL-GMM.
alignment_swap(L1, L2, initial_value_list)
alignment_swap(L1, L2, initial_value_list)
L1 |
the component "L1" of the output from function |
L2 |
the component "L2" of the output from function |
initial_value_list |
the output from function |
A list with the following components (well-aligned).
w |
the estimate of mixture proportion in GMMs for each task. Will be a vector. |
mu1 |
the estimate of Gaussian mean in the first cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task. |
mu2 |
the estimate of Gaussian mean in the second cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task. |
beta |
the estimate of the discriminant coefficient for each task. Will be a matrix, where each column represents the estimate for a task. |
Sigma |
the estimate of the common covariance matrix for each task. Will be a list, where each component represents the estimate for a task. |
For examples, see part "fit signle-task GMMs" of examples in function mtlgmm
.
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
mtlgmm
, tlgmm
, predict_gmm
, data_generation
, initialize
, alignment
, estimation_error
, misclustering_error
.
Generate data for simulations. All models used in Tian, Y., Weng, H., & Feng, Y. (2022)) are implemented.
data_generation( K = 10, outlier_K = 1, simulation_no = c("MTL-1", "MTL-2"), h_w = 0.1, h_mu = 1, n = 50 )
data_generation( K = 10, outlier_K = 1, simulation_no = c("MTL-1", "MTL-2"), h_w = 0.1, h_mu = 1, n = 50 )
K |
the number of tasks (data sets). Default: 10 |
outlier_K |
the number of outlier tasks. Default: 1 |
simulation_no |
simulation number in Tian, Y., Weng, H., & Feng, Y. (2022)). Can be "MTL-1", "MTL-2". Default = "MTL-1". |
h_w |
the value of h_w. Default: 0.1 |
h_mu |
the value of h_mu. Default: 1 |
n |
the sample size of each task. Can be either an positive integer or a vector of length |
a list of two sub-lists "data" and "parameter". List "data" contains a list of design matrices x
, a list of hidden labels y
, and a vector of outlier task indices outlier_index
. List "parameter" contains a vector w
of mixture proportions, a matrix mu1
of which each column is the GMM mean of the first cluster of each task, a matrix mu2
of which each column is the GMM mean of the second cluster of each task, a matrix beta
of which each column is the discriminant coefficient in each task, a list Sigma
of covariance matrices for each task.
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
mtlgmm
, tlgmm
, predict_gmm
, initialize
, alignment
, alignment_swap
, estimation_error
, misclustering_error
.
data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50)
data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50)
Caluclate the estimation error of GMM parameters under the MTL setting (the worst performance among all tasks). Euclidean norms are used.
estimation_error( estimated_value, true_value, parameter = c("w", "mu", "beta", "Sigma") )
estimation_error( estimated_value, true_value, parameter = c("w", "mu", "beta", "Sigma") )
estimated_value |
estimate of GMM parameters. The form of input depends on the parameter |
true_value |
true values of GMM parameters. The form of input depends on the parameter |
parameter |
which parameter to calculate the estimation error for. Can be "w", "mu", "beta", or "Sigma".
|
the largest estimation error among all tasks.
For examples, see examples in function mtlgmm
.
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
mtlgmm
, tlgmm
, predict_gmm
, data_generation
, initialize
, alignment
, alignment_swap
, misclustering_error
.
Initialize the estimators of GMM parameters on each task.
initialize(x, method = c("kmeans", "EM"))
initialize(x, method = c("kmeans", "EM"))
x |
design matrices from multiple data sets. Should be a list, of which each component is a |
method |
initialization method. This indicates the method to initialize the estimates of GMM parameters for each data set. Can be either "EM" or "kmeans". Default: "EM". |
A list with the following components.
w |
the estimate of mixture proportion in GMMs for each task. Will be a vector. |
mu1 |
the estimate of Gaussian mean in the first cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task. |
mu2 |
the estimate of Gaussian mean in the second cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task. |
beta |
the estimate of the discriminant coefficient for each task. Will be a matrix, where each column represents the estimate for a task. |
Sigma |
the estimate of the common covariance matrix for each task. Will be a list, where each component represents the estimate for a task. |
mtlgmm
, tlgmm
, predict_gmm
, data_generation
, alignment
, alignment_swap
, estimation_error
, misclustering_error
.
set.seed(0, kind = "L'Ecuyer-CMRG") ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the data fit <- mtlgmm(x = data_list$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") ## Initialize the estimators of GMM parameters on each task. fitted_values_EM <- initialize(data_list$data$x, "EM") # initilize the estimates by single-task EM algorithm fitted_values_kmeans <- initialize(data_list$data$x, "EM") # initilize the estimates by single-task k-means
set.seed(0, kind = "L'Ecuyer-CMRG") ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the data fit <- mtlgmm(x = data_list$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") ## Initialize the estimators of GMM parameters on each task. fitted_values_EM <- initialize(data_list$data$x, "EM") # initilize the estimates by single-task EM algorithm fitted_values_kmeans <- initialize(data_list$data$x, "EM") # initilize the estimates by single-task k-means
Calculate the misclustering error given the predicted cluster labels.
misclustering_error(y_pred, y_test, type = c("max", "all", "avg"))
misclustering_error(y_pred, y_test, type = c("max", "all", "avg"))
y_pred |
predicted cluster labels |
y_test |
true cluster labels |
type |
which type of the misclustering error rate to return. Can be either "max", "all", or "avg". Default: "max".
|
Depends on type
.
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
mtlgmm
, tlgmm
, data_generation
, predict_gmm
, initialize
, alignment
, alignment_swap
, estimation_error
.
set.seed(23, kind = "L'Ecuyer-CMRG") ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 100) # generate the data x_train <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][1:50,] }, simplify = FALSE) x_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][-(1:50),] }, simplify = FALSE) y_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$y[[k]][-(1:50)] }, simplify = FALSE) fit <- mtlgmm(x = x_train, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") y_pred <- sapply(1:length(data_list$data$x), function(i){ predict_gmm(w = fit$w[i], mu1 = fit$mu1[, i], mu2 = fit$mu2[, i], beta = fit$beta[, i], newx = x_test[[i]]) }, simplify = FALSE) misclustering_error(y_pred[-data_list$data$outlier_index], y_test[-data_list$data$outlier_index], type = "max")
set.seed(23, kind = "L'Ecuyer-CMRG") ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 100) # generate the data x_train <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][1:50,] }, simplify = FALSE) x_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][-(1:50),] }, simplify = FALSE) y_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$y[[k]][-(1:50)] }, simplify = FALSE) fit <- mtlgmm(x = x_train, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") y_pred <- sapply(1:length(data_list$data$x), function(i){ predict_gmm(w = fit$w[i], mu1 = fit$mu1[, i], mu2 = fit$mu2[, i], beta = fit$beta[, i], newx = x_test[[i]]) }, simplify = FALSE) misclustering_error(y_pred[-data_list$data$outlier_index], y_test[-data_list$data$outlier_index], type = "max")
it binary Gaussian mixture models (GMMs) on multiple data sets under a multi-task learning (MTL) setting. This function implements the modified EM algorithm (Altorithm 1) proposed in Tian, Y., Weng, H., & Feng, Y. (2022).
mtlgmm( x, step_size = c("lipschitz", "fixed"), eta_w = 0.1, eta_mu = 0.1, eta_beta = 0.1, lambda_choice = c("cv", "fixed"), cv_nfolds = 5, cv_upper = 5, cv_lower = 0.01, cv_length = 5, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, tol = 1e-05, initial_method = c("EM", "kmeans"), alignment_method = ifelse(length(x) <= 10, "exhaustive", "greedy"), trim = 0.1, iter_max = 1000, iter_max_prox = 100, ncores = 1 )
mtlgmm( x, step_size = c("lipschitz", "fixed"), eta_w = 0.1, eta_mu = 0.1, eta_beta = 0.1, lambda_choice = c("cv", "fixed"), cv_nfolds = 5, cv_upper = 5, cv_lower = 0.01, cv_length = 5, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, tol = 1e-05, initial_method = c("EM", "kmeans"), alignment_method = ifelse(length(x) <= 10, "exhaustive", "greedy"), trim = 0.1, iter_max = 1000, iter_max_prox = 100, ncores = 1 )
x |
design matrices from multiple data sets. Should be a list, of which each component is a |
step_size |
step size choice in proximal gradient method to solve each optimization problem in the revised EM algorithm (Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)), which can be either "lipschitz" or "fixed". Default = "lipschitz".
|
eta_w |
step size in the proximal gradient method to learn w (Step 3 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when |
eta_mu |
step size in the proximal gradient method to learn mu (Steps 4 and 5 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when |
eta_beta |
step size in the proximal gradient method to learn beta (Step 9 of Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when |
lambda_choice |
the choice of constants in the penalty parameter used in the optimization problems. See Algorithm 1 of Tian, Y., Weng, H., & Feng, Y. (2022), which can be either "fixed" or "cv". Default: "cv".
|
cv_nfolds |
the number of cross-validation folds. Default: 5 |
cv_upper |
the upper bound of |
cv_lower |
the lower bound of |
cv_length |
the number of |
C1_w |
the initial value of C1_w. See equations (7) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05 |
C1_mu |
the initial value of C1_mu. See equations (8) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
C1_beta |
the initial value of C1_beta. See equations (9) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
C2_w |
the initial value of C2_w. See equations (10) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05 |
C2_mu |
the initial value of C2_mu. See equations (11) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
C2_beta |
the initial value of C2_beta. See equations (12) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
kappa |
the decaying rate used in equation (7)-(12) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 1/3 |
tol |
maximum tolerance in all optimization problems. If the difference between last update and the current update is less than this value, the iterations of optimization will stop. Default: 1e-05 |
initial_method |
initialization method. This indicates the method to initialize the estimates of GMM parameters for each data set. Can be either "EM" or "kmeans". Default: "EM". |
alignment_method |
the alignment algorithm to use. See Section 2.4 of Tian, Y., Weng, H., & Feng, Y. (2022). Can either be "exhaustive" or "greedy". Default: when
|
trim |
the proportion of trimmed data sets in the cross-validation procedure of choosing tuning parameters. Setting it to a non-zero small value can help avoid the impact of outlier tasks on the choice of tuning parameters. Default: 0.1 |
iter_max |
the maximum iteration number of the revised EM algorithm (i.e. the parameter T in Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 1000 |
iter_max_prox |
the maximum iteration number of the proximal gradient method. Default: 100 |
ncores |
the number of cores to use. Parallel computing is strongly suggested, specially when |
A list with the following components.
w |
the estimate of mixture proportion in GMMs for each task. Will be a vector. |
mu1 |
the estimate of Gaussian mean in the first cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task. |
mu2 |
the estimate of Gaussian mean in the second cluster of GMMs for each task. Will be a matrix, where each column represents the estimate for a task. |
beta |
the estimate of the discriminant coefficient for each task. Will be a matrix, where each column represents the estimate for a task. |
Sigma |
the estimate of the common covariance matrix for each task. Will be a list, where each component represents the estimate for a task. |
w_bar |
the center estimate of w. Numeric. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022). |
mu1_bar |
the center estimate of mu1. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022). |
mu2_bar |
the center estimate of mu2. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022). |
beta_bar |
the center estimate of beta. Will be a vector. See Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022). |
C1_w |
the initial value of C1_w. |
C1_mu |
the initial value of C1_mu. |
C1_beta |
the initial value of C1_beta. |
C2_w |
the initial value of C2_w. |
C2_mu |
the initial value of C2_mu. |
C2_beta |
the initial value of C2_beta. |
initial_mu1 |
the well-aligned initial estimate of mu1 of different tasks. Useful for the alignment problem in transfer learning. See Section 3.4 in Tian, Y., Weng, H., & Feng, Y. (2022). |
initial_mu2 |
the well-aligned initial estimate of mu2 of different tasks. Useful for the alignment problem in transfer learning. See Section 3.4 in Tian, Y., Weng, H., & Feng, Y. (2022). |
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and trends in Optimization, 1(3), 127-239.
tlgmm
, predict_gmm
, data_generation
, initialize
, alignment
, alignment_swap
, estimation_error
, misclustering_error
.
set.seed(0, kind = "L'Ecuyer-CMRG") library(mclust) ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the data fit <- mtlgmm(x = data_list$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") ## compare the performance with that of single-task estimators # fit single-task GMMs fitted_values <- initialize(data_list$data$x, "EM") # initilize the estimates L <- alignment(fitted_values$mu1, fitted_values$mu2, method = "exhaustive") # call the alignment algorithm fitted_values <- alignment_swap(L$L1, L$L2, initial_value_list = fitted_values) # obtain the well-aligned initial estimates # fit a pooled GMM x.comb <- Reduce("rbind", data_list$data$x) fit_pooled <- Mclust(x.comb, G = 2, modelNames = "EEE") fitted_values_pooled <- list(w = NULL, mu1 = NULL, mu2 = NULL, beta = NULL, Sigma = NULL) fitted_values_pooled$w <- rep(fit_pooled$parameters$pro[1], length(data_list$data$x)) fitted_values_pooled$mu1 <- matrix(rep(fit_pooled$parameters$mean[,1], length(data_list$data$x)), ncol = length(data_list$data$x)) fitted_values_pooled$mu2 <- matrix(rep(fit_pooled$parameters$mean[,2], length(data_list$data$x)), ncol = length(data_list$data$x)) fitted_values_pooled$Sigma <- sapply(1:length(data_list$data$x), function(k){ fit_pooled$parameters$variance$Sigma }, simplify = FALSE) fitted_values_pooled$beta <- sapply(1:length(data_list$data$x), function(k){ solve(fit_pooled$parameters$variance$Sigma) %*% (fit_pooled$parameters$mean[,1] - fit_pooled$parameters$mean[,2]) }) error <- matrix(nrow = 3, ncol = 4, dimnames = list(c("Single-task-GMM","Pooled-GMM","MTL-GMM"), c("w", "mu", "beta", "Sigma"))) error["Single-task-GMM", "w"] <- estimation_error( fitted_values$w[-data_list$data$outlier_index], data_list$parameter$w[-data_list$data$outlier_index], "w") error["Pooled-GMM", "w"] <- estimation_error( fitted_values_pooled$w[-data_list$data$outlier_index], data_list$parameter$w[-data_list$data$outlier_index], "w") error["MTL-GMM", "w"] <- estimation_error( fit$w[-data_list$data$outlier_index], data_list$parameter$w[-data_list$data$outlier_index], "w") error["Single-task-GMM", "mu"] <- estimation_error( list(fitted_values$mu1[, -data_list$data$outlier_index], fitted_values$mu2[, -data_list$data$outlier_index]), list(data_list$parameter$mu1[, -data_list$data$outlier_index], data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu") error["Pooled-GMM", "mu"] <- estimation_error(list( fitted_values_pooled$mu1[, -data_list$data$outlier_index], fitted_values_pooled$mu2[, -data_list$data$outlier_index]), list(data_list$parameter$mu1[, -data_list$data$outlier_index], data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu") error["MTL-GMM", "mu"] <- estimation_error(list( fit$mu1[, -data_list$data$outlier_index], fit$mu2[, -data_list$data$outlier_index]), list(data_list$parameter$mu1[, -data_list$data$outlier_index], data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu") error["Single-task-GMM", "beta"] <- estimation_error( fitted_values$beta[, -data_list$data$outlier_index], data_list$parameter$beta[, -data_list$data$outlier_index], "beta") error["Pooled-GMM", "beta"] <- estimation_error( fitted_values_pooled$beta[, -data_list$data$outlier_index], data_list$parameter$beta[, -data_list$data$outlier_index], "beta") error["MTL-GMM", "beta"] <- estimation_error( fit$beta[, -data_list$data$outlier_index], data_list$parameter$beta[, -data_list$data$outlier_index], "beta") error["Single-task-GMM", "Sigma"] <- estimation_error( fitted_values$Sigma[-data_list$data$outlier_index], data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma") error["Pooled-GMM", "Sigma"] <- estimation_error( fitted_values_pooled$Sigma[-data_list$data$outlier_index], data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma") error["MTL-GMM", "Sigma"] <- estimation_error( fit$Sigma[-data_list$data$outlier_index], data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma") error # use cross-validation to choose the tuning parameters # warning: can be quite slow, large "ncores" input is suggested!! fit <- mtlgmm(x = data_list$data$x, kappa = 1/3, initial_method = "EM", ncores = 2, cv_length = 5, trim = 0.1, cv_upper = 2, cv_lower = 0.01, lambda = "cv", step_size = "lipschitz")
set.seed(0, kind = "L'Ecuyer-CMRG") library(mclust) ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the data fit <- mtlgmm(x = data_list$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") ## compare the performance with that of single-task estimators # fit single-task GMMs fitted_values <- initialize(data_list$data$x, "EM") # initilize the estimates L <- alignment(fitted_values$mu1, fitted_values$mu2, method = "exhaustive") # call the alignment algorithm fitted_values <- alignment_swap(L$L1, L$L2, initial_value_list = fitted_values) # obtain the well-aligned initial estimates # fit a pooled GMM x.comb <- Reduce("rbind", data_list$data$x) fit_pooled <- Mclust(x.comb, G = 2, modelNames = "EEE") fitted_values_pooled <- list(w = NULL, mu1 = NULL, mu2 = NULL, beta = NULL, Sigma = NULL) fitted_values_pooled$w <- rep(fit_pooled$parameters$pro[1], length(data_list$data$x)) fitted_values_pooled$mu1 <- matrix(rep(fit_pooled$parameters$mean[,1], length(data_list$data$x)), ncol = length(data_list$data$x)) fitted_values_pooled$mu2 <- matrix(rep(fit_pooled$parameters$mean[,2], length(data_list$data$x)), ncol = length(data_list$data$x)) fitted_values_pooled$Sigma <- sapply(1:length(data_list$data$x), function(k){ fit_pooled$parameters$variance$Sigma }, simplify = FALSE) fitted_values_pooled$beta <- sapply(1:length(data_list$data$x), function(k){ solve(fit_pooled$parameters$variance$Sigma) %*% (fit_pooled$parameters$mean[,1] - fit_pooled$parameters$mean[,2]) }) error <- matrix(nrow = 3, ncol = 4, dimnames = list(c("Single-task-GMM","Pooled-GMM","MTL-GMM"), c("w", "mu", "beta", "Sigma"))) error["Single-task-GMM", "w"] <- estimation_error( fitted_values$w[-data_list$data$outlier_index], data_list$parameter$w[-data_list$data$outlier_index], "w") error["Pooled-GMM", "w"] <- estimation_error( fitted_values_pooled$w[-data_list$data$outlier_index], data_list$parameter$w[-data_list$data$outlier_index], "w") error["MTL-GMM", "w"] <- estimation_error( fit$w[-data_list$data$outlier_index], data_list$parameter$w[-data_list$data$outlier_index], "w") error["Single-task-GMM", "mu"] <- estimation_error( list(fitted_values$mu1[, -data_list$data$outlier_index], fitted_values$mu2[, -data_list$data$outlier_index]), list(data_list$parameter$mu1[, -data_list$data$outlier_index], data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu") error["Pooled-GMM", "mu"] <- estimation_error(list( fitted_values_pooled$mu1[, -data_list$data$outlier_index], fitted_values_pooled$mu2[, -data_list$data$outlier_index]), list(data_list$parameter$mu1[, -data_list$data$outlier_index], data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu") error["MTL-GMM", "mu"] <- estimation_error(list( fit$mu1[, -data_list$data$outlier_index], fit$mu2[, -data_list$data$outlier_index]), list(data_list$parameter$mu1[, -data_list$data$outlier_index], data_list$parameter$mu2[, -data_list$data$outlier_index]), "mu") error["Single-task-GMM", "beta"] <- estimation_error( fitted_values$beta[, -data_list$data$outlier_index], data_list$parameter$beta[, -data_list$data$outlier_index], "beta") error["Pooled-GMM", "beta"] <- estimation_error( fitted_values_pooled$beta[, -data_list$data$outlier_index], data_list$parameter$beta[, -data_list$data$outlier_index], "beta") error["MTL-GMM", "beta"] <- estimation_error( fit$beta[, -data_list$data$outlier_index], data_list$parameter$beta[, -data_list$data$outlier_index], "beta") error["Single-task-GMM", "Sigma"] <- estimation_error( fitted_values$Sigma[-data_list$data$outlier_index], data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma") error["Pooled-GMM", "Sigma"] <- estimation_error( fitted_values_pooled$Sigma[-data_list$data$outlier_index], data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma") error["MTL-GMM", "Sigma"] <- estimation_error( fit$Sigma[-data_list$data$outlier_index], data_list$parameter$Sigma[-data_list$data$outlier_index], "Sigma") error # use cross-validation to choose the tuning parameters # warning: can be quite slow, large "ncores" input is suggested!! fit <- mtlgmm(x = data_list$data$x, kappa = 1/3, initial_method = "EM", ncores = 2, cv_length = 5, trim = 0.1, cv_upper = 2, cv_lower = 0.01, lambda = "cv", step_size = "lipschitz")
Clustering new observations based on fitted GMM estimators, which is an empirical version of Bayes classifier. See equation (13) in Tian, Y., Weng, H., & Feng, Y. (2022).
predict_gmm(w, mu1, mu2, beta, newx)
predict_gmm(w, mu1, mu2, beta, newx)
w |
the estimate of mixture proportion in the GMM. Numeric. |
mu1 |
the estimate of Gaussian mean of the first cluster in the GMM. Should be a vector. |
mu2 |
the estimate of Gaussian mean of the first cluster in the GMM. Should be a vector. |
beta |
the estimate of the discriminant coefficient for the GMM. Should be a vector. |
newx |
design matrix of new observations. Should be a matrix. |
A vector of predicted labels of new observations.
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
mtlgmm
, tlgmm
, data_generation
, initialize
, alignment
, alignment_swap
, estimation_error
, misclustering_error
.
set.seed(23, kind = "L'Ecuyer-CMRG") ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the data x_train <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][1:50,] }, simplify = FALSE) x_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][-(1:50),] }, simplify = FALSE) y_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$y[[k]][-(1:50)] }, simplify = FALSE) fit <- mtlgmm(x = x_train, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") y_pred <- sapply(1:length(data_list$data$x), function(i){ predict_gmm(w = fit$w[i], mu1 = fit$mu1[, i], mu2 = fit$mu2[, i], beta = fit$beta[, i], newx = x_test[[i]]) }, simplify = FALSE) misclustering_error(y_pred[-data_list$data$outlier_index], y_test[-data_list$data$outlier_index], type = "max")
set.seed(23, kind = "L'Ecuyer-CMRG") ## Consider a 5-task multi-task learning problem in the setting "MTL-1" data_list <- data_generation(K = 5, outlier_K = 1, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the data x_train <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][1:50,] }, simplify = FALSE) x_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$x[[k]][-(1:50),] }, simplify = FALSE) y_test <- sapply(1:length(data_list$data$x), function(k){ data_list$data$y[[k]][-(1:50)] }, simplify = FALSE) fit <- mtlgmm(x = x_train, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") y_pred <- sapply(1:length(data_list$data$x), function(i){ predict_gmm(w = fit$w[i], mu1 = fit$mu1[, i], mu2 = fit$mu2[, i], beta = fit$beta[, i], newx = x_test[[i]]) }, simplify = FALSE) misclustering_error(y_pred[-data_list$data$outlier_index], y_test[-data_list$data$outlier_index], type = "max")
Fit the binary Gaussian mixture model (GMM) on target data set by leveraging multiple source data sets under a transfer learning (TL) setting. This function implements the modified EM algorithm (Altorithm 4) proposed in Tian, Y., Weng, H., & Feng, Y. (2022).
tlgmm( x, fitted_bar, step_size = c("lipschitz", "fixed"), eta_w = 0.1, eta_mu = 0.1, eta_beta = 0.1, lambda_choice = c("fixed", "cv"), cv_nfolds = 5, cv_upper = 2, cv_lower = 0.01, cv_length = 5, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa0 = 1/3, tol = 1e-05, initial_method = c("kmeans", "EM"), iter_max = 1000, iter_max_prox = 100, ncores = 1 )
tlgmm( x, fitted_bar, step_size = c("lipschitz", "fixed"), eta_w = 0.1, eta_mu = 0.1, eta_beta = 0.1, lambda_choice = c("fixed", "cv"), cv_nfolds = 5, cv_upper = 2, cv_lower = 0.01, cv_length = 5, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa0 = 1/3, tol = 1e-05, initial_method = c("kmeans", "EM"), iter_max = 1000, iter_max_prox = 100, ncores = 1 )
x |
design matrix of the target data set. Should be a |
fitted_bar |
the output from |
step_size |
step size choice in proximal gradient method to solve each optimization problem in the revised EM algorithm (Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)), which can be either "lipschitz" or "fixed". Default = "lipschitz".
|
eta_w |
step size in the proximal gradient method to learn w (Step 3 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when |
eta_mu |
step size in the proximal gradient method to learn mu (Steps 4 and 5 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when |
eta_beta |
step size in the proximal gradient method to learn beta (Step 7 of Algorithm 4 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 0.1. Only used when |
lambda_choice |
the choice of constants in the penalty parameter used in the optimization problems. See Algorithm 4 of Tian, Y., Weng, H., & Feng, Y. (2022), which can be either "fixed" or "cv". Default = "cv".
|
cv_nfolds |
the number of cross-validation folds. Default: 5 |
cv_upper |
the upper bound of |
cv_lower |
the lower bound of |
cv_length |
the number of |
C1_w |
the initial value of C1_w. See equations (19) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05 |
C1_mu |
the initial value of C1_mu. See equations (20) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
C1_beta |
the initial value of C1_beta. See equations (21) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
C2_w |
the initial value of C2_w. See equations (22) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.05 |
C2_mu |
the initial value of C2_mu. See equations (23) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
C2_beta |
the initial value of C2_beta. See equations (24) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 0.2 |
kappa0 |
the decaying rate used in equation (19)-(24) in Tian, Y., Weng, H., & Feng, Y. (2022). Default: 1/3 |
tol |
maximum tolerance in all optimization problems. If the difference between last update and the current update is less than this value, the iterations of optimization will stop. Default: 1e-05 |
initial_method |
initialization method. This indicates the method to initialize the estimates of GMM parameters for each data set. Can be either "kmeans" or "EM". |
iter_max |
the maximum iteration number of the revised EM algorithm (i.e. the parameter T in Algorithm 1 in Tian, Y., Weng, H., & Feng, Y. (2022)). Default: 1000 |
iter_max_prox |
the maximum iteration number of the proximal gradient method. Default: 100 |
ncores |
the number of cores to use. Parallel computing is strongly suggested, specially when |
A list with the following components.
w |
the estimate of mixture proportion in GMMs for the target task. Will be a vector. |
mu1 |
the estimate of Gaussian mean in the first cluster of GMMs for the target task. Will be a matrix, where each column represents the estimate for a task. |
mu2 |
the estimate of Gaussian mean in the second cluster of GMMs for the target task. Will be a matrix, where each column represents the estimate for a task. |
beta |
the estimate of the discriminant coefficient for the target task. Will be a matrix, where each column represents the estimate for a task. |
Sigma |
the estimate of the common covariance matrix for the target task. Will be a list, where each component represents the estimate for a task. |
C1_w |
the initial value of C1_w. |
C1_mu |
the initial value of C1_mu. |
C1_beta |
the initial value of C1_beta. |
C2_w |
the initial value of C2_w. |
C2_mu |
the initial value of C2_mu. |
C2_beta |
the initial value of C2_beta. |
Tian, Y., Weng, H., & Feng, Y. (2022). Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models. arXiv preprint arXiv:2209.15224.
Parikh, N., & Boyd, S. (2014). Proximal algorithms. Foundations and trends in Optimization, 1(3), 127-239.
mtlgmm
, predict_gmm
, data_generation
, initialize
, alignment
, alignment_swap
, estimation_error
, misclustering_error
.
set.seed(0, kind = "L'Ecuyer-CMRG") ## Consider a transfer learning problem with 3 source tasks and 1 target task in the setting "MTL-1" data_list_source <- data_generation(K = 3, outlier_K = 0, simulation_no = "MTL-1", h_w = 0, h_mu = 0, n = 50) # generate the source data data_target <- data_generation(K = 1, outlier_K = 0, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the target data fit_mtl <- mtlgmm(x = data_list_source$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa0 = 1/3, initial_method = "EM", ncores = 1, lambda_choice = "fixed", step_size = "lipschitz") # use cross-validation to choose the tuning parameters # warning: can be quite slow, large "ncores" input is suggested!! fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, kappa0 = 1/3, initial_method = "EM", ncores = 2, lambda_choice = "cv", step_size = "lipschitz")
set.seed(0, kind = "L'Ecuyer-CMRG") ## Consider a transfer learning problem with 3 source tasks and 1 target task in the setting "MTL-1" data_list_source <- data_generation(K = 3, outlier_K = 0, simulation_no = "MTL-1", h_w = 0, h_mu = 0, n = 50) # generate the source data data_target <- data_generation(K = 1, outlier_K = 0, simulation_no = "MTL-1", h_w = 0.1, h_mu = 1, n = 50) # generate the target data fit_mtl <- mtlgmm(x = data_list_source$data$x, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa = 1/3, initial_method = "EM", trim = 0.1, lambda_choice = "fixed", step_size = "lipschitz") fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, C1_w = 0.05, C1_mu = 0.2, C1_beta = 0.2, C2_w = 0.05, C2_mu = 0.2, C2_beta = 0.2, kappa0 = 1/3, initial_method = "EM", ncores = 1, lambda_choice = "fixed", step_size = "lipschitz") # use cross-validation to choose the tuning parameters # warning: can be quite slow, large "ncores" input is suggested!! fit_tl <- tlgmm(x = data_target$data$x[[1]], fitted_bar = fit_mtl, kappa0 = 1/3, initial_method = "EM", ncores = 2, lambda_choice = "cv", step_size = "lipschitz")