This function fits a semi-supervised mixture model. It simultaneously estimates two mixture components, and assigns the unlabelled observations to these.

mixtura(y, z, dist = "norm",
        phi = NULL, pi = NULL, gamma = NULL,
        test = NULL, iter = 100, kind = 0.05,
        debug = TRUE, ...)

Arguments

y

observations: numeric vector of length n

z

class labels: integer vector of length n, with entries 0, 1 and NA

dist

distributional assumption: character "norm" (Gaussian), "nbinom" (negative bionomial), or "zinb" (zero-inflated negative binomial)

phi

dispersion parameters: numeric vector of length q, or NULL

pi

zero-inflation parameter(s): numeric vector of length q, or NULL

gamma

offset: numeric vector of length n, or NULL

test

resampling procedure: character "perm" (permutation) or "boot" (parametric bootstrap), or NULL

iter

(maximum) number of resampling iterations : positive integer, or NULL

kind

resampling accuracy: numeric between 0 and 1, or NULL; all p-values above kind are approximate

debug

verification of arguments: TRUE or FALSE

...

settings EM algorithm: starts, it.em and epsilon (see arguments)

Value

This function fits and compares a one-component (H0) and a two-component (H1) mixture model.

posterior

probability of belonging to class 1: numeric vector of length n

converge

path of the log-likelihood: numeric vector with maximum length it.em

estim0

parameter estimates under H0: data frame

estim1

parameter estimates under H1: data frame

loglik0

log-likelihood under H0: numeric

loglik1

log-likelihood under H1: numeric

lrts

likelihood-ratio test statistic: positive numeric

p.value

H0 versus H1: numeric between 0 and 1, or NULL

Details

By default, phi and pi are estimated by the maximum likelihood method, and gamma is replaced by a vector of ones.

Reference

A Rauschenberger, RX Menezes, MA van de Wiel, NM van Schoor, and MA Jonker (2020). "Semi-supervised mixture test for detecting markers associated with a quantitative trait", Manuscript in preparation.

See also

Use scrutor for hypothesis testing. All other functions are internal.

Examples

# data simulation n <- 100 z <- rep(0:1,each=n/2) y <- rnorm(n=n,mean=2,sd=1) z[(n/4):n] <- NA # model fitting mixtura(y,z,dist="norm",test="perm")
#> $posterior #> [1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 #> [8] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 #> [15] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 #> [22] 0.0000000 0.0000000 0.0000000 0.5847103 0.5124022 0.6710150 0.7145550 #> [29] 0.6353941 0.5111437 0.5245252 0.6564112 0.5513211 0.6156385 0.7259136 #> [36] 0.5135151 0.6892393 0.5938403 0.3489121 0.5357646 0.5921529 0.5082185 #> [43] 0.3033514 0.4582151 0.4585708 0.5627931 0.5788645 0.3543564 0.5564960 #> [50] 0.6498360 0.5270052 0.4829803 0.5348777 0.5795999 0.5069320 0.6916956 #> [57] 0.7726151 0.4988517 0.5904737 0.5509167 0.6037588 0.5191042 0.5314533 #> [64] 0.5233184 0.6672887 0.6109655 0.5894274 0.6037971 0.5958933 0.5769878 #> [71] 0.4481745 0.4886986 0.4535777 0.6025104 0.6098765 0.4845336 0.6961400 #> [78] 0.4827533 0.4660534 0.4627152 0.5541518 0.6367138 0.5137884 0.3828986 #> [85] 0.4508566 0.5926182 0.6911656 0.7089736 0.5254296 0.4207828 0.5654328 #> [92] 0.6932749 0.5066595 0.4976020 0.6301265 0.4756116 0.6041247 0.5584574 #> [99] 0.5494746 0.5086801 #> #> $converge #> [1] -135.2916 -134.9897 -134.8180 -134.7161 -134.6553 -134.6191 -134.5977 #> [8] -134.5850 -134.5776 -134.5731 -134.5704 -134.5688 -134.5678 -134.5672 #> [15] -134.5669 -134.5666 -134.5665 -134.5664 #> #> $estim0 #> p0 mean0 sd0 p1 mean1 sd1 #> 1 1 1.918567 0.933193 0 NaN NaN #> #> $estim1 #> p0 mean0 sd0 p1 mean1 sd1 #> 1 0.4445477 2.064563 0.9185471 0.5554523 1.718718 0.9173225 #> #> $loglik0 #> [1] -134.9813 #> #> $loglik1 #> [1] -134.5664 #> #> $lrts #> [1] 0.8297895 #> #> $p.value #> [1] 0.6666667 #>