Model fitting

This function fits a semi-supervised mixture model. It simultaneously estimates two mixture components, and assigns the unlabelled observations to these.

mixtura(y, z, dist = "norm",
        phi = NULL, pi = NULL, gamma = NULL,
        test = NULL, iter = 100, kind = 0.05,
        debug = TRUE, ...)

Arguments

y	observations: numeric vector of length `n`
z	class labels: integer vector of length `n`, with entries `0`, `1` and `NA`
dist	distributional assumption: character `"norm"` (Gaussian), `"nbinom"` (negative bionomial), or `"zinb"` (zero-inflated negative binomial)
phi	dispersion parameters: numeric vector of length `q`, or `NULL`
pi	zero-inflation parameter(s): numeric vector of length `q`, or `NULL`
gamma	offset: numeric vector of length `n`, or `NULL`
test	resampling procedure: character `"perm"` (permutation) or `"boot"` (parametric bootstrap), or `NULL`
iter	(maximum) number of resampling iterations : positive integer, or `NULL`
kind	resampling accuracy: numeric between `0` and `1`, or `NULL`; all `p`-values above `kind` are approximate
debug	verification of arguments: `TRUE` or `FALSE`
...	settings `EM` algorithm: `starts`, `it.em` and `epsilon` (see `arguments`)

Value

This function fits and compares a one-component (H0) and a two-component (H1) mixture model.

posterior

probability of belonging to class 1: numeric vector of length n

converge

path of the log-likelihood: numeric vector with maximum length it.em

estim0

parameter estimates under H0: data frame

estim1

parameter estimates under H1: data frame

loglik0

log-likelihood under H0: numeric

loglik1

log-likelihood under H1: numeric

lrts

likelihood-ratio test statistic: positive numeric

p.value

H0 versus H1: numeric between 0 and 1, or NULL

Details

By default, phi and pi are estimated by the maximum likelihood method, and gamma is replaced by a vector of ones.

Reference

A Rauschenberger, RX Menezes, MA van de Wiel, NM van Schoor, and MA Jonker (2020). "Semi-supervised mixture test for detecting markers associated with a quantitative trait", Manuscript in preparation.

Examples

# data simulation
n <- 100
z <- rep(0:1,each=n/2)
y <- rnorm(n=n,mean=2,sd=1)
z[(n/4):n] <- NA

# model fitting
mixtura(y,z,dist="norm",test="perm")
#> $posterior
#>   [1] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#>   [8] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#>  [15] 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
#>  [22] 0.0000000 0.0000000 0.0000000 0.5847103 0.5124022 0.6710150 0.7145550
#>  [29] 0.6353941 0.5111437 0.5245252 0.6564112 0.5513211 0.6156385 0.7259136
#>  [36] 0.5135151 0.6892393 0.5938403 0.3489121 0.5357646 0.5921529 0.5082185
#>  [43] 0.3033514 0.4582151 0.4585708 0.5627931 0.5788645 0.3543564 0.5564960
#>  [50] 0.6498360 0.5270052 0.4829803 0.5348777 0.5795999 0.5069320 0.6916956
#>  [57] 0.7726151 0.4988517 0.5904737 0.5509167 0.6037588 0.5191042 0.5314533
#>  [64] 0.5233184 0.6672887 0.6109655 0.5894274 0.6037971 0.5958933 0.5769878
#>  [71] 0.4481745 0.4886986 0.4535777 0.6025104 0.6098765 0.4845336 0.6961400
#>  [78] 0.4827533 0.4660534 0.4627152 0.5541518 0.6367138 0.5137884 0.3828986
#>  [85] 0.4508566 0.5926182 0.6911656 0.7089736 0.5254296 0.4207828 0.5654328
#>  [92] 0.6932749 0.5066595 0.4976020 0.6301265 0.4756116 0.6041247 0.5584574
#>  [99] 0.5494746 0.5086801
#> 
#> $converge
#>  [1] -135.2916 -134.9897 -134.8180 -134.7161 -134.6553 -134.6191 -134.5977
#>  [8] -134.5850 -134.5776 -134.5731 -134.5704 -134.5688 -134.5678 -134.5672
#> [15] -134.5669 -134.5666 -134.5665 -134.5664
#> 
#> $estim0
#>   p0    mean0      sd0 p1 mean1 sd1
#> 1  1 1.918567 0.933193  0   NaN NaN
#> 
#> $estim1
#>          p0    mean0       sd0        p1    mean1       sd1
#> 1 0.4445477 2.064563 0.9185471 0.5554523 1.718718 0.9173225
#> 
#> $loglik0
#> [1] -134.9813
#> 
#> $loglik1
#> [1] -134.5664
#> 
#> $lrts
#> [1] 0.8297895
#> 
#> $p.value
#> [1] 0.6666667
#>

Arguments

Value

Details

Reference

See also

Examples

Contents