P-values for mixedModels · Unfold.jl Timeseries Analysis & Deconvolution

How To get P-Values for Mass-Univariate LMM

There are currently two ways to obtain p-values for LMMs: Wald's t-test and likelihood ratio tests (mass univariate only).

Setup

using MixedModels, Unfold # we require to load MixedModels to load the PackageExtension
using DataFrames
using UnfoldSim
using CairoMakie
data_epoch, evts =
    UnfoldSim.predef_2x2(; n_items = 52, n_subjects = 40, return_epoched = true)
data_epoch = reshape(data_epoch, size(data_epoch, 1), :) #
times = range(0, 1, length = size(data_epoch, 1))

0.0:0.010101010101010102:1.0

Define f0 & f1 and fit!

f0 = @formula 0 ~ 1 + A + (1 + A | subject);
f1 = @formula 0 ~ 1 + A + B + (1 + A | subject); # could also differ in random effects

m0 = fit(UnfoldModel,[Any=>(f0,times)],evts,data_epoch);
m1 = fit(UnfoldModel,[Any=>(f1,times)],evts,data_epoch);

Likelihood ratio

uf_lrt = likelihoodratiotest(m0, m1)
uf_lrt[1]

	model-dof	deviance	χ²	χ²-dof	P(>χ²)
[38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m+[39m ([38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m\|[39m subject)	6	6189
[38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m+[39m B [38;2;239;83;80m+[39m ([38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m\|[39m subject)	7	6188	0	1	0.5115

As you can see, we have some likelihood ratio outcomes, exciting!

Extract p-values

pvalues(uf_lrt)

100-element Vector{Vector{Float64}}:
 [0.511455717458945]
 [0.637941796064881]
 [0.5540668244816294]
 [0.5502097524931825]
 [0.362546264884788]
 [0.35494238658866406]
 [0.29035542852494334]
 [0.1782510271318645]
 [0.08368531129824638]
 [0.03072016496904737]
 ⋮
 [0.17770403001354876]
 [0.33688149917579724]
 [0.4664701704477222]
 [0.5260221394554021]
 [0.6236465364380881]
 [0.9176409352787988]
 [0.9845498321443239]
 [0.9090829816880928]
 [0.9105533234565882]

We have extracted the p-values and now need to make them usable. The solution can be found in the documentation under ?pvalues.

pvals_lrt = vcat(pvalues(uf_lrt)...)
nchan = 1
ntime = length(times)
reshape(pvals_lrt, ntime, nchan)' # note the last transpose via ' !

1×100 adjoint(::Matrix{Float64}) with eltype Float64:
 0.511456  0.637942  0.554067  0.55021  …  0.98455  0.909083  0.910553

Perfecto, these are the LRT p-values of a model condA vs. condA+condB with same random effect structure.

This method is easier to calculate but has limitations in accuracy and scope. It may also be less accurate due to the liberal estimation of degrees of freedom. Testing is limited in this case, as random effects cannot be tested and only single predictors can be used, which may not be appropriate for spline effects. It is important to note that this discussion is beyond the scope of this LMM package.

res = coeftable(m1)
# only fixed effects: what is not in a ranef group is a fixef.
res = res[isnothing.(res.group), :]
# calculate t-value
res[:, :tvalue] = res.estimate ./ res.stderror

300-element Vector{Float64}:
  8.753118942611785
  8.822489307544313
  8.78980028648741
  8.846062744090224
  8.838764488011073
  8.864859807828397
  8.791044719802002
  8.699423870284289
  8.707512203323395
  8.67515737784441
  ⋮
  1.3351971683803914
  0.9511374554947897
  0.7212681368883016
  0.628063090859258
  0.4859472324613865
  0.10240562623839347
  0.019175819533040064
 -0.11308577669944529
  0.11125437202896354

We obtained Walds t, but how to translate them to a p-value?

Determining the necessary degrees of freedom for the t-distribution is a complex issue with much debate surrounding it. One approach is to use the number of subjects as an upper bound for the p-value (your df will be between $n_{subject}$ and $\sum{n_{trials}}$).

df = length(unique(evts.subject))

Plug it into the t-distribution.

using Distributions
res.pvalue = pdf.(TDist(df),res.tvalue)

300-element Vector{Float64}:
 1.1797789286642877e-10
 9.532416386007859e-11
 1.0539217380332435e-10
 8.867261063748567e-11
 9.06801277015405e-11
 8.370671134551291e-11
 1.0498987308942673e-10
 1.3919798969210152e-10
 1.35769976519841e-10
 1.5001726460588368e-10
 ⋮
 0.1621760047001079
 0.25065891658087713
 0.30419518984628147
 0.32421124621818975
 0.35139082706874725
 0.3943321435847846
 0.3963822188121128
 0.3938674346679318
 0.39395035182029153

Comparison of methods

Cool! Let's compare both methods of p-value calculation!

df = DataFrame(:walds => res[res.coefname.=="B: b_tiny", :pvalue], :lrt => pvals_lrt)
f = Figure()

scatter(f[1,1],times,res[res.coefname .== "B: b_tiny",:estimate],axis=(;xlabel="time",title="coef: B:b_tiny"))
scatter(f[1,2],df.walds,df.lrt,axis=(;title="walds-t pvalue",ylabel="LRT pvalue"))
scatter(f[2,1],times,df.walds,axis=(;title="walds-t pvalue",xlabel="time"))
scatter(f[2,2],times,df.lrt,axis=(;title="lrt pvalue",xlabel="time"))

f

Look pretty similar! Note that the Walds-T is typically too liberal (LRT also, but to a lesser exted). Best is to use the forthcoming MixedModelsPermutations.jl or go the route via R and use KenwardRoger (data not yet published)