How To get P-Values for Mass-Univariate LMM
There are currently two ways to obtain p-values for LMMs: Wald's t-test and likelihood ratio tests (mass univariate only).
Setup
using MixedModels, Unfold # we require to load MixedModels to load the PackageExtension
using DataFrames
using UnfoldSim
using CairoMakie
data_epoch, evts =
UnfoldSim.predef_2x2(; n_items = 52, n_subjects = 40, return_epoched = true)
data_epoch = reshape(data_epoch, size(data_epoch, 1), :) #
times = range(0, 1, length = size(data_epoch, 1))
0.0:0.010101010101010102:1.0
Define f0 & f1 and fit!
f0 = @formula 0 ~ 1 + A + (1 + A | subject);
f1 = @formula 0 ~ 1 + A + B + (1 + A | subject); # could also differ in random effects
m0 = fit(UnfoldModel,[Any=>(f0,times)],evts,data_epoch);
m1 = fit(UnfoldModel,[Any=>(f1,times)],evts,data_epoch);
Likelihood ratio
uf_lrt = likelihoodratiotest(m0, m1)
uf_lrt[1]
model-dof | deviance | χ² | χ²-dof | P(>χ²) | |
---|---|---|---|---|---|
[38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m+[39m ([38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m|[39m subject) | 6 | 6189 | |||
[38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m+[39m B [38;2;239;83;80m+[39m ([38;2;144;202;249m1[39m [38;2;239;83;80m+[39m A [38;2;239;83;80m|[39m subject) | 7 | 6188 | 0 | 1 | 0.5115 |
As you can see, we have some likelihood ratio outcomes, exciting!
Extract p-values
pvalues(uf_lrt)
100-element Vector{Vector{Float64}}:
[0.511455717458945]
[0.637941796064881]
[0.5540668244816294]
[0.5502097524931825]
[0.362546264884788]
[0.35494238658866406]
[0.29035542852494334]
[0.1782510271318645]
[0.08368531129824638]
[0.03072016496904737]
⋮
[0.17770403001354876]
[0.33688149917579724]
[0.4664701704477222]
[0.5260221394554021]
[0.6236465364380881]
[0.9176409352787988]
[0.9845498321443239]
[0.9090829816880928]
[0.9105533234565882]
We have extracted the p-values and now need to make them usable. The solution can be found in the documentation under ?pvalues
.
pvals_lrt = vcat(pvalues(uf_lrt)...)
nchan = 1
ntime = length(times)
reshape(pvals_lrt, ntime, nchan)' # note the last transpose via ' !
1×100 adjoint(::Matrix{Float64}) with eltype Float64:
0.511456 0.637942 0.554067 0.55021 … 0.98455 0.909083 0.910553
Perfecto, these are the LRT p-values of a model condA
vs. condA+condB
with same random effect structure.
Walds T-Test
This method is easier to calculate but has limitations in accuracy and scope. It may also be less accurate due to the liberal estimation of degrees of freedom. Testing is limited in this case, as random effects cannot be tested and only single predictors can be used, which may not be appropriate for spline effects. It is important to note that this discussion is beyond the scope of this LMM package.
res = coeftable(m1)
# only fixed effects: what is not in a ranef group is a fixef.
res = res[isnothing.(res.group), :]
# calculate t-value
res[:, :tvalue] = res.estimate ./ res.stderror
300-element Vector{Float64}:
8.753118942611785
8.822489307544313
8.78980028648741
8.846062744090224
8.838764488011073
8.864859807828397
8.791044719802002
8.699423870284289
8.707512203323395
8.67515737784441
⋮
1.3351971683803914
0.9511374554947897
0.7212681368883016
0.628063090859258
0.4859472324613865
0.10240562623839347
0.019175819533040064
-0.11308577669944529
0.11125437202896354
We obtained Walds t, but how to translate them to a p-value?
Determining the necessary degrees of freedom for the t-distribution is a complex issue with much debate surrounding it. One approach is to use the number of subjects as an upper bound for the p-value (your df will be between $n_{subject}$ and $\sum{n_{trials}}$).
df = length(unique(evts.subject))
40
Plug it into the t-distribution.
using Distributions
res.pvalue = pdf.(TDist(df),res.tvalue)
300-element Vector{Float64}:
1.1797789286642877e-10
9.532416386007859e-11
1.0539217380332435e-10
8.867261063748567e-11
9.06801277015405e-11
8.370671134551291e-11
1.0498987308942673e-10
1.3919798969210152e-10
1.35769976519841e-10
1.5001726460588368e-10
⋮
0.1621760047001079
0.25065891658087713
0.30419518984628147
0.32421124621818975
0.35139082706874725
0.3943321435847846
0.3963822188121128
0.3938674346679318
0.39395035182029153
Comparison of methods
Cool! Let's compare both methods of p-value calculation!
df = DataFrame(:walds => res[res.coefname.=="B: b_tiny", :pvalue], :lrt => pvals_lrt)
f = Figure()
scatter(f[1,1],times,res[res.coefname .== "B: b_tiny",:estimate],axis=(;xlabel="time",title="coef: B:b_tiny"))
scatter(f[1,2],df.walds,df.lrt,axis=(;title="walds-t pvalue",ylabel="LRT pvalue"))
scatter(f[2,1],times,df.walds,axis=(;title="walds-t pvalue",xlabel="time"))
scatter(f[2,2],times,df.lrt,axis=(;title="lrt pvalue",xlabel="time"))
f
Look pretty similar! Note that the Walds-T is typically too liberal (LRT also, but to a lesser exted). Best is to use the forthcoming MixedModelsPermutations.jl or go the route via R and use KenwardRoger (data not yet published)