Cross-validated MANOVA (cvMANOVA)

This tutorial demonstrates how to perform cross-validated MANOVA on a two-level factor.

Setup

using Unfold
using UnfoldSim
using UnfoldStats
using StatsModels
using Random
using Statistics
using CairoMakie
using UnfoldMakie
using DataFrames

Set seed for reproducible cross-validation folds.

Random.seed!(42)

Generate two-class EEG data

Generate simulated multichannel EEG data with two conditions (classes).

eeg, evts =
    UnfoldSim.predef_eeg(; return_epoched = true, multichannel = true, noiselevel = 1.0)
fake_times = 1:size(eeg, 2)

Please cite: HArtMuT: Harmening Nils, Klug Marius, Gramann Klaus and Miklody Daniel - 10.1088/1741-2552/aca8ce

Check the event structure with condition variable

first(evts, 5)

5×3 DataFrame

Row	continuous	condition	latency
	Float64	String	Int64
1	2.77778	car	62
2	-5.0	face	132
3	-1.66667	car	196
4	-5.0	car	249
5	5.0	car	303

Fit an Unfold-Model

For cvMANOVA we need a overspecified designmatrix. Thus instead of using an intercept, we have one column for each level of any categorical predictor we want to use.

f = @formula 0 ~ 0 + condition
contrasts = Dict(:condition => StatsModels.FullDummyCoding())

m = fit(UnfoldModel, f, evts, eeg, fake_times; contrasts = contrasts, fit = false)

Unfold[38;2;239;83;80m-[39mType: [38;2;206;147;216m::Unfold.UnfoldLinearModel[39m{{Float64}} [1m Any[22m [38;2;239;83;80m=[39m[38;2;239;83;80m>[39m [38;2;144;202;249m0[39m [38;2;239;83;80m+[39m condition [31m❌[39m model not fit. Use fit[38;2;239;83;80m![39m(uf,data) to fit it. Useful functions: [38;2;255;238;88m`design(uf)`[39m, [38;2;255;238;88m`designmatrix(uf)`[39m, [38;2;255;238;88m`coef(uf)`[39m, [38;2;255;238;88m`coeftable(uf)`[39m

As you can see, we have a binary designmatrix now. This is later important to define the contrasts.

modelmatrix(m)[1][1:5, :]

5×2 Matrix{Float64}:
 1.0  0.0
 0.0  1.0
 1.0  0.0
 1.0  0.0
 1.0  0.0

Fit with k-fold cross-validation

We use the cross-validation solver, and importantly, we directly fit the test-set Β estimates as well

cv_solver = Unfold.solver_cv(n_folds = 5, shuffle = true, fit_test = true)
fit!(m, eeg; solver = cv_solver) # note: we could have run the fit(...;solver=...) directly

Run cvMANOVA

Define a two-level contrast to compare the two classes: [-1, 1] This means: (class 2) - (class 1)

C = [-1, 1]

2-element Vector{Int64}:
 -1
  1

We use only the "baseline" period (first 10 time samples) to estimate a noise covariance.

Y_baseline = eeg[:, 1:10, :]

Compute cvMANOVA's D for each timepoint

D_per_fold = cvMANOVA(m, Y_baseline; C = C)

5-element Vector{Vector{Float64}}:
 [0.027764980774131687, -0.2546290504159589, -0.24916255993144862, 0.19109306235976473, 0.08125641863393376, 0.047149431331211404, -0.07127463856809141, -0.0424445617953821, 0.036007711927381836, -0.14097786992346634  …  -0.3045294245836672, 0.0010104390135587732, -0.06389238983220265, 0.20715023826012918, -0.16748790182220147, 0.041758021115041365, 0.02133607821357223, 0.008909666709274745, -0.11532777191507937, 0.05890502175583695]
 [-0.15980178425853017, 0.0567619471091535, -0.13829405093931982, -0.08833080637488536, -0.3772458352800997, -0.10523779272367637, 0.3751074078513164, 0.36816568014750845, -0.3422369087877687, -0.08246093227027577  …  -0.49210556489171203, -0.4466499863420076, -0.6995368476691959, 0.048525762107107066, -0.19821026719850826, 0.04319158674313825, -0.10107023935738152, 0.08274159863244354, 0.22537898104739343, -0.22603420516239028]
 [0.22540145150017704, 0.19552162538612966, -0.33067070328062426, -0.11426298416186839, -0.22144710964976785, -0.200703440529621, 0.07636985531337431, 0.15435807266655927, -0.06007567495241057, -0.1716224756186039  …  -1.6450892306119074, -0.7426585544833768, -0.6752736647900289, -0.4326952390982113, -0.2826195531644225, -0.17356810994284125, 0.11417251568977872, 0.1262187754813273, 0.09496638119599776, 0.05913056434454566]
 [0.1926049836375232, 0.022495399970289317, -0.1186917295229486, 0.013546452580904386, 0.060913869518597286, 0.19640103528957148, -0.16732198777517585, 0.13487161034038553, -0.13221882228900858, -0.3298635120578531  …  0.00996087261523129, -0.08363913685303505, -0.03968250035538556, -0.3701357264346371, 0.022715106203780606, -0.2984585994136954, 0.031856292141955636, -0.4007169165079941, 0.35607551804292314, -0.4516968241504468]
 [-0.22472525328764564, -0.09821601861110911, -0.20736883169193687, 0.14774335825157445, -0.15315134855775078, 0.2743354028522875, -0.2856923668712996, -0.1736102452056607, -0.12945606450584815, -0.06319524951776864  …  -1.4564279213231213, -0.5549322936228176, -0.9736331073427926, -0.49393253744025883, -0.0745937089558469, -0.26568060849691766, 0.2317535457478839, -0.16344529718626588, 0.1933543144862743, 0.018084313523514618]

Aggregate the discriminability statistic across all CV folds.

D_mean = mean(D_per_fold)

This is the time-resolved discriminability: higher values = better discrimination

f, ax, h = series(reduce(hcat, D_per_fold)', linestyle = :dot)
lines!(ax, D_mean, color = :black)

plot_erp!(f[2, 1], subset(coeftable(m), :channel => (x -> x .== 1)))
f

This page was generated using Literate.jl.