Mass Univariate Linear Models (no overlap correction)
In this notebook we will fit regression models to simulated EEG data. We will see that we need some type of overlap correction, as the events are close in time to each other, so that the respective brain responses overlap. If you want more detailed introduction to this topic check out our paper.
Setting up & loading the data
using DataFrames
using Unfold
using UnfoldMakie, CairoMakie # for plotting
using UnfoldSim
Load Data
We'll start with some predefined simulated continuos EEG data. We have 2000 events, 1 channel and one condition with two levels
data, evts = UnfoldSim.predef_eeg()
Inspection
The data has only little noise. The underlying signal pattern is a positive-negative-positive spike.
times_cont = range(0,length=200,step=1/100) # we simulated with 100hz for 0.5 seconds
f,ax,h = plot(times_cont,data[1:200])
vlines!(evts[evts.latency .<= 200, :latency] ./ 100;color=:black) # show events, latency in samples!
ax.xlabel = "time [s]"
ax.ylabel = "voltage [ยตV]"
f

To inspect the event dataframe we use
show(first(evts, 6), allcols = true)
6ร3 DataFrame
Row โ continuous condition latency
โ Float64 String Int64
โโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
1 โ 2.77778 car 62
2 โ -5.0 face 132
3 โ -1.66667 car 196
4 โ -5.0 car 249
5 โ 5.0 car 303
6 โ -0.555556 car 366
Every row is an experimental event. Note that :latency
refers to time in samples, (in BIDS-specification, :onset
would typically refer to seconds).
Traditional Mass Univariate Analysis
To perform a mass univariate analysis, you must complete the following steps:
- Split data into epochs
- Specify a formula
- Fit a linear model to each time point & channel
- Visualize the results.
1. Split data into epochs
Initially, you have data with a duration that represents the whole experimental trial. You need to cut the data into small regular epochs related to the some event, e.g. start of fixation.
# Unfold supports multi-channel, so we could provide matrix ch x time, which we can create like this from a vector:
data_r = reshape(data, (1,:))
# cut the data into epochs
data_epochs, times = Unfold.epoch(data = data, tbl = evts, ฯ = (-0.4, 0.8), sfreq = 100); # channel x timesteps x trials
size(data_epochs)
(1, 121, 2000)
ฯ
specifies the epoch size.sfreq
- sampling rate, convertsฯ
to samples.
typeof(data_epochs)
Array{Union{Missing, Float64}, 3}
In julia, missing
is supported throughout the ecosystem. Thus, we can have partial trials and they will be incorporated / ignored at the respective functions. Helpful functions are the julia-base disallowmissing
and the internal Unfold.drop_missing_epochs
functions
2. Specify a formula
Define a formula to be applied to each time point (and each channel) relative to the event. condition
and continuous
are the names of the event-describing columns in evts
that we want to use for modelling.
f = @formula 0 ~ 1 + condition + continuous # note the formulas left side is `0 ~ ` for technical reasons`
3. Fit a linear model to each time point & channel
Fit the "UnfoldModel
" (the fit
syntax is used throughout the Julia ecosystem, with the first element indicating what kind of model to fit)
m = fit(UnfoldModel, f, evts, data_epochs, times);
โ Warning: Missings in data - we remove any trial from data and designmatrix
โ @ Unfold ~/work/Unfold.jl/Unfold.jl/src/solver/prepare.jl:19
Alternative way to call this model is below. This syntax allows you to fit multiple events at once. For example, replacing Any
with :fixation =>...
will fit this model specifically to the fixation event type.
m = fit(UnfoldModel, [Any=>(f, times)], evts, data_epochs);
โ Warning: Missings in data - we remove any trial from data and designmatrix
โ @ Unfold ~/work/Unfold.jl/Unfold.jl/src/solver/prepare.jl:19
Inspect the fitted model:
m
Note these functions to discover the model: design
, designmatrix
, modelfit
and most importantly, coeftable
.
There are of course further methods, e.g. `coef`, `ranef`, `Unfold.formula`, `modelmatrix` which might be helpful at some point, but not important now.
Using coeftable
, we can get a tidy DataFrames, very useful for your further analysis.
first(coeftable(m), 6)
Row | channel | coefname | estimate | eventname | group | stderror | time |
---|---|---|---|---|---|---|---|
Int64 | String | Float64 | DataType | Nothing | Nothing | Float64 | |
1 | 1 | (Intercept) | 0.328882 | Any | -0.4 | ||
2 | 1 | (Intercept) | 0.462337 | Any | -0.39 | ||
3 | 1 | (Intercept) | 0.694753 | Any | -0.38 | ||
4 | 1 | (Intercept) | 1.04318 | Any | -0.37 | ||
5 | 1 | (Intercept) | 1.47384 | Any | -0.36 | ||
6 | 1 | (Intercept) | 1.9311 | Any | -0.35 |
4. Visualize the results
Tidy DataFrames are easy to visualize using e.g. AlgebraOfGraphics.jl. Function plot_erp
from UnfoldMakie
makes it even easier.
results = coeftable(m)
plot_erp(results)

As you can see, there is a lot going on, even in the baseline period! This is because the signal was simulated with overlapping events. In the next tutorial you will learn how to fix this.