Mass Univariate Linear Models (no overlap correction)

In this notebook we will fit regression models to simulated EEG data. We will see that we need some type of overlap correction, as the events are close in time to each other, so that the respective brain responses overlap. If you want more detailed introduction to this topic check out our paper.

Setting up & loading the data

using DataFrames
using Unfold
using UnfoldMakie, CairoMakie # for plotting
using UnfoldSim

Load Data

We'll start with some predefined simulated continuos EEG data. We have 2000 events, 1 channel and one condition with two levels

data, evts = UnfoldSim.predef_eeg()

Inspection

The data has only little noise. The underlying signal pattern is a positive-negative-positive spike.

times_cont = range(0,length=200,step=1/100) # we simulated with 100hz for 0.5 seconds

f,ax,h = plot(times_cont,data[1:200])
vlines!(evts[evts.latency .<= 200, :latency] ./ 100;color=:black) # show events, latency in samples!
ax.xlabel = "time [s]"
ax.ylabel = "voltage [ยตV]"
f
Example block output

To inspect the event dataframe we use

show(first(evts, 6), allcols = true)
6ร—3 DataFrame
 Row โ”‚ continuous  condition  latency 
     โ”‚ Float64     String     Int64   
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚    5.0      car             64
   2 โ”‚   -2.77778  car            131
   3 โ”‚    1.66667  car            183
   4 โ”‚   -1.66667  face           233
   5 โ”‚    3.88889  car            284
   6 โ”‚   -3.88889  car            354

Every row is an experimental event. Note that :latency refers to time in samples, (in BIDS-specification, :onset would typically refer to seconds).

Traditional Mass Univariate Analysis

To perform a mass univariate analysis, you must complete the following steps:

  1. Split data into epochs
  2. Specify a formula
  3. Fit a linear model to each time point & channel
  4. Visualize the results.

1. Split data into epochs

Initially, you have data with a duration that represents the whole experimental trial. You need to cut the data into small regular epochs related to the some event, e.g. start of fixation.

# Unfold supports multi-channel, so we could provide matrix ch x time, which we can create like this from a vector:
data_r = reshape(data, (1,:))
# cut the data into epochs
data_epochs, times = Unfold.epoch(data = data, tbl = evts, ฯ„ = (-0.4, 0.8), sfreq = 100); # channel x timesteps x trials
size(data_epochs)
(1, 121, 2000)
  • ฯ„ specifies the epoch size.
  • sfreq - sampling rate, converts ฯ„ to samples.
typeof(data_epochs)
Array{Union{Missing, Float64}, 3}
Note

In julia, missing is supported throughout the ecosystem. Thus, we can have partial trials and they will be incorporated / ignored at the respective functions. Helpful functions are the julia-base disallowmissing and the internal Unfold.drop_missing_epochs functions

2. Specify a formula

Define a formula to be applied to each time point (and each channel) relative to the event. condition and continuous are the names of the event-describing columns in evts that we want to use for modelling.

f = @formula 0 ~ 1 + condition + continuous # note the formulas left side is `0 ~ ` for technical reasons`

3. Fit a linear model to each time point & channel

Fit the "UnfoldModel" (the fit syntax is used throughout the Julia ecosystem, with the first element indicating what kind of model to fit)

m = fit(UnfoldModel, f, evts, data_epochs, times);

Alternative way to call this model is below. This syntax allows you to fit multiple events at once. For example, replacing Any with :fixation =>... will fit this model specifically to the fixation event type.

m = fit(UnfoldModel, [Any=>(f, times)], evts, data_epochs);

Inspect the fitted model:

m

Note these functions to discover the model: design, designmatrix, modelfit and most importantly, coeftable.

Info
There are of course further methods, e.g. `coef`, `ranef`, `Unfold.formula`, `modelmatrix` which might be helpful at some point, but not important now.

Using coeftable, we can get a tidy DataFrames, very useful for your further analysis.

first(coeftable(m), 6)
6ร—7 DataFrame
Rowchannelcoefnameestimateeventnamegroupstderrortime
Int64StringFloat64DataTypeNothingFloat64Float64
11(Intercept)1.46643Any0.113649-0.4
21(Intercept)1.52096Any0.117307-0.39
31(Intercept)1.65619Any0.121052-0.38
41(Intercept)1.90642Any0.12281-0.37
51(Intercept)2.23058Any0.120375-0.36
61(Intercept)2.57112Any0.114245-0.35

4. Visualize the results

Tidy DataFrames are easy to visualize using e.g. AlgebraOfGraphics.jl. Function plot_erp from UnfoldMakiemakes it even easier.

results = coeftable(m)
plot_erp(results)
Example block output

As you can see, there is a lot going on, even in the baseline period! This is because the signal was simulated with overlapping events. In the next tutorial you will learn how to fix this.