Benchmarks

We ran benchmarks on 2024-11-07 as described in ./benchmark/cuda/solver_comparison.jl. Given that some were run on a GPU, we cannot run them on continuous-integration online.

Important
  • Allocations are only CPU allocations - GPU allocations were not counted.
  • Solvers other than default_multi are currently NOT multi-threaded
  • Solvers other than default_multi and krylov_gpu solve $X'Xb = X'y$ instead of $Xb=y$ directly. They are likely less accurate, but should be faster for multi-channel data, as we can precalulate cholesky, qr or similar & the to-be-inverted matrix is much smaller.

Small Model

n_channels = 1,
sfreq = 10,
n_splines = 4,
n_repeats = 10;
gpumethodel_typetimeGBpercent_X_filledsizeDesignn_channelsoverlapcomment
truecholeskyFloat640.068(1190, 130)1(0.2, 0.2)PosDefException(-1)
falsecholeskyFloat640.000560.000690.068(1190, 130)1(0.2, 0.2)
trueinternFloat640.000880.000170.068(1190, 130)1(0.2, 0.2)
falseinternFloat640.00110.000690.068(1190, 130)1(0.2, 0.2)
trueqrFloat640.00130.000190.068(1190, 130)1(0.2, 0.2)
falsecgFloat640.00150.000570.068(1190, 130)1(0.2, 0.2)
falsedefault_multiFloat640.00170.000160.068(1190, 130)1(0.2, 0.2)
falseqrFloat640.0020.000760.068(1190, 130)1(0.2, 0.2)
truecgFloat640.00540.000560.068(1190, 130)1(0.2, 0.2)
truepinvFloat640.00540.000320.068(1190, 130)1(0.2, 0.2)
falsepinvFloat640.0160.00160.068(1190, 130)1(0.2, 0.2)
truekrylov_gpuFloat640.0320.00130.068(1190, 130)1(0.2, 0.2)

small-to-midsize: multi-channel

n_channels = 128,
sfreq = 100,
n_splines = 4,
n_repeats = 200;

Float64

gpumethodel_typetimeGBpercent_X_filledsizeDesignn_channelsoverlapcomment
truecholeskyFloat640.0068(239522, 1210)128(0.2, 0.2)PosDefException(-1)
trueqrFloat640.380.250.0068(239522, 1210)128(0.2, 0.2)
truepinvFloat640.420.260.0068(239522, 1210)128(0.2, 0.2)
trueinternFloat640.70.250.0068(239522, 1210)128(0.2, 0.2)
truecgFloat641.20.320.0068(239522, 1210)128(0.2, 0.2)
falsecholeskyFloat641.50.310.0068(239522, 1210)128(0.2, 0.2)
falseqrFloat641.70.310.0068(239522, 1210)128(0.2, 0.2)
falsecgFloat642.00.30.0068(239522, 1210)128(0.2, 0.2)
falsepinvFloat642.10.380.0068(239522, 1210)128(0.2, 0.2)
truekrylov_gpuFloat645.90.40.0068(239522, 1210)128(0.2, 0.2)
falsedefault_multiFloat6413.01.20.0068(239522, 1210)128(0.2, 0.2)
falseinternFloat6413.01.70.0068(239522, 1210)128(0.2, 0.2)

Float32

gpumethodel_typetimeGBpercent_X_filledsizeDesignn_channelsoverlapcomment
truecholeskyFloat320.0068(239522, 1210)128(0.2, 0.2)PosDefException(-1)
truekrylov_gpuFloat320.0068(239522, 1210)128(0.2, 0.2)
truepinvFloat320.390.250.0068(239522, 1210)128(0.2, 0.2)
trueqrFloat320.620.240.0068(239522, 1210)128(0.2, 0.2)
trueinternFloat320.690.240.0068(239522, 1210)128(0.2, 0.2)
truecgFloat321.20.310.0068(239522, 1210)128(0.2, 0.2)
falsecholeskyFloat321.20.170.0068(239522, 1210)128(0.2, 0.2)
falsecgFloat321.30.160.0068(239522, 1210)128(0.2, 0.2)
falseqrFloat321.40.170.0068(239522, 1210)128(0.2, 0.2)
falsepinvFloat321.60.210.0068(239522, 1210)128(0.2, 0.2)
falseinternFloat3213.00.860.0068(239522, 1210)128(0.2, 0.2)
falsedefault_multiFloat3213.00.970.0068(239522, 1210)128(0.2, 0.2)

large, realistic model

    n_channels = 128,
    sfreq = 500,
    n_splines = (4, 4),
    n_repeats = 500,
gpumethodel_typetimeGBpercent_X_filledsizeDesignn_channelsoverlapcomment
truecholeskyFloat640.0015(3001479, 9616)128(0.2, 0.2)PosDefException(-1)
falsecholeskyFloat640.0015(3001479, 9616)128(0.2, 0.2)PosDefException(2760)
falseinternFloat640.0015(3001479, 9616)128(0.2, 0.2)SingularException(9599)
truecgFloat649.33.60.0015(3001479, 9616)128(0.2, 0.2)
trueqrFloat6411.03.50.0015(3001479, 9616)128(0.2, 0.2)
trueinternFloat6413.03.50.0015(3001479, 9616)128(0.2, 0.2)
falseqrFloat6480.06.30.0015(3001479, 9616)128(0.2, 0.2)
truepinvFloat6480.04.20.0015(3001479, 9616)128(0.2, 0.2)
truekrylov_gpuFloat64107.03.90.0015(3001479, 9616)128(0.2, 0.2)
falsedefault_multiFloat64500.015.00.0015(3001479, 9616)128(0.2, 0.2)
falsepinvFloat64520.011.00.0015(3001479, 9616)128(0.2, 0.2)
falsecgFloat64939.05.70.0015(3001479, 9616)128(0.2, 0.2)