Feature extraction and principal components

Hey everyone!

I had a question about extracting principal components (PCs) from waveforms on dense multi-electrode arrays. What is the typical procedure for this? Do you extract PCs from only the largest waveform per event or do you extract PCs from multiple channels at once and use those? Are there other strategies or details that people take into consideration when doing this feature extraction step?

Hi Cole.

(I hope it’s okay if I moved this post outside of the private category as I think it’s of general interest – and I’m trying to bootstrap the forum :slight_smile:)

The question of PC features for waveforms on multi electrode arrays is tricky in my mind. I haven’t studied the different sorters in detail w.r.t. this question… and I’ve always had the same question myself. People talk about PC features used in clustering as though they can all be put in the same space. But really there is more than one way to do it.

First, you could just compute per channel features. For example, 3 features per channel, and then you would have say 21 features if the waveform spanned 7 channels.

The way mountainsort (and others?) does it is to compute the features specific to a channel neighborhood by first flattening all the channels / timepoints into one big vector. But in this strategy you cannot compare the features across different neighborhoods… it just doesn’t work. And then there’s a question of what neighborhood is relevant for a particular cluster – and per spike neighborhood is even more difficult. So in the neighborhood strategy, there really is no such thing as PC features per event that is universal (across all events in the recording).

MountainSort has a multi-phase approach where a separate detection, feature extraction, and clustering is performed in each channel’s neighborhood. So if there are 64 channels, then 64 separate detect/feature/cluster procedures are performed (potentially in parallel). Then there is a way of removing duplicates (not merging!) and then a second phase of clustering. Because of the complexity I always need to pause when somebody asks “can you export the features used for clustering”. I simply don’t know how to answer it without describing the entire algorithm (a weakness of mine).

In summary my answer is “I don’t know”. And maybe somebody could help illuminate how the other sorters do it and whether it makes any sense to export PC features for large MEAs.

2 Likes

Very interesting! I am happy to have this conversation in the public forum. The MountainSort approach is quite interesting. What constitutes a neighbourhood in MountainSort? Is that the adjacency radius then? Does this separate detection and clustering step for each channel scale well to large systems, say in the 100s to 1000s of channels?

In HerdingSpikes, we try to explain each event with one spike and one estimated location. When we detect a spike, we first calculate a variant of center of mass and then we keep the largest spike for future feature extraction. We then use the spikes that we kept (up to some large number of spikes) to compute two PCs for each spike which are then whitened. We end up with a 4D vector to describe each spike which consists of: [x, y, alpha * PC1, alpha * PC2 ] where alpha is used to scale the principal components. We cluster this with a very fast, parallelized meanshift algorithm.

This approach scales quite well to big probes, but may suffer a bit in accuracy and potentially with overlapping spikes although we would need more tests to quantify this. Using the new localization method I created, I would imagine that the accuracy could increase quite a bit at the cost of speed during the training of the network (although inference is quite fast).

I can’t always understand how people use different waveform features when sorting dense MEAs so I am excited to hear different approaches.

Yes, the adjacency radius determines the neighborhood around each channel. Yes, it scales linearly (in theory) with number of channels. I think it can handle thousands of channels (in theory) although the present implementation may need to be optimized further for that case. We’ll see when we do more rigorous benchmarking on spikeforest.

1 Like