BINLESS STRATEGIES FOR ESTIMATION OF INFORMATION IN SPIKE TRAINS

Jonathan D. Victor
Department of Neurology and Neuroscience
Weill Medical College of Cornell University
1300 York Avenue
New York, NY 10021

The amount of information contained in a spike train is an important quantity for understanding neural coding. However, as is becoming increasingly appreciated, estimation of this quantity from empirical data can be fraught with difficulties. Estimation of information in spike trains generally consists of several steps: (i) embedding spiketrains into a space, (ii) clustering simlar spike trains into groups, (iii) using a "plug-in" estimate for transmitted information based on how these groups relate to the stimuli, and (iv) estimating biases due to small sample size. Traditional approaches use binning as part of the embedding stage (i), with each bin corresponding to a separate dimesion. Bins that are too wide lead to underestimates of information since temporal detail is lost, while binds that are too narrow lead to biases associated with extreme undersampling. This difficulty is especially critical when spike trains are not Poisson, since the effect of the deviations from Poisson-ness on information content may only be appreciated when the analysis method is sensitive to precise timing (the small-bin regime). Metric space methods (Victor and Purpura 1997) avoid the binning problem, but still may under estimate information due to the clustering at stage (ii). Jackknife estimators at stage (iv) can be superior to the standard (Treves and Panzeri 1995) bias correction, but no bias correction is effective when the amount of data is limited.

We present an alternative that bypasses the difficulties associated with binning and clustering. The approach applies to datasets that are repeated measures of responses to a small number of discrete stimuli, rather than responses to a continuous "rich" stimulus (e.g., a pseudorandom noise). The idea rests on a little-known asymptotically unbiased "binless" estimator of differential entropy (Kozachenko and Leonenko 1987). This estimator can be applied to spike trains, provided that the spike trains are embedded in low-dimensional Euclidean spaces. To preserve the advantages of the binless estimator, we use linear, continuous embedding of spike trains, rather than embeddings based on discretization of time (binning). Information is then estimated from the difference between the entropy of the set of all spike trains, and the entropies of the spike trains elicited by each stimulus. Calculations on simulated data show that this approach can be dramatically more efficient than standard bin-based approaches, both for Poisson and non-Poisson data.