When presented with complex rhythmic auditory stimuli, humans are able to track underlying temporal structure (e.g., a "beat"), both covertly and with their movements. This capacity goes far beyond that of a simple entrained oscillator, drawing on contextual and enculturated timing expectations and adjusting rapidly to perturbations in event timing, phase, and tempo. Here we propose that the problem of rhythm tracking is most naturally characterized as a problem of continuously estimating an underlying phase and tempo based on precise event times and their correspondence to timing expectations. We formalize this problem as a case of inferring a distribution on a hidden state from point process data in continuous time: either Phase Inference from Point Process Event Timing (PIPPET) or Phase And Tempo Inference (PATIPPET). This approach to rhythm tracking generalizes to non-isochronous and multi-voice rhythms. We demonstrate that these inference problems can be approximately solved using a variational Bayesian method that generalizes the Kalman-Bucy filter to point-process data. These solutions reproduce multiple characteristics of overt and covert human rhythm tracking, including period-dependent phase corrections, illusory contraction of unexpectedly empty intervals, and failure to track excessively syncopated rhythms, and could could be plausibly approximated in the brain. PIPPET can serve as the basis for models of performance on a wide range of timing and entrainment tasks and opens the door to even richer predictive processing and active inference models of rhythmic timing.
bioRxiv Subject Collection: Neuroscience