An accurate and reliable whole-brain segmentation is a key aspect of longitudinal neuroimaging studies. The ability to measure structural changes reliably is fundamental to detect confidently biological effects, especially when these affects are small. In this work, we undertake a thorough comparative analysis of reliability, bias, sensitivity to detect longitudinal change and diagnostic sensitivity to Alzheimer’s disease of two subcortical segmentation methods, Automatic Segmentation (ASEG) and Sequence Adaptive Multimodal Segmentation (SAMSEG). These are provided by the recently released version 7.1 of the open-source neuroimaging package FreeSurfer, with ASEG being the default segmentation method. First, we use a large sample of participants (n = 1629) distributed across the lifespan (age range = 4-93 years) to assess the within-session test-retest reliability in eight bilateral subcortical structures: amygdala, caudate, hippocampus, lateral ventricles, nucleus accumbens, pallidum, putamen and thalamus. We performed the analyses separately for a sub-sample scanned on a 1.5T Siemens Avanto (n = 774) and a sub-sample scanned on a 3T Siemens Skyra (n = 855). The absolute symmetrized percent differences across the lifespan indicated relatively constant reliability trajectories across age except for the younger children in the Avanto dataset for ASEG. Although both methods showed high reliability (ICC > 0.95), SAMSEG yielded significantly lower volumetric differences between repeated measures for all subcortical segmentations (p < 0.05) and higher spatial overlap in all structures except putamen, which had significantly higher spatial overlap for ASEG. Second, we tested how well each method was able to detect neuroanatomic volumetric change using longitudinal follow up scans (n = 491 for Avanto and n = 245 for Skyra; interscan interval = 1-10 years). Both methods showed excellent ability to detect longitudinal change, but yielded age-trajectories with notable differences for most structures, including the hippocampus and the amygdala. For instance, ASEG hippocampal volumes showed a steady age-decline from subjects in their twenties, while SAMSEG hippocampal volumes were stable until their sixties. Finally, we tested sensitivity of each method to clinically relevant change. We compared annual rate of hippocampal atrophy in a group of cognitively normal older adults (n = 20), patients with mild cognitive impairment (n = 20) and patients with Alzheimer’s disease (n = 20). SAMSEG was more sensitive to detect differences in atrophy between the groups, demonstrating ability to detect clinically relevant longitudinal changes. Both ASEG and SAMSEG were reliable and led to detection of within-person longitudinal change. However, SAMSEG yielded more consistent measurements between repeated scans without a lack of sensitivity to changes.
bioRxiv Subject Collection: Neuroscience