Introduction. The last decade has seen a surge in well powered genome-wide association studies (GWASs) of complex behavioural traits, disorders, and more recently, of brain structural and functional neuroimaging features. However, the extreme polygenicity of these complex traits makes it difficult to translate the GWAS signal into mechanistic biological insights. We postulate that the covariance of SNP-effects across many brain features, as be captured by latent genomic components of SNP effect sizes. These may partly reflect the concerted multi-locus genomic effects through known molecular pathways and protein-protein interactions. Here, we test the feasibility of a new data-driven method to derive such latent components of genome-wide effects on more than thousand neuroimaging derived traits, and investigate their utility in interpreting the complex biological processes that shape the GWAS signal. Methods. We downloaded the GWAS summary statistics of 3,143 brain imaging-derived phenotypes (IDPs) from the UK Biobank, provided by the Oxford Brain Imaging Genetics (BIG) Server (Elliott et al. 2018). Probabilistic independent component analysis (ICA) was used to extract two hundred independent genomic components from the matrix of SNP-effect sizes. We qualitatively describe the distribution of the latent component’s loadings in the neuroimaging and the genomic dimensions. Gene-wide statistics were calculated for each genomic component. We tested the genomic component’s enrichment for molecular pathways using MSigDB, and for single-cell RNA-sequencing of adult and foetal brain cells. Results. 200 components explained 80% of the variance in SNP-effects sizes. Each MRI modality and data processing method projected the imaging data into a clearly distinct cluster in the genomic component embedded space. Among the 200 genomic components, 157 were clearly driven by a single locus, while 39 were highly polygenic. Together, these 39 components were significantly enriched for 2,274 MSigDB gene sets (fully corrected for multiple testing across gene-sets and components). Several components were sensitive to molecular pathways, single cell expression profiles, and brain traits in patterns consistent with knowledge across these biological levels. To illustrate this, we highlight a component that implicated axonal regeneration pathways, which was specifically enriched for gene expression in oligodendrocyte precursors, microglia and astrocytes, and loaded highly on white matter neuroimaging traits. We highlight a second component that implicated synaptic function and neuron projection organization pathways that was specifically enriched for neuronal cell transcriptomes. Conclusion. We propose genomic ICA as a new method to identify latent genetic factors influencing brain structure and function by multimodal MRI. The derived latent genomic dimensions are highly sensitive to known molecular pathways and cell-specific gene expression profiles. Genomic ICA may help to disentangle the many different biological routes by which the genome defines the inter-individual variation of the brain. Future research is aimed at using this method to profile individual subjects’ genomic data along the new latent dimensions and evaluating the utility of these dimensions in stratifying heterogeneous patient populations.
bioRxiv Subject Collection: Neuroscience