Machine learning for spatial and network biology
Biological systems are organized across a hierarchy of scales, from the spatial organization of cells in tissues to networks of interactions between genes and proteins. However, the systematic study of spatial and network processes is challenged by high levels of heterogeneity, sparsity, and other forms of noise in modern sequencing data. In this talk, I present new statistical and machine learning (ML) methodologies for spatial and network biology. First, I introduce “gene expression topography”, a fundamentally new paradigm for modeling spatial gradients and tissue geometry from sparse spatial data. I derive algorithms for learning “topographic maps” of 2-D tissue slices using tools from complex analysis and interpretable deep learning. These maps reveal the spatial and molecular organization of tissues from the brain, skin, and tumor microenvironment. Second, I introduce a statistical framework for anomaly detection in biological interaction networks, or the problem of identifying anomalous subnetworks of interacting disease genes/proteins. I prove that many widely-used algorithms are statistically biased — resolving a 20-year-old open question on why these methods identify large and unrealistic subnetworks — and I derive asymptotically unbiased and efficient algorithms for network anomaly detection. Taken together, my research underscores the need for specialized, principled, and interpretable ML approaches for advancing biomedical discovery.