PhD Proposal: Long-read, assembly-based metagenomic and cancer phasing
IRB-3137
Intra-species variation in bacteria results in distinct strains, often associated with important phenotypic differences such as pathogenicity, antimicrobial resistance or cancer. Detecting these strains is essential but challenging, as metagenomic samples used to study bacteria typically contain a mixture of multiple species and strains. Sequencing of these metagenomic samples produces reads of mixed origin, without strain-level resolution. One approach to get strain-level resolution is to assemble haplotypes corresponding to each strain individually. In this proposal, we present Strainy, our assembly-based method for addressing this multiploid phasing problem using single nucleotide variants (SNVs). We then propose methods to detect a different class of genomic variation --inversions-- in bacterial genomes. More accurate detection of inversions would not only allow their integration into phasing pipelines but also enable deeper exploration of their biological implications. Finally, we propose an extension of the Strainy algorithm to detect and reconstruct heterozygous subpopulations of cancer cells, known as cancer clones. Identifying these clones offers critical insights into tumor evolution and supports the development of improved diagnostics and targeted cancer therapies.