Scalable methods for k-mer based biological sequence analysis

Talk

Paul Medvedev

Talk Series:

Visitors

Time:

02.03.2025 11:00 to 12:00

Location:

IRB 4105 or https://umd.zoom.us/j/94340703410?pwd=rrXaGSXSpabcMTtDNmeCNf2Ih2fQYE.1

URL:

https://talks.cs.umd.edu/talks/4095

Scalable analysis of biological sequences often starts by breaking long strings into their constituent k-mers. A k-mer is simply a substring of a short fixed length k. Compact data structures and efficient algorithms for storing and analyzing k-mer datasets have therefore become one of the bottlenecks for biological discovery. In this talk, I will present several techniques we have developed to push the boundaries of what is possible with such datasets. I will present the spectrum-preserving string set representation (RECOMB 2020, best paper award) as well as space-efficient data structures for querying large sequence archives (RECOMB 2017). Time permitting, I will also present our work on the use of sketching algorithms to estimate sequence similarity from k-mer sets (RECOMB 2021 and ISMB 2022).

Scalable methods for k-mer based biological sequence analysis

Talk

Talk

Talk

Talk

Talk

Talk

Talk

Event

Event

Talk