Fast joint estimation of alignment and phylogeny from genomics sequences in a frequentist framework
At a glance
The availability of large molecular data demands accurate and
fast bioinformatics methods to analyze these data. Molecular
sequences of common origin are used to infer phylogenetic trees,
which help to test various biological hypotheses or to support
subsequent analyses. Phylogeny inference relies on sequence
alignments, which are usually inferred during a heuristic search
navigated by a guide-tree. This circularity calls for methods for
joint inference of phylogeny and alignment. This project will
develop a fast and practical solution.
The goal is to develop a fast and accurate joint alignment and tree inference algorithm in the frequentist framework, which will be implemented in a user-friendly software package and applicable to large genomic and metagenomic datasets with of sequences. We will connect our recent successful methods implemented in independent packages: CodonPhyML for fast maximum likelihood phylogeny inference for protein-coding genes and ProGraphMSA for fast probabilistic graph-based phylogeny-aware alignment. To circumvent the computational difficulties, we will use the Poisson indel process - a modification of the classical model with a linear time complexity. High performance computing will ensure that the implementation is optimized for memory usage and speed using parallelization.
The new method will support the phylogenetic analyses of genomic data with thousands of sequences from microbial pathogens or antibody data from infected donors. Based on our own current collaborations with industry, the new method promises to be in high demand not only in academic projects but also in pharmaceutical and biotech industry.
NAR Genomics and Bioinformatics.
Available from : https://doi.org/10.1093/nargab/lqaa092