ProtTest: Selection of Best-Fit Models of Protein Evolution ProtTest is a widely-used bioinformatic software tool designed to find the most appropriate model of amino acid replacement for a given protein sequence alignment. Selecting the correct evolutionary model is a crucial foundational step in molecular phylogenetics, as it directly impacts the accuracy of reconstructed phylogenetic trees and parameter estimations. Developed by David Posada and his team, the tool bridges a critical gap in protein sequence analysis, offering functionality similar to what jModelTest or Modeltest provides for nucleotide data. 🔬 Core Functions
ProtTest automates the rigorous statistical evaluation of diverse evolutionary frameworks to determine which configuration best fits the specific empirical data at hand.
Model Likelihood Evaluation: Calculates the mathematical likelihood of the protein alignment across up to 120 candidate empirical models.
Statistical Selection Criteria: Compares models using robust statistical metrics, including the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), Corrected AIC (AICc), and Decision Theory Criterion (DT).
Parameter Estimation: Computes model-averaged estimates for critical parameters like observed amino acid frequencies (+F), gamma-distributed rates among sites (+G), and the proportion of invariable sites (+I).
Phylogenetic Inferences: Facilitates the generation of a model-averaged phylogenetic tree, giving researchers a clearer picture of historical evolutionary paths. 🛠️ Supported Matrices and Models
The backend of ProtTest relies heavily on the PhyML framework to execute maximum likelihood (ML) optimizations. It evaluates combinations of popular empirical amino acid substitution matrices, including: Supported Replacement Matrices Standard / General WAG, Dayhoff, JTT, Blosum62, VT, LG, DCMut Mitochondrial mtREV, MtMam, MtArt Specialized / Viral CpREV, RtREV, HIVw, HIVb
When these matrices are combined with rate variation parameters (+I and +G) and character frequencies (+F), the tool exhaustively checks dozens of permutations to capture subtle evolutionary signals. 💻 Architecture and High-Performance Computing (HPC)
The software is written in Java and is fully cross-platform, capable of running natively on Windows, macOS, and Linux systems.
To handle modern, large-scale genomic datasets that feature hundreds or thousands of sequences, developers introduced ProtTest 3 (and its subsequent ProtTest-HPC variants). This architecture leverages three separate execution strategies: ProtTest: selection of best-fit models of protein evolution
Leave a Reply