Detailansicht

Goodness of fit and robustness of phylogenetic methods in the light of intermittent evolution
Thi Minh Anh Nguyen
Art der Arbeit
Dissertation
Universität
Universität Wien
Fakultät
Fakultät für Lebenswissenschaften
Betreuer*in
Arndt von Haeseler
Volltext herunterladen
Volltext in Browser öffnen
Alle Rechte vorbehalten / All rights reserved
DOI
10.25365/thesis.16620
URN
urn:nbn:at:at-ubw:1-30250.63614.638266-5
Link zu u:search
(Print-Exemplar eventuell in Bibliothek verfügbar)

Abstracts

Abstract
(Deutsch)
nicht angegeben
Abstract
(Englisch)
Charles Darwin's theory of `The Origin of Species' (1859) states that species have evolved from common ancestors. Reconstructing so-called phylogenetic trees to elucidate the evolutionary relationships among species has since then become one of the main objectives in biology. In recent years, more and more phylogenetic studies have been published thanks to the advent of massive sequence data and to the development of efficient software packages. However, before drawing biological implications from the inferred evolutionary relationships, several issues should be taken into account. This thesis investigates two interesting issues in more detail: First, how can one know that the model used describes the data adequately? We present MISFITS, a novel approach to evaluate the goodness of fit between a phylogenetic model and an alignment, which at the same time pinpoints to alignment site patterns that do not fit. MISFITS introduces a minimum number of extra substitutions on the inferred tree to provide a biologically motivated justification for the deviation between the observed site pattern frequency and the corresponding expectation. The extra substitutions plus the evolutionary model then fully explain the alignment. Moreover, the significance of the required number of extra substitutions can be determined by conducting a parametric bootstrap analysis. Therefore, MISFITS rejects inadequate models in terms of fit to the data. We demonstrate MISFITS on several examples and present a survey of the goodness of fit of the best-fit models (suggested by model selection) to thousands of alignments in the PANDIT database. Second, insights into the performance of tree inference methods are essential because they may help to avoid wrong conclusions from the inferred phylogenies due to reconstruction artefacts such as long branch attraction. Among the criteria to evaluate the performance of a phylogenetic method, robustness to model violation is of particular practical importance as complete a priori knowledge of evolutionary processes is typically unavailable. We first develop ImOSM, a convenient tool to imbed intermittent evolution as model violation into an alignment. Intermittent evolution refers to extra substitutions occurring randomly on branches of a tree and thus changing alignment site patterns. We then study the robustness of widely used phylogenetic methods: maximum likelihood (ML), maximum parsimony (MP) and a distance-based method (BIONJ) to various scenarios of model violation. We show that violation of rates across sites (RaS) heterogeneity, and simultaneous violation of RaS and the transition transversion ratio along two nonadjacent external branches hinder all methods recovery of the true topology for a four-taxon tree. For an eight-taxon balanced tree these violations cause each of the three methods to infer a different topology: both ML and MP fail whilst BIONJ reconstructs the true tree. Furthermore, we report that several tests including the MISFITS test have enough power to detect such model violations. Thus, for analyses of real data, such reconstruction results require further investigation and these tests are recommended at the first glance.

Schlagwörter

Schlagwörter
(Englisch)
sequence evolution phylogeny inference model test model adequacy model violation maximum likelihood maximum parsimony neighbor joining
Autor*innen
Thi Minh Anh Nguyen
Haupttitel (Englisch)
Goodness of fit and robustness of phylogenetic methods in the light of intermittent evolution
Publikationsjahr
2011
Umfangsangabe
XIV, 113 S. : graph. Darst.
Sprache
Englisch
Beurteiler*innen
Dirk Metzler ,
Simon Whelan
Klassifikationen
42 Biologie > 42.10 Theoretische Biologie ,
54 Informatik > 54.80 Angewandte Informatik
AC Nummer
AC08960046
Utheses ID
14896
Studienkennzahl
UA | 091 | 490 | |
Universität Wien, Universitätsbibliothek, 1010 Wien, Universitätsring 1