Principles of Microbial Diversity. James W. Brown
alt="images"/>
from which a similarity matrix and tree would be generated.
Notice that some positions in the alignment are included multiple times (9 and 2) and some are not included (4, 6, and 7). In realistically large alignments, such randomly sampled alignments yield good trees if the branching arrangements are well supported by the sequence data. Therefore, in a bootstrap analysis, a tree is generated from each of 100 to 1,000 alignments generated by random sampling of the alignment, and the number of these trees that contain each branch of the reference tree (generated in the usual way) is determined. In other words, the presence or absence of every branch in the tree is scored. The percentage of trees from a bootstrapped alignment that contain each branch in the reference tree is used to label the branches (Fig. 5.5). Often, the same type of analysis is performed using more than one method of tree construction, e.g., neighbor joining and maximum likelihood.
The evaluation of bootstrap scores is subjective, but generally branches that show up in at least 50% of trees generated from bootstrapped data sets are considered to be reliable.
Questions for thought
1 1. Compare the bootstrap values on the tree in the bootstrap section above. Which treeing algorithm seems to have generated the more robust trees, maximum likelihood or maximum parsimony? Are long or short branches more reliably predicted?
2 2. What kind of artifact would you predict from a substitution model that overestimated, instead of underestimated, the relative evolutionary distance of long branches?
3 3. During bootstrapping of an alignment, the sequences become scrambled (the residues appear out of order). Does this matter? Why or why not? (Think about it this way—would a tree come out the same if the alignment was scrambled, or not?)
4 4. In the substitution models we talked about, we focused on substitutions (obviously). How do you suppose these models deal with gaps? What about underspecified bases (e.g., R, Y, or N), or instances where sequence data are absent (i.e., in the case where only a piece of the sequence is available)?
5 5. In a parsimony analysis, a by-product of the analysis is predicted ancestral sequences. Can you think of a situation in which this might be useful?
Конец ознакомительного фрагмента.
Текст предоставлен ООО «ЛитРес».
Прочитайте эту книгу целиком, купив полную легальную версию на ЛитРес.
Безопасно оплатить книгу можно банковской картой Visa, MasterCard, Maestro, со счета мобильного телефона, с платежного терминала, в салоне МТС или Связной, через PayPal, WebMoney, Яндекс.Деньги, QIWI Кошелек, бонусными картами или другим удобным Вам способом.