J. A. Lake, The order of sequence alignment can bias the selection of tree topology, Mol Biol Evol, vol.8, pp.378-385, 1991.

D. A. Morrison and J. T. Ellis, Effects of nucleotide sequence alignment on phylogeny estimation: a case study on 18 S rDNAs of apicomplexa, Mol Biol Evol, vol.14, pp.428-441, 1997.

T. H. Ogden and M. S. Rosenberg, Multiple sequence alignment accuracy and phylogenetic inference, Syst Biol, vol.55, pp.314-328, 2006.

L. Wang, J. Leebens-mack, P. K. Wall, K. Beckmann, C. W. Depamphilis et al., The impact of multiple protein sequence alignment on phylogenetic estimation, IEEE/ACM Trans Comput Biol Bioinf, 2009.

R. C. Edgar and S. Batzoglou, Multiple sequence alignment, Curr Opin Struct Biol, vol.16, pp.368-373, 2006.

C. Notredame, Recent evolutions of multiple sequence alignment algorithms, PLoS Comput Biol, vol.3, p.123, 2007.

J. Castresana, Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis, Mol Biol Evol, vol.17, pp.540-552, 2000.

G. Talavera and J. Castresana, Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments, Syst Biol, vol.56, pp.564-577, 2007.

G. J. Olsen and C. R. Woese, Ribosomal RNA: a key to phylogeny, FASEB J, vol.7, pp.113-123, 1993.

A. G. Rodrigo, P. R. Bergquist, and P. L. Bergquist, Inadequate support for an evolutionary link between the Metazoa and the Fungi, Syst Biol, vol.43, pp.578-584, 1994.

D. L. Swofford, G. J. Olsen, P. J. Waddell, D. M. Hillis, C. Moritz et al., Phylogenetic inference. Molecular Systematics Sunderland: Sinauer AssociatesHillis DM, pp.407-514, 1996.

N. Rodríguez-ezpeleta, H. Brinkmann, S. C. Burey, B. Roure, G. Burger et al., Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes, Curr Biol, vol.15, pp.1325-1330, 2005.

J. Huerta-cepas, A. Bueno, J. Dopazo, and T. Gabaldón, PhylomeDB: a database for genome-wide collections of gene phylogenies, Nucleic Acids Res, vol.36, pp.491-496, 2008.

A. Dress, C. Flamm, G. Fritzsch, S. Grünewald, M. Kruspe et al., Noisy: Identification of problematic columns in multiple sequence alignments, Algorithms for Molecular Biology, vol.3, p.7, 2008.

W. D. Swingley, R. E. Blankenship, and J. Raymond, Integrating Markov clustering and molecular phylogenetics to reconstruct the cyanobacterial species tree from conserved protein families, Mol Biol Evol, vol.25, pp.643-654, 2008.

S. Capella-gutiérez, J. M. Silla-martínez, and T. Gabaldón, trimAl: a tool for automated alignment triming in large-scale phylogenetic analyses, Bioinformatics, vol.25, pp.1972-1973, 2009.

V. Daubin, M. Gouy, and G. Perrière, A phylogenomic approach to bacterial phylogeny: evidence of a core of genes sharing a common history
URL : https://hal.archives-ouvertes.fr/hal-00427259

, Genome Res, vol.12, pp.1080-1090, 2002.

F. D. Ciccarelli, T. Doerks, C. Von-mering, C. J. Creevey, B. Snel et al., Toward automatic reconstruction of a highly resolved tree of life, Science, vol.311, pp.1283-1287, 2006.

J. Adachi, P. J. Waddell, W. Martin, and M. Hasegawa, Plastid genome phylogeny and a model of amino acid substitution for protein encoded by chloroplast DNA, J Mol Evol, vol.50, pp.348-358, 2000.

M. M. Mcmahon and M. J. Sanderson, Phylogenetic supermatrix analysis of GenBank sequences from 2228 Papilionoid legumes, Syst Biol, vol.55, pp.818-836, 2006.

T. D. Schneider and R. M. Stephens, Sequence logos: a new way to display consensus sequences, Nucl Acids Res, vol.18, pp.6097-6100, 1990.

V. Jayaswal, L. S. Jermiin, and J. Robinson, Estimation of phylogeny using a general Markov model, Evol Bioinf Online, vol.1, pp.62-80, 2005.

N. Galtier and M. Gouy, Inferring pattern and process: maximum-likelihood implementation of a nonhomogeneous model of DNA sequence evolution for phylogenetic analysis, Mol Biol Evol, vol.15, pp.871-879, 1998.
URL : https://hal.archives-ouvertes.fr/hal-00428472

L. S. Jermiin, S. Ho, F. Ababneh, J. Robinson, and A. Larkum, The biasing effect of compositional heterogeneity on phylogenetic estimates may be underestimated, Syst Biol, vol.53, pp.638-643, 2004.

M. J. Phillips, F. Delsuc, and D. Penny, Genome-scale phylogeny and the detection of systematic biases, Mol Biol Evol, vol.21, pp.1455-1458, 2004.
URL : https://hal.archives-ouvertes.fr/halsde-00193019

, International Union of Pure and Applied Chemistery and International Union of Biochemistery (IUPAC-IUB) Commission on Biochemical Nomenclature: Abbreviations and symbols for nucleic acids, polynucleotides and their constituents, Biochem J, vol.120, pp.449-454, 1970.

M. O. Dayhoff, R. M. Schwartz, and B. D. Orcutt, Atlas of Protein Sequence and Structure Washington, National Biomedical Research FoundationDayhoff MO, vol.1978, issue.3, pp.345-352

T. M. Embley, M. Van-der-giezen, D. S. Horner, P. L. Dyal, and P. Foster, Mitochondria and hydrogenosomes are two forms of the same fundamental organelle, Philos Trans R Soc Lond B Biol Sci, vol.358, pp.191-203, 2003.

E. Susko and A. J. Roger, On reduced amino acid alphabets for phylogenetic inference, Mol Biol Evol, vol.24, pp.2139-2150, 2007.

A. Stuart, A test for homogeneity of the marginal distributions in a twoway classification, Biometrika, vol.42, pp.412-416, 1955.

F. Ababneh, L. S. Jermiin, C. Ma, and J. Robinson, Matched-pairs tests of homogeneity with applications to homologous nucleotide sequences, Bioinformatics, vol.22, pp.1225-1231, 2006.

, International Union of Pure and Applied Chemistery and International Union of Biochemistery (IUPAC-IUB) Commission on Biochemical Nomenclature: A one-letter notation for amino acid sequences (definitive rules), Pure Appl Chem, vol.31, pp.639-645, 1972.

V. Neumann and J. , Mathematische Grundlagen der Quantenmechanik, 1932.

D. R. Caffrey, S. Somaroo, J. D. Hughes, J. Mintseris, and E. S. Huang, Are proteinprotein interfaces more conserved in sequence than the rest of the protein surface?, Protein Sci, vol.13, pp.190-202, 2004.

, JAMA: A Java Matrix Package

C. Shannon, A mathematical theory of communication. Bell System Tech J 1948, vol.27, pp.623-656

W. R. Taylor, The classification of amino acid conservation, J Theor Biol, vol.119, pp.205-218, 1986.

D. Bordo and P. Argos, Suggestion for "safe" residue substitutions in sitedirected mutagenesis, J mol Biol, vol.217, pp.721-729, 1991.

S. Henikoff and J. G. Henikoff, Amino acid substitution matrices from protein blocks, Proc Natl Acad Sci, vol.89, pp.10915-10919, 1992.

, BLOSUM matrices

S. R. Eddy, Where did the BLOSUM62 alignment score matrix come from?, Nat Biotechnol, vol.22, pp.1035-1036, 2004.

D. J. States, W. Gish, and S. F. Altschul, Improved sensitivity of nucleic acid database searches using application-specific scoring matrices, Methods: A Companion to Methods Enzymol, vol.3, pp.66-70, 1991.

H. Uesaka, Validity and applicability of several tests for comparing marginal distributions of a square table with ordered categories, Behaviormetrika, vol.30, pp.65-78, 1991.

W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C -The Art of Scientific Computing Cambridge, vol.2, 1992.

A. Rambaut and N. C. Grassly, Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees, Comput Appl Biosci, vol.13, pp.235-238, 1997.

D. Jones, W. Taylor, and J. Thornton, The rapid generation of mutation data matrices from protein sequences, Comput Appl Biosci, vol.8, pp.275-282, 1992.

S. Guindon and O. Gascuel, A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood, Syst Biol, vol.52, pp.696-704, 2003.

R. C. Edgar, MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Res, vol.32, pp.1792-1797, 2004.

R. C. Edgar, MUSCLE: a multiple sequence alignment method with reduced time and space complexity, BMC Bioinformatics, vol.5, p.113, 2004.

J. P. Egan, Signal detection theory and ROC analysis, series in cognition and perception, 1975.

C. E. Metz, Basic principles of ROC analysis, Semin Nucl Med, vol.8, pp.283-298, 1978.

J. Swets, Measuring the accuracy of diagnostic systems, Science, vol.240, pp.1285-1293, 1988.

G. Criscuolo and . Biology, , vol.10, 2010.

T. Fawcett, An introduction to ROC analysis, Pattern Recogn Lett, vol.27, pp.861-874, 2006.

W. Mcstewart, A note on the power of the sign test, Ann Math Stat, vol.12, pp.279-303, 1941.

W. J. Dixon and A. M. Mood, The statistical sign test, J Am Statist Assoc, vol.41, pp.557-566, 1946.

J. Hemelrijk, A theorem on the sign test when ties are present, Proc Nederl Akad Weten Ser A, vol.55, p.322, 1952.

O. Gascuel, BIONJ: an improved version of the NJ algorithm based on a simple model of sequence data, Mol Biol Evol, vol.14, pp.685-695, 1997.
URL : https://hal.archives-ouvertes.fr/lirmm-00730410

P. Goloboff, S. Farris, and K. C. Nixon, TNT (Tree analysis using New Technology) ver

K. C. Nixon, The parsimony ratchet, a new method for rapid parsimony analysis, Cladistics, vol.15, pp.407-414, 1999.

G. F. Estabrook, F. R. Mcmorris, and C. A. Meacham, Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units, Syst Zool, vol.34, pp.193-200, 1985.

J. Felsenstein, Confidence limits on phylogenies: an approach using the bootstrap, vol.39, pp.783-791, 1985.

M. Anisimova and O. Gascuel, Approximate likelihood ratio test for branches: a fast, accurate and powerful alternative, Syst Biol, vol.55, pp.539-552, 2006.
URL : https://hal.archives-ouvertes.fr/lirmm-00136658

S. Guindon, J. F. Dufayard, V. Lefort, M. Anisimova, W. Hordijk et al., New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0, Syst Biol, vol.59, pp.307-321, 2010.
URL : https://hal.archives-ouvertes.fr/lirmm-00511784

J. A. Hanley and B. J. Mcneil, The meaning and the use of the area under a receiver operating characteristic (ROC) curve, Radiology, vol.143, pp.29-36, 1982.

J. Fogarty, R. S. Baker, and S. E. Hudson, Case studies in the use of ROC curve analysis for sensor-based estimates in human computer interaction, Proceedings of Graphics Interface, pp.129-136, 2005.

, Concatenate: a software to build supermatrices of characters

J. Adachi and M. Hasegawa, Model of amino acid substitution in proteins encoded by mitochondrial DNA, J Mol Evol, vol.42, pp.459-468, 1996.

N. Rodríguez-ezpeleta, H. Brinkmann, S. C. Burey, B. Roure, G. Burger et al., Monophyly of Primary Photosynthetic Eukaryotes: Green Plants, Red Algae, and Glaucophytes, Curr Biol, vol.15, pp.1325-1330, 2005.

J. D. Hackett, H. S. Yoon, S. Li, A. Reyes-prieto, S. E. Rümmele et al., Phylogenomic analysis supports the monophyly of Cryptophytes and Haptophytes and the association of Rhizaria with Chromalveolates, Mol Biol Evol, vol.24, pp.1702-1713, 2007.

F. Burki, K. Shalchian-tabrizi, and J. Pawlowski, Phylogenomics reveals a new 'metagroup' including most photosynthetic eukaryotes, Biol Lett, vol.4, pp.366-369, 2008.

V. Hampl, L. Hug, J. W. Leigh, J. B. Dacks, B. F. Lang et al., Phylogenomic analyses support the monophyly of Excavata and resolve relationships among eukaryotic "supergroups, Proc Natl Acad Sci, vol.106, pp.3859-3864, 2009.

W. Ewens and G. Grant, Statistical methods in bioinformatics: an introduction, 2005.

J. Felsenstein, Evolutionary tree from DNA sequences: a maximum likelihood approach, J Mol Evol, vol.17, pp.368-376, 1981.

A. Rzhetsky and M. Nei, Tests of applicability of several substitution models for DNA sequence data, Mol Biol Evol, vol.12, pp.131-151, 1995.

A. H. Bowker, A test for symmetry in contingency tables, J Am Stat Assoc, vol.43, pp.572-574, 1948.

A. Rokas, B. L. Williams, N. King, and S. B. Carroll, Genome-scale approaches to resolving incongruence in molecular phylogenies, Nature, vol.425, pp.798-804, 2003.

, Yeast multi-gene dataset

C. Lanave, G. Preparata, C. Saccone, and G. Serio, A new method for calculating evolutionary substitution rates, J Mol Evol, vol.20, pp.86-93, 1984.

R. Rodriguez, J. L. Oliver, A. Marin, and J. R. Medina, The general stochastic model of nucleotide substitution, J Theor Biol, vol.142, pp.485-501, 1990.

Z. Yang, Estimating the pattern of nucleotide substitution, J Mol Evol, vol.39, pp.105-111, 1994.

J. A. Lake, Reconstructing evolutionary trees from DNA and protein sequences: paralinear distances, Proc Natl Acad Sci, vol.91, pp.1455-1459, 1994.

P. J. Lockhart, M. A. Steel, M. D. Hendy, and D. Penny, Recovering evolutionary trees under a more realistic model of sequence evolution, Mol Biol Evol, vol.11, pp.605-612, 1994.

M. A. Steel, Recovering a tree from the leaf colourations it generates under a Markov model, Appl Math Lett, vol.7, pp.19-23, 1994.

D. J. Taylor and W. H. Piel, An assessment of accuracy, error, and conflict with support values from genome-scale phylogenetic data, Mol Biol Evol, vol.21, pp.1534-1537, 2004.

F. Ren, H. Tanaka, and Z. Yang, An empirical examination of the utility of codon-substitution models in phylogeny reconstruction, Syst Biol, vol.54, pp.808-818, 2005.

J. G. Burleigh, A. C. Driskell, and M. J. Sanderson, Supertree bootstrapping methods for assessing phylogenetic variation among genes in genome-scale data sets, Syst Biol, vol.55, pp.426-440, 2006.

C. Ané, B. Larget, D. A. Baum, S. D. Smith, and A. Rokas, Bayesian estimation of concordance among gene trees, Mol Biol Evol, vol.24, pp.412-426, 2007.

A. Criscuolo and C. J. Michel, Phylogenetic inference with weighted codon evolutionary distances, J Mol Evol, vol.68, pp.377-392, 2009.

B. Néron, H. Ménager, C. Maufrais, N. Joly, J. Maupetit et al., Mobyle: a new full web bioinformatics framework, vol.25, pp.3005-3011, 2009.

, Cite this article as: Criscuolo and Gribaldo: BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments, BMC Evolutionary Biology, vol.10, p.210, 2010.