Skip to Main content Skip to Navigation
Theses

Tools for massive bacterial comparative genomics: Development and Applications.

Abstract : Bacterial comparative genomics consists in comparing the gene contents of different strains: their pangenome. With the increasing number of strains sequenced, the tools available when I started this PhD were reaching their limits in terms of computation time and space. The aim was to develop a method able to handle thousands of genomes, accurately and in a reasonable amount of time. Besides, to our knowledge, no tool was able to do all key steps of any comparative genomics study. This spurred the development of PanACoTA, a tool to standardize and automatize the process to build the key collections of data needed for these studies. This includes all steps from downloading genomes with a quality control until the inference of a phylogenetic tree based on the core genome (genes shared by all strains). In order to be able to adapt to specific needs (exploration of parameters, additional steps), we implemented it in a modular way. For the “pangenome” module, we developed a new method, based on recent tools of genome comparison and clustering. Robust to changes in sampling size, this method can infer a pangenome of 4000 strains in 30 minutes. During its development, we applied PanACoTA to different kinds of studies. We showed its usefulness for short-term studies (find specificity of a pathogenic strain of E. anophelis), long-term (genomic diversity of E. coli species), or to identify different species in an little-known genus (Morganella).
Complete list of metadata

https://hal-pasteur.archives-ouvertes.fr/tel-03697241
Contributor : Amandine PERRIN Connect in order to contact the contributor
Submitted on : Thursday, June 16, 2022 - 3:54:18 PM
Last modification on : Saturday, June 25, 2022 - 3:44:09 AM

File

these_archivage_1800004895K.pd...
Files produced by the author(s)

Licence


Distributed under a Creative Commons Attribution - NonCommercial - NoDerivatives 4.0 International License

Identifiers

  • HAL Id : tel-03697241, version 1

Citation

Amandine Perrin. Tools for massive bacterial comparative genomics: Development and Applications.. Quantitative Methods [q-bio.QM]. Sorbone Université, 2022. English. ⟨tel-03697241⟩

Share

Metrics

Record views

0

Files downloads

0