Gene Similarity-based Approaches for Determining Core-Genes of Chloroplasts

Affiliation auteurs!!!! Error affiliation !!!!
TitreGene Similarity-based Approaches for Determining Core-Genes of Chloroplasts
Type de publicationConference Paper
Year of Publication2014
AuteursAlKindy B, Guyeux C, Couchot J-F, Salomon M, Bahi JM
EditorZheng H, Hu X, Berrar D, Wang Y, Dubitzky W, Hao JK, Cho KH, Gilbert D
Conference Name2014 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM)
PublisherIEEE; Natl Sci Fdn; Nsilico; Comp Sci Res Inst; BioBusiness; IEEE Comp Soc; Engn Res Inst; Biomed Sci Res Inst
Conference Location345 E 47TH ST, NEW YORK, NY 10017 USA
ISBN Number978-1-4799-5669-2
Mots-clésChloroplasts, clustering, Core genome, evolution, Methodology, Pan genome, Quality Control
Résumé

In computational biology and bioinformatics, the manner to understand evolution processes within various related organisms paid a lot of attention these last decades. However, accurate methodologies are still needed to discover genes content evolution. In a previous work, two novel approaches based on sequence similarities and genes features have been proposed. More precisely, we proposed to use genes names, sequence similarities, or both, insured either from NCBI or from DOGMA annotation tools. Dogma has the advantage to be an up-to-date accurate automatic tool specifically designed for chloroplasts, whereas NCBI possesses high quality human curated genes (together with wrongly annotated ones). The key idea of the former proposal was to take the best from these two tools. However, the first proposal was limited by name variations and spelling errors on the NCBI side, leading to core trees of low quality. In this paper, these flaws are fixed by improving the comparison of NCBI and DOGMA results, and by relaxing constraints on gene names while adding a stage of post-validation on gene sequences. The two stages of similarity measures, on names and sequences, are thus proposed for sequence clustering. This improves results that can be obtained using either NCBI or DOGMA alone. Results obtained with this ``quality control test'' are further investigated and compared with previously released ones, on both computational and biological aspects, considering a set of 99 chloroplastic genomes.