Comparative pangenome analysis of major pneumococcal genotypes from India
- Posted
- Server
- bioRxiv
- DOI
- 10.1101/2024.01.14.575557
Background
Pneumococcal genomes are highly dynamic with varying core genome sizes. The genotype classification system, Global Pneumococcal Sequence Clusters, identified patterns within genotype and antibiotic resistance. Few genotypes like GPSC10 are frequently associated with antimicrobial resistance and high rates of non-vaccine serotypes.
Objective
To identify and annotate the differences in the core genomes of major GPSC in India, and construct and analyse the Indian Pneumococcal Pangenome (IPPG).
Methods
Using existing dataset from the Global Pneumococcal Sequencing Project, 618 strains were included. The most frequent GPSCs: GPSC1, GOSC2, GPSC8, GPSC9 and GPSC10 were analyzed separately. Pangenomes were constructed using Panaroo with tuning the family threshold parameter. Differences in protein clusters were identified using Orthovenn3 webserver. Functional annotations were performed by eggNOG, Uniprot and STRING database searches.
Results
The IPPG core genome size (1615 genes) was similar to those reported previously, with similar distribution of metabolic categories across the five GPSC types. The GPSC10 (1619 genes) and GPSC1 (1909 genes) had the lowest and highest core genome sizes respectively, and these core genomes possessed genes encoding for macrolide and tetracycline resistance. Virulence genes ply, psaA, pce (cbpE), pavA, nanB, lytA, and hysA are detected among all the core genomes.
Conclusions
There is a genotype specific variation within the core genomes of major GPSCs in India. The presence of antibiotic resistance genes among GPSC1 and GPSC10 core genomes explain widespread drug resistance due to these genotypes. The core virulence genes identified among all the genotypes indicate conserved pathogenesis mechanisms, and can be targets for vaccine development or therapy.