Clonal Distribution of Clindamycin-Resistant Erythromycin-Susceptible (CRES) Streptococcus agalactiae in Korea Based on Whole Genome Sequences
2020; 40(5): 370-381
Ann Lab Med 2022; 42(4): 438-446
Published online July 1, 2022 https://doi.org/10.3343/alm.2022.42.4.438
Copyright © Korean Society for Laboratory Medicine.
Hyoshim Shin , M.D.1, Takashi Takahashi
, M.D., Ph.D.2, Seungjun Lee
, M.D.3, Eun Hwa Choi
, M.D., Ph.D.4, Takahiro Maeda
, B.P.2, Yasuto Fukushima
, B.P.2, and Sunjoo Kim, M.D., Ph.D.3,5
1Department of Laboratory Medicine, Gyeongsang National University Hospital, Jinju, Korea; 2Laboratory of Infectious Diseases, Graduate School of Infection Control Sciences & Ōmura Satoshi Memorial Institute, Kitasato University, Tokyo, Japan; 3Department of Laboratory Medicine, Gyeongsang National University Changwon Hospital, Changwon, Korea; 4Department of Pediatrics, Seoul National University College of Medicine, Seoul, Korea; 5Department of Laboratory Medicine, Gyeongsang National University College of Medicine, Institute of Health Sciences, Jinju, Korea
Correspondence to: Sunjoo Kim, M.D., Ph.D.
Department of Laboratory Medicine, Gyeongsang National University Changwon Hospital, 11 Samjungja-ro, Seongsan-gu, Changwon 51472, Korea
Tel: +82-55-214-3072
Fax: +82-55-214-3087
E-mail: sjkim8239@hanmail.net
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Background: Few studies have investigated the invasiveness of Streptococcus pyogenes based on whole-genome sequencing (WGS). Using WGS, we determined the genomic features associated with invasiveness of S. pyogenes strains in Korea.
Methods: Forty-five S. pyogenes strains from 1997, 2006, and 2017, including common emm types, were selected from the repository at Gyeongsang National University Hospital in Korea. In addition, 48 S. pyogenes strains were randomly selected depending on their invasiveness between 1997 and 2017 to evaluate the genetic evolution and the associations between invasiveness and genetic profiles. Using WGS datasets, we conducted virulence-associated DNA sequence determination, emm genotyping, multi-locus sequence typing (MLST), and superantigen gene profiling.
Results: In total, 87 strains were included in this study. There were no significant differences in the genomic features throughout the study periods. Four genes, csn1, ispE, nisK, and citC, were detected only in invasive strains. There was a significant association between invasiveness and emm cluster type A-C3, including, emm1.0, emm1.18, emm1.3, and emm1.76 (P<0.05). The predominant emm1 lineage belonged to ST28. There were no associations between invasiveness and superantigen gene profiles.
Conclusions: This is the first study using WGS datasets of S. pyogenes strains collected between 1997 and 2017 in Korea. Streptococcal invasiveness is associated with the presence of csn1, ispE, nisK, and citC. The emm1 lineage and ST28 clone are explicitly associated with invasiveness, whereas genomic features remained stable over the 20-year period.
Keywords: Streptococcus pyogenes, Invasiveness, Whole genome sequencing
We genomically characterized
All strains used in this study were collected between 1997 and 2017 and stored in the repository at Gyeongsang National University Hospital (GNUH) in Gyeongnam Province, Korea. Forty-five
Bacteria were identified using a Vitek-2 automated identification system (BioMérieux Inc., Marcy l’Étoile, France). All strains were inoculated in 30% glycerol in Todd-Hewitt broth and stored at –70°C. They were recovered on blood agar plates for genetic analysis.
The study protocol was approved by the Institutional Review Board of GNUH (approval number: GNUCH 2018-01-008). Informed consent was waived because of the retrospective nature of the study.
Genomic DNA was extracted using a Wizard Genomic DNA Isolation Kit (Promega, Madison, WI, USA). The DNA and potential culture contamination were checked by 16S rRNA gene sequencing using an ABI 3730 DNA sequencer (Applied Biosystems, Foster City, CA, USA). A draft genome sequence of each strain was generated by MiSeq sequencing (300-bp, paired-end) using a MiSeq Reagent Kit v3 (Illumina, San Diego, CA, USA). Sequencing libraries were prepared using the TruSeq DNA LT sample Prep Kit (Illumina). The Illumina sequencing data were assembled with SPAdes v3.13.0 (Algorithmic Biology Lab, St. Petersburg Academic University of the Russian Academy of Sciences, St. Petersburg, Russia). The EzBioCloud genome database was used for gene finding and functional annotation of the whole-genome assemblies (https://www.ezbiocloud.net). Protein-coding DNA sequences (CDSs) were predicted using Prodigal 2.6.2 [7]. The CDSs were classified based on their roles, with reference to orthologous groups (EggNOG v4.5; http://eggnogdb.embl.de). For more detailed functional annotation, the predicted CDSs were compared with those from the Swiss-Prot (https://www.uniprot.org), KEGG (http://www.genome.jp/kegg/), and SEED (http://pubseed.theseed.org) databases using the UBLAST (https://www.drive5.com/) program.
CG analyses comprised two steps (Fig. 1). The first analysis was conducted according to the isolation year (strains of 1997 vs. those of 2006 vs. those of 2017). The second analysis was conducted according to invasiveness (strains from the throat vs. those from blood/joint fluid). CG analysis was conducted by comparing functional genes based on the clustering of orthologous genes. The genome sequences of all strains were obtained from the EzBioCloud database (http://www.ezbiocloud.net/), and average nucleotide identity (ANI) values were calculated. For ANI calculation, the query genomes were cut into small fragments (1,020 bp), and high-scoring pairs between two genome sequences were selected using the USEARCH program (http://www.drive5.com/usearch). Using the calculated ANI values, a dendrogram was constructed using the unweighted pair group method. Homologous regions in a target genome to query open reading frames were determined using the USEARCH program and were aligned using pair-wise global alignment. The matched regions in the subject contig were extracted and saved as homologs [8].
We used DDBJ Fast Annotation and Submission Tool v1.2.4 (DFAST; https://dfast.nig.ac.jp) for annotation and searched the sequences around the
When sequences were incompletely matched with an
Phylogenetic analysis was accomplished using ~1.4 million bps of orthologous protein-coding regions for 45 strains according to the isolation year and 48 strains according to invasiveness, respectively (data not shown).
To determine five target genes (
Primer sets used to amplify
Superantigen gene | Forward | Reverse |
---|---|---|
5´-TAAGAACCAAGAGATGG-3´ | 5´-ATTCTTGAGCAGTTACC-3´ | |
Alternative | 5´-CAAGAACCGAGAGATGT-3´ | |
5´-AAGAAGCAAAAGATAGC-3´ | 5´-TGGTAGAAGTTACGTCC-3´ | |
5´-GATTTCTACTATTTCACC-3´ | 5´-AAATATCTGATCTAGTCCC-3´ | |
5´-GTGTAGAATTGAGGTAATTG-3´ | 5´-TAATATAGCCTGTCTCGTAC-3´ | |
5´-TAACTCCTGAAAAGAGGCT-3´ | 5´-TTGTAGCTAGAACCAGAAG-3´ | |
Alternative | 5´-TAGCTCCTGAAAAGAGGCT-3´ | 5´-TTGTAGTTAGAACCAGAAG-3´ |
We determined the STs using allelic profiles consisting of seven housekeeping genes (
For novel allelic numbers/STs, we submitted the data (i.e., bacterial genotypic/phenotypic data and patient backgrounds) to the
We used Fisher’s exact test (two-sided) to determine significant differences in categorical variables, and the chi-square test to compare the proportions in each
The
The STs are presented in Supplemental Data Table 1. The 87 strains comprised 21 STs with exact loci matched against the PubMLST database. ST36, ST28, and ST39, accounting for 20.7%, 18.4%, and 11.5%, respectively, were the most frequent. There were strong associations of genetic characteristics within the MLST complex. The goeBURST diagram is shown in Fig. 3. There were 17 singletons in the CG analysis, and ST28 showed a clonal distribution of invasive strains in the second analysis. The predominant
The phylogenetic tree based on the periodic comparison showed a sporadic distribution (data not shown). The second analysis revealed the genetic relationships among
When comparing gene origins by pan-genome orthologous group (POG) analysis, we found
Presence or absence of pan-genome orthologous genes according to invasiveness
Gene | Function | Non-invasive (N=24) | Invasive (N=24) | |
---|---|---|---|---|
ATP-binding, helicase, hydrolase, mitochondrion, nucleotide-binding, putative mitochondrial ATP-dependent helicase irc3 | Present | Absent | 0.0006 | |
DNA helicase | Present | Absent | 0.0094 | |
Cytoplasm, hydrolase, protease, serine protease, endopeptidase Clp | Present | Absent | 0.0392 | |
DNA-binding, isomerase, magnesium, metal-binding, topoisomerase, DNA topoisomerase | Present | Absent | 0.0496 | |
Transferase, acetate CoA-transferase | Present | Absent | 0.0496 | |
ATP-binding, cell membrane, kinase, membrane, nucleotide-binding, phosphoprotein, transferase, transmembrane, transmembrane helix, two-component regulatory system, histidine kinase | Present | Absent | 0.0496 | |
5-Formyltetrahydrofolate cyclo-ligase | Present | Absent | 0.0496 | |
Antiviral defense, DNA-binding, endonuclease, exonuclease, hydrolase, magnesium, manganese, metal-binding, nuclease, RNA-binding, CRISPR-associated endonuclease Cas9/Csn1 | Absent | Present | 0.0044 | |
ATP-binding, isoprene biosynthesis, kinase, nucleotide-binding, transferase, 4-(cytidine 5´-diphospho)-2-C-methyl-D-erythritol kinase | Absent | Present | 0.0094 | |
ATP-binding, cell membrane, kinase, membrane, nucleotide-binding, phosphoprotein, transferase, transmembrane, transmembrane helix, two-component regulatory system, histidine kinase | Absent | Present | 0.0355 | |
ATP-binding, ligase, nucleotide-binding, (citrate [pro-3S]-lyase) ligase | Absent | Present | 0.0496 |
Abbreviation: CRISPR, clustered regularly interspaced short palindromic repeats.
We looked for common virulence-associated CDSs among all 87 strains by searching for annotated CDSs based on functional annotation of the whole-genome assemblies. We found 25 CDSs associated with bacterial virulence. Among them, 12 (lactocepin, oleate hydratase, putative glycoslytransferases, capsule biosynthesis protein [CapA], regulatory protein MsrR, internalin-I, deoxyribonuclease, biofilm-regulatory protein, listeriolysin-regulatory protein, streptokinase, C5a peptidase, and M protein) were identified in all strains. CDSs encoding exotoxin A and procollagen-proline 3-dioxygenase were frequently detected in invasive strains (all
Comparison of CDSs among all 87 strains by searching annotated CDSs based on functional annotation pipeline of whole-genome assemblies
CDSs | Non-invasive (N=63) | Invasive (N=24) | |
---|---|---|---|
Chitinase | 27 (42.9%) | 14 (58.3%) | 0.293 |
Exotoxin type A | 13 (20.6%) | 12 (50.0%) | 0.015 |
Procollagen-proline 3-dioxygenase | 8 (12.7%) | 8 (33.3%) | 0.035 |
Platelet binding protein GspB | 13 (36.1%) | 4 (16.7%) | 0.771 |
C protein alpha-antigen | 5 (7.9%) | 3 (12.5%) | 0.679 |
Glycoprotein-gp2 | 4 (6.3%) | 3 (12.5%) | 0.389 |
N-acetylmuramoyl-L-alanine amidase | 1 (1.6%) | 0 (0.0%) | 1.000 |
Trehalose transport system permease protein SugB | 1 (1.6%) | 0 (0.0%) | 1.000 |
Serine-rich adhesin for platelets | 1 (1.6%) | 0 (0.0%) | 1.000 |
Deoxyribonuclease (Yes/No) | 61 (96.8%) | 24 (100.0%) | 1.000 |
Hyaluronan.synthase (Yes/No) | 49 (77.8%) | 22 (91.7%) | 0.216 |
Streptopain (Yes/No) | 63 (100.0%) | 23 (95.8%) | 0.276 |
The values are presented as N (%). Bold type indicates statistical significance.
Abbreviation: CDS, coding DNA sequence.
WGS analyses have proven useful in unraveling the genetic diversity of strains and discriminating between closely related strains. Our study provided information about the genomic characteristics and virulence genes of 87 strains collected in Korea over a 20-year period based on longitudinal analysis of WGS datasets.
Up to 200
Globally,
By searching for virulence-associated CDSs, we found that four genes,
Lactococcal
We investigated virulence-associated CDSs among all 87 strains searched from annotated CDSs based on functional annotation of the whole-genome assemblies. The genes encoding exotoxin A and procollagen-proline 3-dioxygenase were frequently present in invasive strains. Streptococcal exotoxin A is encoded by
The phylogenetic analysis revealed no significant associations between the superantigen profiles and invasiveness. Moreover, establishing links between longitudinal groups within the phylogenetic tree was difficult. These results indicate the preservation of stable genetic elements over time. The
This study had some limitations. Although our study spanned two decades and was population-based, only 87 strains were included, explaining why we did not observe significant genome changes during the study period. We investigated virulence-associated CDSs among all 87 strains by searching only annotated CDSs based on a functional annotation pipeline of whole-genome assemblies rather than by searching the sequences around specific genes or by PCR simulation. We searched for related articles by entering the search terms “
In conclusion, this study provided CG characteristics of
Kim S, Choi E, and Takahashi T conceptualized the study; Shin H, Choi E, and Lee S collected the data; Shin H, Maeda T, Fukushima Y, Lee S, Kim S, and Takahashi T analyzed the data; Shin H wrote the manuscript; Kim S and Takahashi T reviewed and edited the manuscript; all authors reviewed and approved the manuscript.
None declared.
This work was supported by a grant from the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare (H19C0047), and Bio & Medical Technology Development Program of the National Research Foundation (NRF) (2021M3E5E3080382, 2021R1I1A3044483) by the Korean government. The funders had no role in study design, data collection and interpretation, decision to publish, or preparation of the manuscript.