|
|
||||||||

1 Microbiology Research Group, Research Institute of Innovative Technology for the Earth (RITE), Soraku, Kyoto 619-0292, Japan
2 Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan
Correspondence
Hideaki Yukawa
mmg-lab{at}rite.or.jp
| ABSTRACT |
|---|
|
|
|---|
The DDBJ/EMBL/GenBank accession numbers for the complete sequence of the C. glutamicum R genome and its native episome PCgR1 are AP009044 and AP009045, respectively.
Tables of strains and plasmids, and of oligonucleotides and primers used in this study are available with the online version of this paper.
Present address: Institute of Plant Biology, Hungarian Academy of Sciences, Szeged, Hungary.
| INTRODUCTION |
|---|
|
|
|---|
The complete genome sequences of two variants of C. glutamicum ATCC 13032 have been published (Ikeda & Nakagawa, 2003
; Kalinowski et al., 2003
), as have those of the closely related Corynebacterium efficiens YS-314 (Nishio et al., 2003
), and of the two human pathogens Corynebacterium jeikeium K411 (Tauch et al., 2005
) and Corynebacterium diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003
). We report here the genomic sequence of C. glutamicum R, a strain with industrial potential (Inui et al., 2004
) that was isolated from soil sampled in Japan.
The availability of these different genomic data allows the identification of the corynebacterial core genes and of those genes directly related to various ecological niches. In addition, comparative analysis of the sigma factors, secreted proteins, sugar metabolism and two-component systems present in C. glutamicum R enables further assessment of the industrial potential of this strain and its metabolic and regulatory specificities.
| METHODS |
|---|
|
|
|---|
DNA techniques.
Corynebacterial chromosomal DNA was isolated following standard methods (Sambrook et al., 1989
) modified by using 4 mg lysozyme ml1 at 37 °C for 30 min. Cells were transformed as previously described (Vertès et al., 1993b
) using E. coli JM110 plasmid DNA. E. coli plasmid DNA isolation and strain transformation were performed following standard methods (Sambrook et al., 1989
). Restriction endonucleases, Klenow fragment and T4 DNA ligase were used as per the manufacturer's instructions (Takara). Restriction fragments were isolated from agarose gels with the GeneClean kit (Bio 101), according to the manufacturer's instructions. PCR was performed using Ex-Taq DNA polymerase (Takara). Prior to sequencing, exonuclease treatment was performed using ExoSAP-IT (USB), as per the manufacturer's instructions.
Library construction.
Random fragments resulting from sonication of C. glutamicum R chromosomal DNA were separated on agarose gel into one 23 kb pool and another 89 kb pool. The fragments were blunted and ligated into SmaI-digested pUC119. The ligation mixture was used to transform E. coli JM109 and recombinants were selected on IPTG-supplemented plates. Gaps between contigs were closed using a Lambda FIX II/XhoI replacement phage library with a mean insert size of 20 kb, as per the manufacturer's instructions (Stratagene). Fragments (12 kb) at the end of the assembled contigs were amplified by PCR from the chromosomal DNA of C. glutamicum R, labelled using a Gene Image Random Prime Labelling Module (GE Healthcare Bio-Sciences Corp.) and used as probes to screen the phage library by plaque hybridization. Several genomic DNA fragments extracted from the positive phage clones were sequenced after subcloning into vector pUC18 (Sambrook et al., 1989
). For sequencing purposes, E. coli clones bearing C. glutamicum R chromosomal DNA fragments were grown overnight and the corresponding plasmids were isolated. Gaps in the assembled sequence were closed by PCR-mediated genome walking.
Genome sequencing.
Using the whole-genome shotgun method, libraries of 23, 89 and 20 kb genomic inserts were sequenced from both ends using M13 universal forward and reverse primers (Sambrook et al., 1989
) and cycle-sequenced using the BigDye Terminator method in ABI 3700 CE and ABI 3730 DNA analysers (Applied Biosystems). The sequences were base-called and assembled using Phred, Phrap and Consed (Ewing et al., 1998
; Gordon et al., 1998
). The Pregap4 program of the Staden package (Bonfield et al., 1995
; Staden, 1996
) was used for clipping vector sequences, as well as for quality clipping and contamination screening after base-calling by Phred (Ewing & Green, 1998
; Ewing et al., 1998
). Gaps were closed by primer walking on gap-spanning plasmid clones and direct sequencing of PCR products. Repetitive sequences such as rDNA were confirmed by PCR. The error rate was lower than 2 bases per 10 kb as calculated using Consed (Gordon et al., 1998
).
Gene prediction and analysis.
rRNAs were located by a BLASTN homology search against the 16S, 23S and 5S rRNA sequences of C. glutamicum ATCC 13032. tRNAs were predicted by tRNA scan SE (Lowe & Eddy, 1997
). Protein coding sequences (CDS) were predicted using Glimmer3 (Delcher et al., 1999a
; Salzberg et al., 1998
) and GeneMarkS (Besemer et al., 2001
). Proteome prediction was performed by a BLASTP homology search using an e-value lower than 1xe4 against GenBank release 152, GenPept release 152, UniProt release 4.6, the NCBI clusters of orthologous groups (COG) database (Tatusov et al., 1997
) and the Pfam family database (Accelrys GCG Wisconsin package version 11.0). Numerous annotations were checked manually after the auto-annotations were performed. The search for repeats at each extremity of each of the 10 major strain-specific islands (SSIs) present in the genome of C. glutamicum R (Suzuki et al., 2005a
) was performed using the EMBOSS package (Rice et al., 2000
).
Identification of orthologous CDS and genomic islands.
All the CDS of C. glutamicum R, C. glutamicum ATCC 13032 (Ikeda & Nakagawa, 2003
; Kalinowski et al., 2003
), C. efficiens YS-314 (Nishio et al., 2003
), C. diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003
) and C. jeikeium K411 (Tauch et al., 2005
) were compared to each other in a reciprocal manner using BLASTP with an e-value of xe4 as a cut-off. Genes showing the highest similarity levels in a dual strain comparison were automatically parsed for every CDS present in both strains using an original Perl script. The CDS with a reciprocal best hit were defined as being orthologous CDS of the two strains.
Identification of genomic DNA islands in the C. glutamicum R and C. glutamicum ATCC 13032 genome was performed using MUMer2.1 (Delcher et al., 1999b
, 2002
).
Transposon, deletion, and gene disruption and replacement mutagenesis.
We used a combination of Tn5 and Tn31831 mutagenesis systems to assemble a library of 2300 different transposon mutants (Suzuki et al., 2006
; Vertès et al., 2005
), and the Cre-loxP system (Suzuki et al., 2005b
) to generate deletion mutants. Mutations were verified by PCR using the oligonucleotides and nucleotide primers shown in Supplementary Table S2 (available with the online version of this paper). Gene disruption and replacement mutagenesis were performed as described previously (Vertès et al., 1993a
) using primers indicated in Supplementary Table S2.
Gene identification numbers.
Gene identification numbers are from the Virtual Institute of Microbial Stress and Survival (VIMSS) database (Alm et al., 2005
).
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
|
|
The genome of strain R encodes 6 rRNA operons and 57 tRNA genes. The first base of the initiation codon of the dnaA gene was designated as the sequence coordinate origin. It is located near characteristic replication regions. The GC skew analysis (Grigoriev, 1998
) presented in Fig. 1
supports the view that DNA replication in C. glutamicum is bidirectional, as observed in strain ATCC 13032 (Kalinowski et al., 2003
). It is noteworthy that the G+C content variation is low in this genome. Nevertheless, a comparison of the genome of C. glutamicum strain R with the published genome sequences of two isolates of C. glutamicum ATCC 13032 (Ikeda & Nakagawa, 2003
; Kalinowski et al., 2003
) revealed the presence of hundreds of SSIs (Suzuki et al., 2005b
). In particular, strain R exhibits 11 SSIs larger than 10 kb (Fig. 1
) that have a G+C content ranging from 45.7 to 60.7 mol% (Suzuki et al., 2005a
). Notably, none of these was observed to be flanked by obvious repeats. However, it is noteworthy that strain R is devoid of the AT-rich 211 kb genomic island with clear boundaries (Zhang & Zhang, 2004
) that is carried by strain ATCC 13032 and that contains genes typically associated with horizontal transfer, such as a restriction-modification system, transposases, recombination enzymes and phage-derived sequences (Kalinowski et al., 2003
). Likewise, the AT-rich 25 kb region harboured by strain ATCC 13032 containing the genes cg0414cg0440 is absent from the strain R genome. It is noteworthy that products of these genes are involved in cell wall formation, including cell surface polysaccharide (wzz, cg0414), lipopolysaccharide (various glycosyl transferases) or murein formation (murA, cg0422; murB, cg0423). These observations suggest that measurable differences could exist between the cell walls of the two strains that could perhaps form the basis of immunotyping procedures. Furthermore, the 14 kb G+C-rich region identified in the genome of strain ATCC 13032 (Kalinowski et al., 2003
) to contain C. diphtheriae sequences (95 % identity at the nucleotide level) is absent from the genome of strain R, which does not exhibit any sequence similar to the gene cluster cg3276cg3290. This observation promotes the view that these genes have, on an evolutionary timescale, recently been acquired by C. glutamicum ATCC 13032 in a horizontal transfer event originating from C. diphtheriae. On the other hand, it is noteworthy that the strain R SSI-4, characterized by a low G+C content (45.7 mol%) (Suzuki et al., 2005a
), and thus probably resulting from a horizontal transfer, does not contain any obvious phage- or mobile-element-related sequence.
Strain R encodes 22 sequences homologous to mobile elements, including six incomplete insertion sequence signatures, and five insertion sequences that contain two ORFs which putatively form a full transposase protein via a frameshift (ISCgR3a-b, ISCgR10a-b, ISCgR11a-b, ISCgR12a-b, ISCgR13a-b) (Table 2
). The presumably functional mobile elements found in C. glutamicum R originate from two different families (IS3, ISCgR11; IS6, ISCgR1, ISCgR2, ISCgR9), as defined by Mahillon & Chandler (1998)
. ISCgR9 is an isoform of IS1628 (Mahillon & Chandler, 1998
). ISCgR3 and ISCgR13 constitute novel elements that share a relatively low level of identity at the amino acid level with a putative Rhodococcus erythropolis insertion sequence (Stecker et al., 2003
). Similar to what is observed in strain ATCC 13032, SSIs in strain R are rich in mobile elements, particularly SSI-3 (G+C content, 60.7 mol%) and SSI-5 (55.2 mol%) (Table 2
). On the other hand, fewer genes of phage origin are present in the genome of C. glutamicum R as compared to C. glutamicum ATCC 13032 since only phage remnants are observed (Table 2
). Nevertheless, SSI-8, 42.7 kb in size and with a G+C content of 52.1 mol% (Suzuki et al., 2005a
), could have originated from a previously unknown phage as it encodes numerous hypothetical proteins and several ORFs, the products of which share some homology to known phage proteins. This is in sharp contrast to what is observed in C. glutamicum ATCC 13032 isolates that harbour three to four different putative prophages (Kalinowski, 2005
).
|
The C. glutamicum R and ATCC 13032 genomes contain similar numbers of putative operons in concordance with the numbers of rho-independent transcription terminators containing a stemloop and poly-U tail. Using the TIGR Comprehensive Microbial Resource (Ermolaeva et al., 2001
), we calculated that C. glutamicum ATCC 13032 contains 455 operons and 485 rho-independent terminators. Notably, cgR1278, encoding the transcription termination factor rho, is probably an essential gene in C. glutamicum R (Suzuki et al., 2006
).
Corynebacterial core genes
The corynebacterial backbone, defined by the set of orthologous genes present in all corynebacteria sequenced to date [C. glutamicum ATCC 13032 (Ikeda & Nakagawa, 2003
; Kalinowski et al., 2003
), C. glutamicum R (this work), C. efficiens YS-314 (Nishio et al., 2003
), C. jeikeium K411 (Tauch et al., 2005
) and C. diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003
)], comprises 835 genes assigned to COG categories (Table 3
), with the total number of shared genes corresponding to approximately a third of the genes present in the saprophytic corynebacteria, in agreement with the value reported by other authors (Tauch et al., 2005
) (1089 genes). However, the subset of genes that are essential is significantly lower than the core corynebacterial genome, as demonstrated by the successful transposon mutagenesis of at least 75 % of the ORFs of strain R using transposons Tn5 and Tn31831 (Suzuki et al., 2006
; Vertès et al., 2005
). It has been demonstrated that the genes that are indispensable for growth of Bacillus subtilis cells in rich medium under standard laboratory conditions belong to the following known categories: DNA and RNA metabolism, protein synthesis, cell envelope, cell shape and division, glycolysis, respiratory pathways, nucleotides and cofactors (Kobayashi et al., 2003
). Based on the genomic sequences of C. glutamicum R (this work) and of other corynebacteria (Cerdeño-Tárraga et al., 2003
; Ikeda & Nakagawa, 2003
; Kalinowski et al., 2003
; Nishio et al., 2003
; Tauch et al., 2005
), these fundamental cellular processes are also essentially conserved in corynebacterial genomes.
|
A, the sigma factor that directs the transcription of most genes in growing cells, and
B, which plays a crucial role in the maintenance of the stationary phase in M. smegmatis (Mukherjee & Chatterji, 2005
D which is disrupted by a spacer region in C. glutamicum R. In M. tuberculosis H37Rv,
D has been shown to control the expression of ribosome-associated gene products in the stationary phase and to be required for full virulence (Calamita et al., 2005
H has been observed to regulate major components of oxidative and heat stress responses (Raman et al., 2001
L, an orthologue of which is present in C. diphtheriae NCTC 13129 but not in the other corynebacteria sequenced to date, has been observed to regulate polyketide synthases and secreted or membrane proteins (Hahn et al., 2005
H factor, with which it forms a putative operon, as observed in Nocardia farcinica IFM 10152 (Ishikawa et al., 2004
C has been shown to be required for lethality of M. tuberculosis in mice (Sun et al., 2004
E is induced upon treatment of M. tuberculosis cells with hydrogen peroxide or upon macrophage infection (Jensen-Cain & Quinn, 2001
K,
L) and C. jeikeium K411 (
K,
L,
W). This latter observation particularly reinforces the view that
K is involved in bacterial virulence, as a
K orthologue is present in M. tuberculosis H37Rv but absent from M. smegmatis MC2 (Waagmeester et al., 2005
|
Sugar metabolism
Corynebacteria are able to utilize only a limited array of different sugars. For example, wild-type C. glutamicum R cells are able to utilize fructose, glucose, glucuronic acid, glucosamine, maltose, mannose,
-methylglucoside, ribose, sucrose, trehalose, arbutin and salicin as sole carbon sources, but not arabinose, galactose, lactose, cellobiose, mannitol, rhamnose, xylose or xylitol (this work). C. glutamicum transports several sugars, including sucrose, fructose and glucose, by the phosphotransferase system (PTS) (Moon et al., 2005
; Saier, 2002
). Sugar transport plays an important role in the observed substrate range limitation of C. glutamicum, as exemplified by the isolation of a spontaneous PTS mutant of C. glutamicum R that is able to extend the catabolic capabilities of this organism to the degradation of cellobiose (Kotrba et al., 2003
). However, this limitation in carbon substrate spectrum is also due to the lack of specific catabolic genes, such as the lack of xylose isomerase which prevents the catabolism of the pentose xylose, despite the presence of ATP-binding cassette proteins putatively involved in the transport of this sugar (specifically that encoded by cgR1331, which is part of a five-gene operon containing a lacI-type transcriptional regulator, and which shows, respectively, 48 and 44 % homology to the xylF genes of Geobacillus kaustophilus HTA426 and E. coli K-12 which encode the periplasmic xylose-binding subunit of a high-affinity xylose ABC-transporter; moreover, synteny comparisons suggest the presence also of the xylG (cgR1329) and xylH genes (cgR1330) encoding, respectively, the ATP-binding and membrane components of a xylose ABC transporter). In addition, several sugar/proton symports could also be involved in the uptake of xylose by wild-type C. glutamicum cells, since cgR0261, cgR2943, cgR2864, cgR2290 and cgR2267 all show homology levels greater than 40 % with E. coli xylE, a gene that encodes a D-xylose/proton symporter from the major facilitator superfamily of transporters.
Notably, all corynebacteria sequenced to date, including C. glutamicum R (cgR1200) and ATCC 13032 (VIMSS376610), as well as C. diphtheriae NCTC 13129 (VIMSS520668) and C. jeikeium K411 (VIMSS844316), exhibit a putative
-fructofuranosidase gene encoding a protein that is 49 % similar (34 % identical) to the fruA gene product of Bacillus megaterium ATCC 14581 (Chiou et al., 2002
). The capacity of corynebacteria to synthesize fructans, and particularly pathogenic corynebacteria, would thus be interesting to verify since fructans can trigger inflammatory reactions (Shilo & Wolman, 1958
) and thus contribute to disease progression. However, only C. glutamicum R (cgR2548) and C. glutamicum ATCC 13032 (VIMSS376610) exhibit a putative sacA gene encoding a second
-fructofuranosidase. The C. glutamicum sacA gene product shares 48 % homology with the B. subtilis 168 SacA (Glaser et al., 1993
) and is only 26 % identical to the product of the putative C. glutamicum fruA gene. In both C. glutamicum R and ATCC 13032, sacA is part of a putative four-gene operon that includes a phosphotransferase system component (PTS enzyme IIC) (respectively, cgR2547 and VIMSS376609). Both fruA and sacA are dispensable, as demonstrated by the disruption and replacement of these genes in C. glutamicum R (this work).
Furthermore, consistent with the notion that corynebacteria use glycogen as their major polyglucan reserve, glycogen metabolism genes are highly conserved among these bacteria. In particular, corynebacterial genomes exhibit sequences homologous to a glycosyl transferase (cgR1201) linked in a putative operon to glgC, the gene encoding ADP-glucose pyrophosphorylase (strain R, cgR1202; strain ATCC 13032, VIMSS375129). Also observed are a putative two-gene operon, glgB, encoding glycosyl transferase (cgR1302-cgR1303), and glgX, encoding a glycogen debranching enzyme (cgR1991) that forms a putative operon with a regulatory protein of the TetR family (cgR1990). Likewise, all corynebacterial trehalose metabolic genes identified previously (Tzvetkov et al., 2003
) (treS, cgR2175; treY, cgR2002; treZ, cgR2009; otsA, cgR2531; otsB, cgR2533) are present in C. glutamicum R. Therefore, it is likely that in this bacterium all three trehalose metabolic pathways found in bacteria are enabled [TreS pathway from maltose, TreY/TreZ pathway from
(14)glucose polymers, OtsA/OtsB pathway from glucose 6-phosphate and UDP-glucose]. This observation reinforces the view that this non-reducing sugar plays a major physiological role in Actinobacteria as energy storage and as an environmental protectant against various stresses such as low water activity (desiccation, dehydration), external osmolality fluctuations, heat, cold and oxidation. It is also noteworthy that the genes otsA and otsB are part of a putative five-gene operon which is adjacent to a transcriptional regulator of the lacI family (cgR2534) located downstream and in the opposite orientation.
Two-component systems
Sensing of environmental conditions to ensure variability and adaptability to environmental changes is enabled by two-component systems linked to signal transduction cascades that function via phosphorelays (Taylor & Zhulin, 1999
). Basic two-component systems comprise a membrane-associated sensor kinase that detects external stimuli and a transcriptional regulator that acts upon the cellular machinery to bring about the necessary adaptive changes (Fontan et al., 2004
). The genes encoding these two proteins are typically organized in two-gene operons. A similar organization is observed in the genome of C. glutamicum R where 27 ORFs are present that share a high level of homology with two-component system genes (Table 5
). One orphan regulator (cgR0730), which is dispensable as demonstrated by the absence of phenotypic change upon its deletion (this work), and 13 putative two-component systems organized in two-gene operons encoding a sensor kinase and a regulator, have been identified. Despite being absent from the genome of C. glutamicum R, narQ, the putative cognate kinase gene of cgR0730, is present in C. glutamicum ATCC 13032, C. efficiens YS-314 and C. diphtheriae NCTC 13129. On the other hand, orthologues of senX3/regX3, mtrA/mtrB, mprB/mprA and phoR/phoP are present in all the corynebacteria and mycobacteria sequenced to date (Table 5
).
|
an et al., 2006
Of these two-component systems, only mtrA/mtrB has been shown to be essential to the growth of M. tuberculosis as demonstrated by unsuccessful attempts to disrupt the mtrA gene, despite the fact that mtrB appears to be non-essential (Fontan et al., 2004
; Zahrt & Deretic, 2000
). The two-component system mtrA/mtrB is conserved in all actinobacteria (Hoskisson & Hutchings, 2006
) and its regulator is the only response regulator that is induced in M. tuberculosis during macrophage infection, but not in broth culture (Zahrt & Deretic, 2000
). The two-component system MtrA/MtrB has been shown to strongly influence cellular morphology, antibiotic susceptibility and genetic expression of osmoprotection as demonstrated by the phenotype exhibited by a C. glutamicum ATCC 13032 double mutant (Möker et al., 2004
). In C. glutamicum R we succeeded in disrupting either cgR0863 (putative mtrA) or cgR0864 (putative mtrB) by inserting a transposon in the central regions of these genes. Neither of the resulting single-gene disruptants showed any significantly altered cellular morphology or altered growth pattern, perhaps indicative of cross-talk between the MtrA and MtrB proteins and one or more other two-component system proteins. The gene immediately downstream of (and overlapping) mtrB is conserved in all actinobacteria, where it encodes the actinobacteria signature protein LpqB (Gao et al., 2006
). In the corynebacteria and mycobacteria sequenced to date, mtrA and mtrB are part of a putative operon that also contains sahH (cgR0861 and VIMSS374775 in C. glutamicum R and ATCC 13032, respectively), encoding S-adenosylhomocysteine hydrolase, and tmpk (cgR0862 and VIMSS374776, respectively), encoding thymidylate kinase. On the other hand, the cgR0122-cgR0123 two-component system, a putative orthologue of the yycF/yycG system of B. subtilis that modulates the expression of the ftsAZ operon (Fukuchi et al., 2000
), can be deleted from the C. glutamicum R genome by excision of SSI-3 without any significant phenotypic change (Suzuki et al., 2005b
). Brocker & Bott (cited in Ko
an et al., 2006
) suggested that this two-component system is involved in the genetic regulation of copper metabolism. Interestingly, although this two-component system is borne by an SSI in C. glutamicum R, it shows a high level of identity with orthologues found in all the other corynebacteria sequenced to date. On the other hand, its relatively high G+C content (greater than 60 mol%) in all corynebacteria sequenced to date would suggest that its acquisition is a recent event in evolutionary terms. Last, the response regulator encoded by gene cgR2566, the product of which shares a high level of homology with the BaeR protein of E. coli, is also conserved in all corynebacteria sequenced to date (Table 5
). We did not succeed in disrupting or deleting gene cgR2566 despite being able to mutate cgR2567 by transposon insertion, its cognate gene encoding a sensor kinase.
Interestingly, two corynebacterial two-component systems appear to be specific to saprophytic corynebacteria. Their highest levels of identity are shared with the putative B. subtilis systems ykoG/ykoH (cgR0359/cgR0360) and yvfT/yvfU (cgR1049/cgR1050) (Fabret et al., 1999
). Similar to what has been observed in B. subtilis, these genes appear to be non-essential in corynebacteria as demonstrated by the isolation of several transposon mutants of C. glutamicum R where these genes had been inactivated (this work). However, saprophytic corynebacteria appear to be devoid of chrS/chrA or cstS/cstA orthologues. These latter two-component systems, present in both C. diphtheriae and C. jeikeium, have been linked to haem and haemoglobin sensing (Schmitt, 1999
). Nevertheless, ORFs cgR2844 and cgR1838 show low identity but relatively high homology to chrA (67 and 63 %, respectively) and cstA (53 %) from pathogenic corynebacteria. Likewise, corynebacteria appear to be devoid of devS/devR orthologues, which have been linked in M. tuberculosis and M. bovis with hypoxic dormancy (Boon & Dick, 2002
; Fontan et al., 2004
; Saini et al., 2004
), albeit that cgR2844 and cgR1838 are 52 and 57 % homologous, respectively, to M. bovis devR. Genes cgR2844 and cgR2845 are dispensable in C. glutamicum R, as demonstrated by their simultaneous deletion via Cre/loxP-mediated rearrangements (this work).
Moreover, the genome of C. glutamicum R reveals the presence of two unique two-component systems, exhibiting high similarity to cutS/cutR of Streptomyces coelicolor involved in negative regulation of actinorhodin synthesis (Hutchings et al., 2004
), and to fixL/fixJ of Azorhizobium caulinodans involved in nitrogen fixation in response to low-oxygen conditions (Fischer, 1994
). These genes are present on two different SSIs (SSI-5, a region rich in transposase genes, and SSI-9, respectively) (Suzuki et al., 2005a
) and are characterized by G+C contents that differ significantly from the typical G+C content of corynebacterial genomes (Table 5
). These observations corroborate the view that these sequences have been acquired by C. glutamicum R through horizontal transfer events that occurred relatively recently in evolutionary terms. As previously reported, these sequences are dispensable as demonstrated by the absence of a specific phenotype in the corresponding C. glutamicum R deletion mutants (Suzuki et al., 2005a
). Similarly, C. glutamicum ATCC 13032 exhibits one unique two-component system (VIMSS376724 and VIMSS376723) that shows weak identity to two C. efficiens genes. It is also worth noting that C. diphtheriae also possesses a unique two-component system on its chromosome (VIMSS519770 and VIMSS519769), albeit that two similar genes are borne by the C. efficiens plasmid pCE3 (Table 5
). Likewise, C. jeikeium K411 encodes two two-component system genes, kdpD and kdpE, putatively involved in turgor pressure sensing and regulation (Hutchings et al., 2004
), that are absent from the other corynebacteria sequenced to date, but present in mycobacteria and Streptomyces species (Hutchings et al., 2004
).
These different observations promote the view that, except for a few sensorregulator couples that are highly conserved in Corynebacterineae, the sensing and regulation machinery of corynebacteria is highly plastic. This raises the question whether horizontal transfers of these important adaptation genes, rather than gene decay, has played a major role during the course of evolution of these bacteria. Such a view is consistent with the observed presence in a few corynebacterial strains of two-component systems for which orthologues can only be found in mycobacteria, Streptomyces species or N. farcinica, but not in the other corynebacteria sequenced to date. Furthermore, a few of these observed systems either are borne by an episome or are part of an SSI. On the other hand, the overall conservation of these systems between corynebacteria and mycobacteria further validates corynebacteria as important model organisms to contribute to the understanding of the biology of slow-growing pathogenic mycobacteria.
Global differences between saprophytic and pathogenic corynebacteria
Based on our calculations and model, the presence in the corynebacteria sequenced to date of up to 762 genes identified in COG appears limited to the saprophytes (Table 3
). In contrast, pathogenic corynebacteria display fewer candidate pathogen-specific genes (Table 3
). For example, we calculated that the saprophytic corynebacteria sequenced to date exhibit approximately 30 % more transporters per number of ORFs in their genomes than the pathogenic organisms C. diphtheriae and C. jeikeium. This difference constitutes an indication of the relatively greater metabolic versatility of the former organisms. This observation also suggests a higher degree of transporter substrate specificity and a higher number of secondary carriers, since overall the substrate specificities of transport systems of an organism are correlated to its ecological niche and the diversity and relative concentrations of nutrients it encounters (Paulsen et al., 2000
).
Likewise, among the genes potentially involved in host interactions, microbiofilm formation, DNA transfer and bacteriophage attachment, C. diphtheriae NCTC 13129 harbours 11 genes putatively encoding fimbrial proteins or fimbria-associated proteins, whereas C. glutamicum R only exhibits four such genes which are also present in C. efficiens (cgR2789, VIMSS301626; cgR2790, VIMSS301627; cgR2791, VIMSS301628; cgR2793, VIMSS301630, respectively). Notably, only one cluster of pili genes is found in saprophytic bacteria, whereas the genome of C. diphtheriae NCTC 13129 exhibits three such clusters (Gaspar & Ton-That, 2006
). Similarly, C. diphtheriae NCTC 13129 and C. jeikeium K411 harbour, in comparison with C. glutamicum and C. efficiens, a relatively larger number of secreted or surface-anchored proteins, in relation to the ecological niche occupied by these species.
Notably, genes related to amino acid transport and metabolism are fewer in number in pathogenic corynebacteria than in the saprophytic ones (Table 3
). In addition to horizontal DNA transfer that has occurred in the saprophytic corynebacteria, this difference has been ascribed to gene decay that has occurred during the course of the evolution of the former organisms (Nishio et al., 2004
). Interestingly, this specialization is not the rule in Corynebacterineae, since for example the chromosome sequence of N. farcinica reveals that this organism includes many genes for virulence, drug resistance and secondary metabolism. Interestingly, analyses of paralogous protein families suggest that gene duplications have resulted in N. farcinica being able to survive not only in soil environments, but also in animal tissues (Ishikawa et al., 2004
).
Functional differences between C. glutamicum R and ATCC 13032
A detailed comparison of the genomic sequence of C. glutamicum R with that of ATCC 13032 reveals that only 60 and 189 genes, respectively, are strain-specific. Relatively, both of these strains encode the same number of genes per COG category (Table 3
), with a few genes unique to either C. glutamicum strain R or ATCC 13032 when compared to the corynebacteria sequenced to date (Table 3
). With the exception of sequences from mobile elements, the most striking differences in the number of genes are observed in amino acid transport and metabolism, and secondary metabolite transport and metabolism.
Genes unique to C. glutamicum R
A total of 9 % (263) of the predicted genes of C. glutamicum ATCC 13032 remain hypothetical or specific to C. glutamicum when compared to C. efficiens YS-314 and C. diphtheriae NCTC 13129 (Kalinowski et al., 2003
). It has been suggested that sequencing more than one or two genomes per species is necessary to access bacterial pan-genomes (Tettelin et al., 2005
). For example, the sequencing of eight strains of Streptococcus agalactiae was found to be sufficient to define the core genome of these organisms with 95 % confidence, whereas each new S. agalactiae genome sequence would reveal an extrapolated 33 new genes, with a 6x104 probability that this number falls to zero. The complete genome sequence of C. glutamicum R reveals 39 genes that to the best of our knowledge have not been previously identified in corynebacteria. In addition to the previously described
-glucoside phosphotransferase (cgR0436) that confers cellobiose utilization properties upon occurrence of a single amino acid substitution (Kotrba et al., 2003
), and several mobile-element-derived ORFs (ISCgR3a, ISCgR5, ISCgR12b, ISCgR13a, ISCgR13b), C. glutamicum R notably encodes five novel conserved hypothetical proteins (cgR0052, cgR0067, cgR1134, cgR2375, cgR2798), four membrane proteins putatively involved in transport mechanisms (cgR0768, cgR2326, cgR2800, cgR2956), as well as four regulatory proteins (cgR0139, cgR0414, cgR1106, cgR2822) in addition to the two two-component systems cgR2292/cgR2299 and cgR0540/cgR0541 discussed previously. Interestingly, C. glutamicum R encodes several genes that reflect its ecological niche in soil, as exemplified by a putative tyramine oxidase gene (cgR0016) that could be involved in the degradation of phenethylamines, compounds present in natural environments, or by a putative L-asparaginase (cgR2808). As a result, to access the diversity of the metabolism of corynebacteria, a significant number of additional genome sequences would need to be determined.
| ACKNOWLEDGEMENTS |
|---|
Edited by: C. W. Chen
| REFERENCES |
|---|
|
|
|---|
Besemer, J., Lomsadze, A. & Borodovsky, M. (2001). GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29, 26072618.
Blattner, F. R., Plunkett, G., 3rd, Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K. & other authors (1997). The complete genome sequence of Escherichia coli K-12. Science 277, 14531474.
Bonamy, C., Labarre, J., Reyes, O. & Leblon, G. (1994). Identification of IS1206, a Corynebacterium glutamicum IS3-related insertion sequence and phylogenetic analysis. Mol Microbiol 14, 571581.[CrossRef]