Microbiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Microbiology 153 (2007), 1042-1058; DOI  10.1099/mic.0.2006/003657-0
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Supplementary Tables
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Yukawa, H.
Right arrow Articles by Inui, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yukawa, H.
Right arrow Articles by Inui, M.
Agricola
Right arrow Articles by Yukawa, H.
Right arrow Articles by Inui, M.
Microbiology 153 (2007), 1042-1058; DOI  10.1099/mic.0.2006/003657-0
© 2007 Society for General Microbiology

Comparative analysis of the Corynebacterium glutamicum group and complete genome sequence of strain R

Hideaki Yukawa1,2, Crispinus A. Omumasaba1, Hiroshi Nonaka1, Péter Kós1,{dagger}, Naoko Okai1, Nobuaki Suzuki1, Masako Suda1, Yota Tsuge1,2, Junko Watanabe1, Yoko Ikeda1, Alain A. Vertès1 and Masayuki Inui1

1 Microbiology Research Group, Research Institute of Innovative Technology for the Earth (RITE), Soraku, Kyoto 619-0292, Japan
2 Graduate School of Biological Sciences, Nara Institute of Science and Technology, Ikoma, Nara 630-0101, Japan

Correspondence
Hideaki Yukawa
mmg-lab{at}rite.or.jp


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
The complete genome sequence of Corynebacterium glutamicum strain R was determined to allow its comparative analysis with other corynebacteria. The biology of corynebacteria was explored by refining the definition of the subset of genes that constitutes the corynebacterial core as well as those characteristic of saprophytic and pathogenic ecological niches. In addition, the relative scarcity of corynebacterial sigma factors and the plasticity of their two-component system machinery reflect their relatively exacting nutritional requirements and reduced membrane-associated and secreted proteins. The conservation of key genes and pathways between corynebacteria, mycobacteria and Nocardia validates the use of C. glutamicum to study fundamental processes that are conserved in slow-growing mycobacteria, including pathogenesis-associated mechanisms. The discovery of 39 novel genes in C. glutamicum R that have not been previously reported in other corynebacteria supports the rationale for sequencing additional corynebacterial genomes to better define the corynebacterial pan-genome and identify previously undetected metabolic pathways in these organisms.


Abbreviations: CDS, coding sequence(s); COG, clusters of orthologous groups; ECF, extracytoplasmic function; PTS, phosphotransferase system; SSI, strain-specific island

The DDBJ/EMBL/GenBank accession numbers for the complete sequence of the C. glutamicum R genome and its native episome PCgR1 are AP009044 and AP009045, respectively.

Tables of strains and plasmids, and of oligonucleotides and primers used in this study are available with the online version of this paper.

{dagger}Present address: Institute of Plant Biology, Hungarian Academy of Sciences, Szeged, Hungary.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Corynebacterium glutamicum is a fast-growing, aerobic, non-sporulating, non-motile, saprophytic, Gram-positive micro-organism that can be isolated from soil samples based on its properties to secrete large amounts of glutamic acid under suitable conditions (Kumagai, 2000Down; Vertès et al., 2005Down). Corynebacteria belong to the order Actinomycetales of the eubacteria that is characterized by a high G+C content and that constitutes a different evolutionary line from that formed by low-G+C content micro-organisms such as bacilli or clostridia (Stackebrandt & Woese, 1981Down; Stackebrandt et al., 1997Down). The genus Corynebacterium is closely related to Mycobacterium and Nocardia, among other genera, which form the Corynebacterineae suborder (Liebl, 2005Down). The genus includes both aerobes and facultative anaerobes, is phenotypically very diverse and forms a monophyletic group that exhibits considerable phylogenetic depth (Liebl, 2005Down). It consists of 59 validly described species, of which two taxon groups and 35 species are medically relevant (von Graevenitz & Bernard, 2001Down). Non-medical corynebacteria are widely disseminated in nature and have been isolated from a number of different environments other than soil, including dairy products, plant material, faeces and animal skin (Liebl, 2001Down). Except for phage-mediated transfer and a few conjugative plasmids, corynebacteria appear devoid of a natural competence system for exogenous DNA uptake (Vertès et al., 2005Down). C. glutamicum has a long history of use for the industrial production of various primary metabolites, including amino acids and nucleotides (Demain, 2000Down; Hermann, 2003Down). Moreover, its potential as a commodity chemicals producer (Inui et al., 2004Down) and for bioremediation applications (Shen et al., 2005Down) is the focus of increasing research efforts.

The complete genome sequences of two variants of C. glutamicum ATCC 13032 have been published (Ikeda & Nakagawa, 2003Down; Kalinowski et al., 2003Down), as have those of the closely related Corynebacterium efficiens YS-314 (Nishio et al., 2003Down), and of the two human pathogens Corynebacterium jeikeium K411 (Tauch et al., 2005Down) and Corynebacterium diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003Down). We report here the genomic sequence of C. glutamicum R, a strain with industrial potential (Inui et al., 2004Down) that was isolated from soil sampled in Japan.

The availability of these different genomic data allows the identification of the corynebacterial core genes and of those genes directly related to various ecological niches. In addition, comparative analysis of the sigma factors, secreted proteins, sugar metabolism and two-component systems present in C. glutamicum R enables further assessment of the industrial potential of this strain and its metabolic and regulatory specificities.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Plasmids, bacterial strains and culture conditions.
Strains and plasmids used in this study are shown in supplementary Table S1 (available with the online version of this paper). C. glutamicum R is a strain from our laboratory collection that was isolated in Japan from a meadow soil sample. Escherichia coli cells were grown at 37 °C in LB medium supplemented where necessary with 50 µg ml–1 of either ampicillin (Ap) or chloramphenicol (Cm) (Sambrook et al., 1989Down). Unless otherwise stated, C. glutamicum and C. efficiens cells were grown in nutrient-rich A medium (Inui et al., 2004Down) at 33 °C for 48 h. For sugar utilization experiments, a corynebacterial cell starter culture grown aerobically until late exponential phase in A-medium containing 40 g glucose l–1 was used to inoculate BT minimum medium (Inui et al., 2004Down) containing 40 g l–1 of the sugar being tested, using glucose as a positive control for growth.

DNA techniques.
Corynebacterial chromosomal DNA was isolated following standard methods (Sambrook et al., 1989Down) modified by using 4 mg lysozyme ml–1 at 37 °C for 30 min. Cells were transformed as previously described (Vertès et al., 1993bDown) using E. coli JM110 plasmid DNA. E. coli plasmid DNA isolation and strain transformation were performed following standard methods (Sambrook et al., 1989Down). Restriction endonucleases, Klenow fragment and T4 DNA ligase were used as per the manufacturer's instructions (Takara). Restriction fragments were isolated from agarose gels with the GeneClean kit (Bio 101), according to the manufacturer's instructions. PCR was performed using Ex-Taq DNA polymerase (Takara). Prior to sequencing, exonuclease treatment was performed using ExoSAP-IT (USB), as per the manufacturer's instructions.

Library construction.
Random fragments resulting from sonication of C. glutamicum R chromosomal DNA were separated on agarose gel into one 2–3 kb pool and another 8–9 kb pool. The fragments were blunted and ligated into SmaI-digested pUC119. The ligation mixture was used to transform E. coli JM109 and recombinants were selected on IPTG-supplemented plates. Gaps between contigs were closed using a Lambda FIX II/XhoI replacement phage library with a mean insert size of 20 kb, as per the manufacturer's instructions (Stratagene). Fragments (1–2 kb) at the end of the assembled contigs were amplified by PCR from the chromosomal DNA of C. glutamicum R, labelled using a Gene Image Random Prime Labelling Module (GE Healthcare Bio-Sciences Corp.) and used as probes to screen the phage library by plaque hybridization. Several genomic DNA fragments extracted from the positive phage clones were sequenced after subcloning into vector pUC18 (Sambrook et al., 1989Down). For sequencing purposes, E. coli clones bearing C. glutamicum R chromosomal DNA fragments were grown overnight and the corresponding plasmids were isolated. Gaps in the assembled sequence were closed by PCR-mediated genome walking.

Genome sequencing.
Using the whole-genome shotgun method, libraries of 2–3, 8–9 and 20 kb genomic inserts were sequenced from both ends using M13 universal forward and reverse primers (Sambrook et al., 1989Down) and cycle-sequenced using the BigDye Terminator method in ABI 3700 CE and ABI 3730 DNA analysers (Applied Biosystems). The sequences were base-called and assembled using Phred, Phrap and Consed (Ewing et al., 1998Down; Gordon et al., 1998Down). The Pregap4 program of the Staden package (Bonfield et al., 1995Down; Staden, 1996Down) was used for clipping vector sequences, as well as for quality clipping and contamination screening after base-calling by Phred (Ewing & Green, 1998Down; Ewing et al., 1998Down). Gaps were closed by primer walking on gap-spanning plasmid clones and direct sequencing of PCR products. Repetitive sequences such as rDNA were confirmed by PCR. The error rate was lower than 2 bases per 10 kb as calculated using Consed (Gordon et al., 1998Down).

Gene prediction and analysis.
rRNAs were located by a BLASTN homology search against the 16S, 23S and 5S rRNA sequences of C. glutamicum ATCC 13032. tRNAs were predicted by tRNA scan SE (Lowe & Eddy, 1997Down). Protein coding sequences (CDS) were predicted using Glimmer3 (Delcher et al., 1999aDown; Salzberg et al., 1998Down) and GeneMarkS (Besemer et al., 2001Down). Proteome prediction was performed by a BLASTP homology search using an e-value lower than 1xe–4 against GenBank release 152, GenPept release 152, UniProt release 4.6, the NCBI clusters of orthologous groups (COG) database (Tatusov et al., 1997Down) and the Pfam family database (Accelrys GCG Wisconsin package version 11.0). Numerous annotations were checked manually after the auto-annotations were performed. The search for repeats at each extremity of each of the 10 major strain-specific islands (SSIs) present in the genome of C. glutamicum R (Suzuki et al., 2005aDown) was performed using the EMBOSS package (Rice et al., 2000Down).

Identification of orthologous CDS and genomic islands.
All the CDS of C. glutamicum R, C. glutamicum ATCC 13032 (Ikeda & Nakagawa, 2003Down; Kalinowski et al., 2003Down), C. efficiens YS-314 (Nishio et al., 2003Down), C. diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003Down) and C. jeikeium K411 (Tauch et al., 2005Down) were compared to each other in a reciprocal manner using BLASTP with an e-value of xe–4 as a cut-off. Genes showing the highest similarity levels in a dual strain comparison were automatically parsed for every CDS present in both strains using an original Perl script. The CDS with a reciprocal best hit were defined as being orthologous CDS of the two strains.

Identification of genomic DNA islands in the C. glutamicum R and C. glutamicum ATCC 13032 genome was performed using MUMer2.1 (Delcher et al., 1999bDown, 2002Down).

Transposon, deletion, and gene disruption and replacement mutagenesis.
We used a combination of Tn5 and Tn31831 mutagenesis systems to assemble a library of 2300 different transposon mutants (Suzuki et al., 2006Down; Vertès et al., 2005Down), and the Cre-loxP system (Suzuki et al., 2005bDown) to generate deletion mutants. Mutations were verified by PCR using the oligonucleotides and nucleotide primers shown in Supplementary Table S2 (available with the online version of this paper). Gene disruption and replacement mutagenesis were performed as described previously (Vertès et al., 1993aDown) using primers indicated in Supplementary Table S2.

Gene identification numbers.
Gene identification numbers are from the Virtual Institute of Microbial Stress and Survival (VIMSS) database (Alm et al., 2005Down).


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
General features of C. glutamicum R
The genetic basis of C. glutamicum R consists of a 49 120 bp native episome (PCgR1) and one circular chromosome of 3 314 179 bp encoding 2990 ORFs with a mean length of 957 bp (Table 1Down and Fig. 1Down). The G+C content of the chromosome is 54.1 mol% overall, 55.2 mol% for the protein coding regions and 47.4 mol% for the non-coding regions. PCgR1 exhibits a G+C content of 53.9 mol% that is very similar to that of the chromosome, suggesting that its acquisition by strain R is not a recent event unless it was acquired by horizontal transfer from an organism with similar G+C content. It encodes 28 putative proteins, 647 bp long on average, and has a relatively low coding density (36.9 %). It does not appear to be similar to any episome previously identified in corynebacteria. The G+C content of the protein coding regions of PCgR1 is 55.4 mol% on average and that of the non-coding regions, 53.1 mol%.


View this table:
[in this window]
[in a new window]

 
Table 1. General features of C. glutamicum R and related bacteria

Abbreviations: Cg R, C. glutamicum R; Cg K, Kitasato University C. glutamicum ATCC 13032 isolate; Cg B, Bielefeld University C. glutamicum ATCC 13032 isolate; Ce, C. efficiens; Cj, C. jeikeium; Ms, Mycobacterium smegmatis; Mt, Mycobacterium tuberculosis; Nf, Nocardia farcinica. The M. smegmatis sequence was obtained from The Institute of Genome Research (TIGR).

 

Figure 1
View larger version (48K):
[in this window]
[in a new window]

 
Fig. 1. Circular representation of the genome of C. glutamicum R. The first base of the initiation codon of the dnaA gene was set as the origin of the coordinates. Coordinates are given in kb. The two outermost circles represent the predicted CDS on the forward (maroon) strand on the outside and reverse (green) strand on the inside. The third circle from the outside shows the location of SSIs greater than 10 kb (SSI-1–SSI-11) along the genome. The localization of insertion sequences (ISCgR1–ISCgR13) and phage-derived proteins (PPCgR1–PPCgR8) is depicted. The colour of the labelled genes corresponds to the colour of the strand on which they are located. The fourth and fifth circles represent the G+C content and the GC skew (G–C)/(G+C), respectively, each plotted using a 3000 bp window with a 1000 bp window overlap. Red regions (pointing outwards) in these two circles are those of high G+C content or high GC skew, whereas blue regions (pointing inwards) are those of low G+C content or low GC skew. The map was created using original scripts and the CGView software (Stothard & Wishart, 2005Down).

 
The size of the C. glutamicum R genome is similar to that of the other saprophytic corynebacteria that have been sequenced to date, and significantly larger than that of the genomes of the pathogenic organisms C. jeikeium and C. diphtheriae (Table 1Up). The lower number of genes observed in the clinical isolates sequenced to date perhaps reflects that fewer metabolic functions are needed for corynebacteria to occupy a clinical ecological niche rather than a soil-based environment, consistent with the observation that gene decay and gene reduction have played a central role in the evolution of the Mycobacterium leprae (Cole et al., 2001Down) and C. diphtheriae (Nishio et al., 2004Down) chromosomes. These various corynebacterial complete genome sequences confirm the observation that, of the organisms forming the Corynebacterineae phylogenetic cluster, corynebacteria have both the smallest genomes and the lowest coding densities (Table 1Up), although this difference is perhaps an artifact as gene finding in genomes of high G+C content tends to predict more or extended coding regions due to lower frequencies of stop codons.

The genome of strain R encodes 6 rRNA operons and 57 tRNA genes. The first base of the initiation codon of the dnaA gene was designated as the sequence coordinate origin. It is located near characteristic replication regions. The GC skew analysis (Grigoriev, 1998Down) presented in Fig. 1Up supports the view that DNA replication in C. glutamicum is bidirectional, as observed in strain ATCC 13032 (Kalinowski et al., 2003Down). It is noteworthy that the G+C content variation is low in this genome. Nevertheless, a comparison of the genome of C. glutamicum strain R with the published genome sequences of two isolates of C. glutamicum ATCC 13032 (Ikeda & Nakagawa, 2003Down; Kalinowski et al., 2003Down) revealed the presence of hundreds of SSIs (Suzuki et al., 2005bDown). In particular, strain R exhibits 11 SSIs larger than 10 kb (Fig. 1Up) that have a G+C content ranging from 45.7 to 60.7 mol% (Suzuki et al., 2005aDown). Notably, none of these was observed to be flanked by obvious repeats. However, it is noteworthy that strain R is devoid of the AT-rich 211 kb genomic island with clear boundaries (Zhang & Zhang, 2004Down) that is carried by strain ATCC 13032 and that contains genes typically associated with horizontal transfer, such as a restriction-modification system, transposases, recombination enzymes and phage-derived sequences (Kalinowski et al., 2003Down). Likewise, the AT-rich 25 kb region harboured by strain ATCC 13032 containing the genes cg0414–cg0440 is absent from the strain R genome. It is noteworthy that products of these genes are involved in cell wall formation, including cell surface polysaccharide (wzz, cg0414), lipopolysaccharide (various glycosyl transferases) or murein formation (murA, cg0422; murB, cg0423). These observations suggest that measurable differences could exist between the cell walls of the two strains that could perhaps form the basis of immunotyping procedures. Furthermore, the 14 kb G+C-rich region identified in the genome of strain ATCC 13032 (Kalinowski et al., 2003Down) to contain C. diphtheriae sequences (95 % identity at the nucleotide level) is absent from the genome of strain R, which does not exhibit any sequence similar to the gene cluster cg3276–cg3290. This observation promotes the view that these genes have, on an evolutionary timescale, recently been acquired by C. glutamicum ATCC 13032 in a horizontal transfer event originating from C. diphtheriae. On the other hand, it is noteworthy that the strain R SSI-4, characterized by a low G+C content (45.7 mol%) (Suzuki et al., 2005aDown), and thus probably resulting from a horizontal transfer, does not contain any obvious phage- or mobile-element-related sequence.

Strain R encodes 22 sequences homologous to mobile elements, including six incomplete insertion sequence signatures, and five insertion sequences that contain two ORFs which putatively form a full transposase protein via a frameshift (ISCgR3a-b, ISCgR10a-b, ISCgR11a-b, ISCgR12a-b, ISCgR13a-b) (Table 2Down). The presumably functional mobile elements found in C. glutamicum R originate from two different families (IS3, ISCgR11; IS6, ISCgR1, ISCgR2, ISCgR9), as defined by Mahillon & Chandler (1998)Down. ISCgR9 is an isoform of IS1628 (Mahillon & Chandler, 1998Down). ISCgR3 and ISCgR13 constitute novel elements that share a relatively low level of identity at the amino acid level with a putative Rhodococcus erythropolis insertion sequence (Stecker et al., 2003Down). Similar to what is observed in strain ATCC 13032, SSIs in strain R are rich in mobile elements, particularly SSI-3 (G+C content, 60.7 mol%) and SSI-5 (55.2 mol%) (Table 2Down). On the other hand, fewer genes of phage origin are present in the genome of C. glutamicum R as compared to C. glutamicum ATCC 13032 since only phage remnants are observed (Table 2Down). Nevertheless, SSI-8, 42.7 kb in size and with a G+C content of 52.1 mol% (Suzuki et al., 2005aDown), could have originated from a previously unknown phage as it encodes numerous hypothetical proteins and several ORFs, the products of which share some homology to known phage proteins. This is in sharp contrast to what is observed in C. glutamicum ATCC 13032 isolates that harbour three to four different putative prophages (Kalinowski, 2005Down).


View this table:
[in this window]
[in a new window]

 
Table 2. Insertion sequences and phage-derived sequences in the C. glutamicum R genome

The sizes of ORFs are given in bp. The direction of transcription is given relative to the dnaA gene (+, clockwise, –, anticlockwise). Identity levels are calculated based on putative amino acid sequences. Homologous genes: IS1673 is from the C. glutamicum plasmid pCG4 (Tauch et al., 2003Down), IS1870 is from the C. glutamicum pTET3 plasmid (Tauch et al., 2002Down), PBD2.162 and PBD2.163 are from the Rhodococcus erythropolis linear plasmid pBD2 (Stecker et al., 2003Down), IS30 is from E. coli K-12 (Blattner et al., 1997Down), IS6110 is from Mycobacterium avium (Li et al., 2005Down), IS1628 is from the C. glutamicum plasmid pCG4 (Tauch et al., 2003Down), IS1206 [isoform ISCg14 (Kalinowski et al., 2003Down)] is from the C. glutamicum chromosome (Bonamy et al., 1994Down). Putative tnp, Putative transposase gene; IR, inverted repeats; NF, not found. SSI-3 has a G+C content of 60.7 mol%; SSI-5, 55.2 mol%; SSI-8, 52.1 mol%; and SSI-10, 53.0 mol% (Suzuki et al., 2005aDown).

 
Chromosome structure
As previously observed (Kalinowski et al., 2003Down; Nakamura et al., 2003Down), corynebacterial genomes, including that of strain R, show a very high degree of synteny with a striking lack of detectable inversions (not shown). The C. jeikeium genome, however, exhibits 10 apparent breakpoints of synteny with C. glutamicum ATCC 13032 (Tauch et al., 2005Down). Likewise, the genomes of the relatively closely related mycobacteria show that extensive rearrangements have occurred in these species throughout the course of evolution. This phenomenon has been ascribed to the lack of a complete RecBCD recombination repair system in corynebacteria (Nakamura et al., 2003Down), despite the presence of recA and recB genes. Nevertheless, as demonstrated by the rearrangements that have occurred in the genome of C. jeikeium, other recombination mechanisms may be at play that allow moderate reorganizations in the chromosome architecture.

The C. glutamicum R and ATCC 13032 genomes contain similar numbers of putative operons in concordance with the numbers of rho-independent transcription terminators containing a stem–loop and poly-U tail. Using the TIGR Comprehensive Microbial Resource (Ermolaeva et al., 2001Down), we calculated that C. glutamicum ATCC 13032 contains 455 operons and 485 rho-independent terminators. Notably, cgR1278, encoding the transcription termination factor rho, is probably an essential gene in C. glutamicum R (Suzuki et al., 2006Down).

Corynebacterial core genes
The corynebacterial backbone, defined by the set of orthologous genes present in all corynebacteria sequenced to date [C. glutamicum ATCC 13032 (Ikeda & Nakagawa, 2003Down; Kalinowski et al., 2003Down), C. glutamicum R (this work), C. efficiens YS-314 (Nishio et al., 2003Down), C. jeikeium K411 (Tauch et al., 2005Down) and C. diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003Down)], comprises 835 genes assigned to COG categories (Table 3Down), with the total number of shared genes corresponding to approximately a third of the genes present in the saprophytic corynebacteria, in agreement with the value reported by other authors (Tauch et al., 2005Down) (1089 genes). However, the subset of genes that are essential is significantly lower than the core corynebacterial genome, as demonstrated by the successful transposon mutagenesis of at least 75 % of the ORFs of strain R using transposons Tn5 and Tn31831 (Suzuki et al., 2006Down; Vertès et al., 2005Down). It has been demonstrated that the genes that are indispensable for growth of Bacillus subtilis cells in rich medium under standard laboratory conditions belong to the following known categories: DNA and RNA metabolism, protein synthesis, cell envelope, cell shape and division, glycolysis, respiratory pathways, nucleotides and cofactors (Kobayashi et al., 2003Down). Based on the genomic sequences of C. glutamicum R (this work) and of other corynebacteria (Cerdeño-Tárraga et al., 2003Down; Ikeda & Nakagawa, 2003Down; Kalinowski et al., 2003Down; Nishio et al., 2003Down; Tauch et al., 2005Down), these fundamental cellular processes are also essentially conserved in corynebacterial genomes.


View this table:
[in this window]
[in a new window]

 
Table 3. Classification of C. glutamicum R CDS by COG

ORFs were analysed by BLASTP for each of the genomes of the corynebacteria sequenced to date against the NCBI COG database. Results were parsed for hits with an e-value cutoff of 1xe–25. Proteins containing multiple functional domains were forced into only one COG category by elimination of duplicates. In accordance with their COG identities, the resultant hits were grouped into the following categories: Cg R, total hits of C. glutamicum R; Core, hits with orthologues in all the corynebacteria sequenced to date; Sap, hits with orthologues only in the saprophytic corynebacteria sequenced to date; Cglut, hits with orthologues only in both C. glutamicum R and the Kitasato University C. glutamicum ATCC 13032 isolate; R spec, non-core, non-C. glutamicum and non-saprophytic hits of strain R; Cg K, C. glutamicum ATCC 13032 Kitasato (Ikeda & Nakagawa, 2003); K spec, non-core, non-C. glutamicum and non-saprophytic hits of ATCC 13032 Kitasato strain; Ce, total hits of C. efficiens YS-314; Ce spec, non-core and non-saprophytic hits of C. efficiens; Path, non-core genes present in both of the pathogenic corynebacteria sequenced to date; Cd, total hits of C. diphtheriae NTCT 13129; Cd spec, non-core and non-pathogen-specific hits of C. diphtheriae; Cj, total hits of C. jeikeium K411; Cj spec, non-core and non-pathogen-specific hits of C. jeikeium. Genes that could not be assigned to a COG category are not included in the Table.

 
Sigma factors
Ecological niches in soil are characterized by the availability of numerous growth substrates. The pan-genome of saprophytic organisms contains numerous enzymic activities necessary for the breakdown of a large variety of complex molecules. While saprophytic fungi play a predominant role in the recycling of organic matter, numerous enzymes are secreted by bacterial saprophytes to hydrolyse polysaccharides, proteins and lipids. Likewise, saprophytic organisms need to deploy a variety of adaptive responses to cope with various environmental stresses that can range, for a soil bacterium, from nutrient limitation, external osmolality fluctuations and oxygen deprivation to temperature shock. These responses are typically regulated via specialized sigma factors of the extracytoplasmic function (ECF) subfamily (Missiakas & Raina, 1998Down). The saprophytic organisms Mycobacterium smegmatis (Waagmeester et al., 2005Down) and Streptomyces avermitilis (Ikeda et al., 2003Down) exhibit 26 and 60 putative sigma factors, respectively. In the case of S. avermitilis, which belongs to a different suborder of the Actinomycetales from M. smegmatis and C. glutamicum, 47 of these belong to the ECF subfamily. On the other hand, C. glutamicum R harbours fewer of these components of the RNA polymerase complex. As shown in Table 4Down, in addition to {sigma}A, the sigma factor that directs the transcription of most genes in growing cells, and {sigma}B, which plays a crucial role in the maintenance of the stationary phase in M. smegmatis (Mukherjee & Chatterji, 2005Down), corynebacteria possess five alternative sigma factors to regulate genetic expression in response to extracellular changes. C. glutamicum R and C. glutamicum ATCC 13032 exhibit the same number of these ECF sigma factors to regulate extracytoplasmic functions, with the exception of {sigma}D which is disrupted by a spacer region in C. glutamicum R. In M. tuberculosis H37Rv, {sigma}D has been shown to control the expression of ribosome-associated gene products in the stationary phase and to be required for full virulence (Calamita et al., 2005Down). Similarly, {sigma}H has been observed to regulate major components of oxidative and heat stress responses (Raman et al., 2001Down), and {sigma}L, an orthologue of which is present in C. diphtheriae NCTC 13129 but not in the other corynebacteria sequenced to date, has been observed to regulate polyketide synthases and secreted or membrane proteins (Hahn et al., 2005Down). In C. jeikeium K411 and C. diphtheriae NCTC 13129, sigH is located upstream of rshA, a putative anti-{sigma}H factor, with which it forms a putative operon, as observed in Nocardia farcinica IFM 10152 (Ishikawa et al., 2004Down). All corynebacteria sequenced to date contain a sigC and a sigE gene. Interestingly, {sigma}C has been shown to be required for lethality of M. tuberculosis in mice (Sun et al., 2004Down), and {sigma}E is induced upon treatment of M. tuberculosis cells with hydrogen peroxide or upon macrophage infection (Jensen-Cain & Quinn, 2001Down). In contrast to what is observed in mycobacteria, where the saprophytic M. smegmatis harbours twice as many sigma factors (26 sigma factors) as its pathogenic relative M. tuberculosis (13) (Waagmeester et al., 2005Down), no sigma factor specific to saprophytic corynebacteria could be identified by homology searches, despite the presence of pathogenic-specific sigma factors in C. diphtheriae NCTC 13129 ({sigma}K, {sigma}L) and C. jeikeium K411 ({sigma}K, {sigma}L, {sigma}W). This latter observation particularly reinforces the view that {sigma}K is involved in bacterial virulence, as a {sigma}K orthologue is present in M. tuberculosis H37Rv but absent from M. smegmatis MC2 (Waagmeester et al., 2005Down). Consistent with these predicted functions, we could isolate C. glutamicum R cells mutated by transposon insertion in any of the genes sigH, sigE, sigB, sigM, or pvdS1 and pvdS2 (these latter genes encode two conserved hypothetical proteins that show limited homology with sigma factors, though they are unlikely to encode sigma factors), but not in sigC or sigA. Notably, sigH and sigE are also dispensable in C. glutamicum ATCC 13032 (Engels et al., 2004Down).


View this table:
[in this window]
[in a new window]

 
Table 4. Putative sigma factors of corynebacteria

Putative sigma factors were identified by BLASTP searches against the GenBank and VIMSS databases. Gene numbers for Cg K, Ce, Cd and Cj refer to the VIMSS database (Alm et al., 2005Down). Cg R, C. glutamicum strain R; Cg K, C. glutamicum ATCC 13032 Kitasato (Ikeda & Nakagawa, 2003Down); Ce, C. efficiens YS-314 (Nishio et al., 2003Down); Cd, C. diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003Down); Cj, C. jeikeium K411 (Tauch et al., 2005Down). Percentage identity scores (Id) were measured against the corresponding C. glutamicum R genes. The designation of the sigma factors is consistent with that used for M. tuberculosis (Waagmeester et al., 2005Down). The genes listed as pvdS1 and pvdS2 are conserved hypothetical proteins that show a limited homology to sigma factors. In C. glutamicum R, the gene encoding {sigma}D is interrupted by a spacer region. Sizes of genes refer to the number of base pairs.

 
Secreted proteins
The relative paucity of sigma factor-mediated adaptive mechanisms in corynebacteria reflects the exacting nutrient requirements typically exhibited by these organisms and their relative limitation in extracellular enzymes to digest complex molecules present in their environments. Corynebacteria excrete a limited number of proteins, as demonstrated by the identification of only approximately 40 protein spots in the 4.0 to 5.0 pI range during proteome analysis experiments of supernatants of late-exponential growth phase cultures (Hermann et al., 2001Down), and by the identification of 49 cell-surface protein spots and 89 extracellular protein spots in the 3 to 7 pI range from proteome fractions of C. efficiens YS-314 (Hansmeier et al., 2006Down). In particular, it has long been known that C. glutamicum cell extracts and supernatants do not have broad-spectrum proteolytic activity, as they show only limited extracellular protease activity on skim milk plates. Likewise, these bacteria exhibit limited extracellular lipolytic and nuclease activity, and no cellulase or amylase activity (Yukawa et al, 2007Down). The large number of transporters found in these bacteria to ensure the uptake of amino acids and peptides (Winnen et al., 2005Down) can perhaps be ascribed to the limited capability of these organisms to breakdown complex molecules. Nevertheless, mining of the C. glutamicum R genome reveals the presence of gene cgR1176 which encodes a putative secreted protease which is 29.6 % identical to the B. subtilis epr gene, the product of which is an extracellular serine protease. Based on in silico homology searches, orthologues of cgR1176 appear to be common in members of the Actinobacteria group. Similarly, cgR1002, a probable pullulanase gene, which is part of a putative three-gene operon in C. glutamicum R, can also be observed in the genome of C. glutamicum ATCC 13032 (VIMSS374909) and in C. efficiens YS-314 as part of a putative four-gene operon, but not in C. diphtheriae NCTC 13129 nor C. jeikeium K411. Interestingly, this operon also encodes two hypothetical membrane proteins which are conserved in both saprophytic and pathogenic corynebacteria (cgR1000, cgR1001). In C. efficiens, this operon is predicted (VIMSS database) to be controlled by CE2422, a transcription regulator of the GntR family of proteins. Homologues of CE2422 are present in both C. glutamicum R (cgR2434) and ATCC 13032 (VIMSS376496), and in various Streptomyces species, but not in the pathogenic corynebacteria sequenced to date nor in mycobacteria.

Sugar metabolism
Corynebacteria are able to utilize only a limited array of different sugars. For example, wild-type C. glutamicum R cells are able to utilize fructose, glucose, glucuronic acid, glucosamine, maltose, mannose, {alpha}-methylglucoside, ribose, sucrose, trehalose, arbutin and salicin as sole carbon sources, but not arabinose, galactose, lactose, cellobiose, mannitol, rhamnose, xylose or xylitol (this work). C. glutamicum transports several sugars, including sucrose, fructose and glucose, by the phosphotransferase system (PTS) (Moon et al., 2005Down; Saier, 2002Down). Sugar transport plays an important role in the observed substrate range limitation of C. glutamicum, as exemplified by the isolation of a spontaneous PTS mutant of C. glutamicum R that is able to extend the catabolic capabilities of this organism to the degradation of cellobiose (Kotrba et al., 2003Down). However, this limitation in carbon substrate spectrum is also due to the lack of specific catabolic genes, such as the lack of xylose isomerase which prevents the catabolism of the pentose xylose, despite the presence of ATP-binding cassette proteins putatively involved in the transport of this sugar (specifically that encoded by cgR1331, which is part of a five-gene operon containing a lacI-type transcriptional regulator, and which shows, respectively, 48 and 44 % homology to the xylF genes of Geobacillus kaustophilus HTA426 and E. coli K-12 which encode the periplasmic xylose-binding subunit of a high-affinity xylose ABC-transporter; moreover, synteny comparisons suggest the presence also of the xylG (cgR1329) and xylH genes (cgR1330) encoding, respectively, the ATP-binding and membrane components of a xylose ABC transporter). In addition, several sugar/proton symports could also be involved in the uptake of xylose by wild-type C. glutamicum cells, since cgR0261, cgR2943, cgR2864, cgR2290 and cgR2267 all show homology levels greater than 40 % with E. coli xylE, a gene that encodes a D-xylose/proton symporter from the major facilitator superfamily of transporters.

Notably, all corynebacteria sequenced to date, including C. glutamicum R (cgR1200) and ATCC 13032 (VIMSS376610), as well as C. diphtheriae NCTC 13129 (VIMSS520668) and C. jeikeium K411 (VIMSS844316), exhibit a putative beta-fructofuranosidase gene encoding a protein that is 49 % similar (34 % identical) to the fruA gene product of Bacillus megaterium ATCC 14581 (Chiou et al., 2002Down). The capacity of corynebacteria to synthesize fructans, and particularly pathogenic corynebacteria, would thus be interesting to verify since fructans can trigger inflammatory reactions (Shilo & Wolman, 1958Down) and thus contribute to disease progression. However, only C. glutamicum R (cgR2548) and C. glutamicum ATCC 13032 (VIMSS376610) exhibit a putative sacA gene encoding a second beta-fructofuranosidase. The C. glutamicum sacA gene product shares 48 % homology with the B. subtilis 168 SacA (Glaser et al., 1993Down) and is only 26 % identical to the product of the putative C. glutamicum fruA gene. In both C. glutamicum R and ATCC 13032, sacA is part of a putative four-gene operon that includes a phosphotransferase system component (PTS enzyme IIC) (respectively, cgR2547 and VIMSS376609). Both fruA and sacA are dispensable, as demonstrated by the disruption and replacement of these genes in C. glutamicum R (this work).

Furthermore, consistent with the notion that corynebacteria use glycogen as their major polyglucan reserve, glycogen metabolism genes are highly conserved among these bacteria. In particular, corynebacterial genomes exhibit sequences homologous to a glycosyl transferase (cgR1201) linked in a putative operon to glgC, the gene encoding ADP-glucose pyrophosphorylase (strain R, cgR1202; strain ATCC 13032, VIMSS375129). Also observed are a putative two-gene operon, glgB, encoding glycosyl transferase (cgR1302-cgR1303), and glgX, encoding a glycogen debranching enzyme (cgR1991) that forms a putative operon with a regulatory protein of the TetR family (cgR1990). Likewise, all corynebacterial trehalose metabolic genes identified previously (Tzvetkov et al., 2003Down) (treS, cgR2175; treY, cgR2002; treZ, cgR2009; otsA, cgR2531; otsB, cgR2533) are present in C. glutamicum R. Therefore, it is likely that in this bacterium all three trehalose metabolic pathways found in bacteria are enabled [TreS pathway from maltose, TreY/TreZ pathway from {alpha}(1–4)glucose polymers, OtsA/OtsB pathway from glucose 6-phosphate and UDP-glucose]. This observation reinforces the view that this non-reducing sugar plays a major physiological role in Actinobacteria as energy storage and as an environmental protectant against various stresses such as low water activity (desiccation, dehydration), external osmolality fluctuations, heat, cold and oxidation. It is also noteworthy that the genes otsA and otsB are part of a putative five-gene operon which is adjacent to a transcriptional regulator of the lacI family (cgR2534) located downstream and in the opposite orientation.

Two-component systems
Sensing of environmental conditions to ensure variability and adaptability to environmental changes is enabled by two-component systems linked to signal transduction cascades that function via phosphorelays (Taylor & Zhulin, 1999Down). Basic two-component systems comprise a membrane-associated sensor kinase that detects external stimuli and a transcriptional regulator that acts upon the cellular machinery to bring about the necessary adaptive changes (Fontan et al., 2004Down). The genes encoding these two proteins are typically organized in two-gene operons. A similar organization is observed in the genome of C. glutamicum R where 27 ORFs are present that share a high level of homology with two-component system genes (Table 5Down). One orphan regulator (cgR0730), which is dispensable as demonstrated by the absence of phenotypic change upon its deletion (this work), and 13 putative two-component systems organized in two-gene operons encoding a sensor kinase and a regulator, have been identified. Despite being absent from the genome of C. glutamicum R, narQ, the putative cognate kinase gene of cgR0730, is present in C. glutamicum ATCC 13032, C. efficiens YS-314 and C. diphtheriae NCTC 13129. On the other hand, orthologues of senX3/regX3, mtrA/mtrB, mprB/mprA and phoR/phoP are present in all the corynebacteria and mycobacteria sequenced to date (Table 5Down).


View this table:
[in this window]
[in a new window]

 
Table 5. Putative two-component systems of corynebacteria

Putative two-component system genes were identified by BLASTP searches against the GenBank and VIMSS databases. Gene numbers for Cg K, Ce, Cd, Cj, Mtb and Nf refer to the VIMSS database, except CE3P006 and CE3P005 which refer to the GenBank database and gene numbers for Ms which refer to the TIGR-CMR database. Cg R, C. glutamicum strain R; Cg K, C. glutamicum ATCC 13032 Kitasato (Ikeda & Nakagawa, 2003Down); Ce, C. efficiens YS-314 (Nishio et al., 2003Down); Cd, C. diphtheriae NCTC 13129 (Cerdeño-Tárraga et al., 2003Down); Cj, C. jeikeium K411 (Tauch et al., 2005Down); Ms, M. smegmatis MC2 (TIGR-CMR data); Mtb, M. tuberculosis H37Rv (Cole et al., 1998Down); Nf, N. farcinica IFM 10152 (Ishikawa et al., 2004Down). Percentage identity scores (Id) were measured against the corresponding C. glutamicum R genes, or, when no orthologue is present in C. glutamicum R, to a reference gene indicated by the annotation REF. Only genes with identity levels greater than or equal to 35 % are indicated, except for genes for which a possible significant link could be identified by identity, homology or synteny, for example the citA/citB genes of N. farcinica. The G+C content (mol%) is given only for the C. glutamicum R genes. Sizes of gene products refer to the number of amino acids. Fnct, Predicted physiological function; S, sensor kinase; R, transcriptional regulator. Known orthologous genes with the highest identity scores are given in bold type. By homology with previously described two-component systems, these genes are as follows: yycF/yycG (CgR0122 is 63 % homologous to the yycF gene product of Bacillus subtilis; Fukuchi et al., 2000Down); ykoG/ykoH (CgR0360 is 50 % homologous to the ykoH gene product of B. subtilis; Fabret et al., 1999Down); senX3/regX3 (CgR0476 is 70 % identical to the regX3 gene product of M. tuberculosis H37Rv; Fontan et al., 2004Down); cutS/cutR (CgR0540 and CgR0541 are, respectively, 50 and 70 % homologous to the cutS and cutR gene products of Streptomyces lividans; Hutchings et al., 2004Down); mtrA/mtrB (CgR0863 is 68 % identical to the mtrA gene product of M. tuberculosis H37Rv; Fontan et al., 2004Down; Zahrt & Deretic, 2000Down); mprB/mprA (CgR0988 is 64 % identical to the mprA gene product of M. tuberculosis H37Rv; Fontan et al., 2004Down; Zahrt & Deretic, 2000Down); yocF/yocG (CgR1050 is 62 % homologous to the B. subtilis yocG gene product; Fabret et al., 1999Down); yvqE/yvqC (CgR1838 and CgR1839 are, respectively, 62 and 49 % homologous to the B. subtilis yvqC and yvqE gene products; similarly, CgR2844 and CgR2845 are 63 and 51 % homologous to the B. subtilis yvqC and yvqE gene products); fixL/fixJ (CgR2292 and CgR2299 are, respectively, 50 and 61 % homologous to the fixL and fixJ gene products of Azorhizobium caulinodans (Fischer, 1994Down); phoR/phoP (CgR2511 is 67 % identical to the phoP gene product of M. tuberculosis H37Rv; Fontan et al., 2004Down); baeS/baeR (CgR2567 and CgR2566 are 60 and 53 %, respectively, homologous to the E. coli baeS and baeR gene products; Nishino et al., 2005Down). The orphan putative regulator protein CgR0730 is 51 % homologous to the NarL protein of E. coli (Goh et al., 2005Down). The products of the C. diphtheriae genes VIMSS519770 and VIMSS519769 are 48 and 57 % homologous to the yxjM and yxjL putative gene products of B. subtilis 168, respectively. The C. glutamicum ATCC 13032 gene VIMSS376724 is 52 % homologous to the yfiK putative gene product of B. subtilis 168. The M. smegmatis MC2 genes MSMEG 0149 and MSMEG 0150 are adjacent but transcribed in opposite orientations. PAS, PAS sensing domain-containing protein, as defined by Taylor & Zhulin (1999)Down.

 
The stimuli sensed by most of these two-component systems remain unknown, despite, for example, the fact that the role of citA/citB in the uptake and metabolism of citrate dicarboxylates has been firmly established (Gerharz et al., 2003Down). In addition, in M. tuberculosis, genes under the positive control of PhoP include genes encoding proteins involved in lipid metabolism, cell wall synthesis, membrane transport and oxidative stress response (Fontan et al., 2004Down). The role of PhoP in adaptation to phosphate-limited conditions has been confirmed in C. glutamicum ATCC 13032 (Kocan et al., 2006Down). M. tuberculosis phoP mutants have an altered rounded shape and show altered levels of lipoarabinomannan derivatives compared to the wild-type (Fontan et al., 2004Down), as well as having an impaired ability to synthesize methyl-branched fatty-acid-containing acyltrehaloses (Gonzalo Asensio et al., 2006Down). Disruption of the putative phoP and phoR genes of C. glutamicum R demonstrated that these genes are not essential under standard laboratory conditions and that their deletion does not result in any obvious phenotype. Likewise, it has been shown that the mprA/mprB gene pair is linked to the sensing of cell-wall- or outer-membrane-related stress (He & Zahrt, 2005Down). It is also noteworthy that C. glutamicum R and C. glutamicum ATCC 13032 display, downstream of the putative mprA-mprB genes, a putative pepD gene (respectively, cgR0990 and VIMSS374897), the product of which is a trypsin-like serine protease which is thought to aid in the degradation of misfolded proteins that are generated in response to various stresses. PepD has been demonstrated in M. tuberculosis to be secreted into the culture medium (Skeiky et al., 1999Down). A similar genetic organization, where the genes mprA-mprB-pepD are part of an operon, has been observed in pathogenic mycobacteria (He & Zahrt, 2005Down). The conservation of this gene cluster in both saprophytic and pathogenic organisms suggests that it constitutes an important adaptive mechanism of the bacteria of the Corynebacterineae group. However, none of these genes appears to be essential under laboratory conditions, as demonstrated by the disruption of mprA and mprB in M. tuberculosis (Zahrt & Deretic, 2001Down). Similarly, we achieved the disruption of cgR0988 (mprA), cgR0989 (mprB) and cgR0990 (pepD) in C. glutamicum R by transposon mutagenesis (this work).

Of these two-component systems, only mtrA/mtrB has been shown to be essential to the growth of M. tuberculosis as demonstrated by unsuccessful attempts to disrupt the mtrA gene, despite the fact that mtrB appears to be non-essential (Fontan et al., 2004Down; Zahrt & Deretic, 2000Down). The two-component system mtrA/mtrB is conserved in all actinobacteria (Hoskisson & Hutchings, 2006Down) and its regulator is the only response regulator that is induced in M. tuberculosis during macrophage infection, but not in broth culture (Zahrt & Deretic, 2000Down). The two-component system MtrA/MtrB has been shown to strongly influence cellular morphology, antibiotic susceptibility and genetic expression of osmoprotection as demonstrated by the phenotype exhibited by a C. glutamicum ATCC 13032 double mutant (Möker et al., 2004Down). In C. glutamicum R we succeeded in disrupting either cgR0863 (putative mtrA) or cgR0864 (putative mtrB) by inserting a transposon in the central regions of these genes. Neither of the resulting single-gene disruptants showed any significantly altered cellular morphology or altered growth pattern, perhaps indicative of cross-talk between the MtrA and MtrB proteins and one or more other two-component system proteins. The gene immediately downstream of (and overlapping) mtrB is conserved in all actinobacteria, where it encodes the actinobacteria signature protein LpqB (Gao et al., 2006Down). In the corynebacteria and mycobacteria sequenced to date, mtrA and mtrB are part of a putative operon that also contains sahH (cgR0861 and VIMSS374775 in C. glutamicum R and ATCC 13032, respectively), encoding S-adenosylhomocysteine hydrolase, and tmpk (cgR0862 and VIMSS374776, respectively), encoding thymidylate kinase. On the other hand, the cgR0122-cgR0123 two-component system, a putative orthologue of the yycF/yycG system of B. subtilis that modulates the expression of the ftsAZ operon (Fukuchi et al., 2000Down), can be deleted from the C. glutamicum R genome by excision of SSI-3 without any significant phenotypic change (Suzuki et al., 2005bDown). Brocker & Bott (cited in Kocan et al., 2006Down) suggested that this two-component system is involved in the genetic regulation of copper metabolism. Interestingly, although this two-component system is borne by an SSI in C. glutamicum R, it shows a high level of identity with orthologues found in all the other corynebacteria sequenced to date. On the other hand, its relatively high G+C content (greater than 60 mol%) in all corynebacteria sequenced to date would suggest that its acquisition is a recent event in evolutionary terms. Last, the response regulator encoded by gene cgR2566, the product of which shares a high level of homology with the BaeR protein of E. coli, is also conserved in all corynebacteria sequenced to date (Table 5Up). We did not succeed in disrupting or deleting gene cgR2566 despite being able to mutate cgR2567 by transposon insertion, its cognate gene encoding a sensor kinase.

Interestingly, two corynebacterial two-component systems appear to be specific to saprophytic corynebacteria. Their highest levels of identity are shared with the putative B. subtilis systems ykoG/ykoH (cgR0359/cgR0360) and yvfT/yvfU (cgR1049/cgR1050) (Fabret et al., 1999Down). Similar to what has been observed in B. subtilis, these genes appear to be non-essential in corynebacteria as demonstrated by the isolation of several transposon mutants of C. glutamicum R where these genes had been inactivated (this work). However, saprophytic corynebacteria appear to be devoid of chrS/chrA or cstS/cstA orthologues. These latter two-component systems, present in both C. diphtheriae and C. jeikeium, have been linked to haem and haemoglobin sensing (Schmitt, 1999Down). Nevertheless, ORFs cgR2844 and cgR1838 show low identity but relatively high homology to chrA (67 and 63 %, respectively) and cstA (53 %) from pathogenic corynebacteria. Likewise, corynebacteria appear to be devoid of devS/devR orthologues, which have been linked in M. tuberculosis and M. bovis with hypoxic dormancy (Boon & Dick, 2002Down; Fontan et al., 2004Down; Saini et al., 2004Down), albeit that cgR2844 and cgR1838 are 52 and 57 % homologous, respectively, to M. bovis devR. Genes cgR2844 and cgR2845 are dispensable in C. glutamicum R, as demonstrated by their simultaneous deletion via Cre/loxP-mediated rearrangements (this work).

Moreover, the genome of C. glutamicum R reveals the presence of two unique two-component systems, exhibiting high similarity to cutS/cutR of Streptomyces coelicolor involved in negative regulation of actinorhodin synthesis (Hutchings et al., 2004Down), and to fixL/fixJ of Azorhizobium caulinodans involved in nitrogen fixation in response to low-oxygen conditions (Fischer, 1994Down). These genes are present on two different SSIs (SSI-5, a region rich in transposase genes, and SSI-9, respectively) (Suzuki et al., 2005aDown) and are characterized by G+C contents that differ significantly from the typical G+C content of corynebacterial genomes (Table 5Up). These observations corroborate the view that these sequences have been acquired by C. glutamicum R through horizontal transfer events that occurred relatively recently in evolutionary terms. As previously reported, these sequences are dispensable as demonstrated by the absence of a specific phenotype in the corresponding C. glutamicum R deletion mutants (Suzuki et al., 2005aDown). Similarly, C. glutamicum ATCC 13032 exhibits one unique two-component system (VIMSS376724 and VIMSS376723) that shows weak identity to two C. efficiens genes. It is also worth noting that C. diphtheriae also possesses a unique two-component system on its chromosome (VIMSS519770 and VIMSS519769), albeit that two similar genes are borne by the C. efficiens plasmid pCE3 (Table 5Up). Likewise, C. jeikeium K411 encodes two two-component system genes, kdpD and kdpE, putatively involved in turgor pressure sensing and regulation (Hutchings et al., 2004Down), that are absent from the other corynebacteria sequenced to date, but present in mycobacteria and Streptomyces species (Hutchings et al., 2004Down).

These different observations promote the view that, except for a few sensor–regulator couples that are highly conserved in Corynebacterineae, the sensing and regulation machinery of corynebacteria is highly plastic. This raises the question whether horizontal transfers of these important adaptation genes, rather than gene decay, has played a major role during the course of evolution of these bacteria. Such a view is consistent with the observed presence in a few corynebacterial strains of two-component systems for which orthologues can only be found in mycobacteria, Streptomyces species or N. farcinica, but not in the other corynebacteria sequenced to date. Furthermore, a few of these observed systems either are borne by an episome or are part of an SSI. On the other hand, the overall conservation of these systems between corynebacteria and mycobacteria further validates corynebacteria as important model organisms to contribute to the understanding of the biology of slow-growing pathogenic mycobacteria.

Global differences between saprophytic and pathogenic corynebacteria
Based on our calculations and model, the presence in the corynebacteria sequenced to date of up to 762 genes identified in COG appears limited to the saprophytes (Table 3Up). In contrast, pathogenic corynebacteria display fewer candidate pathogen-specific genes (Table 3Up). For example, we calculated that the saprophytic corynebacteria sequenced to date exhibit approximately 30 % more transporters per number of ORFs in their genomes than the pathogenic organisms C. diphtheriae and C. jeikeium. This difference constitutes an indication of the relatively greater metabolic versatility of the former organisms. This observation also suggests a higher degree of transporter substrate specificity and a higher number of secondary carriers, since overall the substrate specificities of transport systems of an organism are correlated to its ecological niche and the diversity and relative concentrations of nutrients it encounters (Paulsen et al., 2000Down).

Likewise, among the genes potentially involved in host interactions, microbiofilm formation, DNA transfer and bacteriophage attachment, C. diphtheriae NCTC 13129 harbours 11 genes putatively encoding fimbrial proteins or fimbria-associated proteins, whereas C. glutamicum R only exhibits four such genes which are also present in C. efficiens (cgR2789, VIMSS301626; cgR2790, VIMSS301627; cgR2791, VIMSS301628; cgR2793, VIMSS301630, respectively). Notably, only one cluster of pili genes is found in saprophytic bacteria, whereas the genome of C. diphtheriae NCTC 13129 exhibits three such clusters (Gaspar & Ton-That, 2006Down). Similarly, C. diphtheriae NCTC 13129 and C. jeikeium K411 harbour, in comparison with C. glutamicum and C. efficiens, a relatively larger number of secreted or surface-anchored proteins, in relation to the ecological niche occupied by these species.

Notably, genes related to amino acid transport and metabolism are fewer in number in pathogenic corynebacteria than in the saprophytic ones (Table 3Up). In addition to horizontal DNA transfer that has occurred in the saprophytic corynebacteria, this difference has been ascribed to gene decay that has occurred during the course of the evolution of the former organisms (Nishio et al., 2004Down). Interestingly, this specialization is not the rule in Corynebacterineae, since for example the chromosome sequence of N. farcinica reveals that this organism includes many genes for virulence, drug resistance and secondary metabolism. Interestingly, analyses of paralogous protein families suggest that gene duplications have resulted in N. farcinica being able to survive not only in soil environments, but also in animal tissues (Ishikawa et al., 2004Down).

Functional differences between C. glutamicum R and ATCC 13032
A detailed comparison of the genomic sequence of C. glutamicum R with that of ATCC 13032 reveals that only 60 and 189 genes, respectively, are strain-specific. Relatively, both of these strains encode the same number of genes per COG category (Table 3Up), with a few genes unique to either C. glutamicum strain R or ATCC 13032 when compared to the corynebacteria sequenced to date (Table 3Up). With the exception of sequences from mobile elements, the most striking differences in the number of genes are observed in amino acid transport and metabolism, and secondary metabolite transport and metabolism.

Genes unique to C. glutamicum R
A total of 9 % (263) of the predicted genes of C. glutamicum ATCC 13032 remain hypothetical or specific to C. glutamicum when compared to C. efficiens YS-314 and C. diphtheriae NCTC 13129 (Kalinowski et al., 2003Down). It has been suggested that sequencing more than one or two genomes per species is necessary to access bacterial pan-genomes (Tettelin et al., 2005Down). For example, the sequencing of eight strains of Streptococcus agalactiae was found to be sufficient to define the core genome of these organisms with 95 % confidence, whereas each new S. agalactiae genome sequence would reveal an extrapolated 33 new genes, with a 6x10–4 probability that this number falls to zero. The complete genome sequence of C. glutamicum R reveals 39 genes that to the best of our knowledge have not been previously identified in corynebacteria. In addition to the previously described beta-glucoside phosphotransferase (cgR0436) that confers cellobiose utilization properties upon occurrence of a single amino acid substitution (Kotrba et al., 2003Down), and several mobile-element-derived ORFs (ISCgR3a, ISCgR5, ISCgR12b, ISCgR13a, ISCgR13b), C. glutamicum R notably encodes five novel conserved hypothetical proteins (cgR0052, cgR0067, cgR1134, cgR2375, cgR2798), four membrane proteins putatively involved in transport mechanisms (cgR0768, cgR2326, cgR2800, cgR2956), as well as four regulatory proteins (cgR0139, cgR0414, cgR1106, cgR2822) in addition to the two two-component systems cgR2292/cgR2299 and cgR0540/cgR0541 discussed previously. Interestingly, C. glutamicum R encodes several genes that reflect its ecological niche in soil, as exemplified by a putative tyramine oxidase gene (cgR0016) that could be involved in the degradation of phenethylamines, compounds present in natural environments, or by a putative L-asparaginase (cgR2808). As a result, to access the diversity of the metabolism of corynebacteria, a significant number of additional genome sequences would need to be determined.


    ACKNOWLEDGEMENTS
 
This research was supported by New Energy and Industrial Technology Development Organization (NEDO), Japan.

Edited by: C. W. Chen


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS AND DISCUSSION
 REFERENCES
 
Alm, E. J., Huang, K. H., Price, M. N., Koche, R. P., Keller, K., Dubchak, I. L. & Arkin, A. P. (2005). The MicrobesOnline Web site for comparative genomics. Genome Res 15, 1015–1022.[Abstract/Free Full Text]

Besemer, J., Lomsadze, A. & Borodovsky, M. (2001). GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions. Nucleic Acids Res 29, 2607–2618.[Abstract/Free Full Text]

Blattner, F. R., Plunkett, G., 3rd, Bloch, C. A., Perna, N. T., Burland, V., Riley, M., Collado-Vides, J., Glasner, J. D., Rode, C. K. & other authors (1997). The complete genome sequence of Escherichia coli K-12. Science 277, 1453–1474.[Abstract/Free Full Text]

Bonamy, C., Labarre, J., Reyes, O. & Leblon, G. (1994). Identification of IS1206, a Corynebacterium glutamicum IS3-related insertion sequence and phylogenetic analysis. Mol Microbiol 14, 571–581.[CrossRef]