Microbiology
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Microbiology 153 (2007), 3337-3349; DOI  10.1099/mic.0.2007/005868-0
This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via CrossRef
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Cox, R. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Cox, R. A.
Agricola
Right arrow Articles by Cox, R. A.
Microbiology 153 (2007), 3337-3349; DOI  10.1099/mic.0.2007/005868-0
© 2007 Society for General Microbiology

A scheme for the analysis of microarray measurements based on a quantitative theoretical framework for bacterial cell growth: application to studies of Mycobacterium tuberculosis

Robert A. Cox

Division of Mycobacterial Research, National Institute for Medical Research, London NW7 1AA, UK

Correspondence
Robert A. Cox
rcox{at}nimr.mrc.ac.uk


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
A theoretical framework was established for the interpretation of microarray measurements. Mathematical equations were derived that link the molecular processes involved in the transcription and translation of an open reading frame (ORF) with the properties of a population of cells. The theory was applied to three published sets of microarray measurements related to the growth of Mycobacterium tuberculosis. It was shown for strains growing at the same rate, for example wild-type and mutant strains, that the expression ratio obtained by microarray analysis for a particular ORF is equal to the ratio of the copy numbers of the encoded protein. The growth of M. tuberculosis in a batch culture was analysed at several time points over a period of 60 days. Several properties including the following were calculated for cells cultured for 60 days: µ≤0.008 h–1, there was a decrease in the number of ribosomes per cell to 26 % of the value at day 0, and only 40 % or less of this reduced number of ribosomes were estimated to be actively synthesizing protein. Profiles of the expression ratio observed for a particular ORF versus the period of cell culture were related to changes in the relative numbers of copies of the encoded protein per cell. Two profiles were found to have theoretical significance: profile I, exemplified by ORFs encoding proteins needed for DNA partition and DNA synthesis; and profile II, exemplified by ORFs encoding proteins (including ribosomal proteins) needed for protein synthesis. Data for a number of other genes including hspX, icl, dosR and ftsZ were also analysed.


Abbreviations: qRT-PCR, quantitative real-time polymerase chain reaction; RNAP, RNA polymerase


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Current methods of studying gene expression include the use of reverse transcriptase to make cDNA copies of cellular RNA and assays for particular gene transcripts based on either DNA amplification (quantitative real-time polymerase chain reaction: qRT-PCR) or on hybridization of tagged cDNA to an array of immobilized DNA targets, with each target representing an individual gene (microarray analysis). The results of qRT-PCR are expressed as the number of copies of transcript per unit of RNA; frequently a reference gene is also included as a ‘normalizer’ to place the expression of the gene of interest in a cellular context. Microarray analysis is generally used as a comparative tool to ascertain whether a change in either growth conditions or genotype leads to ‘upregulation’ or ‘downregulation’ of sets of genes of interest. Both techniques have proved to be valuable and widely used methods for studying the expressions of individual genes. Mycobacterial gene expression has been studied by both qRT-PCR and microarray analysis. Thus far, the expression ratios (the output from microarray measurements) have not been directly related to cell parameters. Previously (Cox, 2004Down), I presented a theoretical framework for bacterial cell growth based on the properties of population-average cells and the mathematical relationships involved in transcription and translation. In the bacterial cell the latter two properties are coupled (Stent, 1964Down; Byrne et al., 1964Down; Miller et al., 1970Down). I now show that this framework can be readily applied to the interpretation of microarray data. This paper describes how parameters of population-average cells are related to microarray measurements; and how microarray data may be used to reveal relative values of properties such as the number of copies of a particular protein per cell and the average number of ribosomes per cell.

The procedures developed were applied to three microarray studies (Bacon et al., 2004Down; Kendall et al., 2004Down; Voskuil et al., 2004Down; for review see Butcher, 2004Down) of the growth of Mycobacterium tuberculosis. Features of the synthesis of proteins, DNA and RNA were identified and changes in the physiological parameters of the tubercle bacillus during growth were quantified.


    THEORY
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Definition of the problem.
The problem is to find a theoretical framework that links experimental data with cell parameters that are accessible to experiment. Microarray analysis is a comparative technique. Usually a reference strain (identified by a single superscript prime) and an experimental strain (identified by two superscript primes) are compared. RNA is isolated from each strain. A sample, x µg, of each RNA fraction is copied into cDNA using dNTPs tagged with fluorophors f' and f'' respectively. Samples of cDNA' and cDNA'' are mixed, denatured and competitively hybridized to an array of DNA sequences immobilized on a glass surface. An all-genome array is constructed so that each open reading frame (ORF) is represented by a specific DNA sequence at a known site. The fluorescence measurements are normalized, allowing calculation of the expression ratio r(i)=f''(i)/f'(i) of the two fluorescent labels hybridized to a particular ORF (ORF(i)). Provided that cDNA' and cDNA'' both accurately reflect the composition of their parent RNA fraction, r(i) then provides a measure of the expression of ORF(i) under reference and experimental conditions (Bowtell & Sambrook, 2000Down).

The coupling of transcription and translation that is a characteristic feature of bacterial growth provides the basis for the theoretical framework that is sought. Thus, the problem can be restated as a search for the appropriate cell parameters that define the expression ratio r(i). As a first step it is helpful to show how the properties of a cell are related to the properties of a culture. See Table 1Down for definitions of variables.


View this table:
[in this window]
[in a new window]

 
Table 1. Definitions of variables

 
Population-average cells.
Suppose that at t hours a bacterial culture growing exponentially comprises ncells(t) cells and the masses of protein, RNA, DNA, etc., are p(t) fg, R(t) fg and D(t) fg, etc. A population-average cell (Schaechter et al., 1958Down) has the properties mp(av) fg protein, mRNA(av) fg RNA, mDNA(av) fg DNA and so on as defined in equations 1, 2 and 3, where the subscript (av) specifies a population-average cell.Down

Formula 001
Down

Formula 002
Down

Formula 003
The population-average cell reflects the properties of the entire cell population, which includes cells of all ages, a, ranging from newborn cells (a=0) to cells about to divide (a=1); see Cox (2003Down, 2004Down) and Cox & Cook (2007)Down. In an individual cell particular proteins may be expressed at specific stages of cell growth. All such proteins are present within population-average cells although the element of periodic expression is not recognized. By definition, the abundances of such proteins within the cell culture will increase at the same rate as population-average cells. The quantity measured in microarray analysis may now be defined in terms of parameters of population-average cells.

What is the quantity measured in microarray analysis?
Let prime and double prime superscripts denote the two fluorescently labelled cDNA preparations and also the cell parameters related to the RNA preparations from which the cDNA samples were prepared. For example, M'tr(i) and M''tr(i) each represent the mass of transcripts of ORF(i) per x µg RNA used as the substrate for cDNA synthesis. Thus r(i), the relative intensities of the two fluorescent labels, is given by equation 4:Down

Formula 004
Equation 4, which is fundamental to microarray analysis, is based on three assumptions: first, the fluorescence arising from hybridization to immobilized DNA(i) is a measure of the concentration of cDNA(i) in solution; secondly, the ratio, r(i), of the fluorescent labels is equal to the ratio M''tr(i)/M'tr(i); thirdly, the ratio r(i) is unchanged by hybridization. However, M'tr(i) and M''tr(i) are related to the cell parameters m'mRNA(i) and m''mRNA(i), which are the masses (fg) of mRNA(i) per cell, and m'RNA(av) and m''RNA(av), as shown in equations 5 and 6:Down

Formula 005
Down

Formula 006
The terms x/m'RNA(av) and x/m''RNA(av) reveal the numbers of cells needed to yield x µg RNA. Substitution for M'tr(i) and M''tr(i) in equation 4 leads to equation 7:Down

Formula 007
The next step is to identify the cell parameters governing mmRNA(i).

Transcription/translation of an individual ORF (ORF(i)).
A snapshot of an ORF undergoing transcription/translation has revealed (Miller et al., 1970Down) RNA polymerase molecules (RNAPs) at intervals along the gene, with nascent mRNA transcripts increasing in size according to the position of the RNAP; and a proportionate number of ribosomes attached to each nascent mRNA. The specific synthesis rate {omega}p(i) of protein p(i) is the product of the number nR(i) of ribosomes actively translating mRNA(i) and {epsilon}aa(i) the polypeptide chain elongation rate and m*aa the average mass of an amino acid (see equation 8):Down

Formula 008
It is inferred that mRNA(i) that is in the process of being translated is protected from RNase action. Suppose that {sigma} nucleotides of mRNA(i) are protected by each ribosome, then mmRNA(i), the mass of mRNA(i), is given by equation 9, where m*nuc is the average mass of a nucleotide:Down

Formula 009
It is shown in the Appendix that manipulations of equations 9 and 7A and related parameters lead to equation 10, where laa(i) is the length in amino acids of protein p(i) and nc-p(i) is the number of copies of protein p(i) per cell:Down

Formula 010
Substitutions for m'mRNA(i) and m''mRNA(i) lead to equation 11a:Down

Formula 011

Simplified forms of equation 11a used in this study.
Equation 11a relates the expression ratio r(i) to parameters of population-average cells; that is, r(i) is a function of relative values of µ, {epsilon}aa(i), mRNA(av) and nc-p(i). The solution of equation 11a requires measurements of µ and the ratios DNA : RNA : protein for each culture. These measurements provide (Cox, 2004Down) values of m'RNA(av), m''RNA(av), m'p(av), m''p(av), {epsilon}'aa(av), {epsilon}''aa(av) (note that {epsilon}'aa(av){approx}{epsilon}'aa(i), and {epsilon}''aa(av){approx}{epsilon}''aa(i)). There are several situations that allow simplification of equation 11a as discussed below.

Wild-type and mutant strains grow at the same rate.
It has been shown (Schaechter et al., 1958Down) that at a given temperature the macromolecular composition of a bacterium is dependent on growth rate; notably, the RNA (ribosome) content has been found to vary with growth rate. For a review see Bremer & Dennis (1996)Down. Conversely, it can be assumed that a mutation that has no effect on the specific growth rate has no effect on the cell's macromolecular composition. Suppose that a single prime and double prime respectively denote wild-type and mutant; when µ'=µ'', m'RNA(av)=m''RNA(av), {epsilon}'aa(av)={epsilon}''aa(av) and hence {epsilon}'aa(i)={epsilon}''aa(i). The expression ratio r=r'/r'' is then equal to n''c-p(i)/n'c-p(i). In other words, the expression ratio is equal to the ratio of the copy numbers of protein p(i) in mutant and wild-type (see equation 11b: Table 2Down).


View this table:
[in this window]
[in a new window]

 
Table 2. Useful forms of equation 11

 
The number nc-p(j) of copies of protein p(j) per cell is independent of growth rate.
Suppose that the bacterial cell uses a fixed complement nc-p(j) of protein p(j), and that the number nc-p(j) of copies per cell of protein p(j) encoded by ORF(j) is the same at all growth rates. In this case n''c-p(j)/n'c-p(j)=1 so that equation 11a reduces to equation 11c (Table 2Up). Candidates for proteins such as p(j) include those involved in processes that take place once only during cell division; for example, chromosome segregation. Note that equation 11c has a further property: the ratio r(i)/r(j) is equal to n''c-p(i)/n'c-p(i).

The number nc-p(k) of copies of protein p(k) is directly proportional to the number nR(av) of ribosomes per cell.
Suppose that ORF(k) encodes protein p(k) which has a number nc-p(k) of copies per population-average cell that is directly proportional to nR(av); that is, nc-p(k) is equal to the product of a constant K (K>0) and nR(av). In this case, n''c-p(k)/n'c-p(k) is equal to n''R(av)/n'R(av), which, in turn, approximates to m''RNA(av)/m'RNA(av). Thus, equation 11a is reduced to equation 11e (Table 2Up). Obvious examples of the p(k) family of proteins are ribosomal (r-proteins) themselves. L7/L12 is present as four copies per ribosome and all others are present as a single copy per ribosome (Bremer & Dennis, 1996Down). Except for L7/L12 the copy number nc-p(k) of an r-protein is equal to nR(av). Other components of the translational machinery such as initiation factors, elongation factors and aminoacyl-tRNA synthases (Bremer & Dennis, 1996Down) were found to have copy numbers directly proportional to nR(av). The ratio r(k)/r(j) yields m''RNA(av)/m'RNA(av). The equations shown in Table 2Up provide the basis for a systematic analysis of the expression of each ORF of a bacterial genome. The equations derived and the assumptions upon which they are based are supported by the microarray studies of M. tuberculosis gene expression (see below).


    RESULTS AND DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Support for equation 11a and the related forms was sought from published microarray studies of M. tuberculosis. Equation 11a was not tested directly because the additional data needed (µ and the ratios DNA : RNA : protein) are not available. However, equation 11b, a simpler form of equation 11a, was applied to two studies; in each study reference and experimental strains were found to grow at the same rate. It has been shown (Verma et al., 1999Down) that, like other bacteria (Schaechter et al., 1958Down; Bremer & Dennis, 1996Down), the tubercle bacillus maintains the number of ribosomes per cell commensurate with the needs of the cell for protein synthesis (growth rate control of ribosome synthesis). It follows that if two strains of M. tuberculosis grow at the same rate, both will synthesize proteins at the same rate and have similar numbers of ribosomes per cell. Equation 11a then reduces to equation 11b so that the expression ratio r(i) is equal to n''c-p(i)/n'c-p(i). In the particular case of ribosomal proteins r=1 when n''R(av)/n'R(av)=1 because n''c-p(i)/n'c-p(i)=n''R(av)/n'R(av)=1. These considerations were applied to two studies as described below in section A.

The results of a third study were examined in order to establish the extent to which equations 11c–11h can be applied to studies where reference strains do not grow at the same rate. Voskuil et al. (2004)Down cultured M. tuberculosis over a period of 60 days and used microarray analysis to follow changes in gene expression as nutrients were consumed, leading to a decrease in the specific growth rate (see section B below).

A. Reference and experimental strains grown at the same rate
Two sets of microarray data were examined. In the first study, it was found that when nitrogen was limiting, the tubercle bacillus grew in a chemostat at the same rate (µ=0.029 h–1) at two different concentrations of dissolved oxygen (Bacon et al., 2004Down). In the second study, it was shown that wild-type M. tuberculosis and a dosR ‘knockout’ mutant grew at the same rate in batch culture. The results of a third study (Voskuil et al., 2004Down) provide a control. M. tuberculosis was grown in batch culture; RNA was isolated from mid-exponential-phase cells and was used as the substrate for the preparation of both the prime and double prime fluorescent cDNA samples. The expression ratios obtained by microarray analysis provide a positive test for equation 11b because only a single strain grown at a particular specific growth rate is involved.

A panel of 50 genes encoding ribosomal proteins (see Table 3Down) was identified. Six of eight putative ribosomal genes (see Table 4Down) were excluded because the functions of the encoded proteins are uncertain. There are two candidate genes present in the genome for each of the ribosomal proteins RpsN, RpsR, RpmB and RpmG and there is no information to show which of the encoded proteins are incorporated into ribosomes.


View this table:
[in this window]
[in a new window]

 
Table 3. Ribosomal protein genes studied

 

View this table:
[in this window]
[in a new window]

 
Table 4. How many of these genes encode ribosomal proteins?

 
The r-values for the three sets of 50 ribosomal proteins were compared (see Fig. 1Down); the mean r-values, r(50), and standard deviations were calculated and individual r-values were found to be in accord with the corresponding normal distribution. Thus, r-values of each set of 50 ribosomal proteins are accurately represented by r(50)±SD (standard deviation). Thus, equation 12 summarizes the data for M. tuberculosis ribosomal gene expression observed by Bacon et al. (2004)Down; the superscript single prime denotes high (50 % dissolved oxygen tension) and the superscript double prime denotes low (1 % dissolved oxygen tension).


Figure 1
View larger version (15K):
[in this window]
[in a new window]

 
Fig. 1. Distribution of ORFs encoding ribosomal proteins per 0.1r. In each panel the broken line denotes the values expected for a normal distribution according to the mean value and standard deviation shown. (a) Comparison of ribosomal gene expression for M. tuberculosis grown in a chemostat at two different dissolved oxygen tensions (1 % and 50 %); see Bacon et al. (2004)Down. (b) Comparison of a dosR mutant with wild-type M. tuberculosis (Kendall et al., 2004Down). (c) Comparison of mid-exponential-phase cells when cDNA' and cDNA'' were copied from samples of the same cDNA preparation (Voskuil et al., 2004Down).

 
Down

Formula 012
Secondly, equation 13 summarizes the data for ribosomal protein genes reported by Kendall et al. (2004)Down; the superscript single prime denotes wild-type M. tuberculosis and the superscript double prime denotes the dosR ‘knockout’ mutant.Down

Formula 013
Thirdly, equation 14 summarizes the data for the control experiment (Voskuil et al., 2004Down); both the superscript single prime and the superscript double prime refer to the cDNA probes copied from the same RNA fraction. Hence, in the ideal case r=1.Down

Formula 014
As expected, it was found (see equations 12, 13 and 14) that the mean expression ratios for ribosomal protein genes are close to unity. Thus, I infer that equation 11b is applicable to expression values obtained when reference and experimental strains grow at the same rate.

B. Reference and experimental strains grown at different rates
The data analysed were obtained from samples collected from a culture of M. tuberculosis (Voskuil et al., 2004Down) over a period of 60 days. Cells collected at mid-exponential phase were used as the reference strain. Results are presented (see Tables 5Down, 6Down and 7Down and Fig. 2Down) as plots of expression ratios versus the periods of cell culture (expression profiles). Wherever possible the precision of the microarray data was improved by obtaining average expression ratios (r(n)SD for sets of genes encoding proteins which have related functions; for example, ribosomal proteins (see Fig. 2Down).


View this table:
[in this window]
[in a new window]

 
Table 5. Representative microarray data obtained for M. tuberculosis grown in batch culture: profiles of expression ratios of highly upregulated and other selected genes

 

View this table:
[in this window]
[in a new window]

 
Table 6. Estimates of m''p(av)/m'p(av)

(n''c-p(i)/n'c-p(i))av is considered to be an approximation for m''p(av)/m'p(av) (see footnote {ddagger} of Table 2Up).

 

View this table:
[in this window]
[in a new window]

 
Table 7. Changes in the properties of population-average cells of M. tuberculosis grown in batch culture derived from microarray measurements

 

Figure 2
View larger version (24K):
[in this window]
[in a new window]

 
Fig. 2. Representative microarray data obtained for M. tuberculosis grown in batch culture. Profiles of expression ratios of ORFs encoding proteins engaged in A, DNA synthesis and segregation; B, protein synthesis; C, RNA synthesis; and D, proteins whose copy numbers (nc-p(i)) increase twofold at some stage during culture. Expression ratios were obtained from Voskuil et al. (2004)Down for a culture of M. tuberculosis clinical isolate 1254 grown at 37 °C in 7H9 medium (supplemented with BSA, NaCl, glucose and glycerol) with shaking at 90 r.p.m. Cell culture was continued for 60 days and samples were removed at the times specified in the figure. Time 0 days refers to a mid-exponential-phase culture of OD600 0.15. The expression ratio at this time point was obtained by using the RNA isolated to prepare one sample of cDNA labelled with fluorophor f' and a second sample labelled with fluorophor f''. In each case the reference strain is the culture of mid-exponential-phase cells. The experimental strain is the culture at the time point specified. The expression profiles reflect the changes in gene expression in response to depletion of nutrients in the culture medium. r(50), etc., denote the average expression ratio of the number of related genes specified in the subscript. Standard deviations are shown by the error bars. Profile I is defined in (a) and is equivalent to the profile of r(j) versus time. ORF(j) encodes protein p(j), whose copy number nc-p(j) is independent of growth rate. Profile II is defined in (d) and is equivalent to the profile of r(k) versus time. ORF(k) encodes protein p(k); nc-p(k) is directly proportional to nR(av). Profile III (a broken line) is specified in (l) and is equivalent to the profile of r(i) versus time when ORF(i) encodes a protein which, at some time between 0 and 6 days, increases twofold in copy number and then remains at this value as cell culture proceeds. A, Genes encoding proteins needed for DNA partition and synthesis. (a) ORFs encoding proteins implicated in DNA partition. r(3) is the average expression ratio for parA, parB and ftsK. (b) ORFs encoding proteins needed for DNA synthesis. r(5) is the average expression ratio for dnaN, dnaG, polA, dnaQ and dnaZX. (c) Expression profile for dnaE1. B, Genes encoding ribosomal proteins and factors needed for protein synthesis. (d) ORFs encoding ribosomal proteins. r(50) denotes the average expression ratio for the 50 ORFs listed in Table 3Up encoding ribosomal proteins. (e) ORFs encoding protein factors (Group I) needed for protein synthesis. r(8) denotes the average expression values for fusA1, tuf, infC, frr, tsf, prfB, tig and infA. (f) ORFs encoding protein factors (Group II) needed for protein synthesis. r(3) denotes the average expression values for fusA2, prfA and infB. C, Genes encoding proteins needed for RNA synthesis. (g) ORFs encoding proteins needed for RNA synthesis. r(6) denotes the average expression ratio for, nusG, rpoB, rpoC, rho, rpoA and gpsI. (h) Expression ratios for nusA (filled circles) and nusB (open circles). (i) ORFs encoding tRNA synthases. r(16) denotes the average expression ratio of 16 ORFs: Rv0041, Rv1007c, Rv1292, Rv1649, Rv1689, Rv2334, Rv2357c, Rv2448c, Rv2555c, Rv2572c, Rv2580c, Rv2614c, Rv2845c, Rv3336c, Rv3580c and Rv3598c. D, Genes encoding proteins whose copy numbers increase twofold at some stage during cell culture. (j), (k) and (l) respectively show expression ratios for Rv0165c, ilvD (an essential gene) and oxyS (an inessential gene). Profile III is defined above.

 
Two expression profiles (I and II) are identified in Fig. 2Up in order to denote proteins with the properties of p(j) and p(k) (see Table 2Up). By definition (see equation 11c, Table 2Up), the copy numbers nc-p(j) of protein of the p(j) family are independent of µ (n''c-p(j)/n'c-p(j)=1) and (see equation 11e, Table 2Up) the copy numbers of protein of the p(k) family are directly proportional to nR(av) (n''c-p(k)/n'c-p(k)=n''R(av)/n'R(av)). The specific growth rate is expected to decrease after exponential phase as the supply of nutrients diminishes. The slower growth rate is reflected in Fig. 2(a)–(i)Up as a decrease in expression ratios with the period of cell culture. However, the expression ratios were found to be similar in values for days 6–8, possibly because the growth rate remained constant over this period due to adaptation to an alternative substrate in the complex medium used.

Proteins involved in DNA partition and synthesis.
The mycobacterial cell replicates its genome once during each cell division cycle and then partitions the two genomes equally between the two daughter cells. I propose that proteins involved in DNA partition are required to the same extent by all cells irrespective of growth rate. This proposal is supported by our knowledge of the three-dimensional structure and function of FtsK (Massey et al., 2006Down). In other words, for a particular strain the copy number nc-p(j) of ‘partition’ proteins, p(j), is constant at all growth rates so that n''c-p(j)/n'c-p(j)=1; see equation 11c (Table 2Up).

The average r-values r(3) for three genes encoding proteins involved in DNA partition, ftsK, parA and parB, at different periods of cell culture are shown in Fig. 2(a)Up. This profile labelled I is used to represent r(j) versus the period of cell culture. The r-values r(5) for five proteins involved in DNA replication lie close to profile I (see Fig. 2bUp), suggesting that the numbers of copies per population-average cell of these proteins also remain constant and independent of growth rate. It was noted (see Fig. 2cUp) that r-values for dnaE1, encoding the {alpha}-chain of DNA polymerase III, followed the pattern obtained for ribosomal proteins which is the reference profile for r(k) (see curve II, Fig. 2dUp), while the data for dnaE2 were too erratic to be helpful. Thus, the possibility remains that, as the growth rate diminishes, the availability of the {alpha}-chain of DNA polymerase III governs the concentration of DNA polymerase III holoenzyme.

Proteins involved in protein biosynthesis.
The average value of r (r(50)) for the set of 50 ribosomal proteins was found to diminish with increasing periods of cell culture. Profile II is the plot of the average values of r(50) against the period of cell culture (Fig. 2dUp) and is equivalent to the plot of r(k) against time (see Table 2Up). Eight factors required for protein synthesis including elongation factor EF-Tu (which participates in the rate-limiting step in peptide bond formation) form a group. Their average r-value (r(8)) follows the trend observed for ribosomal proteins; that is, within experimental error, values of r(8) follow curve II (see Fig. 2dUp), as shown in Fig. 2(e)Up. It is inferred that the number of copies of these eight factors per ribosome remains constant irrespective of growth rate. The expression of genes encoding three factors, fusA2, prfA and infB, was found to vary with the period of incubation in a different way, as judged by the profile of r(3) values (see Fig. 2fUp), which is similar to profile I (see Fig. 2aUp). It is concluded that the copy numbers of these proteins per population-average cell are independent of growth rate.

In contrast with fusA1 (Sassetti et al., 2003Down), fusA2 is a non-essential gene (Lamichhane et al., 2003Down). Both FusA1 and FusA2 are classified as having elongation factor EF-G activity. However, they differ significantly in their amino acid sequences; they share 221/701 identical amino acids, which is comparable with the 233 identical amino acids shared between FUSA1 and elongation factor EF-G of Escherichia coli. Both infB, encoding initiation factor IF-2, and prfA, encoding peptide chain release factor RF-1, are essential genes (Sassetti et al., 2003Down).

Proteins involved in RNA synthesis.
rRNA constitutes a major part of the RNA fraction (mRNA(av)) of population-average cells. The synthesis of rRNA is the rate-limiting step in ribosome synthesis (for a review see Keener & Nomura, 1996Down). Thus, the rate of synthesis of proteins involved in rRNA synthesis might be expected to co-ordinate with the rate of synthesis of ribosomal proteins. As anticipated, the r-values (r(6)) of six genes encoding proteins involved in RNA replication were found to follow profile II (Fig. 2aUp). Two genes, nusA and nusB, encoding proteins implicated in antitermination activity during the course of rRNA synthesis followed a different pattern (see Fig. 2hUp; the observed r-values followed profile I, Fig. 2aUp, rather than profile II, Fig. 2dUp). It is inferred that the copy numbers of NusA and NusB per population-average cell vary little with growth rate.

The tRNA pool functions in protein biosynthesis through the formation of aminoacyl-tRNA derivatives. The rate-limiting step in peptide chain formation is the interaction of a ternary complex of aminoacyl-tRNA, elongation factor EF-Tu and GTP with the A-site of the ribosome. Aminoacyl-tRNA synthases are responsible for charging tRNA. The r-values (r(16)) of 16 aminoacyl-tRNA synthase genes were examined (see Fig. 2iUp). The dependence of r(16) on the period of cell culture was found to follow profile II (Fig. 2iUp) for the first 14 days, indicating that the number of copies of aminoacyl-tRNA synthases per ribosome is constant over this period. The expression ratios found after longer periods of cell culture are greater than those expected for profile II, suggesting that the number of copies per ribosome increases at very slow growth rates.

Expression profiles other than profile I and profile II were observed (see Fig. 2j–lUp). Each profile is consistent with a doubling of the copy number (n''c-p(i)=2n'c-p(i)) of the encoded protein at some time during the 60 day period of cell culture. Other examples of periodically expressed genes are presented in Table 5Up. Large changes in the expression ratios were found for hspX, Rv2626c and Rv2660c. The non-essential gene dosR was found to be upregulated after 14 days of cell culture. The expression ratio of icl diminished as the period of cell culture increased. The expression ratio for the essential gene ftsZ was found to follow profile II, indicating that the number of copies of FtsZ per cell is proportional to the number of ribosomes per cell; that is, FtsZ is a member of the p(k) family of proteins.

C. Evaluation of parameters commonly used to describe cell growth
The growth of a cell culture is often described by the specific growth rate (µ) and a measure of macromolecular composition such as the ratio DNA : RNA : protein or more recently by µ and the macromolecular composition of a culture average cell (see the Theory section).

Relative values of mp(av) were deduced in the following way. At time t the ratio r(i)/r(j) provides the ratio n''c-p(i)/n'c-p(i). Thus the sum of {sum}(r(i)/r(j))t for values of i=1 to i=nORF/g divided by nORF/g provides an average value (n''c-p(i)/n'c-p(i))av which should reflect the ratio m''p(av)/m'p(av). For example, if after 60 days of cell culture m''p(av is 0.5 m'p(av) then (n''c-p(i)/n'c-p(i))av would also tend towards 0.5. In practice, it was found that the sum of {sum}(r(i)/r(j))t from i=1 to i=206 provided reliable estimates of (n''c-p(i)/n'c-p(i))av. The calculated values (see Table 6Up) suggest that although the composition of the protein fractions may vary with growth rate the mass of protein per cell m''p(av) is close to m'p(av) for all periods of cell culture.

Inspection of the sample of 206 genes revealed that in 74 cases the number of copies of protein p(i) per cell, nc-p(i), changed very little (n''c-p(i){approx}n'c-p(i)); in 44 cases n''c-p(i) was found to increase with the period of cell culture; in 86 cases n''c-p(i) was found to decrease with the period of culture. However, changes in the ratio n''c-p(i)/n'c-p(i) were relatively small (<2) and usually less than the corresponding decrease noted for ribosomal proteins (see Fig. 2dUp). Increases in the number of copies of protein p(i) per cell with the period of cell culture followed different patterns; examples are presented in Fig. 2(j–l)Up and Table 5Up. Two-dimensional gel analysis of the overall pattern of cell proteins indicates very little change in the number of proteins identified with the period of cell culture, although the relative abundances of individual proteins (as revealed by the intensities of staining) may vary (Betts et al., 2002Down).

The equations summarized in Table 2Up allow evaluation of the relative values of µ, mRNA(av) and mp(av). Values of r(j), r(k) and m''p(av)/m'p(av) are needed to solve the appropriate equations. Profile I (Fig. 2aUp) is considered to provide values of r(j) as a function of the period of cell culture; that is, the copy numbers of proteins p(j) (DNA partition proteins) are regarded as independent of growth rate: n''c-p(j)/n'c-p(j)=1. Profile II (Fig. 2dUp) observed for ribosomal proteins has the properties of r(k) as a function of the period of cell culture. In this case equation 11a reduces to equation 11c provided that the proportion of mRNA(av) that is rRNA remains unchanged. Then, n''c-p(k)/n'c-p(k) is equal to n''R(av)/n'R(av), which is equal to m''RNA(av)/m'RNA(av). The ratio m''p(av)/m'p(av) was estimated as a function of the period of cell culture (see equation 11d), as summarized in Table 6Up. These relationships were used to calculate the parameters presented in Table 7Up.

Maximum values of µ''/µ' (see Table 7Up) were obtained by means of equation 11e (Table 2Up) on the basis of the assumption that {epsilon}'aa(k)={epsilon}''aa(k). Maximum values of µ'' were then deduced by supposing that µ'=0.04 h–1 (tD=16 h). The following values were obtained: at day 6 and at day 8 µ''≤0.028 h–1 (tD ≥24 h); at day 14 µ''≤0.015 h–1 (tD ≥46 h); at day 24 µ''≤0.008 h–1 (tD ≥80 h); at day 60 µ''≤0.004 h–1 (tD ≥160 h). Technically it is very difficult to measure µ during stationary phase. However, it is to be expected that µ will diminish as the nutrient supply is reduced.

After 60 days of cell culture the cellular RNA content mRNA(av) (and the ribosome content nR(av)) were estimated to have decreased to approximately one-quarter of the content of mid-exponential-phase cells. This decrease in mRNA(av) (and nR(av)) is consistent with data reported by Beste et al. (2005)Down for Mycobacterium bovis BCG, the vaccine strain of M. tuberculosis. These authors grew M. bovis BCG in a chemostat at two different rates, µ=0.03 h–1 (tD=23 h) and µ=0.01 h–1 (tD=69 h), in order to represent mid-exponential-phase and stationary-phase cells. Their data reveal that, allowing 1.4 genome equivalents per cell, the slower-growing cells had 22 % of the RNA content of the faster-growing cells.

The fraction of cells actively synthesizing protein was also found to diminish over 60 days of cell culture (see Table 7Up) to no more than 38 % of the value for mid-exponential-phase cells. In other words at day 60 more than half of the ribosomes were not involved in synthesizing protein. Stationary-phase cells of E. coli also maintain an excess of unprogrammed (non-translating) ribosomes (in this case in the form of 100S dimers of 70S ribosomes) which allow the cell to respond rapidly on relief of nutritional constraints (McCarthy, 1960Down; Feiss & DeMoss, 1965Down; Wada et al., 1990Down; for a review see Keener & Nomura, 1996Down).

In summary, the information presented in Table 7Up provides both direct and indirect support for the theoretical framework which is summarized by the equations shown in Table 2Up.

D. Review of the support for equation 11 (see Table 2Up)
The data presented in section A (see Fig. 1Up) confirm the proposal that equation 11a is reduced to equation 11b when reference and experimental strains grow at the same specific growth rate. In this case the expression ratio is directly proportional to the ratio of the copy numbers n''c-p(i)/n'c-p(i) of the encoded protein, as was shown for ribosomal proteins.

A notable feature of section B is the number of different expression profiles that were found (see Fig. 2Up and Table 5Up). This diversity indicates that the expression ratio reflects the function of the encoded protein in cell metabolism. The equations listed in Table 2Up provide the theoretical basis for analysing an individual profile. The reference profiles I and II respectively characterize genes encoding the p(j) and p(k) families of genes. Eight genes, including five that encode proteins needed for DNA partition and synthesis, are represented by profile I. Sixty-four genes encoding ribosomal proteins, protein factors needed for protein synthesis and proteins needed for RNA synthesis are all represented by profile II. The equations underlying profiles I and II (see Table 2Up) were used to calculate changes in the properties of population-average cells during the course of cell culture (see Table 7Up). The trends reported in Table 7Up are in accord with our knowledge of the growth of bacterial cells in batch culture. This outcome provides support for equation 11a, etc. Table 7Up is based on experimental data which comprises expression ratios only.

A contrasting approach is that of Nie et al. (2006)Down, who used multiple regression analysis to correlate measurements of mRNA abundance and protein abundance in Desulfovibrio vulgaris. They found a modest correlation between these two quantities, which is largely explained by the technical problems they encountered. Microarray analysis was used to measure mRNA abundances. Tryptic digests of the protein fraction were separated by high-performance liquid chromatography and fractions were monitored by tandem mass spectrometry. No indication was given as to how these measurements might relate to other cell properties.

Concluding remarks
The results presented above complement the studies of Bacon et al. (2004)Down, Betts et al. (2002)Down, Kendall et al. (2004)Down and Voskuil et al. (2004)Down by extending the scope of the analysis of microarray data.

Many techniques, including microarray analysis, provide information about a population of cells. For this reason the concept of a population-average cell is both accurate and helpful. This concept provides the basis for developing a virtual schematic cell that summarizes the properties deduced from studies of cell cultures. In turn, the schematic cell can be an aid to designing new experiments.

In this report I have used the concept of a population-average cell to show that the expression ratio measured by microarray analysis is dependent not only on relative copy numbers of the encoded protein but also on several other cell parameters. However, when reference (for example wild-type) and experimental (for example mutant) strains grow at the same rate the expression ratio is equal to the ratio of the copy numbers of the encoded protein (the other terms cancel out).

The analysis of microarray data obtained when experimental and reference strains grew at different rates led to the recognition that in two cases the expression ratio is independent of the copy numbers of the encoded protein: namely genes encoding proteins (p(j)) whose copy numbers vary little with growth rate, and genes encoding proteins (p(k)) whose copy numbers are directly proportional to the number of ribosomes per cell. Proteins of the p(j) family are potentially useful as cell markers; for example, fluorescently labelled protein could be used to monitor both cell size and cell numbers.

At present few data are available for copy numbers of proteins of the tubercule bacillus. Thus, the theoretical framework could prove helpful in the study of metabolic pathways because of the possibility of monitoring changes with growth rate in the copy numbers of proteins involved in a particular process or pathway. This proposal is illustrated by three cell processes that were analysed in this report: DNA replication, protein synthesis and RNA synthesis. This study also highlights the extra information that could be obtained by applying microarray studies to culture of known growth rates and macromolecular compositions. Further understanding of metabolic processes gained in this way should lead to the identification of new drug targets and so facilitate drug discovery. Finally, the theoretical framework was developed for bacteria in general and its application is not restricted to M. tuberculosis.


    APPENDIX
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Correlation between the expression ratio r(i) and the number of copies nc-p(i) of protein p(i) per cell
Transcription/translation coupling.
Coupling between the processes of bacterial transcription and translation has long been accepted (Stent, 1964Down; Byrne et al., 1964Down; Miller et al., 1970Down). Transcription/translation coupling is thought to require that the rates of mRNA elongation ({epsilon}mRNA nucleotides h–1) and peptide chain elongation ({epsilon}aa amino acid residues h–1) are co-ordinated so that {epsilon}mRNA(i)=3{epsilon}aa(i), where the factor 3 reflects the number of nucleotides per codon. When ribosomes are translating a previously synthesized mRNA the rate-limiting step in protein biosynthesis is the rate of interaction of tc (a ternary complex formed between aminoacyl-tRNA, elongation factor EF-Tu and GTP) with the A-site of a ribosome (Pape et al., 1998Down). In order for transcription and translation to be coupled the rate-limiting step d(tc·A-site)/dt amino acids h–1 must be equal to or faster than one-third of the elongation rate {epsilon}mRNA nucleotides h–1 of nascent mRNA(i). Suppose that the maximum rate for peptide chain elongation is {epsilon}*aa(i), which is achieved when the formation of the peptide bond is rate limiting. Then, {epsilon}mRNA(i)=3{epsilon}*aa(i). When {epsilon}mRNA(i)<3{epsilon}*aa(i) the rate of transcription, by limiting the availability of codons, becomes the rate-limiting step in the synthesis of protein p(i).

Transcriptional control processes determine the rate of transcription of ORF(i). The maximum number of RNAPs per ORF(i) is determined by the size {psi} bp of the region of DNA involved in the formation of an open complex RNAPoc with RNAP to initiate transcription. The need for RNAPoc to move clear of the promoter restricts the maximum number of RNAPs per ORF(i) to one per {psi} base pairs. DNase protection studies (Ozoline & Tsyganov, 1995Down) show that {psi} ranges from 60 bp to approximately 80 bp for promoters that are active without a regulator and from 70 bp to 100 bp for regulated promoters.

The process of translation is thought to protect mRNA from degradation by RNases (degradosomes) (Grunberg-Manago, 1999Down; Régnier & Arraiano, 2000Down; Leroy et al., 2002Down; Khodursky & Bernstein, 2003Down). It is proposed that the number, {sigma}, of nucleotides protected by translation is related to the size of a region of mRNA protected by a ribosome and the number of nucleotides that form a binding site for a degradosome. The largest dimension of the bacterial 70S ribosome is 22.5±25 nm (Noller & Nomura, 1996Down), which is equivalent in length to a sequence of 60–75 nucleotides, allowing 0.34 nm (3.4 Å) per nucleotide. The maximum value of {sigma} corresponds to the length of the region protected by ribosomes plus a region smaller than the number of nucleotides that form a binding site for a degradosome. For example, unprotected stretches of mRNA of 20 nucleotides or more were considered to be capable of binding degradosomes and, hence, to be very rapidly degraded. Thus, a sequence of two regions of 10 nucleotides separated by a ribosome-binding site may approach the limiting size for a sequence protected from RNase action; namely a sequence of 80–95 nucleotides. Kinetic studies of the mRNA fraction of E. coli summarized by Bremer & Dennis (1996)Down led to estimates for {sigma} of 55–80 nucleotides per ribosome depending on the growth rate. Thus, a value of {sigma}=80±20 nucleotides per ribosome summarizes the several estimates.

Properties of a specified ORF (ORF(i)).
The properties of an ORF that is actively being transcribed can be related to the population-average cell as described below. A particular open reading frame ORF(i) encodes a particular protein p(i) which is present as nc-p(i) copies of p(i) per population-average cell; nc-p(i)≥0 copies per cell. The length of p(i) is defined as laa(i) amino acid residues so that the mass mp(i) of p(i) is the product of laa(i) and m*aa, the average mass of an amino acid residue. Hence, mp(av) is the sum from i=1 to i=nORFs/g, of the terms m*aa·nc-p(i)·laa(i) (see equation 1A). The limits of the summation are omitted to simplify the presentation; nORFs/g is the number of ORFs per genome.Down

Formula 015
Equation 1A allows {omega}p(av), the specific protein synthesis rate for a population-average cell, to be expressed in terms of a specified ORF. For exponential growth the term {omega}p(av) is defined by equation 2A, where µ h–1 is the specific growth rate:Down

Formula 016
Substitution for mp(av) in equation 2A leads to equation 3A:Down

Formula 017
Thus, the specific protein synthesis rate {omega}p(av) is equal to the sum of the specific protein synthesis rates {omega}p(i). The parameter {omega}p(i) is thus defined by equation 4A:Down

Formula 018
The rate {omega}p(i) (see Fig. 1Up) is also equal to the product of nR(i), the number of ribosomes translating transcripts of OFR(i), and {epsilon}aa(i), the rate (amino acids h–1) at which a ribosome elongates a peptide chain; see equation 5A:Down

Formula 019
The number of ribosomes per population-average cell directly involved in protein synthesis is the product βR·nR(av), where βR is the fraction of ribosomes programmed with mRNA and nR(av) is the number of ribosomes per population-average cell. The parameter nR(av) can also be expressed as a summation; see equation 6A:Down

Formula 020
The parameter nR(i) can be related to µ and {epsilon}aa(i) by equating the right-hand sides of equations 4A and 5A, leading to equation 7A:Down

Formula 021
The RNA fraction comprises three principal components, namely rRNA, tRNA and mRNA. In E. coli the proportions are 83 % rRNA, 15 % tRNA and 2 % mRNA (Bremer & Dennis, 1996Down). Thus, mRNA(av) is conveniently expressed as in equation 8A, as the sum of the masses of the individual components:Down

Formula 022
The parameter mmRNA(i) can be expressed as the sum of the masses of all of the transcripts of all of the ORFs, {sum}mmRNA(i), where mRNA(i)≥0 depending on whether or not ORF(i) is expressed; see equation 9A:Down

Formula 023
The parameter mmRNA(i) can be evaluated on the basis of the assumption that RNases rapidly degrade mRNA that is unprotected by ribosomes. Suppose that each ribosome protects {sigma} nucleotides from RNase action. Then nR(i) ribosomes translating mRNA(i) will protect {sigma}·nR(i) nucleotides and mRNA(i) is specified by equation 10A, where m*nuc is the average mass of a nucleotide:Down

Formula 024
Substituting for nR(i) in equation 10A leads to equation 11A:Down

Formula 025
Equation 11A may be presented in an alternative form (see equation 12A) by defining a new parameter ntr(i) which is also a measure of the transcriptional activity of ORF(i). The number of mRNA(i) nucleotides per ORF(i) is mmRNA(i)/m*nuc, and the number of nucleotides per full-length transcript of the coding region 3(laa(i)+1), there being three nucleotides per codon and one termination codon per ORF. The approximation that 3(laa(i)+1) equals 3laa(i) is used throughout without a significant loss of accuracy. Hence, ntr(i), the number of transcript equivalents, is defined by equation 12A:Down

Formula 026
Hence, equation 11A may be rearranged to make ntr(i) the subject after each side of the equation is multiplied by 3:Down

Formula 027
Equations 11A, 12A and 13A need to be modified in order to reflect current experimental procedures. For example, in qRT-PCR experiments the quantity measured is regarded as the number of transcripts of ORF(i) per unit mass of RNA. Effectively, the measured quantity is ntr(i)/mRNA(av); this quotient is defined by dividing both sides of equation 13A by mRNA(av) (see equation 14A):Down

Formula 028
Note that equations 13A and 14A provide the basis for the quantitative analysis of qRT-PCR measurements. The parameter measured in microarray experiments is the ratio r of two appropriately fluorescently labelled cDNA species competing for binding sites presented by DNA immobilized on a solid surface (see Fig. 1Up).

Suppose that prime and double prime superscripts denote competing cDNA species and that the two cDNA species are prepared from equivalent masses of the RNA substrates. Hence, r is defined by equations (15A) and (16A):Down

Formula 029
Down

Formula 030
Equations (15A) and (16A) are equivalent because substitution for either mmRNA(i) or ntr(i) leads to equation 17A:Down

Formula 031
Equations (15A–17A) link the measured quantity r (see Fig. 1Up) with four cell parameters, namely nc-p(i), the number of copies of protein p(i) per population-average cell; µ, the specific growth rate; {epsilon}aa(i), the polypeptide chain elongation rate; and mRNA(av).


    ACKNOWLEDGEMENTS
 
I thank Simon A. Cox for his help in the preparation of this manuscript and my colleagues Dr Michael G. Sargent and Dra Maria J. Garcia of the Universidad Autonoma de Madrid for their constructive comments.

Edited by: S. V. Gordon


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THEORY
 RESULTS AND DISCUSSION
 APPENDIX
 REFERENCES
 
Bacon, J., James, B. W., Wernisch, L., Williams, A., Morley, K. A., Hatch, G. J., Mangan, J. A., Hinds, J., Stoker, N. G. & other authors (2004). The influence of reduced oxygen availability on pathogenicity and gene expression in Mycobacterium tuberculosis. Tuberculosis (Edinb) 84, 205–217.[CrossRef][Medline]

Beste, D. J. V., Peters, J., Hooper, T., Avignone-Rossa, C., Bushell, M. E. & McFadden, J. (2005). Compiling a molecular inventory for Mycobacterium bovis BCG at two growth rates: evidence for growth rate-mediated regulation of ribosome biosynthesis and lipid metabolism. J Bacteriol 187, 1677–1684.[Abstract/Free Full Text]

Betts, J. C., Lukey, P. T., Robb, L. C., McAdam, R. A. & Duncan, K. (2002). Evaluation of a nutrient starvation model of Mycobacterium tuberculosis persistence by gene and protein expression profiling. Mol Microbiol 43, 717–731.[CrossRef][Medline]

Bowtell, D. & Sambrook, J. (2000). DNA Microarrays: a Molecular Cloning Manual. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press.

Bremer, H. & Dennis, P. P. (1996). Modulation of chemical composition and other parameters of the cell growth rate In Escherichia coli and Salmonella: Cellular and Molecular Biology, 2nd edn, pp. 1553–1568. Edited by F. C. Neidhardt and others. Washington, DC: American Society for Microbiology Press.

Butcher, P. D. (2004). Microarrays for Mycobacterium tuberculosis. Tuberculosis (Edinb) 84, 131–137.[CrossRef][Medline]

Byrne, R., Levin, J. G., Bladen, H. A. & Nirenberg, M. W. (1964). The in vitro formation of a DNA–ribosome complex. Proc Natl Acad Sci U S A 52, 140–148.[Free Full Text]

Cox, R. A. (2003). Correlation of the rate of protein synthesis and the third power of the RNA : protein ratio in Escherichia coli a