DNA_C_Value_Paradox1

Introduction:

Quantity of DNA in an organism per cell, in all cells, is always constant, for a given species.  Given the list of organisms on this planet, with teaming millions, each have its own genome whose size varies from one species to the other.   At this point of time, no body has found two different species, however close in phylogeny they may be, have the same size of the genome, even one finds such situations, and some of their sequence will be different.  However when haploid genomic content is quantitated species wise, phylum wise, or kingdom wise in increased order of evolutionary table, one finds enormous variation, not only in the same phylum, group, order, family or  genus and surprisingly variation in genomic content ( both qualitatively and quantitatively) within a phylum or an order or genus is surprisingly large from 10^5 bp to 10^12 bp.  Animals show variations range more than 3,300-fold, and in land plants among them differ by a factor of about 1,000.  Protists genomes have been surprisingly vary more than 300,000-fold in size.  There is no relationship between the C-values and their complexities.

 

Origin of the term the term C:

“What is C Value- is it the Content, Complement, Concentration or what? , if it is content what content 2n or n. Many authors have incorrectly assumed that the "C" in "C-value" refers to "characteristic", "content", or "complement". Even among authors who have attempted to trace the origin of the term, there had been some confusion because Hewson Swift did not define it explicitly when he coined it in 1950.  In his original paper, Swift appeared to use the designation "1C value", "2C value", etc., in reference to "classes" of DNA content (e.g., Gregory 2001[3], 2002[4]); however, Swift explained in personal correspondence to Prof. Michael D. Bennett in 1975 that "I am afraid the letter C stood for nothing more glamorous than 'constant', i.e., the amount of DNA that was characteristic of a particular genotype" (quoted in Bennett and Leitch 2005). This is in reference to the report in 1948 by Vendrely and Vendrely of a "remarkable constancy in the nuclear DNA content of all the cells in all the individuals within a given animal species" (translated from the original French). Swift's study of this topic related specifically to variation (or lack thereof) among chromosome sets in different cell types within individuals, but his notation evolved into "C-value" in reference to the haploid DNA content of individual species and retains this usage today”.

However the discovery of a large amount of non coding DNA lead to the concept of C-DNA value or C-Value paradox and variation is surprisingly so vast it is called C-DNA value paradoxThe paradox or the enigma is between the C-value and the gene numbers. The disjunction between the human genome complexity-number of genes in organism and such complexity is termed as G-value paradox; distinguish between C-value paradox and G-value paradox? The I-value is about the information found in the G value complex.

Species

C value

G value

 

S.cerevisiae

12

6000

 

D. melanogaster

130

14000

 

C.elegans

97

19000

 

A.thaliana

125

26000

 

H.sapiens

2900

3100

 

 

 

 

 

 

 

 

 

C Value :The amount DNA found in haploid genome, measured in million base pairs or in pg; the C may mean constancy of the genome in the species.

G Value: The number of gene found in the haploid genome; the number includes predicted and ORFs.

I value:  The amount of information embedded by the genome; their estimates effective number of gene which encompasses alternative splicing, post translational modifications, multidomain proteins and gene redundancy plus gene expression and gene interaction

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Since genomes and their organisms are very complex, one research strategy is to reduce the number of genes in a genome to the bare minimum and still have the organism in question survive. There is experimental work being done on minimal genomes for single cell organisms as well as minimal genomes for multicellular organisms (see Developmental biology). The work is both in vivo and in silico.

 Genome sizes are typically given as gametic nuclear haploid DNA contents (‘C-values’) either in units of mass (picograms, where 1 pg = 10−12 g) or in number of base pairs (in eukaryotes, most often in mega bases, where 1 Mb = 106 bases). These are directly interconvertible as 1 pg = 978 Mb (or 1 Mb = 1.022 × 10−3 pg)”.  The quantity of the cDNA varies 8000 fold in eukaryote genomes.

Genomic Contents for Comparison:

Mycoplasma             10^5-6 bp

(-) Bacteria                5x10^6 bp

(+) Bacteria                2 to 8x10^6 bp

Fungi                          2 to 5x10^7 bp

Algae                         5 to 8x10^7 bp

Molds                         6 to 9x10^7 bp,

Worms                       7x10^7 to 2x10^8 bp,

Molluscs                    6x10^8 to 7x10^9 bp,

Insects                       1.5x10^8 to 6x10^9 bp,

Echinoderms                        5x10^8 to 5x10^9 bp,

Cartilage fishes        3x10^9 to8x10^9 bp,

Bony fishes               6x10^8 to 9x10^9 bp,

Amphibians              8x10^8 to 9x10^11 bp,

Reptiles                     2x10^9 to 5x10^9 bp,

Birds                           2x10^9 to 9x10^9 bp,

Mammals                  3x10^9 to 5x10^9 bp,

Flowering plants     8 x 10^8 to 2 x 10^12 bp

Frittilaria asyriaca   132pgx978Mbp =1.2X10^5Mbp=1.2x10^11bp

Paris (Pieris)japonica          152pgx978Mbps=1.486x10^11 bp

 

Paradox:

If one scans organisms from simplest unicellular to the most complex organisms, as that of human species and some flowering plants, one finds a size difference ranging from 10^5 to 10^12 bp (plants).  Consider a simple amphibian and a mammal i.e. human being, the size of the genome in both is almost the same ie.10^9 bp.   Another example makes it one wonder, why there is so much of difference between a fruit fly (1.4x10^8 bp) and a house fly (8x10*bp), where the body pattern is more or less the same with the exception of body size.  If one compares the genome size of an amphibian X. laevis (10^9bp) and a lungfish (10^11bp), the difference is huge.  There are many such anomalies and paradoxes difficult to explain and elucidate.  Among plants Arabidopsis thaliana, an angiosperm (Dicot) known to contain smallest genome with 5 haploid chromosomes, but the first land plant called Psilotum nudum, the firs vascular land plant surprisingly consists of  2.5X 10^11 bp 3000 times the amount of  Arabidopsis thaliana.  The reason is due to the finding that the Psilotum genome contains more than 80% of its genome as what is now considered as repetitive DNA.  Similarly some amphebians and lung fishes contain 30 or more amount of DNA than their counter species. The smallest gome is found in a parasite in a parasite  Encephalitozoon intestinalis (a parasite in humans) contains 0.0023pg.  Among plants Frittilaria assyrica (132.52pg) (a hybrid of Trillium and Hagae) once considered to contain largest genome but Paris japonica out beaten it with the genome size of 152. 23 pg (1pg =  978mBP) 15 times bigger than trillium or a large marbled lung fish Proteopterus aethiopicus with 132.83pg.  In comparison to them Homo sapien contains 3pg.

 

 

Expectedly, logic wise, as the body of an organism becomes more and more complex with more number of cells and more number of tissues and organs, the size of the genome should increase keeping in step with the complexity.    But within a group, with not much of difference in the body pattern and size, function and efficiency of the organism, paradoxically one will be baffled to find large difference in the genomic content; it is very difficult to comprehend and explain why such big difference; so this puzzle is called C –Value Paradox.

 

File:Paris japonica Kinugasasou in Hakusan 2003 7 27 openfree.jpg

Paris japonica

To explain this anomaly and the jig-saw puzzle, one has to ask a question to be answered, i.e. what is the total number of genes absolutely essential for a structure to be called life, which should exhibit growth, reproduction and live in happiest times and able to produce a population.  

How many genes does it take to make an organism? Craig Venter:

“The scientists at The Institute for Genomic Research (now known as the J. Craig Venter Institute) who determined the Mycoplasma genitaliums’ sequence has followed this work by systematically destroying its genes (by mutating them with insertions) to see which ones are essential to life and which are dispensable. Of the 485 protein-encoding genes, they have concluded that only 381 of them are essential to life”.  The work is remarkable.

One can always expect, with more types of cells and complexity of cell types more and more number of genes are required.   Furthermore with the acquisition of greater number functions, regulations, adaptations and various other devises in the mode of living and reproduction, one generally expect, species have acquired greater number of genes; so the genomic size. Besides many of the DNA segments have duplicated many thousand times and added to this increase are viral DNA or RNA mediated DNAs have been inserted and duplicated.  Thus more than 45% of the genomic DNA is found to be transposable elements or repetitive DNA in one or the other form.  From recent genomic analysis for protein coding region of the whole genome it has been found that only 1 to 1.2% of the genome accounts for it.

 

 This figure gives you a concept of the genome sizes from lower to higher order of organisms, but looking at the size of the genome and the number genes estimated to be present in each of the genome provides you the paradox, why some of the genomes have greater amount of DNA than the required; so why such extra DNA present and what it does?

 

Considering the above criteria, variation in the size of the genome within a group or between closely related groups, it is not possible to explain why organism have so much of excess DNA than that can be accounted for.  Is there any large-scale duplication of genes into multiple copies?  Is there any DNA just that exists without any function?   Is there any possibility that during evolution many genes have been duplicated and some have lost their functional abilities and exist as molecular fossils?   Is there any possibility of promiscuous enmass transfer of the genome from other members?  There are innumerable cases, especially plants, where by duplication of haploid chromosomes lead to diploidy, triploidy and polyploidy, either by auto polyploidy or heteropolyploidy; the genome size has increased many fold, which has also created some variation in the size and also created new varieties or in extreme cases new species.

 

 

 

It is important to know and solve this puzzle, by assessing how much DNA is actually used for coding and how much of this DNA is redundant, duplicated, and how many copies exist each of them in a given genome.   How much of DNA has no function and what is the size of these components and how many copies of each of these segments are present in a given genome.  If the DNA that doesn’t have any function or cannot be accounted for what is it doing, where is it located with in the genome?  Now techniques are available to determine the quality and quantity of protein coding genes, ribosomal RNA, tRNA and other small molecular weight RNA coding genes.  It is possible to quantitate total DNA that codes for the above said components and also determine that DNA which doesn’t code for any functions.  It is also possible to find out the number of each of the said category of DNA complexes.   It is also possible to estimate the number of genes expressed in tissue specific manner and to find the number of genes expressed as house keeping genes.  Even those genes that are expressed as house keeping or in tissue specific manner; or expressed in response to age, stage of growth and development or response to stimuli, all the said can be determined qualitatively and quantitatively.

 

 

 

During the course of evolution existing genome is added with different components by various means, thus the genome size has increased, some time abnormally.  This increase in the genome size is used for various features like increased metabolism, cell numbers and body size, organ complexity and developmental complexity.

Solving the Paradox:

The techniques used are DNA: DNA hybridization, DNA: RNA hybridization and DNA micro array (HAD).  Hybridization techniques can be used to understand reassociation kinetics, which provides methods for quantification of each kind of DNA or RNA.  The methodology used is simple, where the DNA is fragmented to the required size and fragments of the same are taken and heated to its melting temperature ie.90°C (Tm) or more in a defined solution at a particular pH.   When DNA fragments melt and strands separate completely, then they are allowed to renature or hybridize or anneal at temperature 25° C less than its melting temperature(Tm).   The temperature at which dsDNA strands melt is called Tm (melting temperature), which varies from one kind of DNA to the other and it depends on the content GC or AT.  DNA with more of GC content melts at higher temperature than the DNA having more of ATs.

 

Free nucleotides when used for measuring its quantity by means spectrophotometer, at UV wavelength 260nm, because of the presence of heterocyclic rings and double bonds, nucleotides absorb light; the property is termed as chromacity.   If the same amount of nucleotides in the form of polynucleotide chains such as dsDNA or DNA-RNA hybrids shows 40% less absorption, it is called Hypochromacity.   On the contrary, if the ds stranded dsDNA is melted into single stranded DNA, it’s OD increase by 40%, which is called Hyperchromacity.  If one OD of DNA at 260nm, gives a quantity of 50ug/ml, one OD of ssDNA gives 40ug/ml, this is because in ds DNA not all the nucleotides are exposed or oriented for absorption of light and some are hidden from the pathway of light, because the DNA is in coiled state, while RNA being single stranded structure, all of its nucleotides are exposed to light, so it shows higher OD or it is hyper chromatic.  This phenomenon can be exploited in quantification of ss DNA and ds DNA in dissociation and reassociation experiment; the same can also be used to study kinetics of reassociation, which provides valuable data.

Melting Point of DNA:

 

 

Curve shows the Tm at which melting starts

 

 

 

                                                           

This Tm curve shows, above and below, how and at what temperature the ds DNA melts into single stranded DNA

 

           

 

 

When a given DNA, in an ionic solution with a specific pH, heated, it slowly melts into ssDNA, then the melting is fast as the temperature is raised.  The temperature at which it completely melts, it is called melting point of DNA.  But the term Tm is used to define the temperature at which half of the DNA is melted.  The Tm of the DNA varies depending upon the G+C content, if the G+C content is more than A+T, the Tm is more and the reverse of it is true to A=T rich DNA.

 

                                                                        Tm curve