DNA_C_Value_Paradox1
Introduction:
Quantity of DNA in an organism per cell, in all cells, is always constant, for a given species. Given the list of organisms on this planet, with teaming millions, each have its own genome whose size varies from one species to the other. At this point of time, no body has found two different species, however close in phylogeny they may be, have the same size of the genome, even one finds such situations, and some of their sequence will be different. However when haploid genomic content is quantitated species wise, phylum wise, or kingdom wise in increased order of evolutionary table, one finds enormous variation, not only in the same phylum, group, order, family or genus and surprisingly variation in genomic content ( both qualitatively and quantitatively) within a phylum or an order or genus is surprisingly large from 10^5 bp to 10^12 bp. Animals show variations range more than 3,300-fold, and in land plants among them differ by a factor of about 1,000. Protists genomes have been surprisingly vary more than 300,000-fold in size. There is no relationship between the C-values and their complexities.
|
Species |
C value |
G value |
|
|
S.cerevisiae |
12 |
6000 |
|
|
D. melanogaster |
130 |
14000 |
|
|
C.elegans |
97 |
19000 |
|
|
A.thaliana |
125 |
26000 |
|
|
H.sapiens |
2900 |
3100 |
|
|
|
|
|
|
|
|
|
|
|
G Value: The number of gene found in the haploid genome; the number includes predicted and ORFs.
I value: The amount of information embedded by the genome; their estimates effective number of gene which encompasses alternative splicing, post translational modifications, multidomain proteins and gene redundancy plus gene expression and gene interaction
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Since genomes and their organisms are very complex, one research strategy is to reduce the number of genes in a genome to the bare minimum and still have the organism in question survive. There is experimental work being done on minimal genomes for single cell organisms as well as minimal genomes for multicellular organisms (see Developmental biology). The work is both in vivo and in silico.
Genome sizes are typically given as gametic nuclear haploid DNA contents (‘C-values’) either in units of mass (picograms, where 1 pg = 10−12 g) or in number of base pairs (in eukaryotes, most often in mega bases, where 1 Mb = 106 bases). These are directly interconvertible as 1 pg = 978 Mb (or 1 Mb = 1.022 × 10−3 pg)”. The quantity of the cDNA varies 8000 fold in eukaryote genomes.
Genomic Contents for Comparison:
Mycoplasma 10^5-6 bp
(-) Bacteria 5x10^6 bp
(+) Bacteria 2 to 8x10^6 bp
Fungi 2 to 5x10^7 bp
Algae 5 to 8x10^7 bp
Molds 6 to 9x10^7 bp,
Worms 7x10^7 to 2x10^8 bp,
Molluscs 6x10^8 to 7x10^9 bp,
Insects 1.5x10^8 to 6x10^9 bp,
Echinoderms 5x10^8 to 5x10^9 bp,
Cartilage fishes 3x10^9 to8x10^9 bp,
Bony fishes 6x10^8 to 9x10^9 bp,
Amphibians 8x10^8 to 9x10^11 bp,
Reptiles 2x10^9 to 5x10^9 bp,
Birds 2x10^9 to 9x10^9 bp,
Mammals 3x10^9 to 5x10^9 bp,
Flowering plants 8 x 10^8 to 2 x 10^12 bp
Frittilaria asyriaca 132pgx978Mbp =1.2X10^5Mbp=1.2x10^11bp
Paris (Pieris)japonica 152pgx978Mbps=1.486x10^11 bp
Paradox:
If one scans organisms from simplest unicellular to the most complex organisms, as that of human species and some flowering plants, one finds a size difference ranging from 10^5 to 10^12 bp (plants). Consider a simple amphibian and a mammal i.e. human being, the size of the genome in both is almost the same ie.10^9 bp. Another example makes it one wonder, why there is so much of difference between a fruit fly (1.4x10^8 bp) and a house fly (8x10*bp), where the body pattern is more or less the same with the exception of body size. If one compares the genome size of an amphibian X. laevis (10^9bp) and a lungfish (10^11bp), the difference is huge. There are many such anomalies and paradoxes difficult to explain and elucidate. Among plants Arabidopsis thaliana, an angiosperm (Dicot) known to contain smallest genome with 5 haploid chromosomes, but the first land plant called Psilotum nudum, the firs vascular land plant surprisingly consists of 2.5X 10^11 bp 3000 times the amount of Arabidopsis thaliana. The reason is due to the finding that the Psilotum genome contains more than 80% of its genome as what is now considered as repetitive DNA. Similarly some amphebians and lung fishes contain 30 or more amount of DNA than their counter species. The smallest gome is found in a parasite in a parasite Encephalitozoon intestinalis (a parasite in humans) contains 0.0023pg. Among plants Frittilaria assyrica (132.52pg) (a hybrid of Trillium and Hagae) once considered to contain largest genome but Paris japonica out beaten it with the genome size of 152. 23 pg (1pg = 978mBP) 15 times bigger than trillium or a large marbled lung fish Proteopterus aethiopicus with 132.83pg. In comparison to them Homo sapien contains 3pg.
Expectedly, logic wise, as the body of an organism becomes more and more complex with more number of cells and more number of tissues and organs, the size of the genome should increase keeping in step with the complexity. But within a group, with not much of difference in the body pattern and size, function and efficiency of the organism, paradoxically one will be baffled to find large difference in the genomic content; it is very difficult to comprehend and explain why such big difference; so this puzzle is called C –Value Paradox.
Paris japonica
“The scientists at The Institute for Genomic Research (now known as the J. Craig Venter Institute) who determined the Mycoplasma genitaliums’ sequence has followed this work by systematically destroying its genes (by mutating them with insertions) to see which ones are essential to life and which are dispensable. Of the 485 protein-encoding genes, they have concluded that only 381 of them are essential to life”. The work is remarkable.
One can always expect, with more types of cells and complexity of cell types more and more number of genes are required. Furthermore with the acquisition of greater number functions, regulations, adaptations and various other devises in the mode of living and reproduction, one generally expect, species have acquired greater number of genes; so the genomic size. Besides many of the DNA segments have duplicated many thousand times and added to this increase are viral DNA or RNA mediated DNAs have been inserted and duplicated. Thus more than 45% of the genomic DNA is found to be transposable elements or repetitive DNA in one or the other form. From recent genomic analysis for protein coding region of the whole genome it has been found that only 1 to 1.2% of the genome accounts for it.


This figure gives you a concept of the genome sizes from lower to higher order of organisms, but looking at the size of the genome and the number genes estimated to be present in each of the genome provides you the paradox, why some of the genomes have greater amount of DNA than the required; so why such extra DNA present and what it does?
Considering the above criteria, variation in the size of the genome within a group or between closely related groups, it is not possible to explain why organism have so much of excess DNA than that can be accounted for. Is there any large-scale duplication of genes into multiple copies? Is there any DNA just that exists without any function? Is there any possibility that during evolution many genes have been duplicated and some have lost their functional abilities and exist as molecular fossils? Is there any possibility of promiscuous enmass transfer of the genome from other members? There are innumerable cases, especially plants, where by duplication of haploid chromosomes lead to diploidy, triploidy and polyploidy, either by auto polyploidy or heteropolyploidy; the genome size has increased many fold, which has also created some variation in the size and also created new varieties or in extreme cases new species.
It is important to know and solve this puzzle, by assessing how much DNA is actually used for coding and how much of this DNA is redundant, duplicated, and how many copies exist each of them in a given genome. How much of DNA has no function and what is the size of these components and how many copies of each of these segments are present in a given genome. If the DNA that doesn’t have any function or cannot be accounted for what is it doing, where is it located with in the genome? Now techniques are available to determine the quality and quantity of protein coding genes, ribosomal RNA, tRNA and other small molecular weight RNA coding genes. It is possible to quantitate total DNA that codes for the above said components and also determine that DNA which doesn’t code for any functions. It is also possible to find out the number of each of the said category of DNA complexes. It is also possible to estimate the number of genes expressed in tissue specific manner and to find the number of genes expressed as house keeping genes. Even those genes that are expressed as house keeping or in tissue specific manner; or expressed in response to age, stage of growth and development or response to stimuli, all the said can be determined qualitatively and quantitatively.

During the course of evolution existing genome is added with different components by various means, thus the genome size has increased, some time abnormally. This increase in the genome size is used for various features like increased metabolism, cell numbers and body size, organ complexity and developmental complexity.
Solving the Paradox:
The techniques used are DNA: DNA hybridization, DNA: RNA hybridization and DNA micro array (HAD). Hybridization techniques can be used to understand reassociation kinetics, which provides methods for quantification of each kind of DNA or RNA. The methodology used is simple, where the DNA is fragmented to the required size and fragments of the same are taken and heated to its melting temperature ie.90°C (Tm) or more in a defined solution at a particular pH. When DNA fragments melt and strands separate completely, then they are allowed to renature or hybridize or anneal at temperature 25° C less than its melting temperature(Tm). The temperature at which dsDNA strands melt is called Tm (melting temperature), which varies from one kind of DNA to the other and it depends on the content GC or AT. DNA with more of GC content melts at higher temperature than the DNA having more of ATs.
Free nucleotides when used for measuring its quantity by means spectrophotometer, at UV wavelength 260nm, because of the presence of heterocyclic rings and double bonds, nucleotides absorb light; the property is termed as chromacity. If the same amount of nucleotides in the form of polynucleotide chains such as dsDNA or DNA-RNA hybrids shows 40% less absorption, it is called Hypochromacity. On the contrary, if the ds stranded dsDNA is melted into single stranded DNA, it’s OD increase by 40%, which is called Hyperchromacity. If one OD of DNA at 260nm, gives a quantity of 50ug/ml, one OD of ssDNA gives 40ug/ml, this is because in ds DNA not all the nucleotides are exposed or oriented for absorption of light and some are hidden from the pathway of light, because the DNA is in coiled state, while RNA being single stranded structure, all of its nucleotides are exposed to light, so it shows higher OD or it is hyper chromatic. This phenomenon can be exploited in quantification of ss DNA and ds DNA in dissociation and reassociation experiment; the same can also be used to study kinetics of reassociation, which provides valuable data.
Melting Point of DNA:

Curve shows the Tm at which melting starts

This Tm curve shows, above and below, how and at what temperature the ds DNA melts into single stranded DNA

When a given DNA, in an ionic solution with a specific pH, heated, it slowly melts into ssDNA, then the melting is fast as the temperature is raised. The temperature at which it completely melts, it is called melting point of DNA. But the term Tm is used to define the temperature at which half of the DNA is melted. The Tm of the DNA varies depending upon the G+C content, if the G+C content is more than A+T, the Tm is more and the reverse of it is true to A=T rich DNA.


Tm curve