Accurate gene identification requires approaches that extract information from the inherently low signal-to-noise ratio of the human genome. We shall describe some of them in Chapter 8. Here we discuss only one general approach, which is based on the observation that sequences that have a function are relatively conserved during evolution, whereas those without a function are free to mutate randomly. The strategy is therefore to compare the human sequence with that of the corresponding regions of a related genome, such as that of the mouse. Humans and mice are thought to have diverged from a common mammalian ancestor about 80 x 106 years ago, which is long enough for the majority of nucleotides in their genomes to have been changed by random mutational events. Consequently the only regions that will have remained closely similar in the two genomes are those in which mutations would have impaired function and put the animals carrying them at a disadvantage, resulting in their elimination from the population by natural selection. Such closely similar regions are known as conserved regions. The conserved regions include both functionally important exons and regulatory DNA sequences. In contrast, nonconserved regions represent DNA whose sequence is unlikely to be critical for function.
The power of this method can be increased by comparing our genome with the genomes of additional animals whose genomes have been completely sequenced, including the rat, chicken, chimpanzee, and dog. By revealing in this way the results of a very long natural “experiment,” lasting for hundreds of millions of years, such comparative DNA sequencing studies have highlighted the most interesting regions in these genomes. The comparisons reveal that roughly 5% of the human genome consists of “multi-species conserved sequences,” as discussed in detail near the end of this chapter. Unexpectedly, only about one-third of these sequences code for proteins. Some of the conserved noncoding sequences correspond to clusters of protein-binding sites that are involved in gene regulation, while others produce RNA molecules that are not translated into protein. But the function of the majority of these sequences remains unknown. This unexpected discovery has led scientists to conclude that we understand much less about the cell biology of vertebrates than we had previously imagined. Certainly, there are enormous opportunities for new discoveries, and we should expect many surprises ahead.
It will not be enough just to develop ways of treating the hereditary defects. We shall have to find some way to purify the pool of human germ plasm so that there will not be so many seriously defective children born . . . We are going to have to institute birth control, population control.
There should be tattooed on the forehead of every young person a symbol showing possession of the sickle-cell gene or whatever other similar gene . . . It is my opinion that legislation along this line, compulsory testing for defective genes before marriage, and some form of semi-public display of this possession, should be adopted.
[T]he ultimate application of molecular biology would be the direct control of nucleotide sequences in human chromosomes, coupled with recognition, selection and integration of the desired genes, of which the existing population furnishes a considerable variety. These notions of a future eugenics are, I think, the popular view of the distant role of molecular biology in human evolution.
The old eugenics was limited to a numerical enhancement of the best of our existing gene pool. The new eugenics would permit in principle the conversion of all the unfit to the highest genetic level.
We must remember that while objective science will always allow the facts to contribute to its discoveries, the questions we ask just as powerfully determine the answers we get.