A mechanism that allows cells to read their own DNA in the correct direction and prevent them from copying most of the so-called "junk DNA" has been discovered by biologists at the Massachusetts Institute of Technology.

Only about 15 percent of the human genome consists of protein-coding genes; however, in recent years scientists have found that a surprising amount of "filler," or intergenic DNA, gets copied into RNA - the molecule that carries DNA's messages to the rest of the cell.

For years, scientists have been trying to figure out what this RNA is up to, if anything.

In 2008, MIT researchers led by Institute Professor Phillip Sharp discovered that much of this RNA is generated through a process called divergent expression, through which cells read their DNA in both directions moving away from a given starting point.

In the new study, published in the journal Nature, Sharp and his colleagues describe for the first time how cells initiate and then halt the copying of RNA in the upstream, or non-protein-coding direction, while allowing it to continue in the direction in which genes are correctly read.

The finding, the researchers said, helps to explain the existence of many recently discovered types of short strands of RNA whose function is unknown.

"This is part of an RNA revolution where we're seeing different RNAs and new RNAs that we hadn't suspected were present in cells, and trying to understand what role they have in the health of the cell or the viability of the cell," said Sharp, who is a member of MIT's Koch Institute for Integrative Cancer Research. "It gives us a whole new appreciation of the balance of the fundamental processes that allow cells to function."

DNA controls cellular activity by coding for the production of RNAs and proteins. In order to exert this control, the genetic information encoded by DNA has to be coped, or transcribed, into messenger RNA (mRNA).

In order to reveal its genetic messages, the DNA double helix unwinds and RNA transcription can proceed in either direction. However, to initiate this copying, an enzyme called RNA polymerase latches onto the DNA at a spot known as a promoter. The RNA polymerase then moves along the strand, building the mRNA chain as it goes.

In sequencing the mRNA transcripts of mouse embryonic stem cells, the researchers discovered that polyadenylation also plays a major role in halting the transcription of upstream, noncoding DNA sequences. Because these regions have a high density of signal sequences for polyadenylation, this prompts enzymes to chop up RNA before it gets very long. Stretches of DNA that code for genes have a low density of these signal sequences.

Furthermore, the researchers were able to locate another factor that influences whether transcription is allowed to continue.

Recently, it's been shown that when a cellular factor known as U1 snRNP binds to RNA, polyadenlyation is suppressed. Now, in the new study, scientists found that genes have a higher concentration of binding sites for U1 snRNP than noncoding sequences, allowing gene transcriptions to continue uninterrupted.

As a result, the work demonstrates the importance of U1 snRNP in protecting mRNA as it is transcribed from genes and in preventing the cell from unnecessary copying of non-protein-coding DNA, according to Gideon Dreyfuss, a professor of biochemistry and biophysics at the University of Pennsylvania School of Medicine.

"They've identified a very likely mechanism for early termination of these upstream RNAs by depriving them of U1 snRNP suppression of polyadenylation and cleavage," said Dreyfuss, who was not part of the research team.

However, the function of all of this upstream noncoding RNA remains a subject of much investigation.

"That transcriptional process could produce an RNA that has some function, or it could be a product of the nature of the biochemical reaction," Sharp said. "This will be debated for a long time."

Going forward, Sharp's lab is exploring the relationship between this transcription process and the observation of large numbers of long noncoding RNAs (lncRNSA), with the plan to investigate the mechanisms that control the synthesis of such RNAs and determine their functions.

"Once you see some data like this, it raises many more questions to be investigated, which I'm hoping will lead us to deeper insights into how our cells carry out their normal functions and how they change in malignancy," Sharp explained.