Re: pure chance

billgr@cco.caltech.edu
Wed, 15 Jan 1997 11:05:34 -0800 (PST)

Gene Godbold:

> > OK, so these are like the 'start' and 'stop' codons? Or just catalysts
> > which mark off coding and non-coding regions of the genome? (By telling
> > the polymerase where to bind.)
>
> There are a slew of DNA binding proteins which need to bind to the
> promoter in order to generate an RNA transcript. They specifically
> recognize areas of DNA as well as previously bound proteins.
>
> There are other proteins, downstream and upstream
> (3' and 5', respectively) which *can* (they aren't always present as I
> understand) either enhance the generation of the RNA transcript or inhibit
> the generation of the transcript. Other proteins help determine the rate
> at which the RNA polymerase bebops along the DNA strand. This is
> necessary for some proteins which need to be turned on (and off) quickly.
>
> There are also thought to be signals that regulate the stability of the
> RNA transcript in the 3' noncoding end of the RNA.
>
> The information of the enhancers would be transformed from
> the interaction of the protein with the DNA sequence to the number of RNA
> transcripts, I think, but I don't know how you would work backwards. The
> stability information would be lost after translation of the protein from
> mRNA. The information regarding how fast the RNA polymerase transcribes
> the RNA from the DNA, while vital for certain regulatory proteins, is not
> amenable to retrieval either once the RNA transcript is made.

I think you're right--it isn't possible for rate information to be
decoded once the product is made (unless something *else* is made that
records the rate or unless the rate is unique to the protein made).

I guess I had always just thought of reverse translation as a matter of
examining proteins and sticking codons together. Obviously that is only
a fraction of the story. The interaction between the regulator proteins
and the final products of a certain region would seem impossible to
reconstruct--once the final product is made, I would imagine the regulatory
proteins very possibly would have already switched around in abundance,
or, at very least, by the time the product was ready for some reverse
translation mechanism, it would have no way of knowing what the regulators
had been doing when it was first put into RNA transcript.

Are some genes read at different speeds depending on the needs of the
cell? Or is a particular part always read at a particular rate?

> > That is interesting. As you may know, in coding theory such codes sound
> > like NRZ codes (non-return-to-zero). That is, your code can't allow more
> > than a certain number of zeroes in a row, otherwise the timing will drift
> > on the readout device. For example, in a CD player, I think the max number
> > of zeroes in a row is like 4 or 5. This makes it necessary to pick an
> > ECC which doesn't have the possibility of concatenating more than the
> > critical number of zeroes. i.e. you can't have one codeword with 101000
> > and another of 000101110 because if the two come in order, you have 6
> > zeroes--too many.
>
> I'm not really up on coding theory, though it seems like it wouldn't be
> that foreign to molecular biology. I wonder why they don't teach us any?
> (Not that I'm a molecular biologist; I just sometimes play one at work.
> :-) Why is the NRZ rule used in CD players?

The reason in CD players is the timing one. The CD player maintains an
internal clock which tells it when to read off the bits coming in. Since
it is impossible to maintain the clock independently of the data stream
(it goes so fast that even a little bit of drift would cause large numbers
of errors), you have to sinc up the clock to the data itself. You can
do this by noticing when there is a transition--if you get a one and then
a zero, the transition should come halfway through your clock cycle. If
it doesn't, then you know the clock is off a little and you can correct
it. If there are too many zeroes (or ones) in a row, then you don't get
this signal for too long, and the clock drifts, and you may read 8 zeroes
when there were really 9. In actual practice, the hardware uses transitions
for encoding the bits, and so it is more like 1010101010 that you want
to avoid, but I think the basic idea is the same.

-Greg