Answers to study questions for required reading.
Study questions for Watson and Crick, 1953:
- Does the WC structure most clearly resemble A, B, or Z form? Why?
Ans: B-form. The bases are perpendicular to the helix axis, the sugars are anti.
- The authors state that "the two chains (but not the bases) are related
by a dyad perpendicular to the fibre axis." What is a "dyad," and
what does this statement suggest about whether the chains are parallel or
antiparallel? Under what circumstances would the bases in fact be related
by a true dyad axis?
Ans: A dyad is a C2 symmetry axis: a 180 degree rotation about the axis leads to an indistinguishable molecule.
Since the 5' and 3' project backward and forward respectivley on the left side of a base when the minor groove is at the bottom,
rotation about the pseudidyad places 5' forward and 3' back on the other side, leading to antiparallel strands.
- Draw the preferred enol form of thymine (why is it preferred?). What would
enol-T base pair with? Why was it important to Watson and Crick that the
keto forms of the bases be preferred?
| Ans: The enol form of thymine is shown at the right. This one has
all three double bonds conjugated. The H-binding pattern looks just
like C, so it would base pair with G. Bromo-deoxy U is more likely
than T to tautomerize in this way, making it mutageneic. Obviously,
Watson
and Crick needed to be confident they had the correct tautomeric
forms of the bases to come up with their model. |
 |
- Identify a nomenclature/numbering inconsistency between the WC paper and
today's labeling.
| Ans: The numbering of pyrimidines is different. Note there is
no picture of the actual base pairs -- they expect the reader to
draw them out for him/herself. |
 |
- The authors specifically suggest that their helical structure cannot apply
to RNA. Where/how do they say this?
Ans: "It is probably impossible to
build this structure with a ribose sugar in place of the deoxyribose, as
the extra oxygen atom would make too close a van der Waals contact." The
don't say to whom, but it's clear they had built real models. See "Why
no B-RNA."
Study questions for hybridization thermodynamics:
- Download the oligonucleotide hybridization
spreadsheet. Learn to use Microsoft Excel if necessary.You may also
want to look at the underlying
math, or not..
Ans: Not much to say here.
-
In the spreadsheet, change ΔS° and ΔH° and
observe effects on the melting curve. Change
both ΔH° and ΔS° at
the same time but try to keep the overall stability, roughly expressed by
ΔG°37 = ΔH° -310ΔS°
constant.
What happens to the sharpness of the transition as [the absolute magnitudes of] ΔH° and ΔS° increase?
Ans:
As ΔH° becomes more negative, the curve shifts to the right (higher
TM, more stable).
Conversely, a more negative ΔS° shifts the curve left. Changing
both can leave the TM
unchanged, but since the temperature dependence of ΔG° gets more
steep the curve becomes sharper (more rapid change in Keq for hybridization).
- Qualitatively guess at how accurately ΔH°, ΔS°, ΔG°37 and
TM can
each be measured from [thsi kind of] experimental data. Give your reasoning based on the results
from
above.
Ans: Because errors in ΔH° and ΔS° are compensating,
it takes fairly large changes in both to make an obvious difference in the shape
of the curve. It's hard to do better than +/- 8-10%. On the other hand, the TM can
be measured to better than 1 degree, perhaps a 0.1 % error in Kelvin temperature.
The errors in ΔH° and ΔS° largely
cancel in giving ΔG°37, so it is determined much more accurately
than ΔH° or ΔS°, good to about 2 %. We continue
to measure enthalpy and entropy changes individually because they are need to
predict the results at other temperatures.
Study questions for Moser and Dervan, 1987:
- Why do we focus on these particular triple base pairs instead of any of
the legion of other possibilities? (Remember what’s important about
the particular Watson-Crick base pairs vs. all the others.)
Ans: These
particular triples can stack on each other in a regular helix, like the WC base pairs.
- Why are the triplexes studied here stabilized by low pH? Why are they stabilized
by polyamines? Why is the increased stability of triplex at acidic pH not
apparent in Figure 5?
Ans: Low pH promotes the protonation of cytosine needed to form the CH+:G-C
triple. Polyamines help neutralize the high density of negative charge from
the phosphates. In Figure 5, binding is montiored by cleavage, but the cleavage
reaction itself is strongly pH-dependent, so at lower pH they are not detecting
the binding that is occurring.
- Figure 3 demonstrates parallel orientation of the third strand oligo-T
and the homopurine strand in the duplex. What is the reasoning leading to
this conclusion? How does it rule out strand displacement as a mode of binding?
Ans: As the Fe-EDTA group is moved from 5' to 3' along the third strand,
the position of cleavage moves from 3' to 5' on the poly-Y strand of the
duplex. Strand displacement would have given third strand antiparallel to
oligo-R strand.
- If the third strand probe bound in the minor groove rather than the major
groove, what would the DNA-EDTA 9 footprinting histogram on the bottom right
of figure 4 look like, and why?
Ans: Look at the triple strand structure in Figure 6.
The nearest phosphates in space are offset to the 5' side on the poly-R strand sequence and to the 3' side on the poly-Y strand.
Binding of a cleavage agent in the minor groove gives an offset in the opposite direction, which would be to the 3' on the poly-R and the
5' on the poly-Y.
- Why do the shorter probes or mismatches in Fig. 4 cleave reasonably well
at low temperature but then cleave (and presumably bind) less and less effectively
as the temperature is increased?
Ans: They are melting off.
- Why has it not been possible to generalize triple-strand recognition to
double-stranded targets of arbitrary sequence? In other words, why the restriction
to homopurine/homopyrimidine tracts?
Ans: They could not complete the code: C and T
don't have enough hydrogen bonding groups on their "Hoogsteen" faces.
Study questions for the chimp genome draft sequence, 2005.
This is a very long and complex paper, though very well-written. You do not
need to understand every detail, but you are responsible for the starred (*)
study
questions.
You need
not read the Methods, but the short Discussion section is important. You may
need to use Google or Wikipedia to answer some of these questions if too many
terms are unfamiliar. Warning: Since this is not my field, some of the answers
provided may be simplistic.
- *What is “sequence redundancy?” Why is it necessary to have
significant redundancy in order to assemble a genome using whole-genome shotgun
sequencing (WBS)?
Ans: Sequence redundancy is the average number of times each base was sequenced.
In WGS you can't choose clones ahead of time, so in order to have a reasonable
chance of sequencing nearly every base at least once you need to sequence most
of them more than once. This also gives more confidence in the accuracy of
the sequence and the assembly.
- What's the difference between nucleotide-level accuracy and structural
accuracy? What is the idea behind the claim that the substitution rate assessed
by comparison with a BAC is about what one would expect because the BAC is
a single haplotype?
Ans. Nucleotide-level accuracy refers to the local sequence, i.e. is a particular
position really o G or might it be an A? Structural accuracy refers to the
assembly of contigs (contiguous assemblies of shorter sequences). The BAC is
derived from only one chromosome, whereas the draft sequence has contributions
from both chromosomes and therefore heterozygosity will contribute differences.
- Figure 1b shows the divergence between chimp and human. How was the figure
constructed, and what do the individual symbols mean?
Ans: The graph shows the probability of observing the indicated divergence
frequency in 1 MB segments of each chromosome. It indicates that there is substantial
MB-to-MB variation in the amount of divergence. It would be interesting to
correlate this result with the coding sequence density on each chromosome.
- The authors show that sequence divergence is much more rapid at CpG sites,
due to cytosine methylation (at position 5) and then deamination. What would
be the product of methylation/deamination, and why does that lead to more mutation
than other kinds of DNA damage?
Ans: methylation and deamination give T, which is not immediately recognizable
as a damaged base. If the CpG is replicated before the mismatch is repaired,
a mutation will result.
- Is there a simple bottom line for the cause of the variation in divergence
frequency across chromosomes?
Ans: No. Increased divergence is correlated with location near teleomeres (chromosome
ends) and in dark-staining bands, but we do not know mechanisms.
- *What are “indels” and how do they ay arise?
Ans: Insertions or deletions of a few nucleotides to thousands of bp. Most
are very small, probably replication errors. The large ones are due to transposable
elements like retroviruses and LINE and SINE elements.
- Figure 7 shows that old Alu mobile elements are more likely to be found
in GC-rich regions. What explanation do the authors offer?
Ans: That they are preferentially lost from AT-rich regions.
- *What are “purifying selection” and “positive selection,” and
how are they reflected in the Ka/Ks ration?
Ans: Purifying (or negative, or stabilizing) selection: changes in coding sequence
are deleterious, therefore expunged when they appear at random. This will yield
a low Ka/Ks. Positive selection: a gene that has changed rapidly as a response
to the environment, therefore has a high rate of coding sequence changes, Ka/Ks
high or even >1. See this
paper in pLoS.
- Table 3 shows that rare alleles at human polymorphic loci are more likely
to have changes in coding sequences than common alleles or between human and
chimp. The authors suggest that this reflects the genetic load of mutation.
What does this mean?
Ans: These alleles are probably less fit than the common alleles. They are
constantly popping up due to mutation but seem unlikely to be fixed. Therefore
most will eventually be purged by natural selection.
- *What’s the point of looking closely at genes that have diverged
more rapidly than other genes?
Ans: these are the ones that are most likely to have been under positive selection
during the evolution of the human lineage, or otherwise reflect milestones
in our differentiation from the apes.
- What does the dramatic dip in the middle of Figure 10 mean?
Ans: Mutation at a splice site is very rare: it will probably destroy the protein.
- Figure 12 suggests that transcription factors are one class of genes that
have diverged more rapidly in the evolution of humans than of chimpanzees.
If true, what does this suggest about sources of phenotypic changes? Why isn’t
the TF point an obvious outlier on Fig. 12?
Ans: Evolution through regulation rather than the sequences of structural genes.
The TF family is large, therefore a small deviation from equal Ka/Ks is statistically
meaningful, whereas for smaller families it is not.
- *On page 81, the authors try to identify which human SNPs are ancestral
and which are new by comparison with the chimp as an “outgroup.” What’s
the principle behind this idea?
Ans: It’s a simple idea. The allele in humans that matches the chimp
allele is very likely to be the ancestral allele.
- What does the slope of Figure 13 mean?
Ans: The ancestral alleles are less likely to dominate than one might expect,
claimed to be due to population bottlenecks that reduce competition between
alleles. These show up more outside Africa.
- *How does reduced diversity relative to divergence suggest a selective
sweep in human history?
Ans: If we’re all very similar to each other but different from the chimp,
that suggests that some strong selection pressure wiped out the other alleles
in the human population. These regions also tend to have high-frequency derived
alleles.
- *The discussion, and in fact most of the paper, focus on distinguishing
adaptive change from neutral drift or even changes to less-fit phenotypes.
Which type of change is responsible for the bulk of the observed changes? Why
have maladaptive changes apparently been maintained more frequently in hominids
than rodents? Is there a possible upside to this genetic load?
Ans: Most of the changes are neutral, although this paper ignores changes
in non-coding regions. We can tolerate more mutations because we have smaller
populations, longer generation times, more cooperation and protection among
individuals, and recently the ability to use technology. The upside is that
we cover more “sequence space” than we would otherwise, may be
able to have a pool of people that are especially suited to respond to new
challenges. The raw material for natural selection is variation.
Study questions for Seeman et al., 1976:
- Why is it likely to be difficult for proteins to use all six of the major
groove recognition sites W1, W2, W3, W1’, W2’, and W3’ for
sequence-specific recognition?
Ans: For any given base pair, either W2 or W3 but not the other is occupied,
and small conformational changes would allow a protein to recognize either.
In class, we collapsed our discussion to treat W2 and W3 as one position.
- Why is the Table I entry for (G-C/C-G) discrimination at the S2’ position
a “(0)”? In other words, what is the basis for discrimination between
the two different base pairs, and why is it likely to be difficult?
Ans: The "(O)" means that discrimination is based on subtle geometric
features, in this case the orientation of hetero-atom-hydrogen bonds that
could allow
for discrimination on the basis of hydrogen bond orientation. Again, we ignored
this in class and treated the amino groups in the GC/CG minor groove as just
a donor. Note that in class we exaggerated the monotony of the minor groove:
see Kielkopf, C.L., White, S., Szewczyk, J.W., Turner, J.M., Baird, E.E.,
Dervan, P.B. and Rees, D.C. (1998) A Structural Basis for Recognition of
A•T
and T•A Base Pairs in the Minor Groove of B-DNA. Science, 282, 111-115:

- Inosine, which is the same as guanine except that the 2-NH2 group is replaced
by H, can be used as a probe for the groove recognized by a protein. Compare
the I-C base pair to the A-U and G-C base pairs and predict the result of an
experiment where G is substituted by I and the binding of either a major-groove
binding protein or a minor-groove binding protein is studied.
Ans: The I-C base pair looks like a G-C base pair from the viewpoint of
the major groove but like an A-T base pair from the minor groove. So if I
is substituted for a contacted G and protein binding is unaffected,
the protein probably binds the major groove. If binding is altered, the protein
probably binds the minor groove. Similarly an I fo rA susbstitution will
alter only major groove recognition. For a classic case study, see Starr
DB, Hawley DK, Cell, 1991, 67:1231-40. "TFIID binds in the
minor groove of the TATA box."
Study questions for Ren et al., 2000:
-
What is LM-PCR, the method by which IP-enriched DNA was amplified?
Before you look it up, think about what it must do. Why couldn't they
amplify the IP'd DNA with regular old PCR?
Ans: The IP'd DNA is of unknown sequence. What PCR primers would you use?
LM-PCR is ligase-mediated PCR. A common linker is ligated to each end of the
random DNA, and a primer complementary to the linker is used to amplify the entire fragment.
- What is the point of Figure 1B? What do the red and blue dashed
lines mean? Why do they flare out at the bottom left? What's a P value?
Ans: The devil is in the details in the microarray world. The point of the
paper is to identify sequences that are enriched in the immunoprecipitated
DNA. Such sequences are identified by comparing Cy5-labeled enriched DNA
with Cy3-labeled
control
DNA (genomic
DNA). Figure
1 shows that
the amplification of genomic DNA gives very similar results using the two
fluorophores, so that changes in Cy5 intensity can be ascribed to immunoprecipitation
rather than preferential LM-PCR. Note that the LM-PCR signal varies over
a 100-fold range: there are in fact big differences in amplification, they
just are not dye-specific. The red and blue lines represent the likelihood
that the given difference in signal would be obtained by chance, with probability
10e-3 and 10e-5 respectively. The lines flare out because at lower signal
we expect more noise relative to signal, so a larger random probability fo
seeing substantially different signals. The P-value (http://en.wikipedia.org/wiki/P-value)
is "the probability of
obtaining a result at least as 'impressive' as that obtained, assuming
the truth of the null hypothesis that the finding was the result of chance
alone. The fact that p-values are based on this assumption is crucial to
their correct
interpretation." Lower down on the page they give many ways in which
the P-value is incorrectly interpreted.
- Does it strike you as odd that the differences in expression levels
in Figure 2 are so much larger than the differences in occupancy
levels? Similarly, GAL4 is observed to bind at only a few of its many
recognition sites in the genome. What is the likely explanation for
both observations?
Ans: Figure 2 hints at many interesting questions. The expression ratios
are much larger than the binding ratios, and the color scheme conceals this
fact. There must be other proteins involved in expression: if it were simply
one protein binding we woudl expect activation to be proportional to the
binding ratio. It is likely that expression is also affected
by other transcription factors, or that multiple binding of GAL4
is leading to cooperative
recruitment
of chromatin remodeling or basal machinery. Or perhaps downstream regulation
of mRNA translation or lifetime is also operative. The fact that GAL4 does
not bind to many of its genomic consensus sites is consitent with this: it
depends on other proteins for cooperative or anticooperative binding.
- The MTH1, PCL10, and FUR4 genes could have been identified as being
up-regulated by GAL4 simply on the basis of mRNA expression data. Why
does the ChIP-chip method still tell us something we wouldn't have
known only from expression data?
Ans: We know that the effect of GAL4 is direct, that it binds the promoters
of these genes rather than, for example, inducing synthesis of a second factor
that turns these genes on.
- Why are all the rightmost lanes in Figure 2B blank?
Ans: These are controls showing that the Myc antibody does not IP anything
unless Myc-tagged Gal4 is present in the cell.
- What other types of DNA binding proteins (besides transcription
factors) would be interesting to examine using this method?
Ans: YFP, but the most obvious targets are histones and other components of the basal machinery.
Study questions for Naktinis et al., 1996
- What is the basis of the “protein footprinting” assay used
to measure interactions among core, beta, and gamma?
Ans: Beta is labeled at teh C-terminus using a fused protein kinase's recognition
sequence. Then the resistance to proteolysis is assessed in the presence
and absence of the other factors.
- What is the evidence for competitive as opposed to simultaneous binding
of core and gamma to the beta sliding clamp?
Ans: Gel filtration experiments show that beta can bind either core or gama,
but when all three are mixed the beta co-elutes with gamma, not core, and
no larger ternary complex is observed.
- Figure 7 shows that the gamma complex can remove sliding clamps from DNA,
but the evidence is that the +gamma curve changes very little with time.
Why is much of the clamp still bound at the end of the reaction? What might
the gel filtration profile have looked like in the presence of a large excess
of linear DNA?
Ans: The clamp is at equlibrium: presumably the clamp loader is actively
loading and unloading the beta clamps during the reaction. It would have
been nice to see the reaction start from unloaded beta as well. If it is
truly at equilbrium the product should be the same.
- What is the advantage to the cell in having primer-template DNA but not
nicked DNA stabilize the beta-core interaction? How does this play out during
the synthesis of an Okazaki fragment?
Ans:
The primer-template mimics the situation
of a lagging strand polymerase during synthesis of an Okazaki fragment, and
therefore strong beta-core interaction increases processivity. The nicked site
models the end of an Okazaki fragment, so the handoff potentiates rapid recycling of core.
Study questions for Cosma et al., 1999
This is a dense paper that concerns a classic system in yeast cell and molecular biology.
You do not need to understand every detail.
- In Figure 1, what is the evidence for specific binding of Swi4p to the URS2 region? What do the different "delta" lanes in 1B mean?
Why do the authors do a dilution series of the WCE (= whole cell extract) as a control?
Ans: The evidence for binding of Swi4p to URS1 is that only one band lights
up when they amplify the IP'd DNA. The delta lanes are deletion mutants indicating
that deletions of genes
known to be important for transcription of the HO gene also knowck out the
signal in ChIP. The WCE is a control showing that the amplification has
not saturated and that Swi4p
can be detected in unsynchronized cells.
- You do not need to understand all the tricks used to manipulate and monitor the yeast cell cycle, but one of the main points of the paper
is the use of cell-cycle regulated genes as a good model system for following the order of events at a promoter.
Why is this important? Why couldn't the authors just do their experiments with an unsynchronized population of cells? How else might one follow
the order of events at a regulated promoter? In figures 2 and 3, FACS (= fluorescence-activated cell sorting) is a way to measure the DNA content
of individual cells in a population. It is used to help show that Ash1p prevents binding of Swi4p to the URS2 during the first cell cycle after
release from a block and that HO mRNA and Swi4p binding are cell-cycle dependent.
Ans: The use of cell-cycle regulated genes allows transcription to be synchronized,
because yeast can be synchronized. This means that all of the cells are at the
same point in their temporal regulation, whereas for an unsynchronized culture
or a gene that was not cell cycle regulated
every cell would be at a different stage and all we would see would be an
average. Another way to approach this problem would be to add an inducing ligand
at a set time and folow events at the promoter as a function of time thereafter.
-
What is the evidence that at this promoter the action of SWI/SNF is necessary to recruit SAGA (which includes HAT activity)?
Ans: Figure 4 shows that Swi2 (SWI/SNF) binding precedes Ada (SAGA) binding.
Figure 5 shows that SWI/SNF
binding still occurs in the SAGA deletion strains. Figure 6 shows that SAGA
is not recruited in the absence of SWI/SNF activity. Taken together, this
suggests that SWI/SNF recruits SAGA.
- The authors note that Swi5p is present on DNA only transiently but leads to long-lasting effects.
They showed this using ChIP assays on carefully synchronized cells. On page 306 they discuss the evidence
that this observation is real and not due to "epitope masking." What would epitope masking do to the ChIP assay?
If it were actually occurring, what would be the resulting hypothesis as to the role of Swi5p?
Ans: The idea of epitope masking is that if Swi5p were actually present
at the promoter but some other protein landed on top of it and blocked the epitope, then Swi5p would become invisible
to ChIP. The conclusion that Swi5p has a "hit and run" effect would be wrong: the hypothesis would be that the continued of
Swi5p is needed for transcription.
Biochemistry 674 course home page.
Jason Kahn's home page.