Answers to study questions for required reading.

Study questions for Watson and Crick, 1953:

  1. Does the WC structure most clearly resemble A, B, or Z form? Why?
    Ans: B-form. The bases are perpendicular to the helix axis, the sugars are anti.
  2. The authors state that "the two chains (but not the bases) are related by a dyad perpendicular to the fibre axis." What is a "dyad," and what does this statement suggest about whether the chains are parallel or antiparallel? Under what circumstances would the bases in fact be related by a true dyad axis?
    Ans: A dyad is a C2 symmetry axis: a 180 degree rotation about the axis leads to an indistinguishable molecule. Since the 5' and 3' project backward and forward respectivley on the left side of a base when the minor groove is at the bottom, rotation about the pseudidyad places 5' forward and 3' back on the other side, leading to antiparallel strands.
  3. Draw the preferred enol form of thymine (why is it preferred?). What would enol-T base pair with? Why was it important to Watson and Crick that the keto forms of the bases be preferred?
    Ans: The enol form of thymine is shown at the right. This one has all three double bonds conjugated. The H-binding pattern looks just like C, so it would base pair with G. Bromo-deoxy U is more likely than T to tautomerize in this way, making it mutageneic. Obviously, Watson and Crick needed to be confident they had the correct tautomeric forms of the bases to come up with their model.
  4. Identify a nomenclature/numbering inconsistency between the WC paper and today's labeling.
    Ans: The numbering of pyrimidines is different. Note there is no picture of the actual base pairs -- they expect the reader to draw them out for him/herself.
  5. The authors specifically suggest that their helical structure cannot apply to RNA. Where/how do they say this?
    Ans: "It is probably impossible to build this structure with a ribose sugar in place of the deoxyribose, as the extra oxygen atom would make too close a van der Waals contact." The don't say to whom, but it's clear they had built real models. See "Why no B-RNA."

Study questions for hybridization thermodynamics:

  1. Download the oligonucleotide hybridization spreadsheet. Learn to use Microsoft Excel if necessary.You may also want to look at the underlying math, or not..
    Ans: Not much to say here.
  2. In the spreadsheet, change ΔS° and ΔH° and observe effects on the melting curve. Change both ΔH° and ΔS° at the same time but try to keep the overall stability, roughly expressed by
    ΔG°37 = ΔH° -310ΔS°
    constant. What happens to the sharpness of the transition as [the absolute magnitudes of] ΔH° and ΔS° increase?
    Ans: As ΔH° becomes more negative, the curve shifts to the right (higher TM, more stable). Conversely, a more negative ΔS° shifts the curve left. Changing both can leave the TM unchanged, but since the temperature dependence of ΔG° gets more steep the curve becomes sharper (more rapid change in Keq for hybridization).
  3. Qualitatively guess at how accurately ΔH°, ΔS°, ΔG°37 and TM can each be measured from [thsi kind of] experimental data. Give your reasoning based on the results from above.
    Ans: Because errors in ΔH° and ΔS° are compensating, it takes fairly large changes in both to make an obvious difference in the shape of the curve. It's hard to do better than +/- 8-10%. On the other hand, the TM can be measured to better than 1 degree, perhaps a 0.1 % error in Kelvin temperature. The errors in ΔH° and ΔS° largely cancel in giving ΔG°37, so it is determined much more accurately than ΔH° or ΔS°, good to about 2 %. We continue to measure enthalpy and entropy changes individually because they are need to predict the results at other temperatures.

Study questions for Moser and Dervan, 1987:

  1. Why do we focus on these particular triple base pairs instead of any of the legion of other possibilities? (Remember what’s important about the particular Watson-Crick base pairs vs. all the others.)
    Ans: These particular triples can stack on each other in a regular helix, like the WC base pairs.
  2. Why are the triplexes studied here stabilized by low pH? Why are they stabilized by polyamines? Why is the increased stability of triplex at acidic pH not apparent in Figure 5?
    Ans: Low pH promotes the protonation of cytosine needed to form the CH+:G-C triple. Polyamines help neutralize the high density of negative charge from the phosphates. In Figure 5, binding is montiored by cleavage, but the cleavage reaction itself is strongly pH-dependent, so at lower pH they are not detecting the binding that is occurring.
  3. Figure 3 demonstrates parallel orientation of the third strand oligo-T and the homopurine strand in the duplex. What is the reasoning leading to this conclusion? How does it rule out strand displacement as a mode of binding?
    Ans: As the Fe-EDTA group is moved from 5' to 3' along the third strand, the position of cleavage moves from 3' to 5' on the poly-Y strand of the duplex. Strand displacement would have given third strand antiparallel to oligo-R strand.
  4. If the third strand probe bound in the minor groove rather than the major groove, what would the DNA-EDTA 9 footprinting histogram on the bottom right of figure 4 look like, and why?
    Ans: Look at the triple strand structure in Figure 6. The nearest phosphates in space are offset to the 5' side on the poly-R strand sequence and to the 3' side on the poly-Y strand. Binding of a cleavage agent in the minor groove gives an offset in the opposite direction, which would be to the 3' on the poly-R and the 5' on the poly-Y.
  5. Why do the shorter probes or mismatches in Fig. 4 cleave reasonably well at low temperature but then cleave (and presumably bind) less and less effectively as the temperature is increased?
    Ans: They are melting off.
  6. Why has it not been possible to generalize triple-strand recognition to double-stranded targets of arbitrary sequence? In other words, why the restriction to homopurine/homopyrimidine tracts?
    Ans: They could not complete the code: C and T don't have enough hydrogen bonding groups on their "Hoogsteen" faces.

Study questions for the chimp genome draft sequence, 2005.

This is a very long and complex paper, though very well-written. You do not need to understand every detail, but you are responsible for the starred (*) study questions. You need not read the Methods, but the short Discussion section is important. You may need to use Google or Wikipedia to answer some of these questions if too many terms are unfamiliar. Warning: Since this is not my field, some of the answers provided may be simplistic.

  1. *What is “sequence redundancy?” Why is it necessary to have significant redundancy in order to assemble a genome using whole-genome shotgun sequencing (WBS)?
    Ans: Sequence redundancy is the average number of times each base was sequenced. In WGS you can't choose clones ahead of time, so in order to have a reasonable chance of sequencing nearly every base at least once you need to sequence most of them more than once. This also gives more confidence in the accuracy of the sequence and the assembly.
  2. What's the difference between nucleotide-level accuracy and structural accuracy? What is the idea behind the claim that the substitution rate assessed by comparison with a BAC is about what one would expect because the BAC is a single haplotype?
    Ans. Nucleotide-level accuracy refers to the local sequence, i.e. is a particular position really o G or might it be an A? Structural accuracy refers to the assembly of contigs (contiguous assemblies of shorter sequences). The BAC is derived from only one chromosome, whereas the draft sequence has contributions from both chromosomes and therefore heterozygosity will contribute differences.
  3. Figure 1b shows the divergence between chimp and human. How was the figure constructed, and what do the individual symbols mean?
    Ans: The graph shows the probability of observing the indicated divergence frequency in 1 MB segments of each chromosome. It indicates that there is substantial MB-to-MB variation in the amount of divergence. It would be interesting to correlate this result with the coding sequence density on each chromosome.
  4. The authors show that sequence divergence is much more rapid at CpG sites, due to cytosine methylation (at position 5) and then deamination. What would be the product of methylation/deamination, and why does that lead to more mutation than other kinds of DNA damage?
    Ans: methylation and deamination give T, which is not immediately recognizable as a damaged base. If the CpG is replicated before the mismatch is repaired, a mutation will result.
  5. Is there a simple bottom line for the cause of the variation in divergence frequency across chromosomes?
    Ans: No. Increased divergence is correlated with location near teleomeres (chromosome ends) and in dark-staining bands, but we do not know mechanisms.
  6. *What are “indels” and how do they ay arise?
    Ans: Insertions or deletions of a few nucleotides to thousands of bp. Most are very small, probably replication errors. The large ones are due to transposable elements like retroviruses and LINE and SINE elements.
  7. Figure 7 shows that old Alu mobile elements are more likely to be found in GC-rich regions. What explanation do the authors offer?
    Ans: That they are preferentially lost from AT-rich regions.
  8. *What are “purifying selection” and “positive selection,” and how are they reflected in the Ka/Ks ration?
    Ans: Purifying (or negative, or stabilizing) selection: changes in coding sequence are deleterious, therefore expunged when they appear at random. This will yield a low Ka/Ks. Positive selection: a gene that has changed rapidly as a response to the environment, therefore has a high rate of coding sequence changes, Ka/Ks high or even >1. See this paper in pLoS.
  9. Table 3 shows that rare alleles at human polymorphic loci are more likely to have changes in coding sequences than common alleles or between human and chimp. The authors suggest that this reflects the genetic load of mutation. What does this mean?
    Ans: These alleles are probably less fit than the common alleles. They are constantly popping up due to mutation but seem unlikely to be fixed. Therefore most will eventually be purged by natural selection.
  10. *What’s the point of looking closely at genes that have diverged more rapidly than other genes?
    Ans: these are the ones that are most likely to have been under positive selection during the evolution of the human lineage, or otherwise reflect milestones in our differentiation from the apes.
  11. What does the dramatic dip in the middle of Figure 10 mean?
    Ans: Mutation at a splice site is very rare: it will probably destroy the protein.
  12. Figure 12 suggests that transcription factors are one class of genes that have diverged more rapidly in the evolution of humans than of chimpanzees. If true, what does this suggest about sources of phenotypic changes? Why isn’t the TF point an obvious outlier on Fig. 12?
    Ans: Evolution through regulation rather than the sequences of structural genes. The TF family is large, therefore a small deviation from equal Ka/Ks is statistically meaningful, whereas for smaller families it is not.
  13. *On page 81, the authors try to identify which human SNPs are ancestral and which are new by comparison with the chimp as an “outgroup.” What’s the principle behind this idea?
    Ans: It’s a simple idea. The allele in humans that matches the chimp allele is very likely to be the ancestral allele.
  14. What does the slope of Figure 13 mean?
    Ans: The ancestral alleles are less likely to dominate than one might expect, claimed to be due to population bottlenecks that reduce competition between alleles. These show up more outside Africa.
  15. *How does reduced diversity relative to divergence suggest a selective sweep in human history?
    Ans: If we’re all very similar to each other but different from the chimp, that suggests that some strong selection pressure wiped out the other alleles in the human population. These regions also tend to have high-frequency derived alleles.
  16. *The discussion, and in fact most of the paper, focus on distinguishing adaptive change from neutral drift or even changes to less-fit phenotypes. Which type of change is responsible for the bulk of the observed changes? Why have maladaptive changes apparently been maintained more frequently in hominids than rodents? Is there a possible upside to this genetic load?
    Ans: Most of the changes are neutral, although this paper ignores changes in non-coding regions. We can tolerate more mutations because we have smaller populations, longer generation times, more cooperation and protection among individuals, and recently the ability to use technology. The upside is that we cover more “sequence space” than we would otherwise, may be able to have a pool of people that are especially suited to respond to new challenges. The raw material for natural selection is variation.

Study questions for Seeman et al., 1976:

  1. Why is it likely to be difficult for proteins to use all six of the major groove recognition sites W1, W2, W3, W1’, W2’, and W3’ for sequence-specific recognition?
    Ans: For any given base pair, either W2 or W3 but not the other is occupied, and small conformational changes would allow a protein to recognize either. In class, we collapsed our discussion to treat W2 and W3 as one position.
  2. Why is the Table I entry for (G-C/C-G) discrimination at the S2’ position a “(0)”? In other words, what is the basis for discrimination between the two different base pairs, and why is it likely to be difficult?
    Ans: The "(O)" means that discrimination is based on subtle geometric features, in this case the orientation of hetero-atom-hydrogen bonds that could allow for discrimination on the basis of hydrogen bond orientation. Again, we ignored this in class and treated the amino groups in the GC/CG minor groove as just a donor. Note that in class we exaggerated the monotony of the minor groove: see Kielkopf, C.L., White, S., Szewczyk, J.W., Turner, J.M., Baird, E.E., Dervan, P.B. and Rees, D.C. (1998) A Structural Basis for Recognition of A•T and T•A Base Pairs in the Minor Groove of B-DNA. Science, 282, 111-115:
  3. Inosine, which is the same as guanine except that the 2-NH2 group is replaced by H, can be used as a probe for the groove recognized by a protein. Compare the I-C base pair to the A-U and G-C base pairs and predict the result of an experiment where G is substituted by I and the binding of either a major-groove binding protein or a minor-groove binding protein is studied.
    Ans: The I-C base pair looks like a G-C base pair from the viewpoint of the major groove but like an A-T base pair from the minor groove. So if I is substituted for a contacted G and protein binding is unaffected, the protein probably binds the major groove. If binding is altered, the protein probably binds the minor groove. Similarly an I fo rA susbstitution will alter only major groove recognition. For a classic case study, see Starr DB, Hawley DK, Cell, 1991, 67:1231-40. "TFIID binds in the minor groove of the TATA box."

Study questions for Ren et al., 2000:

  1. What is LM-PCR, the method by which IP-enriched DNA was amplified? Before you look it up, think about what it must do. Why couldn't they amplify the IP'd DNA with regular old PCR?
    Ans: The IP'd DNA is of unknown sequence. What PCR primers would you use? LM-PCR is ligase-mediated PCR. A common linker is ligated to each end of the random DNA, and a primer complementary to the linker is used to amplify the entire fragment.
  2. What is the point of Figure 1B? What do the red and blue dashed lines mean? Why do they flare out at the bottom left? What's a P value?
    Ans: The devil is in the details in the microarray world. The point of the paper is to identify sequences that are enriched in the immunoprecipitated DNA. Such sequences are identified by comparing Cy5-labeled enriched DNA with Cy3-labeled control DNA (genomic DNA). Figure 1 shows that the amplification of genomic DNA gives very similar results using the two fluorophores, so that changes in Cy5 intensity can be ascribed to immunoprecipitation rather than preferential LM-PCR. Note that the LM-PCR signal varies over a 100-fold range: there are in fact big differences in amplification, they just are not dye-specific. The red and blue lines represent the likelihood that the given difference in signal would be obtained by chance, with probability 10e-3 and 10e-5 respectively. The lines flare out because at lower signal we expect more noise relative to signal, so a larger random probability fo seeing substantially different signals. The P-value (http://en.wikipedia.org/wiki/P-value) is "the probability of obtaining a result at least as 'impressive' as that obtained, assuming the truth of the null hypothesis that the finding was the result of chance alone. The fact that p-values are based on this assumption is crucial to their correct interpretation." Lower down on the page they give many ways in which the P-value is incorrectly interpreted.
  3. Does it strike you as odd that the differences in expression levels in Figure 2 are so much larger than the differences in occupancy levels? Similarly, GAL4 is observed to bind at only a few of its many recognition sites in the genome. What is the likely explanation for both observations?
    Ans: Figure 2 hints at many interesting questions. The expression ratios are much larger than the binding ratios, and the color scheme conceals this fact. There must be other proteins involved in expression: if it were simply one protein binding we woudl expect activation to be proportional to the binding ratio. It is likely that expression is also affected by other transcription factors, or that multiple binding of GAL4 is leading to cooperative recruitment of chromatin remodeling or basal machinery. Or perhaps downstream regulation of mRNA translation or lifetime is also operative. The fact that GAL4 does not bind to many of its genomic consensus sites is consitent with this: it depends on other proteins for cooperative or anticooperative binding.
  4. The MTH1, PCL10, and FUR4 genes could have been identified as being up-regulated by GAL4 simply on the basis of mRNA expression data. Why does the ChIP-chip method still tell us something we wouldn't have known only from expression data?
    Ans: We know that the effect of GAL4 is direct, that it binds the promoters of these genes rather than, for example, inducing synthesis of a second factor that turns these genes on.
  5. Why are all the rightmost lanes in Figure 2B blank?
    Ans: These are controls showing that the Myc antibody does not IP anything unless Myc-tagged Gal4 is present in the cell.
  6. What other types of DNA binding proteins (besides transcription factors) would be interesting to examine using this method?
    Ans: YFP, but the most obvious targets are histones and other components of the basal machinery.

Study questions for Naktinis et al., 1996

  1. What is the basis of the “protein footprinting” assay used to measure interactions among core, beta, and gamma?
    Ans: Beta is labeled at teh C-terminus using a fused protein kinase's recognition sequence. Then the resistance to proteolysis is assessed in the presence and absence of the other factors.
  2. What is the evidence for competitive as opposed to simultaneous binding of core and gamma to the beta sliding clamp?
    Ans: Gel filtration experiments show that beta can bind either core or gama, but when all three are mixed the beta co-elutes with gamma, not core, and no larger ternary complex is observed.
  3. Figure 7 shows that the gamma complex can remove sliding clamps from DNA, but the evidence is that the +gamma curve changes very little with time. Why is much of the clamp still bound at the end of the reaction? What might the gel filtration profile have looked like in the presence of a large excess of linear DNA?
    Ans: The clamp is at equlibrium: presumably the clamp loader is actively loading and unloading the beta clamps during the reaction. It would have been nice to see the reaction start from unloaded beta as well. If it is truly at equilbrium the product should be the same.
  4. What is the advantage to the cell in having primer-template DNA but not nicked DNA stabilize the beta-core interaction? How does this play out during the synthesis of an Okazaki fragment?
    Ans: The primer-template mimics the situation of a lagging strand polymerase during synthesis of an Okazaki fragment, and therefore strong beta-core interaction increases processivity. The nicked site models the end of an Okazaki fragment, so the handoff potentiates rapid recycling of core.

Study questions for Cosma et al., 1999
This is a dense paper that concerns a classic system in yeast cell and molecular biology. You do not need to understand every detail.

  1. In Figure 1, what is the evidence for specific binding of Swi4p to the URS2 region? What do the different "delta" lanes in 1B mean? Why do the authors do a dilution series of the WCE (= whole cell extract) as a control?
    Ans: The evidence for binding of Swi4p to URS1 is that only one band lights up when they amplify the IP'd DNA. The delta lanes are deletion mutants indicating that deletions of genes known to be important for transcription of the HO gene also knowck out the signal in ChIP. The WCE is a control showing that the amplification has not saturated and that Swi4p can be detected in unsynchronized cells.
  2. You do not need to understand all the tricks used to manipulate and monitor the yeast cell cycle, but one of the main points of the paper is the use of cell-cycle regulated genes as a good model system for following the order of events at a promoter. Why is this important? Why couldn't the authors just do their experiments with an unsynchronized population of cells? How else might one follow the order of events at a regulated promoter? In figures 2 and 3, FACS (= fluorescence-activated cell sorting) is a way to measure the DNA content of individual cells in a population. It is used to help show that Ash1p prevents binding of Swi4p to the URS2 during the first cell cycle after release from a block and that HO mRNA and Swi4p binding are cell-cycle dependent.
    Ans: The use of cell-cycle regulated genes allows transcription to be synchronized, because yeast can be synchronized. This means that all of the cells are at the same point in their temporal regulation, whereas for an unsynchronized culture or a gene that was not cell cycle regulated every cell would be at a different stage and all we would see would be an average. Another way to approach this problem would be to add an inducing ligand at a set time and folow events at the promoter as a function of time thereafter.
  3. What is the evidence that at this promoter the action of SWI/SNF is necessary to recruit SAGA (which includes HAT activity)?
    Ans: Figure 4 shows that Swi2 (SWI/SNF) binding precedes Ada (SAGA) binding. Figure 5 shows that SWI/SNF binding still occurs in the SAGA deletion strains. Figure 6 shows that SAGA is not recruited in the absence of SWI/SNF activity. Taken together, this suggests that SWI/SNF recruits SAGA.
  4. The authors note that Swi5p is present on DNA only transiently but leads to long-lasting effects. They showed this using ChIP assays on carefully synchronized cells. On page 306 they discuss the evidence that this observation is real and not due to "epitope masking." What would epitope masking do to the ChIP assay? If it were actually occurring, what would be the resulting hypothesis as to the role of Swi5p?
    Ans: The idea of epitope masking is that if Swi5p were actually present at the promoter but some other protein landed on top of it and blocked the epitope, then Swi5p would become invisible to ChIP. The conclusion that Swi5p has a "hit and run" effect would be wrong: the hypothesis would be that the continued of Swi5p is needed for transcription.

Biochemistry 674 course home page.

Jason Kahn's home page.