ChIP-Seq is equivalent to ChIP-chip down to the last step. In ChIP-Seq, immunoprecipiated DNA fragments are prepared for sequencing and funneled into a massively parallel sequencer that produces short reads. Even though the sonication step is the same as in ChIP-chip, ChIP-Seq will generate multiple short-reads within any given 500 bp region, thereby pinning down the location of TFBS to within 50-100 bp. A similar result can be obtained with ChIP-chip using high-density tiling-arrays. The downside of ChIP-Seq is that sensitivity is proportional to cost, as sensitivity increases with the number of (expensive) parallel sequencing runs. To control for biases, ChIP-seq experiments often use the "input" as a control. This is DNA sequence resulting from the same pipeline as the ChIP-seq experiment, but omitting the immunoprecipitation step. It therefore should have the same accessibility and sequencing biases as the experiment data.
The DNAse foot-printing method starts by focusing on a given region of interest (e.g. a promoter region) and amplifying it by PCR to obtain lots of sample. It then throws in the TF and then the DNAse. The mix is left to stir for a short time and then gel electrophoresis is run to compare the pattern of fragments in a control (no TF) and in the sample. If the TF has bound the sample, it will have protected a stretch of DNA (encompassing some fragments of the control) and thus those fragments will not appear in the sample gel. The fragments can then be cut-out from the gel, purified and sequenced to obtain the sequence of the protected region. This is often used to identify the binding motif of a TF for the first time. The foot-printing will typically resolve the protected region down to 50-100 bp, and the sequence can be then examined for possible TF-binding sites either by eye of using a computer search.
In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs.
Regulated genes for each binding site are displayed below. Gene regulation diagrams
show binding sites,
both positively and negatively regulated
genes, genes with unspecified type of regulation.
For each indvidual site, experimental techniques used to determine the site are also given.