Curation Information

Publication: ChIP analysis unravels an exceptionally wide distribution of DNA binding sites for the NtcA transcription factor in a heterocyst-forming cyanobacterium.;Picossi S, Flores E, Herrero A;BMC genomics 2014 Jan 13; 15(.):22 [24417914]
TF: NtcA [P0A4U6, view regulon]
Reported TF sp.: Nostoc sp. PCC 7120
Reported site sp.: Nostoc sp. PCC 7120
Created by: Erill Lab
Curation notes: -
External databases: NCBI GEO (GSE51865) http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE51865.

Experimental Process

ChIP seq with NtcA antibody was performed on Anabaena sp. PCC 7120 cells in a N-depleted medium. CysFinder was applied for motif discovery in the whole dataset.

ChIP assay conditions: Wild-type Anabaena sp. PCC 7120 cells growing in bubbled cultures with ammonium as the N source were subjected to incubation in a combined N-depleted medium for 3 hours, after which the cultures were treated with formaldehyde to fix the proteins bound to DNA. After cell lysis and DNA fragmentation, the extracts were treated with an anti-NtcA antibody to specifically immunoprecipitate the NtcA-bound DNA. The immunoprecipitated material was then incubated at 65ºC to reverse the crosslinking, and the DNA was isolated. A sample of total DNA was also isolated prior to anti-NtcA treatment of the extracts to serve as the control input sample. Cells of Anabaena sp. (also known as Nostoc sp.) strain PCC 7120 growing exponentially (3-5 μg Chl/ml) in the light (75 μE·m-2·s-1) at 30°C in BG110 medium supplemented with 10 mM NaHCO3 (referred to as BG110C) containing 6 mM NH4Cl and 12 mM TES and bubbled with a mixture of air + 1% CO2 were collected, washed with BG110C, resuspended in BG110C, and incubated in the same conditions for 3 h. Formaldehyde was then added to the cultures to a final concentration of 1%, and the cultures were incubated for 15 min (no aeration, occasional shaking). Glycine was added at 125 mM final concentration and the incubation was continued for 5 min to stop the fixing reaction. The cells were then filtered, washed with cold TBS (20 mM Tris–HCl, pH 7.4, 140 mM NaCl) and collected in tubes (25 ml of culture per tube). The pellets were frozen in liquid nitrogen and stored at -20°C until used.
ChIP notes: Cells corresponding to about 150 ml of culture (6 tubes) were used for each ChIP experiment. Pellets corresponding to about 25 ml of culture were resuspended in 500 μl of lysis buffer (50 mM HEPES/KOH, pH 7.5, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% sodium deoxycholate, supplemented with Mini EDTA-free protease inhibitor cocktail [Roche]). Cells were supplemented with 150 μl of glass beads (acid-washed, 425-600 μm [Sigma]) and broken in a multivortexer at 2000 rpm for 1 h at 4°C. The cell lysates were collected by centrifugation and the extracts were subjected to sonication to shear the DNA into about 200-bp fragments (40 cycles of 10s, 30s ice, 10% amplitude, in a Branson Digital Sonifier). After centrifugation to eliminate cell debris, the whole-cell extracts were stored at -20°C or immediately used for immunoprecipitation. Immunoprecipitation of DNA was carried out as described in Hanaoka and Tanaka, with some modifications. Whole-cell extracts were prepared at 4 mg/ml of total protein with lysis buffer (in 500 μl total volume). A 50-μl sample was taken as the input sample, and the extracts were pre-treated with 0.6 mg of lysis-buffer-equilibrated Dynabeads Protein G (Invitrogen) (to avoid non-specific binding of DNA to the Protein G). Anti-NtcA antibody (or H2O for the mock sample) was added and incubated at 4°C with rotation overnight. The extracts were treated with 0.6 mg of Dynabeads Protein G for 2 h at 4°C with rotation. The Dynabeads were washed twice with 1.5 ml of lysis buffer (5 min, rotation), and once with 1.5 ml each buffer 1 (lysis buffer containing 500 mM NaCl), buffer 2 (10 mM Tris–HCl, pH 8, 250 mM LiCl, 0.5% NP-40, 1 mM EDTA), and buffer 3 (10 mM Tris–HCl, pH 7.5, 1 mM EDTA, 50 mM NaCl). The Dynabeads were resuspended in a solution of DNase-free RNase A (0.2 μg/μl in TE), incubated for 30 min at 37°C, and washed with 1.5 ml wash buffer 3. To elute the immunoprecipitated material, the Dynabeads were resuspended in 50 μl of elution buffer (50 mM Tris–HCl, pH 7.5, 10 mM EDTA, 1% SDS) and incubated at 65°C for 30 min. The elution step was repeated once and the two eluates were combined. For crosslinking reversion, the eluted material was incubated at 65°C for 5 h. The input sample was processed in parallel (Tris–HCl, pH 7.5, EDTA and SDS was added to reach the same concentration as in the ChIP sample). To eliminate proteins, Proteinase K was added at 0.4 μg/μl (final concentration) and the mixture was incubated for 1 h at 55°C. DNA was purified by phenol/chloroform/isoamyl alcohol extraction (25:24:1) followed by two extractions with chloroform/isoamyl alcohol (24:1). DNA was ethanol-precipitated using ammonium acetate and glycogen, and the pellet was washed twice with 70% ethanol, air-dried and resuspended in 25 μl purified H2O. For ChIP-Seq DNA samples, this protocol was repeated three times using cells from independent inductions, and the resulting DNA was mixed together and concentrated to 25 μl. Input and ChIP DNA samples were sent for sequencing at the Functional Genomics Core Facility of the Institute for Research in Biomedicine, Barcelona (Spain) (Herbert Auer). Next generation sequencing was carried out using Illumina’s sequencing technology. ChIP DNA Sample Prep Kit (Illumina) was used for library preparation. Libraries were loaded at 8 pM concentration into the flow cell using the Cluster Station running recipe V7 with the Single-Read Cluster Generation Kit v4 (all Illumina). The flow cell was loaded into the Genome Analyzer II and samples were sequenced for 120 nucleotides from a single end using the Sequencing Kit v5 and recipe v8 (all Illumina). Manufacturer’s recommendations were strictly followed. Illumina sequencing data were pre-processed with the standard Illumina pipeline version 1.5 and sequences were aligned to the Anabaena sp. PCC 7120 genome (http://genome.microbedb.jp/cyanobase/Anabaena webcite) with the Bowtie software 0.12.5. The percentage of reads mapped to the genome was 92.3% for the Input sample (HQ reads: 30,192,934, 64.9% of total) and 94.2% for the ChIP sample (HQ reads: 31,352,138, 68.5% of total). The analysis of the results was carried out using the Triform algorithm method (Karl Kornacker). For detected double-strand peak regions, the peak locations were reported as the averages of the forward and reverse peak locations; the z-scores were calculated according to equations (4) - (6) , with C (x) being replaced by the sum of the coverages on the forward and reverse peak locations; and the associated discrete p-values were adjusted for multiple testing by application of the Tarone-modified distribution-free Benjamini-Yekutieli method, similar to a method recommended in Gilbert, 2005 . The Q value measures the statistical significance of the peak identifying the target region, defined as the estimated false discovery rate (FDR) among the rows whose Q value is no larger than a chosen FDR. The NLQ value is defined as the -log10(Q value).

Transcription Factor Binding Sites

GTAGTTAAAGGCAC
GTGGCCAGTGTTAC
GTAGCCTTAACTAC

Quantitative data format: NLQ, negative log Q score (range: 3.71 - 8186.60)

GTAGTTAAAGGCAC 422.45
GTGGCCAGTGTTAC 1018.18
GTAGCCTTAACTAC 706.13

CTGGAGAAAATCATGTTTAACAAGAGCATTGACAAAAACCAAGCTTTAGTAAAAAAAGCTCTCGTCGCTTGTTCTTTGATTGGTATTGTTTTAAGTAGTAACATAGCTCAAGCTAGTACCCCCATTGTAGTTAAAGGCACAAAATGTCACCCTGGTTCTCCTGTAGGATTTTGTTGCCCTAACGATGGTACCTCCAGTGGCCCATATTGTCCGGCTCGTCGCATTGCTCCCAAAACTCAAAATTGAGGAAAAAATTTTTGCAGCTACTTCTG 422.45
CTGCGTCATTCAATACTTGTAACGACATTTTCCTGATCCTCCTTGATGTGTAGTGGAGGAATTGGAGTTATCCAAAAGTAATCTTTTATACAAATTTCAGAAATTTAAAAACATAAAATTCCTAATTTGCTTTTTCTATTTGGGTGGCCAGTGTTACAATGCCTTAATACTCCAAAAGATTTGATAAGATTACATTAGGGATATTATTAGGCGGCAAACCCAATCCCCAA 1018.18
ATAGAGAAGCTGATATTTCAATCTGGAGTCAACAAGAGTGCATTACCAACTCCTCAAAATACTGTATCAACACCTGCACCGATTCAACCAAATAAGGCAAGGCTATCAAGTATGGTTAATCAGTTAAACGGGGTAGCGTCATGAACAAGTTAAATAGGGTCTATGACAGCACACTACTAAGCAGTTCCAAAGTGTATCAAATTGACAGAAACCTGTATCAATACTTGTACAAGACAGGCACAATCCAAGCACCACAATACATCTTTCGACCACTAGCAAGACAGCGAAAGAAAGCAGATTTGCAACTGAATCACAAAGCTTTAACTAATCGTTGTTATG 117.84
TATTTTGTTTGATGCGGATTTAGGAGAAAATTTCTTAACGTAGTAAGCTAAATCTTTATTCATTTGCAGCCATGCGCGAATATCGATGTACACGTAACGCATTATACTTACACGACTGCATAGGTCGAGATGACCTTAGAGAACGCCAAGGTCATTATATTTGGGCAATCAACGAAGAAGCTGCATGGCAAGAAATGGCGCGGCGGTATCCA 65.36
AAGCTCAACCCTCTTTAGAGTTAGCTTACAGAAACTACAGCACATCTAGTTCACTGTCTCTAGCACTCCAAGAAGATGTGGCTGTCAGATTTGCTACTATCTTCGGTGATACTGCTTGTTTTCCTGGTGTACTATCAGCTGCTGTTGTTGATCTCATTGATGTTTTTGCTGAATCTCGTAACTTAGCTCGTTTTTGACGATATGGCTATTACATTTTA 94.69
CGAACTTTGGAGTAGGTAGCCAACCGGAAGTCATCAGGGGGAGGGATTGCCTCAGAAGCGAATGCCTCCTTGTAACGAGAAGGATATGCAGACGGAGGCACGGTGACTTGGATTGTAGAGGTGCAAGTAGCAATGTATACATCTAAAACGAGTTTCAAGACTACGGTGGAATGGAATGCTATCCCCTGGCAAAAGCTAGAAAGGAAAGTATTTAAGTTACAAAAG 62.82
TTACCCCAAGTAAAGCAAAGCTGAAGATACACGCGCAGAAGATAGGTAAATTGATTGATGAGCATAAGGCAGTACCACAATCTGATTTAATCGCTCATTTGAATCCCGTCATCCGTGGATGGACAAACTACTACTCATGTGTCGTCAGTAAACGCATATTTAACCAGGCTGATACAACATTATTTAGTCAGCTAAAAGCATGGGCAGAACATCGTCACCC 24.16
TGGACATTGCCATGATGCTAAAACGGCAGAAGATAAATGTGCTGTTACAGTTCCAGAGCTAGATGAGGATTATCTCAACCTTAATCCCTTTTAACTCTGAACTTAGAGGTGTGCATGATAAGCACCAAATCATTGAGGAGCCGTGTGAAATTAACGTTTCACGCACGGTTTTGTAGCCGAGTCAG 10.01
CCTAACTTGCCATATTCATGACTTGCCAGTAATCTCGTAAGCAAAATGCTTGGTTAAAATTCAATGATTATCGCATTGGCAAATCAGAAAGGAGGGGTAGCAAAAACTACTTCTACTATCTCTCTAGGAGGACTGCTGGCTCTTAAAGATACTGTCCTGGCTGTTGACCTTGACCCTCAAGGTAATCTGACTACAGGGCTGGGGGTGGAAGTGGCTGATGACCAAATTAGCTGCTA 35.7
CAACTAGTCAACAATCTGACAAGTTAAACAGTAAGGAAGCTAACAAGTCAGCAAGTCAGCAAGACATATCACCAACCAATCAACAATCTGACAAGTCAACCAGTCAGCAAGATGTAGCCTTAACTACCCAGGAAGCTGACAAGTCAGCAAGTCAACAAGACATATCACCAACCAATCAACAATCTGACAAGTCAACCAGTCAGCAAGATGTAGCCTTAACTACCCAGGAAGCTGACAAGTCAGCAAGTCAACAAGACATATCACCAACCAGTCAACAATCTGACAAGTCAACCAGTCAGCAAGTTAATATAGAAAAAGTTTCATTGCGT 706.13
TTGGAAGACTCTGATGCTGCCGTTACACCACGATCGCATTATTCAAGCTGGTGACAGGAGTTGAACCCGCAACCGATCGCTTACAAGGCGATTGCTCTGCCGTTGAGCCACACCAGCATTTGGTAGCGG 6.26
GACGGGAGTTGCACCCGCTCCTCCGTGCTTGAGAGGCACAGCGACTTATCTTATTTGTCCACAAGGCTACACGGGGATGATG 4.29

Gene Regulation

Regulated genes for each binding site are displayed below. Gene regulation diagrams show binding sites, positively-regulated genes, negatively-regulated genes, both positively and negatively regulated genes, genes with unspecified type of regulation. For each indvidual site, experimental techniques used to determine the site are also given.

Site sequence	Experimental techniques	TF function	TF type
GTAGTTAAAGGCAC	Experimental technique details ChIP-Seq (ECO:0006009) ChIP-Seq - ECO:0006009 ChIP-Seq is equivalent to ChIP-chip down to the last step. In ChIP-Seq, immunoprecipiated DNA fragments are prepared for sequencing and funneled into a massively parallel sequencer that produces short reads. Even though the sonication step is the same as in ChIP-chip, ChIP-Seq will generate multiple short-reads within any given 500 bp region, thereby pinning down the location of TFBS to within 50-100 bp. A similar result can be obtained with ChIP-chip using high-density tiling-arrays. The downside of ChIP-Seq is that sensitivity is proportional to cost, as sensitivity increases with the number of (expensive) parallel sequencing runs. To control for biases, ChIP-seq experiments often use the "input" as a control. This is DNA sequence resulting from the same pipeline as the ChIP-seq experiment, but omitting the immunoprecipitation step. It therefore should have the same accessibility and sequencing biases as the experiment data. - Experimental technique details Motif-discovery (ECO:0005558) Motif-discovery - ECO:0005558 In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs. - Experimental technique details PSSM site search (ECO:0005659) PSSM site search - ECO:0005659 Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector. -	not specified	monomer
GTGGCCAGTGTTAC	Experimental technique details ChIP-Seq (ECO:0006009) ChIP-Seq - ECO:0006009 ChIP-Seq is equivalent to ChIP-chip down to the last step. In ChIP-Seq, immunoprecipiated DNA fragments are prepared for sequencing and funneled into a massively parallel sequencer that produces short reads. Even though the sonication step is the same as in ChIP-chip, ChIP-Seq will generate multiple short-reads within any given 500 bp region, thereby pinning down the location of TFBS to within 50-100 bp. A similar result can be obtained with ChIP-chip using high-density tiling-arrays. The downside of ChIP-Seq is that sensitivity is proportional to cost, as sensitivity increases with the number of (expensive) parallel sequencing runs. To control for biases, ChIP-seq experiments often use the "input" as a control. This is DNA sequence resulting from the same pipeline as the ChIP-seq experiment, but omitting the immunoprecipitation step. It therefore should have the same accessibility and sequencing biases as the experiment data. - Experimental technique details Motif-discovery (ECO:0005558) Motif-discovery - ECO:0005558 In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs. - Experimental technique details PSSM site search (ECO:0005659) PSSM site search - ECO:0005659 Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector. -	not specified	monomer
GTAGCCTTAACTAC	Experimental technique details ChIP-Seq (ECO:0006009) ChIP-Seq - ECO:0006009 ChIP-Seq is equivalent to ChIP-chip down to the last step. In ChIP-Seq, immunoprecipiated DNA fragments are prepared for sequencing and funneled into a massively parallel sequencer that produces short reads. Even though the sonication step is the same as in ChIP-chip, ChIP-Seq will generate multiple short-reads within any given 500 bp region, thereby pinning down the location of TFBS to within 50-100 bp. A similar result can be obtained with ChIP-chip using high-density tiling-arrays. The downside of ChIP-Seq is that sensitivity is proportional to cost, as sensitivity increases with the number of (expensive) parallel sequencing runs. To control for biases, ChIP-seq experiments often use the "input" as a control. This is DNA sequence resulting from the same pipeline as the ChIP-seq experiment, but omitting the immunoprecipitation step. It therefore should have the same accessibility and sequencing biases as the experiment data. - Experimental technique details Motif-discovery (ECO:0005558) Motif-discovery - ECO:0005558 In motif discovery, we are given a set of sequences that we suspect harbor binding sites for a given transcription factor. A typical scenario is data coming from expression experiments, in which we wish to analyze the promoter region of a bunch of genes that are up- or down-regulated under some condition. The goal of motif discovery is to detect the transcription factor binding motif (i.e. the sequence “pattern” bound by the TF), by assuming that it will be overrepresented in our sample of sequences. There are different strategies to accomplish this, but the standard approach uses expectation maximization (EM) and in particular Gibbs sampling or greedy search. Popular algorithms for motif discovery are MEME, Gibbs Motif Sampler or CONSENSUS. More recently, motif discovery algorithms that make use of phylogenetic foot-printing (the idea that TF-binding site will be conserved in the promoter sequences for the same gene in different species) have become available. These are not usually applied to complement experimental work, but can be used to provide a starting point for it. Popular algorithms include FootPrinter and PhyloGibbs. - Experimental technique details PSSM site search (ECO:0005659) PSSM site search - ECO:0005659 Once the binding motif for a TF is known, this motif (which essentially defines a pattern) can be used to scan sequences in order to search for putative TF-binding site. This is useful, for instance, when trying to identify TF-binding site in ChIP-chip data. Searching for TF-binding site can be done in numerous ways. The most basic method is consensus search, in sequences are scored according to how many mismatches they have with the consensus sequence for the motif. A more elaborate way of searching involves using regular expressions, which allow to search for more loosely defined motifs [e.g. C(C/G)AT]. Common algorithms for this type of search include Pattern Locator and the DNA Pattern Find method of the SMS2 suite, but also some word processors. Finally, the mainstream way of conducting TF-binding site search is through the use of position-specific scoring matrices, which basically count the occurrences of each base at each position of the motif and use the inferred frequencies to score candidate sites. Algorithms in this last category include TFSEARCH, FITOM, CONSITE, TESS and MatInspector. -	not specified	monomer