diff --git a/dunnart/README.md b/dunnart/README.md index 7d105097af1069f0e6d178d741804b59df8c6231..bbfdd6c93ec2dae0bc73d7e37482622c498473c0 100644 --- a/dunnart/README.md +++ b/dunnart/README.md @@ -371,6 +371,14 @@ ChIP-seq Standards: # 6. phantomPeakQuals +Information from: https://docs.google.com/document/d/1lG_Rd7fnYgRpSIqrIfuVlAz2dW1VaSQThzk836Db99c/edit + +This set of programs operate on mapped Illumina single-end read datasets in tagAlign or BAM format. Because my data is paired-end I need to only use the forward read. + +A high-quality ChIP-seq experiment will produce significant clustering of enriched DNA sequence tags/reads at locations bound by the protein of interest; the expectation is that we can observe a bimodal enrichment of reads (sequence tags) on both the forward and the reverse strands. + +Cross-correlation analysis is done on a filtered (but not-deduped) and subsampled BAM. There is a special fastq trimming for cross-correlation analysis. Read1 fastq is trimmed to 50bp first using trimfastq.py (last modified 2017/11/08, https://github.com/ENCODE-DCC/chip-seq-pipeline2/blob/master/src/trimfastq.py). And then it is separately mapped as SE. Reads are filtered but duplicates are not removed. Then 15 million reads are randomly sampled and used for cross-correlation analysis. + ### rule phantomPeakQuals: