Skip to content
Snippets Groups Projects
Commit e897205f authored by Laura Cook's avatar Laura Cook
Browse files

minor formatting changes

parent 4fec3164
Branches
No related tags found
No related merge requests found
......@@ -162,10 +162,8 @@ chipseq/
├── genomes/
├── results/
│ ├── bowtie2/
│ └── fastQC/
│ └── deepTools/
│ └── qc/
│ └── macs2/
│ └── phantomPeaks/
├── envs/
└── configs/
```
......@@ -253,9 +251,6 @@ __Parameters__
- `-2`: pair 2
__Effective Genome Length__
We can approximate effective genome size for various read lengths using the khmer program and `unique-kmers.py`. This will estimate the number of unique kmers (for a specified length kmer) which can be used to infer the total uniquely mappable genome. (I.e it doesn't include highly repetitive regions). https://khmer.readthedocs.io/en/v2.1.1/user/scripts.html
......@@ -277,48 +272,24 @@ Run `unique-kmers.py` on dunnart genome for read length of 150bp:
Estimated number of unique 150-mers in /Users/lauracook/../../Volumes/macOS/genomes/Scras_dunnart_assem1.0_pb-ont-illsr_flyeassem_red-rd-scfitr2_pil2xwgs2_60chr.fasta: 2740338543
Total estimated number of unique 150-mers: 2740338543
```
<details><summary>_Total estimated number of unique 150-mers: 3074798085_</summary>
<p>
| 150-mers | | | |
|------------------------------------------------------|---------------------|-------------------|-----------------------|
| 3074798085 | | | |
| | | | |
| number of unique k-mers: | 3074798085 | | |
| false positive rate: | 0.010 | | |
| | | | |
| If you have expected false positive rate to achieve: | | | |
| expected_fp | number_hashtable(Z) | size_hashtable(H) | expected_memory_usage |
| 0.100 | 3 | 4.928212e+09 | 1.478464e+10 |
| 0.200 | 2 | 5.187050e+09 | 1.037410e+10 |
| 0.300 | 1 | 8.620729e+09 | 8.620729e+09 |
| 0.400 | 1 | 6.019271e+09 | 6.019271e+09 |
| 0.500 | 1 | 4.435996e+09 | 4.435996e+09 |
| 0.600 | 1 | 3.355701e+09 | 3.355701e+09 |
| 0.700 | 1 | 2.553877e+09 | 2.553877e+09 |
| 0.800 | 1 | 1.910479e+09 | 1.910479e+09 |
| 0.900 | 1 | 1.335368e+09 | 1.335368e+09 |
| | | | |
| If you have expected memory to use: | | | |
| expected_memory_usage | number_hashtable(Z) | size_hashtable(H) | expected_fp |
| 1.000000e+09 | 1 | 1.000000e+09 | 0.954 |
| 5.000000e+09 | 1 | 5.000000e+09 | 0.459 |
| 1.000000e+10 | 2 | 5.000000e+09 | 0.211 |
| 2.000000e+10 | 4 | 5.000000e+09 | 0.045 |
| 5.000000e+10 | 11 | 4.545455e+09 | 0.000 |
| 1.000000e+11 | 22 | 4.545455e+09 | 0.000 |
| 2.000000e+11 | 45 | 4.444444e+09 | 0.000 |
| 3.000000e+11 | 67 | 4.477612e+09 | 0.000 |
| 4.000000e+11 | 90 | 4.444444e+09 | 0.000 |
| 5.000000e+11 | 112 | 4.464286e+09 | 0.000 |
| 1.000000e+12 | 225 | 4.444444e+09 | 0.000 |
| 2.000000e+12 | 450 | 4.444444e+09 | 0.000 |
| 5.000000e+12 | 1127 | 4.436557e+09 | 0.000 |
__Indexing genome file__
</p>
</details>
Build Index
__Load modules:__
```{bash eval=FALSE}
module load gcc/8.3.0
module load bowtie2/2.3.5.1
```
__Build index__
```{bash eval=FALSE}
bowtie2-build /data/projects/punim0586/lecook/chip/reference_data/bowtie2/dunnart_pseudochr_vs_mSarHar1.11_v1.fasta
```
# 3. FILTERING
......@@ -359,6 +330,8 @@ ChIP-seq Standards:
### rule deeptools_coverage:
Normalised to the reads per genomic content (normalized to 1x coverage)
Produces a coverage file
### rule deeptools_fingerprint:
......@@ -382,6 +355,8 @@ Cross-correlation analysis is done on a filtered (but not-deduped) and subsample
### rule phantomPeakQuals:
# 7. Call peaks (MACS2)
......@@ -428,7 +403,13 @@ therefore if amount that overlaps between each replicate divided by the length o
### rule overlap_peaks_H3K27ac:
ENCODE files:
| File format | Information contained in file | File description | Notes |
|-|-|-|-|
| bigWig | fold change over control, signal p-value | Two versions of nucleotide resolution signal coverage tracks. | The signal is expressed in two ways: as fold-over control at each position, and as a p-value to reject the null hypothesis that the signal at that location is present in the control. |
| bed and bigBed (narrowPeak) | peaks | Relaxed peak calls for each replicate individually and for both replicates' reads pooled together. | These peaks are thresholded to sample enough noise in the experiment for efficient statistical comparison of replicates in subsequent steps; as such, many false positives are expected to be present. They are not meant to be interpreted as definitive binding events, but are rather intended to be used as input for subsequent statistical comparison of replicates. |
| bed and bigBed (narrowPeak) | replicated peaks | The set of peak calls from the pooled replicates. | These peaks are either observed in both replicates, or are observed in two pseudoreplicates. Pseudoreplicates are peak sets called on half of the pooled reads, chosen at random without replacement. |
# Plot DAG
......@@ -436,3 +417,8 @@ therefore if amount that overlaps between each replicate divided by the length o
```
snakemake --dag | dot -Tsvg > dag.svg
```
# Annotate peaks
Create Tbxdb for use with Bioconducter packages
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment