Skip to content
Snippets Groups Projects
Commit ecf5c25b authored by Laura Cook's avatar Laura Cook
Browse files

updated

parent 7244b64b
No related branches found
No related tags found
No related merge requests found
......@@ -171,59 +171,85 @@ chip/
### 1. FastQC on raw reads
```
fastqc
```
### 2. Alignment
```
bowtie2-build
```
### 3. Filtering
```
SAMtools sort
SAMtools view
### 4. Alignment QC & Library Complexity
picard MarkDuplicates
```
### 5. deepTools
### 4. Alignment QC & Library Complexity
### 7. phantomPeakQuals
```
SAMtool
### 8. Call narrow peaks (MACS2)
__Effective Genome Length__
picard
We can approximate effective genome size for various read lengths using the khmer program and `unique-kmers.py`. This will estimate the number of unique kmers (for a specified length kmer) which can be used to infer the total uniquely mappable genome. (I.e it doesn't include highly repetitive regions). https://khmer.readthedocs.io/en/v2.1.1/user/scripts.html
preseq
This was a suggestion of deepTools: https://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html
Install khmer program:
```{bash eval=FALSE}
pip3 install khmer
```
### 5. deepTools
Run `unique-kmers.py` on dunnart genome for read length of 150bp:
```
deepTools
```{bash eval=FALSE}
/usr/local/bin/unique-kmers.py -k 150 dunnart_pseudochr_vs_mSarHar1.11_v1.fa
```
### 7. phantomPeakQuals
<details><summary>__Total estimated number of unique 150-mers: 3074798085__</summary>
<p>
```
spp
```
| 150-mers | | | |
|------------------------------------------------------|---------------------|-------------------|-----------------------|
| 3074798085 | | | |
| | | | |
| number of unique k-mers: | 3074798085 | | |
| false positive rate: | 0.010 | | |
| | | | |
| If you have expected false positive rate to achieve: | | | |
| expected_fp | number_hashtable(Z) | size_hashtable(H) | expected_memory_usage |
| 0.100 | 3 | 4.928212e+09 | 1.478464e+10 |
| 0.200 | 2 | 5.187050e+09 | 1.037410e+10 |
| 0.300 | 1 | 8.620729e+09 | 8.620729e+09 |
| 0.400 | 1 | 6.019271e+09 | 6.019271e+09 |
| 0.500 | 1 | 4.435996e+09 | 4.435996e+09 |
| 0.600 | 1 | 3.355701e+09 | 3.355701e+09 |
| 0.700 | 1 | 2.553877e+09 | 2.553877e+09 |
| 0.800 | 1 | 1.910479e+09 | 1.910479e+09 |
| 0.900 | 1 | 1.335368e+09 | 1.335368e+09 |
| | | | |
| If you have expected memory to use: | | | |
| expected_memory_usage | number_hashtable(Z) | size_hashtable(H) | expected_fp |
| 1.000000e+09 | 1 | 1.000000e+09 | 0.954 |
| 5.000000e+09 | 1 | 5.000000e+09 | 0.459 |
| 1.000000e+10 | 2 | 5.000000e+09 | 0.211 |
| 2.000000e+10 | 4 | 5.000000e+09 | 0.045 |
| 5.000000e+10 | 11 | 4.545455e+09 | 0.000 |
| 1.000000e+11 | 22 | 4.545455e+09 | 0.000 |
| 2.000000e+11 | 45 | 4.444444e+09 | 0.000 |
| 3.000000e+11 | 67 | 4.477612e+09 | 0.000 |
| 4.000000e+11 | 90 | 4.444444e+09 | 0.000 |
| 5.000000e+11 | 112 | 4.464286e+09 | 0.000 |
| 1.000000e+12 | 225 | 4.444444e+09 | 0.000 |
| 2.000000e+12 | 450 | 4.444444e+09 | 0.000 |
| 5.000000e+12 | 1127 | 4.436557e+09 | 0.000 |
### 8. Call narrow peaks (MACS2)
```
macs2 callpeaks
```
</p>
</details>
I will use this number as my estimate for effective genome size.
### 9. Create consensus peaksets
### 10. Annotate peaks relative to gene features (HOMER)
### 11. Present QC for raw read, alignment, peak-calling in MultiQC
### 11. Present QC for raw read, alignment, peak-calling in MultiQC
### 12. Plot DAG
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment