Skip to content
Snippets Groups Projects
Commit 744df195 authored by lecook's avatar lecook
Browse files

clean up

parent 37fcbe2a
No related branches found
No related tags found
No related merge requests found
......@@ -444,62 +444,6 @@ write.table(rep_summits, "H3K4me3_overlap_summits.bed", quote=FALSE, col.names=F
bedtools slop -i H3K4me3_overlap_summits.bed -b 50 -g smiCra1.chrom.sizes > H3K4me3_overlap_100bpsummits.bed
bedtools slop -i H3K27ac_overlap_summits.bed -b 50 -g smiCra1.chrom.sizes > H3K27ac_overlap_100bpsummits.bed
```
# Whole genome alignment with new chromosome level assembly
## Preparation
Spartan modules
```
module load foss
module load lastz
module load ucsc/21072020
module load perl
conda activate wga
```
### Repeat mask dunnart genome
Run RepeatModeler to de novo find repeat regions in the dunnart genome:
```
BuildDatabase -name dunnart -engine ncbi Sminthopsis_crassicaudata_HiC.fasta
nohup RepeatModeler -database dunnart -pa 20 >& repeatmodeler.out
```
Run RepeatMasker to mask repeats in dunnart genome (makes repeats lowercase). Run as an array for scaffolds to make it quicker.
Create commands for array slurm script: `repeatMaskerHiC.sh`
Using faSplit from the UCSC Kent Tools to split into scaffolds
```
faSplit byName Sminthopsis_crassicaudata_HiC.fasta faSplit/
```
```
RepeatMasker -q -xsmall smiCra1_HiC/*.fa -default_search_engine hmmer -trf_prgm /home/lecook/.conda/envs/wga/bin/trf -hmmer_dir /home/lecook/.conda/envs/wga/bin/
```
### Create .2bit and .sizes files
```
faToTwoBit Sminthopsis_crassicaudata_HiC.fasta smiCra1_HiC.2bit
```
```
twoBitInfo smiCra1.2bit stdout | sort -k2rn > smiCra1.chrom.sizes
```
## Genome alignment with LASTZ
Create commands for running lastZ for all scaffolds: `lastz.sh`
Repeated with vertebrate alignment parameters and HoxD55 scoring matrix. Retrieve A LOT more aligned sequences. For example scaffold00002 with mammal parameters retrieved 863M of data, while with the new parameters it's 2.5GB.
Run as an array on slurm: `array_wrapper.slurm`
### References
[1] Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res., 45(14), 8369–8377, 2017
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment