diff --git a/whole-genome-alignment/README.md b/whole-genome-alignment/README.md index 459cccd43e68fc4bf37f5a02eca84bdc7a80e2d2..b10d627be8d6c9cd9a43f5eba77f16292fd59e0f 100755 --- a/whole-genome-alignment/README.md +++ b/whole-genome-alignment/README.md @@ -444,62 +444,6 @@ write.table(rep_summits, "H3K4me3_overlap_summits.bed", quote=FALSE, col.names=F bedtools slop -i H3K4me3_overlap_summits.bed -b 50 -g smiCra1.chrom.sizes > H3K4me3_overlap_100bpsummits.bed bedtools slop -i H3K27ac_overlap_summits.bed -b 50 -g smiCra1.chrom.sizes > H3K27ac_overlap_100bpsummits.bed ``` - -# Whole genome alignment with new chromosome level assembly -## Preparation - -Spartan modules - -``` -module load foss -module load lastz -module load ucsc/21072020 -module load perl -conda activate wga - -``` - -### Repeat mask dunnart genome - - -Run RepeatModeler to de novo find repeat regions in the dunnart genome: -``` -BuildDatabase -name dunnart -engine ncbi Sminthopsis_crassicaudata_HiC.fasta - -nohup RepeatModeler -database dunnart -pa 20 >& repeatmodeler.out - -``` - -Run RepeatMasker to mask repeats in dunnart genome (makes repeats lowercase). Run as an array for scaffolds to make it quicker. -Create commands for array slurm script: `repeatMaskerHiC.sh` - -Using faSplit from the UCSC Kent Tools to split into scaffolds - -``` -faSplit byName Sminthopsis_crassicaudata_HiC.fasta faSplit/ -``` - -``` -RepeatMasker -q -xsmall smiCra1_HiC/*.fa -default_search_engine hmmer -trf_prgm /home/lecook/.conda/envs/wga/bin/trf -hmmer_dir /home/lecook/.conda/envs/wga/bin/ -``` - -### Create .2bit and .sizes files - -``` -faToTwoBit Sminthopsis_crassicaudata_HiC.fasta smiCra1_HiC.2bit -``` - -``` -twoBitInfo smiCra1.2bit stdout | sort -k2rn > smiCra1.chrom.sizes -``` - -## Genome alignment with LASTZ -Create commands for running lastZ for all scaffolds: `lastz.sh` - -Repeated with vertebrate alignment parameters and HoxD55 scoring matrix. Retrieve A LOT more aligned sequences. For example scaffold00002 with mammal parameters retrieved 863M of data, while with the new parameters it's 2.5GB. - -Run as an array on slurm: `array_wrapper.slurm` - ### References [1] Sharma V, Hiller M. Increased alignment sensitivity improves the usage of genome alignments for comparative gene annotation. Nucleic Acids Res., 45(14), 8369–8377, 2017