Commit 2e6fbc31 authored by root's avatar root

Update for 6/1/2021

parent 3ce0c5c1
ABAQUS/Door.odb
AdvLinux/NAMD
Beast/Dengue4.env.trees
BLAST/dbs
BLAST/rat-ests
Cufflinks/sample.bam
Delft3D/*
!Delft3D/Delft3D-test.slurm
!Delft3D/run_all_examples.sh
......@@ -14,12 +12,10 @@ FSL/intro
FSL/fmri
Gaussian/g16
Gaussian/tests
Genomics
HPCshells/NAMD
NAMD/apoa1
NAMD/NAMD_BENCHMARKS_SPARTAN
NAMD/stmv
Python/minitwitter.csv
Trimmomatic/.backup
*.fastq
*.fastq.gz
......@@ -29,6 +25,7 @@ Trimmomatic/.backup
*.tar.gz
*.sam
*.sam.gz
*.bam
*.simg
*.gz
*.fa
......@@ -41,3 +38,5 @@ Trimmomatic/.backup
*.JPG
*.PNG
*.JPEG
*.odb
*.csv
This diff is collapsed.
#!/bin/bash
#SBATCH --time=2:00:00
#SBATCH --ntasks=1
mkdir -p data/ref_genome
curl -L -o data/ref_genome/ecoli_rel606.fasta.gz ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/017/985/GCA_000017985.1_ASM1798v1/GCA_000017985.1_ASM1798v1_genomic.fna.gz
sleep 30
gunzip data/ref_genome/ecoli_rel606.fasta.gz
curl -L -o sub.tar.gz https://downloader.figshare.com/files/14418248
sleep 60
tar xvf sub.tar.gz
mv sub/ data/trimmed_fastq_small
mkdir -p results/sam results/bam results/bcf results/vcf
module load BWA/0.7.17-intel-2017.u2
bwa index data/ref_genome/ecoli_rel606.fasta
bwa mem data/ref_genome/ecoli_rel606.fasta data/trimmed_fastq_small/SRR2584866_1.trim.sub.fastq data/trimmed_fastq_small/SRR2584866_2.trim.sub.fastq > results/sam/SRR2584866.aligned.sam
samtools view -S -b results/sam/SRR2584866.aligned.sam > results/bam/SRR2584866.aligned.bam
samtools sort -o results/bam/SRR2584866.aligned.sorted.bam results/bam/SRR2584866.aligned.bam
#!/bin/bash
who -u >> who.txt
#!/bin/bash
#SBATCH --job-name="fastqc_sample"
#SBATCH --ntasks=1
# Set minimum acceptable walltime=hours:minutes:seconds
#SBATCH --time=0:15:00
#SBATCH --output=fastqc-%j.out
#SBATCH --error=fastqc-%j.err
# Load the environment variables for R
module load fastqc/0.11.5
# The command to actually run the job
fastqc *.fastq*
# When creating filenames any character can be used except to the `/` because that is a directory delimiter.
# However, just because (almost) any character can be used, doesn't mean any character should be used. Spaces in filenames, for example, are a bad practise.
touch "This is a long file name"
for item in $(ls *); do echo ${item}; done
# Filenames with wildcards are not a particularly good idea either.
touch * # What are you thinking?!
rm * # Really?! You want to remove all files in your directory?
rm '*' # Safer, but shouldn't have been created in the first place.
# Best to keep to plain, old fashioned, alphanumerics. CamelCase is helpful.
DDWEIPDGQI TVGQRIGSGS FGTVYKGKWH GDVAVKMLNV TAPTPQQLQA
FKNEVGVLRK TRHVNILLFM GYSTKPQLAI VTQWCEGSSL YHHLHIIETK
FEMIKLIDIA RQTAQGMDYL HAKSIIHRDL KSNNIFLHED LTVKIGDFGL
ATVKSRWSGS HQFEQLSGSI LWMAPEVIRM QDKNPYSFQS DVYAFGIVLY
ELMTGQLPYS NINNRDQIIF MVGRGYLSPD LSKVRSNCPK AMKRLMAECL
KKKRDERPLF PQILASIELL ARSLPK
For those who are more familiar with PBS there is a handy script which can do a lot of the conversion from PBS to Slurm.
https://github.com/bjpop/pbs2slurm
The following is a summary of common PBS commands and their Slurm equivalent, in part taken from `https://genomedk.fogbugz.com/?W6`
User commands PBS/Torque Slurm
Job submission qsub <job script> sbatch <job script>
Job deletion qdel <job_id> scancel <job_id>
Job deletion qdel ALL scancel -u <user>
List jobs qstat [-u user] squeue [-u user] [-l for long format]
Job status qstat -f <job_id> jobinfo <job_id>
Job hold qhold <job_id> scontrol hold <job_id>
Job release qrls <job_id> ​ scontrol release <job_id>
Environment PBS/Torque Slurm
Job ID $PBS_JOBID $SLURM_JOBID
Node list $PBS_NODEFILE $SLURM_JOB_NODELIST (new format)
Submit dir $PBS_O_WORKDIR $SLURM_SUBMIT_DIR
Job Specification PBS/Torque Slurm
Script directive #PBS #SBATCH
Queue -q <queue> -p <partition>
Node count -l nodes=<number> -N <min[-max]>
Cores(cpu) per node -l ppn=<number> -c <number>
Memory size -l mem=16384 --mem=16g OR --mem-per-cpu=2g
Wall time -l walltime=<hh:mm:ss> -t <days-hh:mm:ss>
Standard output file -o <file_name> -o <file_name>
Standard error file -e <file_name> -e <file_name>
Output directory -o <directory> -o "directory/slurm-%j.out"
Event notification -m abe --mail-type=[BEGIN, END, FAIL, REQUEUE, or ALL]
Email address -M <address> --mail-user=<address>
Job name -N <name> --job-name=<name>
Job dependency -W depend= --depend=C:<jobid>
Account to charge -W group_list=<account> --account=<account>
BioSample_s InsertSize_l LibraryLayout_s Library_Name_s LoadDate_s MBases_l MBytes_l ReleaseDate_s Run_s SRA_Sample_s Sample_Name_s Assay_Type_s AssemblyName_s BioProject_s Center_Name_s Consent_s Organism_s Platform_s SRA_Study_s g1k_analysis_group_s g1k_pop_code_s source_s strain_s
SAMN00205533 0 SINGLE CZB152 29-May-14 149 100 25-Mar-11 SRR097977 SRS167141 CZB152 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205558 0 SINGLE CZB154 29-May-14 627 444 25-Mar-11 SRR098026 SRS167166 CZB154 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205559 0 SINGLE CZB199 29-May-14 157 118 25-Mar-11 SRR098027 SRS167167 CZB199 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205560 0 SINGLE REL1166A 29-May-14 627 440 25-Mar-11 SRR098028 SRS167168 REL1166A WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205561 0 SINGLE REL10979 29-May-14 140 94 25-Mar-11 SRR098029 SRS167169 REL10979 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205562 0 SINGLE REL10988 29-May-14 145 110 25-Mar-11 SRR098030 SRS167170 REL10988 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205563 0 SINGLE ZDB16 29-May-14 606 424 25-Mar-11 SRR098031 SRS167171 ZDB16 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205564 0 SINGLE <not provided> 29-May-14 311 257 25-Mar-11 SRR098032 SRS167172 ZDB30 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205564 2834 PAIRED ZDB30 29-May-14 1695 679 25-Mar-11 SRR098033 SRS167172 ZDB30 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205565 0 SINGLE ZDB83 29-May-14 260 162 25-Mar-11 SRR098034 SRS167173 ZDB83 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205566 0 SINGLE ZDB87 29-May-14 255 161 25-Mar-11 SRR098035 SRS167174 ZDB87 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205567 0 SINGLE ZDB96 29-May-14 126 90 25-Mar-11 SRR098036 SRS167175 ZDB96 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205568 0 SINGLE ZDB99 29-May-14 98 68 25-Mar-11 SRR098037 SRS167176 ZDB99 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205569 0 SINGLE ZDB107 29-May-14 241 142 25-Mar-11 SRR098038 SRS167177 ZDB107 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205570 0 SINGLE ZDB111 29-May-14 281 193 25-Mar-11 SRR098039 SRS167178 ZDB111 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205571 0 SINGLE ZDB143 29-May-14 653 466 25-Mar-11 SRR098040 SRS167179 ZDB143 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205572 0 SINGLE ZDB158 29-May-14 546 388 25-Mar-11 SRR098041 SRS167180 ZDB158 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205573 0 SINGLE ZDB172-SE 29-May-14 59 48 25-Mar-11 SRR098042 SRS167181 ZDB172 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205573 2729 PAIRED ZDB172-PE 29-May-14 1620 635 25-Mar-11 SRR098043 SRS167181 ZDB172 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205574 0 SINGLE ZDB199 29-May-14 646 454 25-Mar-11 SRR098044 SRS167182 ZDB199 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205586 0 SINGLE ZDB200 29-May-14 551 390 25-Mar-11 SRR098279 SRS167194 ZDB200 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205587 0 SINGLE ZDB357 29-May-14 571 407 25-Mar-11 SRR098280 SRS167195 ZDB357 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205588 0 SINGLE ZDB409 29-May-14 733 518 25-Mar-11 SRR098281 SRS167196 ZDB409 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205589 0 SINGLE ZDB429 29-May-14 443 309 25-Mar-11 SRR098282 SRS167197 ZDB429 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205590 0 SINGLE ZDB446 29-May-14 719 513 25-Mar-11 SRR098283 SRS167198 ZDB446 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205591 0 SINGLE ZDB458 29-May-14 633 447 25-Mar-11 SRR098284 SRS167199 ZDB458 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205592 0 SINGLE ZDB464 29-May-14 140 97 25-Mar-11 SRR098285 SRS167200 ZDB464 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205593 0 SINGLE ZDB467 29-May-14 714 322 25-Mar-11 SRR098286 SRS167201 ZDB467 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205594 0 SINGLE ZDB477 29-May-14 691 487 25-Mar-11 SRR098287 SRS167202 ZDB477 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205595 0 SINGLE ZDB483 29-May-14 829 593 25-Mar-11 SRR098288 SRS167203 ZDB483 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN00205596 0 SINGLE ZDB564 29-May-14 265 171 25-Mar-11 SRR098289 SRS167204 ZDB564 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN01095545 0 SINGLE ZDB285 25-Jul-12 150 106 26-Jul-12 SRR527252 SRS351858 ZDB285 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN01095546 0 SINGLE ZDB290 25-Jul-12 151 112 26-Jul-12 SRR527253 SRS351860 ZDB290 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN01095547 0 SINGLE ZDB165 25-Jul-12 155 106 26-Jul-12 SRR527254 SRS351861 ZDB165 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN01095548 0 SINGLE ZDB283 25-Jul-12 153 113 26-Jul-12 SRR527255 SRS351862 ZDB283 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN01095549 0 SINGLE ZDB294 25-Jul-12 158 112 26-Jul-12 SRR527256 SRS351863 ZDB294 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
SAMN01095550 0 SINGLE ZDB281 25-Jul-12 157 115 26-Jul-12 SRR527257 SRS351864 ZDB281 WGS <not provided> PRJNA188723 MSU public Escherichia coli B str. REL606 ILLUMINA SRP004752 <not provided> <not provided> <not provided> REL606
\ No newline at end of file
The following as some sinfo examples that you might find useful on Spartan.
`sinfo -s`
Provides summary information the system's partitions, from the partition name, whether the partition is available, walltime limits, node information (allocated, idle, out, total), and the nodelist.
`sinfo -p $partition`
Provides information about the particular partition specified. Breaks sinfo up for that partition into node states (drain, drng, mix, alloc, idle) and the nodes in that state. `Drain` means that the node is marked for maintenance, and whilst existing jobs will run it will not accept new jobs.
`sinfo -a`
Similar to `sinfo -p` but for all partitions.
`sinfo -n $nodes -p $partition`
Print information only for specified nodes in specified partition; can use comma-separated values or range expression e.g., `sinfo -n spartan-bm[001-010]`.
#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --nodelist=spartan-bm005
# Alternative to exclude specific nodes.
# SBATCH --exclude=spartan-bm005
echo $(hostname ) $SLURM_JOB_NAME running $SLURM_JOBID >> hostname.txt
The Slurm command `squeue` provides information about jobs in the squeue. The following are some basic and useful commands.
squeue -a
This displays information in all jobs and job steps in all partitions.
squeue -A $account
This displays information for jobs according to account (i.e., group). Accepts a comma separated list.
squeue -j $jobids
Displays information for jobs according to job id. Accepts a comma separated list.
squeue -p $partition
Displays information for jobs according to partition. e.g., squeue -p gpgpu
squeue -u $users
Displays information for jobs according to users. Accepts a comma separated list.
#!/bin/bash
#SBATCH --job-name="trimm_sample"
# A multithreaded application
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=0:15:00
module load Trimmomatic/0.36-Java-1.8.0_152
java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.36.jar PE -threads 4 SRR2589044_1.fastq.gz SRR2589044_2.fastq.gz \
SRR2589044_1.trim.fastq.gz SRR2589044_1un.trim.fastq.gz \
SRR2589044_2.trim.fastq.gz SRR2589044_2un.trim.fastq.gz \
SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:NexteraPE-PE.fa:2:40:15
#!/bin/bash
#SBATCH --job-name="trimmloop_sample"
# A multithreaded application
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --time=0:15:00
module load Trimmomatic/0.36-Java-1.8.0_152
for infile in ./*_1.fastq.gz
do
base=$(basename ${infile} _1.fastq.gz)
java -jar $EBROOTTRIMMOMATIC/trimmomatic-0.36.jar PE -threads 4 ${infile} ${base}_2.fastq.gz \
${base}_1.trim.fastq.gz ${base}_1un.trim.fastq.gz \
${base}_2.trim.fastq.gz ${base}_2un.trim.fastq.gz \
SLIDINGWINDOW:4:20 MINLEN:25 ILLUMINACLIP:NexteraPE-PE.fa:2:40:15
done
#!/bin/bash
#SBATCH --time=1:00:00
#SBATCH --ntasks=1
module load BCFtools/1.6-intel-2017.u2
bcftools mpileup -O b -o results/bcf/SRR2584866_raw.bcf \
-f data/ref_genome/ecoli_rel606.fasta results/bam/SRR2584866.aligned.sorted.bam
bcftools call --ploidy 1 -m -v -o results/bcf/SRR2584866_variants.vcf results/bcf/SRR2584866_raw.bcf
module load VCFtools/0.1.15-intel-2017.u2-Perl-5.24.1
vcfutils.pl varFilter results/bcf/SRR2584866_variants.vcf > results/vcf/SRR2584866_final_variants.vcf
# Script to import, tidy, and combine multiple vcf files into a tidy data frame
# and save to csv
# Naupaka Zimmerman
# nzimmerman@usfca.edu
# February 27, 2019
# load required packages
library("vcfR")
library("plyr")
# set the path to vcf files (output from the previous pipeline)
path_to_vcf_dir <- "~/.solutions/wrangling-solutions/variant_calling_auto/results/vcf/"
# list all files in the vcf directory
vcf_file_names <- list.files(path_to_vcf_dir)
# extract sample IDs from VCF file names assuming names like
# 'SRR2584863_final_variants.vcf' where the characters before the
# first '_' are the sample ID
sample_ids <- gsub(pattern = "_.*",
replacement = "",
x = vcf_file_names)
# read in all vcf files in the directory
vcf_objects <- sapply(paste0(path_to_vcf_dir,
vcf_file_names),
read.vcfR)
# tidy all vcf files, combining data where possible
tidied_vcf_objects <- lapply(vcf_objects,
vcfR2tidy,
single_frame = TRUE,
info_types = TRUE,
format_types = TRUE)
# shorten names of list items to be just sample IDs instead of full paths
names(tidied_vcf_objects) <- sample_ids
# extract out only the first element of each list item, since the second
# element is metadata can can't easily be combined with the rest into a single
# data frame
just_vcf_data <- lapply(tidied_vcf_objects, `[[`, 1)
# combine the three list elements into a single data frame using plyr, and
# add an id column called 'sample_id', a factor vector that encodes the id of
# each sample, extracted from the shortened list item names, assigned above
combined_vcf_data <- ldply(just_vcf_data,
.id = "sample_id")
# write out the csv of this single combined data frame
write.csv(combined_vcf_data,
file = "~/r_data/combined_tidy_vcf.csv",
row.names = FALSE)
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment