Commit 3ce0c5c1 authored by root's avatar root

Update for 6/1/2021

parent 9f2c5178
ABAQUS/Door.odb
AdvLinux/NAMD
AFNI/ARzs_data.tgz
Beast/Dengue4.env.trees
BLAST/dbs
BLAST/rat-ests
......@@ -10,11 +9,8 @@ Delft3D/*
!Delft3D/run_all_examples.sh
!Delft3D/run_all_examples_sp.sh
!Delft3D/sed_in_file.tcl
digits/digits.img
FreeSurfer/buckner_data
FreeSurfer/buckner_data-tutorial_subjs.tar.gz
FSL/intro
FSL/preCourse.tar.gz
FSL/fmri
Gaussian/g16
Gaussian/tests
......@@ -24,9 +20,24 @@ NAMD/apoa1
NAMD/NAMD_BENCHMARKS_SPARTAN
NAMD/stmv
Python/minitwitter.csv
SAMtools/sample.sam.gz
Singularity/vsoch-hello-world-master.simg
Trimmomatic/*.gz
Trimmomatic/*.fa
Trimmomatic/.backup
*.fastq
*.fastq.gz
*.fasta
*.faa
*.tar
*.tar.gz
*.sam
*.sam.gz
*.simg
*.gz
*.fa
*.img
*.tgz
*.zip
*.jpg
*.png
*.jpeg
*.JPG
*.PNG
*.JPEG
#!/bin/bash
#SBATCH --job-name=ansys-multithread-test
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --output=output.%j.ansys-multithread-test
#SBATCH --time=01:00:00
#### SLURM 1 node, 4 processor per node Ansys test to run for 1 hour.
module load ansys_cfd/20.1
ANSYS_OPTS="-p aa_r -dir $(pwd) -b -np $SLURM_NTASKS"
time ansys201 $ANSYS_OPTS < ansys-test.in
#!/bin/bash
#SBATCH --job-name=ansys-serial-test
#SBATCH --ntasks=1
#SBATCH --output=output.%j.ansys-serial-test
#SBATCH --time=01:00:00
#### SLURM 1 processor Ansys test to run for 1 hour.
module load ansys_cfd/20.1
ANSYS_OPTS="-p aa_r -dir $(pwd) -b"
time ansys201 $ANSYS_OPTS < ansys-test.in
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
#!/bin/bash
# Add your project account details here.
# SBATCH --account=XXXX
#SBATCH --partition=gpgpu
#SBATCH --ntasks=4
#SBATCH --ntasks=1
#SBATCH --time=1:00:00
#SBATCH --partition=snowy
# Using a CUDA version of Amber and GPUs?
# You'll need to invoke the correct directives
# SBATCH --partition=gpgpu
# SBATCH --gres=gpu:p100:4
# Change this as appropriate.
# SBATCH --qos=gpgpuhpcadmin
# Invoke the old build system.
module purge
source /usr/local/module/spartan_old.sh
module load Amber/16-gompi-2017b-CUDA-mpi
srun /usr/local/easybuild/software/Amber/16-gompi-2017b-CUDA-mpi/amber16/bin/pmemd.cuda_DPFP.MPI -O -i mdin -o mdout -inf mdinfo -x mdcrd -r restrt
# Select a version of Amber
# module load Amber/16-GCC-6.2.0
# module load Amber/16-GCC-6.2.0-CUDA
# module load Amber/16-gompi-2017b-CUDA-mpi
module load AmberTools/18-spartan_intel-2017.u2
# module load AmberTools/18-spartan_intel-2017.u2
ambpdb -p 0.15_80_10_pH6.5_1ODX.top -c 0.15_80_10_pH6.5_1ODX.crd > 0.15_80_10_pH6.5_1ODX.pdb
# LL 20190805
Amber (originally Assisted Model Building with Energy Refinement) is software for performing molecular dynamics and structure prediction.
TUTORIAL NOT YET COMPLETE
The sample job is a step from the more extensive tutorial at: https://ambermd.org/tutorials/pengfei/index.htm
It generates a protein PDB file using ambpdb program from topology and coordinate files of the protein system.
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Bowtie-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load bowtie2/2.3.5.1
module load samtools/1.9
module load bcftools/1.9
# Build an index
bowtie2-build pO157_Sakai.fasta.gz pO157_Sakai
# Map the reads, using the trimmed data from fastqc (qv)
bowtie2 -x pO157_Sakai -1 SRR957824_trimmed_R1.fastq -2 SRR957824_trimmed_R2.fastq -S SRR957824.sam
# Convert the SAM file into BAM, a compressed version of SAM that can be indexed.
samtools view -hSbo SRR957824.bam SRR957824.sam
# Sort the bam file per position in the genome and index it
samtools sort SRR957824.bam -o SRR2584857.sorted.bam
samtools index SRR2584857.sorted.bam
# Set up an interactive session with X-windows forwarding or use FastX and visualise with samtools tview
# samtools tview SRR2584857.sorted.bam pO157_Sakai.fasta.gz
# A frequent application for mapping reads is variant calling, i.e. finding positions where the reads are systematically different
# from the reference genome. Single nucleotide polymorphism (SNP)-based typing is particularly popular and used for a broad range of
# applications. For an EHEC O157 outbreak you could use it to identify the source, for instance.
samtools mpileup -uD -f pO157_Sakai.fasta.gz SRR2584857.sorted.bam | bcftools view - > variants.vcf
# In a graphical interactive session examine one of the variants with tview
# samtools tview SRR2584857.sorted.bam pO157_Sakai.fasta.gz -p 'gi|10955266|ref|NC_002128.1|:43071'
Derived from the Swedish Univeristy of Agricultural Sciences
This file contains the sequence of the pO157 plasmid from the Sakai outbreak strain of E. coli O157.
Available from: curl -O -J -L https://osf.io/rnzbe/download
The example maps a prepared read set to a reference sequnce to the virulence plasmid to determine whether p0157 is present in the St Louis
outbreak strain.
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Busco-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load busco/4.0.5-python-3.7.4
module load web_proxy
# You will need to create a busco configuration file. See README.md
# Then modify and uncomment the following line.
# export BUSCO_CONFIG_FILE="/path/to/myconfig.ini"
export BUSCO_CONFIG_FILE="/home/lev/Busco/my-busco.conf"
# The m_genitalium.fasta file is from final.contigs.fa generated in the MEGAHIT example.
busco -i m_genitalium.fasta -l bacteria_odb10 -o busco_genitalium -m genome
NOTA BENE: TUTORIAL IS INCOMPLETE. RUNS BUT WITH ERRORS.
Sone content derived from the Swedish Univeristy of Agricultural Sciences
Busco (Benchmark Universal Single Copy Orthologs) can be used to to find marker genes in a assembly. Marker genes are conserved
across a range of species and finding intact conserved genes in the assembly would be a good indication of its quality.
The file `m_genitalium.fasta` is from the MEGAHIT job example.
The file bacteria_odb9.tar.gz is available from:
http://busco.ezlab.org/v2/datasets/bacteria_odb9.tar.gz
Busco requires editing of a configuration file to operate. A suggested process is as follows:
mkdir Busco
cd Busco
module load foss/2019b ; module load busco/4.0.5-python-3.7.4
cp /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/busco/4.0.5-python-3.7.4/config/config.ini my-busco.conf
busco_configurator.py /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/busco/4.0.5-python-3.7.4/config/config.ini my-busco.conf
#include <stdio.h>
/*
* Notice the absence of the previously expected argument `N`.
* Refactor `loop` to be a CUDA Kernel. The new kernel should
* only do the work of 1 iteration of the original loop.
*/
__global__ void loop()
void loop(int N)
{
/*
* This kernel does the work of only 1 iteration
* of the original for loop. Indication of which
* "iteration" is being executed by this kernel is
* still available via `threadIdx.x`.
*/
printf("This is iteration number %d\n", threadIdx.x);
for (int i = 0; i < N; ++i)
{
printf("This is iteration number %d\n", i);
}
}
__global__ void loop_gpu(){
int time_loop = threadIdx.x;
printf("This is gpu iteration number %d\n", time_loop);
}
int main()
{
/*
* It is the execution context that sets how many "iterations"
* of the "loop" will be done.
* When refactoring `loop` to launch as a kernel, be sure
* to use the execution configuration to control how many
* "iterations" to perform.
*
* For this exercise, only use 1 block of threads.
*/
loop<<<1, 10>>>();
int N = 10;
loop(N);
loop_gpu<<<1, N>>>();
cudaDeviceSynchronize();
}
#include <stdio.h>
__global__ void printSuccessForCorrectExecutionConfiguration()
{
if(threadIdx.x == 1023 && blockIdx.x == 255)
{
printf("Success!\n");
} else {
printf("Failure. Update the execution configuration as necessary.\n");
}
}
int main()
{
/*
* Update the execution configuration so that the kernel
* will print `"Success!"`.
*/
printSuccessForCorrectExecutionConfiguration<<<1, 1>>>();
cudaDeviceSynchronize();
}
#include <stdio.h>
__global__ void loop()
{
/*
* This idiomatic expression gives each thread
* a unique index within the entire grid.
*/
/*
* Refactor `loop` to be a CUDA Kernel. The new kernel should
* only do the work of 1 iteration of the original loop.
*/
int i = blockIdx.x * blockDim.x + threadIdx.x;
printf("%d\n", i);
void loop(int N)
{
for (int i = 0; i < N; ++i)
{
printf("This is iteration number %d\n", i);
}
}
__global__ void loop_gpu(){
int time_loop = blockIdx.x;
printf("This is gpu iteration number %d\n", time_loop);
}
int main()
{
/*
* Additional execution configurations that would
* work and meet the exercises contraints are:
* When refactoring `loop` to launch as a kernel, be sure
* to use the execution configuration to control how many
* "iterations" to perform.
*
* <<<5, 2>>>
* <<<10, 1>>>
* For this exercise, be sure to use more than 1 block in
* the execution configuration.
*/
loop<<<2, 5>>>();
int N = 10;
loop(N);
loop_gpu<<<N, 1>>>();
cudaDeviceSynchronize();
}
# Setup
All examples in this directory with a numerical prefix, 01-, 02- etc are from NVidia.
To run a sample CUDA job start with interactive job. Change the qos as appropriate.
`sinteractive --partition=gpgpu --gres=gpu:p100:4 --qos=gpgpuhpcadmin`
Invoke the old (2015-2020) build system
`source /usr/local/module/spartan_new.sh`
Load a CUDA module
`module load CUDA/8.0.44-GCC-4.9.2`
Copy the CUDA directory to home and enter:
`cd ~; cp -r /usr/local/common/CUDA . ; cd CUDA`
# Structure of CUDA Code
As with all parallel programming, start with serial code, engage in decomposition, then generate parallel code.
......@@ -19,49 +39,37 @@ int main()
cudaDeviceSynchronize();
}
The __global__ keyword indicates that the following function will run on the GPU, and can be invoked globally, which in this context means either by
the CPU, or, by the GPU.
Often, code executed on the CPU is referred to as host code, and code running on the GPU is referred to as device code.
# Hello World from CPU and GPU
# Compiling a Sample GPU Job
To run a sample CUDA job start with interactive job.
A CUDA program can run portions of the code on the CPU and portions on the GPU.
sinteractive --partition=gpgpu --gres=gpu:p100:4
Review the non-CUDA code:
Load a CUDA module
`01-hello-gpu.cu`
`module load CUDA/8.0.44-GCC-4.9.2`
Compile and execute:
To compile 01-hello-gpu-solution.cu, run:
`nvcc 01-hello-gpu-solution.cu -o helloCUDA -gencode arch=compute_60,code=sm_60`
Execute the generated helloCUDA running:
`nvcc 01-hello-gpu.cu -o helloCUDA -gencode arch=compute_60,code=sm_60`
`./helloCUDA`
Or, as an alternative, compile with `-run` at the end of the compilation line which will run the compiled binary right away.
# Examples
All examples with a numerical prefix, 01-, 02- etc are from NVidia.
Refactor the code to take advantage of CUDA functions, recompile, and execute.
# Debug with printf
Calling printf from a CUDA kernel function is no different than calling printf on CPU code. In the vector addition example, edit vec_add.cu and insert the following code after line 18:
if(threadIdx.x == 10)
printf("c[%d] = %dn", id, c[id]);
`nvcc 01-hello-gpu-solution.cu -o helloCUDA -gencode arch=compute_60,code=sm_60`
`./helloCUDA`
# Supported Gencode variations for sm and compute
What are thos gencode requirements?
Below are the supported sm variations and sample cards from that generation
Supported on CUDA 7 and later
Fermi (CUDA 3.2 until CUDA 8) (deprecated from CUDA 9):
......@@ -95,6 +103,111 @@ Turing (CUDA 10 and later)
(c.f., http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/)
# Parallel Kernels
Review the non-CUDA code:
`01-first-parallel.cu`
Compile and execute:
`nvcc 01-first-parallel.cu -o firstCUDA -gencode arch=compute_60,code=sm_60`
`./firstCUDA`
Refactor the code to take advantage of CUDA functions, recompile, and execute.
`nvcc 01-first-parallel-solution.cu -o firstCUDA -gencode arch=compute_60,code=sm_60`
`./firstCUDA`
Modify the distribution of kernels as desired.
# Thread and Block Indices
Currently, the 01-thread-and-block-idx.cu file contains a kernel function that fails.
`nvcc 01-thread-and-block-idx.cu -o indexCUDA -gencode arch=compute_60,code=sm_60`
Refactor the code so that the index is correct, recompile, and execute.
`nvcc 01-thread-and-block-idx-solution.cu -o indexCUDA -gencode arch=compute_60,code=sm_60`
# Accelerating For Loops
Consider the non-accelerated (CPU-based) loop, compile and run.
`nvcc 01-single-block-loop.cu -o loopCUDA -gencode arch=compute_60,code=sm_60`
`./loopCUDA`
Refactor, recompile, and execute.
`nvcc 01-single-block-loop-solution.cu -o loopCUDA -gencode arch=compute_60,code=sm_60`
`./loopCUDA`
Modify the number of iterations as desired.
# Multiple Blocks of Threads
Consider the non-accelerated (CPU-based) loop, compile and run.
`nvcc 02-multi-block-loop.cu -o loop2CUDA -gencode arch=compute_60,code=sm_60`
`./loop2CUDA`
Refactor, recompile, and execute for multiple blocks.
`nvcc `02-multi-block-loop-solution.cu -o loop2CUDA -gencode arch=compute_60,code=sm_60`
`./loop2CUDA`
Note the order of returns.
# Loop with a Mismatched Execution
The program in 02-mismatched-config-loop.cu uses cudaMallocManaged to allocate memory for an integer array of 1000 elements, and
then attempts to initialize all the values in the array in parallel using CUDA kernel function.
`nvcc 02-mismatched-config-loop.cu -o mismatchCUDA -gencode arch=compute_60,code=sm_60`
`./mismatchCUDA`
Refactor, recompile, and execute for multiple blocks.
`nvcc `02-mismatched-config-loop-solution.cu -o loop2CUDA -gencode arch=compute_60,code=sm_60`
`./mismatch`
# Grid-Stride Loops
Grid span cycle: the number of data elements is often greater than the number of threads in the grid. In this case, the thread
cannot process only one element, or the work will not be completed. One of the ways to solve this problem programmatically is to use
the grid span cycle. In the grid span cycle, the first element of the thread is still calculated by
threadIdx.x+blockIdx.x*blockDim.x, and then the thread will move forward according to the number of threads in the grid (blockDim.x
* gridDim.x),
`nvcc 03-grid-stride-double.cu -o gridstrideCUDA -gencode arch=compute_60,code=sm_60`
`./gridstrideCUDA`
`nvcc 03-grid-stride-double-solution.cu -o mismatchCUDA -gencode arch=compute_60,code=sm_60`
`./gridstrideCUDA`
# Debug with printf
Calling printf from a CUDA kernel function is no different than calling printf on CPU code. In the vector addition example, edit vec_add.cu and insert the following code after line 18:
if(threadIdx.x == 10)
printf("c[%d] = %dn", id, c[id]);
# CUDA Error Handling
Most CUDA functions return a value of type cudaError_t, which can be used to check for errors when calling a function.
......
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=FastQC-test1
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the module to check the quality of the sequence data we will use a tool called FastQC.
module purge
module load fastqc/0.11.9-java-11.0.2
# Run FastQC
fastqc SRR957824_500K_R1.fastq.gz SRR957824_500K_R2.fastq.gz
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=FastQC-test1
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the module to check the quality of the sequence data we will use a tool called FastQC.
module purge
module load fastqc/0.11.9-java-11.0.2
# Run FastQC
fastqc SRR957824_trimmed_R1.fastq SRR957824_trimmed_R2.fastq
Derived from the University of Agricultural Science, Sweden
The first dataset you will be working with is from an Illumina MiSeq dataset. The sequenced organism is an enterohaemorrhagic E.
coli (EHEC) of the serotype O157, a potentially fatal gastrointestinal pathogen. The sequenced bacterium was part of an outbreak
investigation in the St. Louis area, USA in 2011. The sequencing was done as paired-end 2x150bp.
The raw data were deposited at the European Nucleotide Archive, under the accession number SRR957824. A subset of the original
dataset for this tutorial.
FastQC is used to check the quality of the data. There are two example job submission scripts. The first is on the raw data, the
second after running Scythe (qv) and Sickle (qv) to remove poor quality bases.
......@@ -60,7 +60,7 @@ gdb gdbtest
cp gdbtest.c gdbtest2.c
gcc -Wall -g gdbtest2.c -o gdbtest2
$ ./gdbtest
$ ./gdbtest2
# There is still another bug! Can you find it? Use GDB to help.
......
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Gretl-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Load the environment variables
module purge
source /usr/local/module/spartan_old.sh
module load gretl/2018c-GCC-6.2.0-LAPACK-3.8.0-OpenBLAS-0.3.5
module load LAPACK/3.8.0-GCC-6.2.0-OpenBLAS-0.3.5
module load OpenSSL/1.0.2l-GCC-6.2.0
module load gnuplot/5.0.0-GCC-6.2.0
# Run a basic Gretl example
gretlcli -b first_ex.inp
This source diff could not be displayed because it is too large. You can view the blob instead.
This diff is collapsed.
open AWM.gdt --quiet # load data from disk
/* data transformations and visualisation */
# the "series" concept: operate on
# vectors on an element-by-element basis
# (but you also have special functions)
series y = 100 * hpfilt(ln(YER))
series u = 100 * URX
series r = STN - 100*sdiff(ln(HICP))
scatters y r u --output=display # command example with an option: graph data
/* in-house VAR */
scalar p = 2 # strong typing: a scalar is not a # matrix nor a series
var p y r u # estimation command
A = $coeff # and corresponding accessor
/* by iterated OLS */
list X = y r u # the list is yet another variable type
matrix B = {} # initialize an empty matrix
# loop over the 3 var equations
# using native OLS command
# and store the estimated coefficients
loop foreach i X
ols $i const X(-1 to -p) --quiet
B ~= $coeff
endloop
# as matrix columns
matrix mY = { y, r, u }
matrix mX = 1 ~ mlag(mY, {1,2})
mY = mY[p+1:,]
mX = mX[p+1:,] #
/* via matrices */
# construct a matrix from series
# or from matrix operators/functions
# and select the appropriate rows
# via "range" syntax
C1 = mX\mY # matlab-style matrix inversion
C2 = mols(mY, mX) # or native function
C3 = inv(mX'mX) * (mX'mY) # or algebraic primitives
/* or the hard, needlessly complicated way --- just to show off */
function scalar loglik(matrix param, const matrix X, const matrix Y)
# this function computes the concentrated log-likelihood
# for an unrestricted multivariate regression model
scalar n = cols(Y)
scalar k = cols(X)
scalar T = rows(Y)
matrix C = mshape(param, k, n)
matrix E = Y - X*C
matrix Sigma = E’E
return -T/2 * ln(det(Sigma))
end function
matrix c = zeros(21,1)
mle ll = loglik(c, mX, mY)
params c
end mle
# initialize the parameters
# and maximize the log-likelihood
# via BFGS, printing out the
# results when done
D = mshape(c, 7, 3) # reshape the results for conformability
/* print out the results */
# note: row ordering between alternatives is different
print A B C1 C2 C3 D
/* Basic Matrix operations with Gretl */
matrix a = {11, 22 ; 33, 44} # a is square 2 x 2
matrix b = {1,2,3; 3,2,1} # b is 2 x 3
matrix c = a'
matrix d = a*b # c is the transpose of a # d is a 2x3 matrix equal to a times b
matrix gina = b'd # valid: gina is 3x3
matrix lina = d + b # valid: lina is 2x3
matrix A = {2,1;0,1}
matrix B = {1,1;1,0}
matrix KP = A ** B
matrix PWR = A^3
matrix HC = A ~ B
matrix VC = A | B
print A B KP PWR HC VC
# example: OLS using matrices
# fix the sample size
scalar T = 256
# construct vector of coefficients by direct imputation
matrix beta = {1.5, 2.5, -0.5} # note: row vector
# construct the matrix of independent variables
matrix Z = mnormal(T, cols(beta)) # built-in functions
# now construct the dependent variable: note the
# usage of the "dot" and transpose operators
matrix y = {1.2} .+ Z*beta’ + mnormal(T, 1)
# now do estimation
matrix X = 1 ~ Z # concatenation operator
matrix beta_hat1 = inv(X’X) * (X’y) # OLS by hand
matrix beta_hat2 = mols(y, X) # via the built-in function
matrix beta_hat3 = X\y # via matrix division
print beta_hat1 beta_hat2 beta_hat3
/* assume y and X are given T x 1 and T x k matrices */
bundle my_model = null # initialization
my_model.T = rows(X) # sample size
my_model.k = cols(X) # number of regressors
matrix e # will hold the residuals
b = mols(y, X, &e) # perform OLS via native function
s2 = meanc(e.^2) # compute variance estimator
matrix V = s2 .* invpd(X’X) # compute covariance matrix
/* now store estimated quantities into the bundle */
my_model.betahat = b
my_model.s2 = s2
my_model.vcv = V
my_model.stderr = sqrt(diag(V))
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=ImageMagick-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load imagemagick/7.0.9-5
# Run the script!
sh watermark.sh
This is an example of using ImageMagick on the cluster. The job submission script launched the script watermark.sh.
The file, watermark.sh, contains two functions. The first separates original jpg files in portrait and landscape directories. The second
applies a different watermark to each directory (in this example they are the same, but you get the idea)...
Sample image files are from my own collection. The watermark is not.
LL202010105
#!/bin/bash
# Lev Lafayette, December 2020
separate() { # Separate original files in portrait and landscape directories
mkdir portraits; mkdir landscapes
for item in ./*.jpg
do
orient=$(identify -format '%[fx:(h>w)]' "$item")
if [ $orient -eq 1 ] ;
then
cp "$item" ./portraits
else
cp "$item" ./landscapes
fi
done
}
apply() { # Apply correct watermark to each directory
cd portraits
for item in ./*.jpg; do convert "$item" ../watermarkp.png +distort affine "0,0 0,0 %[w],%[h] %[fx:t?v.w*(u.h/v.h*0.1):s.w],%[fx:t?v.h*(u.h/v.h*0.1):s.h]" -shave 1 -gravity southeast -geometry +20+20 -composite "$item" ; done
cd ../landscapes
for item in ./*.jpg; do convert "$item" ../watermarkl.png +distort affine "0,0 0,0 %[w],%[h] %[fx:t?v.w*(u.h/v.h*0.1):s.w],%[fx:t?v.h*(u.h/v.h*0.1):s.h]" -shave 1 -gravity southeast -geometry +20+20 -composite "$item" ; done
}
main() {
separate
apply
}
main
exit
#!/bin/bash
# One task, one core, default partition, ten minutes walltime
module purge
module load matlab/2020a
matlab -nodesktop -nodisplay -nosplash< sigmoid.m
This directory includes a simple single-core MATLAB Slurm job which
generates a "Polar Plot" from the associated script.
The file 2019matlab2.slurm creates a sigmoid graph.
There is also an example, matlab-p, which demonstrates the difference of
using a single-core and multi-core loop in MATLAB.
......
x = 0:0.1:10;
y = smf(x,[1 8]);
plot(x,y)
xlabel('smf, P = [1 8]')
ylim([-0.05 1.05])
print -deps sigmoid.ps;
quit;
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=MEGAHIT-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load megahit/1.1.4-python-3.7.4
# Conduct a De-novo assembly on the bacterium.
megahit -1 ERR486840_1.fastq.gz -2 ERR486840_2.fastq.gz -o m_genitalium
# This produces the assembly. Check the report file
# m_genitalium_report/report.html
# De-novo Genome Assembly with MEGAHIT
Derived from content from the Swedish University of Agricultural Sciences
## Data collection
M. genitalium was sequenced using the MiSeq platform (2 * 150bp). The reads were deposited in the ENA Short Read Archive under the
accession ERR486840
module load web_proxy
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR486/ERR486840/ERR486840_1.fastq.gz
wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR486/ERR486840/ERR486840_2.fastq.gz
# Math Basics
Often one needs to engage in some relatively simple calculations on the Linux command line interactively or in a script. Rather than
powerful a fully-fledged environment, such as Octave, R, Maxima, Julia, etc, or even writing one's own program, one can make use of
various Linux utilities and lightweight applications instead.
The main tools for this include shell arithmetic, the utilities `expr`, `bc`, `awk`, and `datamash`. This document includes
consideration of `expr`, `bc`, and `datamash` with `awk` covered in the RegEx directory (`/usr/local/common/RegEx`).
## Shell Arithmetic
The following examples are from the "Supercomputing with Linux" book as part of this series and illustrates the the simple integer arithmetic used in the bash shell through a loop structure.
`x=1; while [ $x -le 5 ]; do echo "While-do count up $x"; x=$(( $x + 1 )); done`
`x=5; until [ $x -le 0 ]; do echo "Until-do count down $x"; x=$(( $x - 1 )); done`
`x=1; until [ $x = 6 ]; do echo "Until-do count up $x"; x=$(( $x + 1 )); done`
The format for bash shell arithmetic is `$(( expression ))`. The shell expansion returns the result the given expression. Some simple examples making use of a variable follows:
```
$ x=5 && echo $x # Sets x to 5.
$ echo $((x+2)) # Returns 7, x is still 5.
$ x=$((x+3)) && echo $x # Sets x to 8.
$ ((x+=3)) && echo $x # Sets x to 11.
$ echo $((x++)) && echo $x # Echo x, increment by 1, echo x again.
$ echo $((++x)) && echo $x # Increment by 1, echo x.
$ echo $((--x)) && echo $x # Decrement by 1, echo x.
$ x=$((x/4)) && echo $x # Division
$ x=$((x*4)) && echo $x # Multiplication
$ x=$((x%4)) && echo $x # Modulus
$ x=3 && ((x=x**2)) && echo $x # Set x to 3, exponent ^2, echo result.
$ echo $((x=4,y=5,z=x*y,w=z/2)) # Multiple statements must be comma-separated. Shell gives last expression.
$ echo $x $y $z $w # ... but all statements are evaluated and variables saved.
```
See the file `fibonacci.sh` for a simple example of taking in user input for shell calculations.
## Expr
The command `expr` evaluates an expression of strings or decimal integers, optionally signed, and writes to standard output; any
other operand is converted to an integer or string, depending on the operation being applied to it. Operators and operands must be
separated by spaces and operators may need to be escaped. For example;
$ expr 1 + 2 + 3 # Addition. Note separation of operators and operands
$ expr 1+2+3 # Note no separation; treated as a whole.
$ expr 100 - 9 # Subtraction
$ expr 100 / 9 # Division. Note that the result in an integer
$ expr 100 * 9 # An error; multiplication operator must be escaped.
$ expr 100 \* 9 # Multiplication; metacharacter now a literal.
$ expr 20 % 3 # Modulus i.e., 20/3 = 6 residual 2.
$ expr 29 % 30 # Modulus, how much is left over if less than 1?
$ expr 2 \> 10 # Greater than, escape the metacharacter, returns false (0)
$ expr 2 \< 10 # Less than, escape the metacharacter, returns true (1)
$ expr 2 = 2 # Equals. Returns true (1), thankfully.
$ expr 2 \<= 2 # Less than or equal to.
$ expr 2 \>= 2 # Greater than or equal to.
$ expr A = A # String test for equality, returns true. Ayn Rand likes this.
$ expr a = A # String test for equality, returns false. Operating system is case-sensitive.
$ expr a + b # Addition of strings generates error.
$ expr a < b # Comparison of strings not based on alphabetical order.
Strings are not quotes although one may wish to do this to protect against special or metacharacters (e.g., spaces). Quoted numbers
are treated like strings. Special operators for strings include length, index, and count of matching characters with regular
expression meta-characters.
$ expr "1 + 2" # This is a string
$ expr length "This is a long string" # Length includes spaces.
$ expr index "This is a long string" "l" # Index requires $string first, value second.
$ expr substr "This is a long string" 6 2 # Starting from character 6, provide next 2 characters.
$ expr "This is a long string" : "This is" # Count matching character
$ expr "This is a long string" : "This is l" # Characters must be contiguous
$ expr "This is a long string" : "^This" # RegEx metacharacts apply; `^` is implicit.
$ expr "This is a long string" : "This$" # RegEx metacharacters apply
$ expr match "This is a string" "This$" # Alternative representation, RegEx metacharacters apply.
substr STRING POS LENGTH substring of STRING, POS counted from 1
Previous examples showed comparisons in numerical expressions. However, these values are arguments and can equally apply with
variables. The following illustrates this use, along with with AND/OR evaluation.
$ value1=20; value2=30 # Set some shell variables.
$ test1=$(expr $value1 \<= $value2) # Set a variable that evaluates an expr command.
$ test2=$(expr $value1 \> $value2) # Set a variable that evaluates an expr command.
$ action=$(expr $test1 \| $test2) # Use of logical 'OR' (other option, logical AND with `&`)
$ echo $action # Provide result.
Although `expr` may seem to be quite limited, operating on integers only, apart from these mathematical functions note that it can
also be used with filenames with integer values, or sub-values, such as in HPC job arrays. In addition, with GNU parallel, some
calculations can be arranged to make use of a multicore environment. e.g., `parallel echo {1} '$(expr {2} \* 2 )' ::: {1..2} :::
{1..2}`
scale=2
print "\nA Simple Arithmetic Calculator using bc\n"
print " Enter x and y value then select an operation.\n\n"
print "x=? "; x = read()
print "y=? "; y = read()
print "Choose an operation: addition (1), subtraction (2), multiplication (3), division (4), "; op = read()
if (op == 1) print "Addition: ", x, "+", y, "=", x+y;
if (op == 2) print "Subtraction: ", x, "-", y, "=", x-y;
if (op == 3) print "Multiplication: ", x, "*", y, "=", x*y;
if (op == 4) print "Division: ", x, "/", y, "=", x/y;
print "\n\n"
quit
0.325 28.532
0.275 42.182
0.264 44.846
0.31 32.991
0.249 35.279
0.261 44.861
0.269 42.665
0.292 46.095
0.289 38.187
0.319 38.738
0.289 36.553
0.257 36.691
0.295 22.317
0.334 42.051
0.339 31.37
0.355 28.422
0.327 40.053
0.458 16.132
0.285 38.754
0.349 32.689
0.262 39.031
0.275 35.028
0.32 35.393
0.22 33.053
0.333 34.401
0.275 43.934
0.299 27.938
0.404 24.352
0.366 33.542
0.39 24.33
0.46 21.071
0.309 33.241
0.348 31.089
0.243 36.379
0.355 30.741
0.374 30.261
#!/bin/bash
# Example derived from Stack Overflow
# https://stackoverflow.com/questions/28522326/extracting-substring-in-linux-using-expr-and-regex
# Reads a line from the file views.txt
#
while read line
do
out=$(expr "$line" : ".*<li>\(.*\) views")
echo $out
done < views.txt
exit
/********************************
bc program: extensions.bc
author: Steffen Brinkmann
e-mail: s.z.s@web.de
comments: - published under the GPL
- contains functions of trigonometry, exponential functions,
functions of number theory and some mathematical constants
so far.
*********************************/
/*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU Library General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with this program; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
*/
/***********************************************
--I-- constants
1.- pi : defined as 4*atan(1)
2.- e : defined as e(1)
--II-- trigonometry:
1.- sin(x) : returns sine of x
2.- cos(x) : returns cosine of x
3.- atan(x) : returns arc tangent of x
4.- tan(x) : returns tangent of x
5.- asin(x) : returns arc sine of x
6.- acos(x) : returns arc cosine of x
7.- cot(x) : returns cotangent of x
8.- acot(x) : returns arc cotangent of x
9.- sec(x) : returns secans of x
10.- cosec(x),csc(x) : returns cosecans of x
11.- asec(x) : returns arc secans of x
12.- acosec(x),ascs(x) : returns arc cosecans of x
13.- sinh(x) : returns hyperbolical sine of x
14.- cosh(x) : returns hyperbolical cosine of x
15.- tanh(x) : returns hyperbolical tangent of x
16.- coth(x) : returns hyperbolical cotangent of x
17.- asinh(x) : returns arc hyperbolical sine of x
18.- acosh(x) : returns arc hyperbolical cosine of x
19.- atanh(x) : returns arc hyperbolical tangent of x
20.- acoth(x) : returns arc hyperbolical cotangent of x
21.- sech(x) : returns secans hyperbolicus of x
22.- cosech(x),csch(x) : returns cosecans hyperbolicus of x
23.- asech(x) : returns arc secans hyperbolicus of x
24.- acosech(x),acsch(x) : returns arc cosecans hyperbolicus of x
--II-- exponential functions:
1.- ln(x) : returns natural logarithm of x
2.- log(x) : returns logarithm (base 10) of x
3.- lb(x),ld(x) : returns logarithm (base 2) of x
4.- pow(x,y) : returns x to the power of y
--III-- number theory:
1.- abs(n) : returns absolute value of n
2.- mod(a,b) : returns a modulo b
3.- factorize(n),fac(n) : prints primefactors of n,
returns number of primefactors
returns 0 if n is a prime number
returns -1 if n is +-1 or 0
CAUTION: 13-digit number may need 30 s
4.- factorial(n),f(n) : returns n factorial
5.- gcd(a,b) : returns the greatest common divisor of a and b
6.- lcm(a,b) : returns the least common multiple of a and b
7.- bessel(n,x) : returns the Bessel function order n of x
************************************************/
pi=4*a(1)
e=e(1)
define sin(x)
{
return (s(x))
}
define cos(x)
{
return (c(x))
}
define atan(x)
{
return (a(x))
}
define tan(x)
{
return (s(x)/c(x))
}
define asin(x)
{
if(x==1) return(pi/2)
if(x==-1) return(-pi/2)
return(a(x/sqrt(1-(x^2))))
}
define acos(x)
{
if(x==1) return(0)
if(x==-1) return(pi)
return(pi/2-a(x/sqrt(1-(x^2))))
}
define cot(x)
{
return(c(x)/s(x))
}
define acot(x)
{
return(pi/2-a(x))
}
define sec(x)
{
return(1/c(x))
}
define cosec(x)
{
return(1/s(x))
}
define csc(x)
{
return(1/s(x))
}
define asec(x)
{
return(acos(1/x))
}
define acosec(x)
{
return(asin(1/x))
}
define acsc(x)
{
return(asin(1/x))
}
define sinh(x)
{
return((e(x)-e(-x))/2)
}
define cosh(x)
{
return((e(x)+e(-x))/2)
}
define tanh(x)
{
return((e(x)-e(-x))/e(x)+e(-x))
}
define coth(x)
{
return((e(x)+e(-x))/e(x)-e(-x))
}
define asinh(x)
{
return(l(x + sqrt(x^2 + 1)))
}
define acosh(x)
{
return(l(x + sqrt(x^2 - 1)))
}
define atanh(x)
{
return((l(1 + x) - l(1 - x))/2)
}
define acoth(x)
{
return(atanh(1/x))
}
define sech(x)
{
return(1/cosh(x))
}
define cosech(x)
{
return(1/sinh(x))
}
define csch(x)
{
return(1/sinh(x))
}
define asech(x)
{
return(acosh(1/x))
}
define acosech(x)
{
return(asinh(1/x))
}
define acsch(x)
{
return(asinh(1/x))
}
/************************/
define ln(x)
{
return(l(x))
}
define log(x)
{
return(l(x)/l(10))
}
define lb(x)
{
return(l(x)/l(2))
}
define ld(x)
{
return(lb(x))
}
define pow(x,y)
{
return(e(y*l(x)))
}
/************************/
define abs(n){
if(n>=0) return(n)
return(-n)
}
define mod(a,b){
auto c,tmp_scale
tmp_scale=scale(sqrt(2))
scale=0
c=a%b
scale=tmp_scale
if(a>=0) return(c)
if(c==0) return(0)
return(c+b)
}
define fac(n)
{
auto tmp,i,factors
if(abs(n)<=1)
{
print abs(n),"\nnumber of factors: "
return(0)
}
if(abs(n)==2)
{
print 2,"\nnumber of factors: "
return(1)
}
tmp=n
while(mod(tmp,2)==0)
{
print 2," "
tmp/=2
factors+=1
}
if(prime[0]==2) /*primenumbers.bc is loaded*/
{
i=0
while((prime[i]*prime[i])<=(n+1))
{
if(mod(tmp,prime[i])==0)
{
print prime[i]," "
tmp/=prime[i]
factors+=1
}else{
i+=1
if(i>65535)
{
break
}
}
}
}
if(i>65535)
{
i=prime[65535]
}else
{
i=3
}
while((i*i)<=(n+1))
{
if(mod(tmp,i)==0)
{
print i," "
tmp/=i
factors+=1
}else{
i+=2
}
}
if(tmp!=1)
{
factors+=1
print tmp," " /*BUG: prints zeros after factor*/
}
print "\n"
print "number of factors: "
return(factors)
}
define factorize(n)
{
return (fac(n))
}
define f(n)
{
if (n <= 1) return (1);
return (f(n-1) * n);
}
define factorial(n)
{
return(f(n))
}
define gcd(m,n){
auto a,b,c,tmp_scale
a=abs(m) /* a=r[0] */
if(n==0) return(a)
b=abs(n) /* b=r[1] */
/*tmp_scale=scale(sqrt(2))*/
/*c=a%b /* c=r[2]=r[0] mod(r[1]) */
c=mod(a,b)
while(c>0){
a=b
b=c
/*(c=a%b /* c=r[j]=r[j-2] mod(r[j-1]) */
c=mod(a,b)
}
/*scale=tmp_scale*/
return(b)
}
define lcm(a,b){
auto g
g=gcd(a,b)
if(g==0) return(0)
return(abs(a*b)/g)
}
define bessel(n,x){
return(j(n,x))
}
#!/bin/bash
# Script for Fibonacci sequence
# Constants. First two numbers of the Fibonacci Series
first=0
second=1
# User input for quantity of sequence
usage(){
echo "Enter an integer to witness the Fibonacci sequence!"
read input
if [ $((input)) != $input ]; then
echo "Not a number! Exiting!"
exit 0
fi
}
# Loop to display the series
calculate() {
echo "The Fibonacci series is : "
for (( i=0; i<input; i++ ))
do
echo -n "$first "
fn=$((first + second))
first=$second
second=$fn
done
echo; echo
}
main() {
usage
calculate
}
# Main function
main
exit
define sum(x,y) {
return x+y
}
print " Enter x and y value.\n\n"
print "x=? "; x = read()
print "y=? "; y = read()
sum(x,y)
quit
This diff is collapsed.
53700.681
56837.722
52350.732
50077.787
41049.346
57408.514
49376.601
46397.964
54652.833
29534.906
32085.979
57745.588
84575.437
42908.652
41363.670
42240.271
116622.236
20659.867
57560.091
42621.539
67639.112
31471.211
34488.579
32542.885
40611.379
53746.799
69357.504
28713.842
46918.227
62949.282
25040.967
36299.865
40255.120
38876.863
30750.583
36011.318
0.325
0.275
0.264
0.31
0.249
0.261
0.269
0.292
0.289
0.319
0.289
0.257
0.295
0.334
0.339
0.355
0.327
0.458
0.285
0.349
0.262
0.275
0.32
0.22
0.333
0.275
0.299
0.404
0.366
0.39
0.46
0.309
0.348
0.243
0.355
0.374
/*************************************
bc program: scientific_constants.bc
author: Steffen Brinkmann
e-mail: subcom@users.sourceforge.net
comments: - published uner the GPL
- contains particle masses, basic constants, such as
speed of light in the vacuum and the gravitational
constant.
- scale is set to 100
*************************************/
scale=100
/* -- I -- particle masses: */
mp = 1.6726231*10^(-27) /* proton rest mass in kg */
mn = 1.6749286*10^(-27) /* neutron rest mass in kg */
me = 9.1093897*10^(-31) /* electron rest mass in kg */
mpev = 938.28*10^6 /* proton rest mass in eV */
mnev = 939.57*10^6 /* neutron rest mass in eV */
meev = .511*10^6 /* electron rest mass in eV */
/* -- II -- general constants: */
c = 299792458 /* speed of light in the vacuum in m/s */
h = 6.6260755*10^(-34) /* Planck constant in Js */
hbar = 1.05457266*10^(-34) /* Planck constant divided by 2*pi in Js */
kb = 1.380658*10^(-23) /* boltzmann constant in J/K */
ec = 1.60217733*10^(-19) /* elementary charge in C */
na = 6.0221367*10^23 /* avogadro number in 1/mol */
epsilon0 = 8.854187817*10^(-12) /* dielectric constant in C^2/Jm */
mu0 = 12.566370614 /* permeability of vacuum in T^2*m^3/J */
alpha = .00729735307 /* fine structure constant */
mub = 9.2740154*10^(-24) /* Bohr magneton J/T */
mun = 5.0507866*10^(-27) /* nuclear magneton J/T */
ge = 2.002319304386 /* free electron g factor */
gammae = 1.7608592*10^11 /* free electron gyromagnetic ratio in T/s */
mue = -9.2847701*10^(-24) /* electron magnetic moment */
gammap = 2.67515255*10^8 /* proton gyromagnetic ratio in water in T/s */
mup = 1.41060761*10^(-26) /* proton magnetic moment in J/T */
amu = 1.66057*10^(-27) /* atomic mass unit in kg */
a0 = 5.29177*10^(-11) /* Bohr radius in m */
re = 2.81792*10^(-15) /* electron radius in m */
vmol = 22.41383 /* molar volume in l/mol */
gh = 5.585 /* proton g factor (lande factor) */
grav = 6.673*10^(-11) /* gravitational constant m^3/kg*s^2 */
g = 9.80665 /* acceleration due to gravity on surface of earth in m/s^2 */
lambdac = 2.42631*10^(-12) /* compton wavelength of the electron m */
Shawn Arts 65
Marques Arts 58
Fernando Arts 78
Paul Arts 63
Walter Arts 75
Derek Arts 60
Nathaniel Arts 88
Tyreque Arts 74
Trevon Arts 74
Nathan Arts 71
Zachary Arts 80
Donovan Arts 75
Levi Arts 76
Sage Arts 55
Roberto Arts 65
William Arts 46
Nico Arts 59
Bryan Arts 68
Isaiah Arts 80
David Business 92
Leonard Business 87
Tysza Business 92
Darren Business 94
Christian Business 88
Aaron Business 83
Kerris Business 82
Dakota Business 83
Teriuse Business 94
Caleb Business 87
Juan Business 79
Andre Health-Medicine 72
Diego Health-Medicine 82
Jonathan Health-Medicine 100
Kevin Health-Medicine 100
Patrick Health-Medicine 92
D'Angelo Health-Medicine 90
Daniel Health-Medicine 91
Dilan Health-Medicine 84
Angel Health-Medicine 100
Peter Health-Medicine 86
Dalton Health-Medicine 100
Israel Health-Medicine 81
Gabriel Health-Medicine 100
Chase Social-Sciences 27
Leroy Social-Sciences 74
Jesse Social-Sciences 32
Drake Social-Sciences 76
Ja'Won Social-Sciences 37
Joel Social-Sciences 72
Darius Social-Sciences 51
David Social-Sciences 69
Williams Social-Sciences 62
Manuel Social-Sciences 61
Lance Social-Sciences 65
Drake Social-Sciences 59
Joseph Social-Sciences 61
Randy Social-Sciences 68
Justin Social-Sciences 90
Yeng Life-Sciences 39
Allen Life-Sciences 50
Brandon Life-Sciences 72
Christian Life-Sciences 67
Aaron Life-Sciences 58
Gurnam Life-Sciences 66
Anthony Life-Sciences 32
Joshua Life-Sciences 14
Nathen Life-Sciences 46
Christopher Life-Sciences 59
John Life-Sciences 70
Austin Life-Sciences 91
Antonio Engineering 88
Faison Engineering 47
Devin Engineering 92
Ignatius Engineering 83
Sonny Engineering 50
Antonio Engineering 56
Zackery Engineering 54
Joe'Quann Engineering 75
Thanh Engineering 53
Michael Engineering 39
Leonardo Engineering 78
Omar Engineering 99
Avery Engineering 51
28.532
42.182
44.846
32.991
35.279
44.861
42.665
46.095
38.187
38.738
36.553
36.691
22.317
42.051
31.37
28.422
40.053
16.132
38.754
32.689
39.031
35.028
35.393
33.053
34.401
43.934
27.938
24.352
33.542
24.33
21.071
33.241
31.089
36.379
30.741
30.261
IbcowWcMcXqY2YGJLwm45kNWV O2ot gWPtKD8Z
8GDmhaWChddMP14ZRg2UyOMtjlUFo4nm9m0QYA90Me1ibfCHpJUFS3Mp yb xzzkAuv68senShLdSPhAGUXivhBH7E i1D DXlLEKf9NU2dTwsmEMdtIhhlqv6PN9a2PzGXk6Df249 f4
F25AtcPC7LDWSxUBRFLzYg
Ro N9hCb3OQMlWSVtyv
Y <li>65,435 views</li>
g1cfKI6a8 QT2nHJ1Fcx6NcKD Z z2xtA4KmA1
nLeDjwV7ftg 6XT2FriGQQ21qJBXFswt62BxPevqKal1 B1TVz80ZjOCTKMLdjRHr83
H dJ
YeonF4R8EAXM XrpzS2C94HJ <li>6,4135 views</li> iBjknyMQxEhhf7MZP8xFAV0BgNyZ8jmBIvwWzwhTT2qGKaGnIqnV5bCuOwPLSjC9lFKbimGPVnPlzhWFyO5xo
nePAFzRnsfp4fglVYLWAXlwoqNHaANep2Z9mbsQXG5cfimo49qu7G0lFFfOz KR2Vr3vUUUcEh7o0s
7 cG7pehJoj1 0GaIefLyMOvSYMFT 4KnpsCAJ2kSWEbNCRH8sk7d7LJe3s
<li>99,435 views</li> pwUvbUHidvDFq6w
kYsOr Am47z8f15idlPuJZqC
semsFl6m <li> views</li>
HKlxbT5oiXaeL7 OOMZDOScFHKEh7lJJTap0JLbfocrIrjSJDA8pL2x12A vRjaUeosb87 8k8sEzWc3 tTuLQBH
jBZJsuB94P8mXcH2bsubbPXd9 MLT hZT7Ax4G2nhJ c0QJMR85uNCc3le9
8fsNv8 <li>65,435,22 views</li> bDhd4H6
9Lq <li>1165,435 views</li>
48U1kwbCItMu4 3owOWJiS4qhQK9aturB 9Yv84vIPHehkcoo6T jq s uL0w8cFKfAUPzvp7QsN0mFszxXxpU4JY0 Vi
jo9cNvJ
qBWbe9pbssp87CVK65awWv6BhHWE8KFgRoi81o3 nkmue6
Nyo2 <li>65,435 vie</li> On
<li>views</li>
9uJuCONsUzxHAeL4numvbi810a
yUa
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Maxima-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Load the environment variables
source /usr/local/module/spartan_old.sh
module load Maxima/5.41.0-GCC-6.2.0
module load gnuplot/5.0.0-GCC-6.2.0
# Run a few basic maxima examples
maxima -r "batchload(\"Maxima-test.mac\");"
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Maxima-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Load the environment variables
module purge
module load maxima/5.41.0
module load gnuplot/5.2.8
# Run a few basic Maxima examples
maxima -r "batchload(\"Maxima-test.mac\");"
This diff is collapsed.
/* Produces a multiplication table. Just like primary school! */
for y:1 thru 12 do
(s:"",
for x:1 thru 12 do
s:sconcat(s,printf(false,"~4d",x*y)),
print(s));
/* Lists can be treated as explicit vectors. * /
/* Operations can be performed on lists in association with other lists. */
vectorsym: [a, b, c, d];
vectornum: [1, 2, 3, 4];
vectorsym + vectornum;
vectorsym - vectornum;
vectorsym * vectornum;
vectorsym / vectornum;
/* Produces a pretty sine wave */
/* Change directory output to something you use */
plot2d (sin(x), [x, 0, 10], [png_file, "~/Maxima/mysinplot.png"]);
/* Produce a fractal Sierpinksi triangle. */
/* Change directory output to something you use */
load(fractals)$
n: 10000$
plot2d ([discrete,sierpinskiale(n)], [style,dots], [png_file, "~/Maxima/triangle.png"]);
/* Note that 0.1 does not equal 1/10th * /
/* The latter real has a repeating binary representation to the precision of the system. */
/* This is what happens when you put pure mathematics into a physical system! */
rationalize (sin (0.1*x + 5.6));
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=ABySS-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load multiqc/1.9-python-3.7.4
# Run multiqc against reports from FastQC (qv)
multiqc SRR957824_500K_R1_fastqc.zip
Basic usage
multiqc [OPTIONS] <analysis directory>
The following from: https://multiqc.info/docs/
MultiQC is a reporting tool that parses summary statistics from results and log files generated by other bioinformatics tools.
MultiQC doesn't run other tools for you - it's designed to be placed at the end of analysis pipelines or to be run manually when
you've finished running your tools.
When you launch MultiQC, it recursively searches through any provided file paths and finds files that it recognises. It parses
relevant information from these and generates a single stand-alone HTML report file. It also saves a directory of data files with
all parsed data for further downstream use.
Example commands:
multiqc .
MultiQC will scan the specified directories and produce a report based on details found in any log files that it recognises. The
following should be self-evident.
multiqc data/sample_1*
multiqc . --ignore run_two/
multiqc --file-list my_file_list.txt
By default, multiqc creates html files. If you want a pdf load the pandoc module and run with the pdf option.
module load pandoc/2.5
multiqc --pdf --file-list my_file_list.txt
Like similar software, Octave has a lot of community generated extensions (sometimes called "libraries" in R, or packages in Python,
or modules in Perl etc).
These can be viewed at the following URL
https://octave.sourceforge.io/packages.php
To install a package in your home directory, please follow these steps.
First, start an interactive job, load the appropriate modues, and copy the desired packages to your prefered directory. Note the use
of the -O option in wget so you don't end up with a horrible filename like `download.php?package=statistics-1.4.2.tar.gz`.
```
$ sinteractive --partition=snowy
$ module load octave/4.2.1
$ module load web_proxy
$ wget -O statistics-1.4.2.tar.gz https://octave.sourceforge.io/download.php?package=statistics-1.4.2.tar.gz
$ wget -O io-2.6.3.tar.gz https://octave.sourceforge.io/download.php?package=io-2.6.3.tar.gz
```
Then load octave, and from the octave CLI install the packages. This particular example is chosen as it includes a popular package
(statistics) and a dependency (io). The `pkg describe` command command will show a description of the package and dependencies.
```
$ octave
octave:1> pkg install io-2.6.3.tar.gz
octave:2> pkg install
octave:3> pkg list
Package Name | Version | Installation directory
--------------+---------+-----------------------
io | 2.6.3 | /home/lev/octave/io-2.6.3
statistics | 1.4.2 | /home/lev/octave/statistics-1.4.2
octave:4> pkg describe -verbose io
```
See more options for installing packages at the following URL:
https://octave.org/doc/v4.4.0/Installing-and-Removing-Packages.html
This diff is collapsed.
From: Lev Lafayette, Sequential and Parallel Programming with C and Fortran, VPAC, 2015. https://github.com/VPAC/seqpar
The most brief example derived from Erik Smistad is used here. It is a simple vector addition and the source code is available in the OpenCl
sub-directory in the Resources directory. The example is a simple vector addition; two equal-sized lists of numbers (say, A and B) and added
together to generate a list of, equal-sized, C. In serial code this can be achieved by a loop:
```
for(int i = 0; i < LIST_SIZE; i++) {
C[i] = A[i] + B[i];
}
```
Whilst the loop itself is simple, there is an time dependency which cannot be resolved in a serial manner. In parallel the vector additions
can occur simultaneously. In OpenCL a kernel is required that will run on the compute device. This is very much like the CUDA kernel, but is
somewhat more complex in OpenCL. The following is just the kernel with a vector addition function.
```
__kernel void vector_add(__global const int *A, __global const int *B, __global int *C) {
// Get the index of the current element to be processed
int i = get_global_id(0);
// Do the operation
C[i] = A[i] + B[i];
}
```
The OpenCL API is defined in the `cl.h` header file. In the program system information is embodied and the devices to use in execution in the
host program as follows:
```
// Get platform and device information
cl_platform_id platform_id = NULL;
cl_device_id device_id = NULL;
cl_uint ret_num_devices;
cl_uint ret_num_platforms;
cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
ret = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_DEFAULT, 1,
&device_id, &ret_num_devices);
```
One will also find the OpenCL context and a command queue.
```
// Create an OpenCL context
cl_context context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &ret);
// Create a command queue
cl_command_queue command_queue = clCreateCommandQueue(context, device_id, 0, &ret);
```
Like in CUDA, memory needs to be transferred in a device. In OpenCL this is done by creating memory buffer objects.
```
// Create memory buffers on the device for each vector
cl_mem a_mem_obj = clCreateBuffer(context, CL_MEM_READ_ONLY,
LIST_SIZE * sizeof(int), NULL, &ret);
cl_mem b_mem_obj = clCreateBuffer(context, CL_MEM_READ_ONLY,
LIST_SIZE * sizeof(int), NULL, &ret);
cl_mem c_mem_obj = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
LIST_SIZE * sizeof(int), NULL, &ret);
// Copy the lists A and B to their respective memory buffers
ret = clEnqueueWriteBuffer(command_queue, a_mem_obj, CL_TRUE, 0,
LIST_SIZE * sizeof(int), A, 0, NULL, NULL);
ret = clEnqueueWriteBuffer(command_queue, b_mem_obj, CL_TRUE, 0,
LIST_SIZE * sizeof(int), B, 0, NULL, NULL);
```
OpenCL then needs a program object to be created and compiled.
```
// Create a program from the kernel source
cl_program program = clCreateProgramWithSource(context, 1,
(const char **)&source_str, (const size_t *)&source_size, &ret);
// Build the program
ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
```
Kernel arguments can then be created, set, and executed.
```
// Create the OpenCL kernel
cl_kernel kernel = clCreateKernel(program, "vector_add", &ret);
// Set the arguments of the kernel
ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&a_mem_obj);
ret = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&b_mem_obj);
ret = clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&c_mem_obj);
// Execute the OpenCL kernel on the list
size_t global_item_size = LIST_SIZE; // Process the entire lists
size_t local_item_size = 64; // Divide work items into groups of 64
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,
&global_item_size, &local_item_size, 0, NULL, NULL);
```
The memory buffer is read from the device to the local variable, and the results displayed.
```
// Read the memory buffer C on the device to the local variable C
int *C = (int*)malloc(sizeof(int)*LIST_SIZE);
ret = clEnqueueReadBuffer(command_queue, c_mem_obj, CL_TRUE, 0,
LIST_SIZE * sizeof(int), C, 0, NULL, NULL);
// Display the result to the screen
for(i = 0; i < LIST_SIZE; i++)
printf("%d + %d = %d\n", A[i], B[i], C[i]);
```
Cleaning up is required in OpenCL
```
// Clean up
ret = clFlush(command_queue);
ret = clFinish(command_queue);
ret = clReleaseKernel(kernel);
ret = clReleaseProgram(program);
ret = clReleaseMemObject(a_mem_obj);
ret = clReleaseMemObject(b_mem_obj);
ret = clReleaseMemObject(c_mem_obj);
ret = clReleaseCommandQueue(command_queue);
ret = clReleaseContext(context);
free(A);
free(B);
free(C);
```
To make OpenCL run the kernel on the GPU the constant CL_DEVICE_TYPE_DEFAULT should be changed to CL_DEVICE_TYPE_GPU in line 43. To force it
to run on CPU set it to CL_DEVICE_TYPE_CPU.
To compile the program use the following. An implementation of OpenCL will be required for the appropriate device. For example, NVIDIA, AMD
have implementations of OpenCL for their GPUs.
`gcc -l OpenCL vector.c -o vector`
File added
#include <stdio.h>
#include <stdlib.h>
#ifdef __APPLE__
#include <OpenCL/opencl.h>
#else
#include <CL/cl.h>
#endif
#define MAX_SOURCE_SIZE (0x100000)
int main(void) {
// Create the two input vectors
int i;
const int LIST_SIZE = 1024;
int *A = (int*)malloc(sizeof(int)*LIST_SIZE);
int *B = (int*)malloc(sizeof(int)*LIST_SIZE);
for(i = 0; i < LIST_SIZE; i++) {
A[i] = i;
B[i] = LIST_SIZE - i;
}
// Load the kernel source code into the array source_str
FILE *fp;
char *source_str;
size_t source_size;
fp = fopen("vector_add_kernel.cl", "r");
if (!fp) {
fprintf(stderr, "Failed to load kernel.\n");
exit(1);
}
source_str = (char*)malloc(MAX_SOURCE_SIZE);
source_size = fread( source_str, 1, MAX_SOURCE_SIZE, fp);
fclose( fp );
// Get platform and device information
cl_platform_id platform_id = NULL;
cl_device_id device_id = NULL;
cl_uint ret_num_devices;
cl_uint ret_num_platforms;
cl_int ret = clGetPlatformIDs(1, &platform_id, &ret_num_platforms);
ret = clGetDeviceIDs( platform_id, CL_DEVICE_TYPE_DEFAULT, 1,
&device_id, &ret_num_devices);
// Create an OpenCL context
cl_context context = clCreateContext( NULL, 1, &device_id, NULL, NULL, &ret);
// Create a command queue
cl_command_queue command_queue = clCreateCommandQueue(context, device_id, 0, &ret);
// Create memory buffers on the device for each vector
cl_mem a_mem_obj = clCreateBuffer(context, CL_MEM_READ_ONLY,
LIST_SIZE * sizeof(int), NULL, &ret);
cl_mem b_mem_obj = clCreateBuffer(context, CL_MEM_READ_ONLY,
LIST_SIZE * sizeof(int), NULL, &ret);
cl_mem c_mem_obj = clCreateBuffer(context, CL_MEM_WRITE_ONLY,
LIST_SIZE * sizeof(int), NULL, &ret);
// Copy the lists A and B to their respective memory buffers
ret = clEnqueueWriteBuffer(command_queue, a_mem_obj, CL_TRUE, 0,
LIST_SIZE * sizeof(int), A, 0, NULL, NULL);
ret = clEnqueueWriteBuffer(command_queue, b_mem_obj, CL_TRUE, 0,
LIST_SIZE * sizeof(int), B, 0, NULL, NULL);
// Create a program from the kernel source
cl_program program = clCreateProgramWithSource(context, 1,
(const char **)&source_str, (const size_t *)&source_size, &ret);
// Build the program
ret = clBuildProgram(program, 1, &device_id, NULL, NULL, NULL);
// Create the OpenCL kernel
cl_kernel kernel = clCreateKernel(program, "vector_add", &ret);
// Set the arguments of the kernel
ret = clSetKernelArg(kernel, 0, sizeof(cl_mem), (void *)&a_mem_obj);
ret = clSetKernelArg(kernel, 1, sizeof(cl_mem), (void *)&b_mem_obj);
ret = clSetKernelArg(kernel, 2, sizeof(cl_mem), (void *)&c_mem_obj);
// Execute the OpenCL kernel on the list
size_t global_item_size = LIST_SIZE; // Process the entire lists
size_t local_item_size = 64; // Divide work items into groups of 64
ret = clEnqueueNDRangeKernel(command_queue, kernel, 1, NULL,
&global_item_size, &local_item_size, 0, NULL, NULL);
// Read the memory buffer C on the device to the local variable C
int *C = (int*)malloc(sizeof(int)*LIST_SIZE);
ret = clEnqueueReadBuffer(command_queue, c_mem_obj, CL_TRUE, 0,
LIST_SIZE * sizeof(int), C, 0, NULL, NULL);
// Display the result to the screen
for(i = 0; i < LIST_SIZE; i++)
printf("%d + %d = %d\n", A[i], B[i], C[i]);
// Clean up
ret = clFlush(command_queue);
ret = clFinish(command_queue);
ret = clReleaseKernel(kernel);
ret = clReleaseProgram(program);
ret = clReleaseMemObject(a_mem_obj);
ret = clReleaseMemObject(b_mem_obj);
ret = clReleaseMemObject(c_mem_obj);
ret = clReleaseCommandQueue(command_queue);
ret = clReleaseContext(context);
free(A);
free(B);
free(C);
return 0;
}
__kernel void vector_add(__global const int *A, __global const int *B, __global int *C) {
// Get the index of the current element to be processed
int i = get_global_id(0);
// Do the operation
C[i] = A[i] + B[i];
}
\ No newline at end of file
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Pilon-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load bowtie2/2.3.5.1
module load samtools/1.9
module load pilon/1.23-java-11.0.2
# The m_genitalium.fasta file is from final.contigs.fa generated in the MEGAHIT example.
# Before running Pilon itself, we have to align our reads against the assembly
bowtie2-build m_genitalium.fasta m_genitalium
bowtie2 -x m_genitalium -1 ERR486840_1.fastq.gz -2 ERR486840_2.fastq.gz | samtools view -bS -o m_genitalium.bam
samtools sort m_genitalium.bam -o m_genitalium.sorted.bam
samtools index m_genitalium.sorted.bam
# Then run Pilon to correct eventual mismatches in the assembly and write the new improved assembly to m_genitalium_improved.fasta
pilon --genome m_genitalium.fasta --frags m_genitalium.sorted.bam --output m_genitalium_improved
Content derived from the Swedish University of Agricultural Sciences
Pilon is a software tool which can be used to automatically improve draft assemblies. It attempts to make improvements to the input
genome, including:
Single base differences
Small Indels
Larger Indels or block substitution events
Gap filling
Identification of local misassemblies, including optional opening of new gaps
Before running Pilon itself, we have to align our reads against the assembly
bowtie2-build m_genitalium.fasta m_genitalium
bowtie2 -x m_genitalium -1 ERR486840_1.fastq.gz -2 ERR486840_2.fastq.gz | \
samtools view -bS -o m_genitalium.bam
samtools sort m_genitalium.bam -o m_genitalium.sorted.bam
samtools index m_genitalium.sorted.bam
then we run Pilon
pilon --genome m_genitalium.fasta --frags m_genitalium.sorted.bam --output m_genitalium_improved
which will correct eventual mismatches in our assembly and write the new improved assembly to m_genitalium_improved.fasta
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Prokka-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
prokka/1.14.5
# Using assembly of Mycoplasma genitalium and a proteins set specific of Mycoplasma for the annotation.
# m_genetalium_improved.fasta and uniprot_mycoplasma_reviewed.faa , respectively.
# Prokka finds and annotates features (both protein coding regions and RNA genes, i.e. tRNA, rRNA) present on on a sequence.
prokka --outdir annotation --kingdom Bacteria --proteins uniprot_mycoplasma_reviewed.faa m_genetalium_improved.fasta
Derived from content by the Swedish University of Agricultural Sciences
Once the Prokka job has finished, examine each of its output files.
The GFF and GBK files contain all of the information about the features annotated (in different formats.)
The .txt file contains a summary of the number of features annotated.
The .faa file contains the protein sequences of the genes annotated.
The .ffn file contains the nucleotide sequences of the genes annotated.
The annotation can be visualised with Artemis (module act).
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=QUAST-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load quast/5.0.2-python-3.7.4
# Evaluate the quality of the genome assembly from MEGAHIT
# The m_genitalium.fasta file is from final.contigs.fa generated in the MEGAHIT example.
quast.py m_genitalium.fasta -o m_genitalium_report
# Check the report with `less m_genitalium_report/report.txt`
Content derived from the Swedish University of Agricultural Sciences
QUAST is a software evaluating the quality of genome assemblies by computing various metrics.
The file `m_genitalium.fasta` is a copy of `m_genitalium.fasta` which is produced by the de novo assembly in the MEGAHIT example.
The quality report will be in m_genitalium_report in various formats.
......@@ -207,14 +207,17 @@ pow(3,5)
* Functions often carry out some processing and return the result, using the return() function. If there are no explicit return from
a function, the value of the last evaluated expression is returned. Multiple returns can be returned with other Obects.
```
multi_return <- function() {electoratedata <- list("Party" = "LIB", "votes" = 19878, "electorate" = "Kew", "Year"=2010); return(electoratedata)}
``` multi_return <- function() {electoratedata <- list("Party" = "LIB", "votes" = 19878, "electorate" = "Kew", "Year"=2010);
return(electoratedata)}
myelectorate <- multi_return()
myelectorate$Party
```
* Paying attention to environment and scope is necessary. An _environment_ can be considered as a collection of Objects (functions,
variables etc) which can be shown by `ls()` function. A global environment is the default. Arguments inside a function are _not_
part of the global environment. They are part of a new environment is created inside the global environment, and another environment
can be created inside that.
* Paying attention to environment and scope is necessary. An _environment_ can be considered as a collection of Objects (functions, variables etc) which can be shown by `ls()` function. A global environment is the default. Arguments inside a function are _not_ part of the global environment. They are part of a new environment is created inside the global environment, and another environment can be created inside that.
```
KewLibVotes2010 <- 19878
......@@ -235,3 +238,93 @@ print(ls())
}
SwingTPP(1.6)
```
## Probability
* R has functions for frequency, mean
Load the library MASS, and display the dataframe "painters". Qualitative data set is listed under schools (A, B, C etc), and can be
displayed by assigning a variable "school", and applying the table function, or in column format. Relative frequency can be
determined by the nrow function. The digits option can round for readability.
> library(MASS) # load the MASS package
> painters # Load the dataframe
> help(painters) # View what "school"means
> school = painters$School # The painter schools
> school.freq = table(school) # Apply the table function
> school.freq # Display the frequency
> cbind(school.freq) # Apply and display as a column
> school.relfreq = school.freq / nrow(painters)
> school.relfreq
> old = options(digits=1) # Make it more readable.
> school.relfreq
> options(old)
To find out the mean composition score of school C in the data set painters, create a logical vector for the school. Create a child
data set by taking a row slice from the dataframe. Use the mean function to determine the composition mean of group C. Use tapply
function to find the mean for composition across all schools.
> c_school = school == "C"
> c_painters = painters[c_school, ]
> mean(c_painters$Composition)
> tapply(painters$Composition, painters$School, mean)
For quantitative data, use the built-in dataframe faithful. To build a frequency distribution, select the duration of eruptions and
use the range function to determine the range of durations. Break the range into non-overlapping sub-intervals by defining a
sequence of equal distance break points. Classify the eruption durations according to the half-unit-length sub-intervals with cut.
Compute the frequency of eruptions in each sub-interval with the table function. The nrow function can be used to determine relative
frequency. The cumulative frequency can be determined with the cumsum function, and the cumulative relative frequency distribution
with the cumrelfreq and nrow.
> head(faithful)
> duration = faithful$eruptions
> range(duration)
> breaks = seq(1.5, 5.5, by=0.5)
> duration.cut = cut(duration, breaks, right=FALSE) # Intervals are closed on the left, open on the right.
> duration.freq = table(duration.cut)
> duration.freq
> duration.relfreq = duration.freq / nrow(faithful)
> duration.cumfreq = cumsum(duration.freq)
> duration.cumrelfreq = duration.cumfreq / nrow(faithful)
Using the same dataset, the mean, median, quartile eruption durations can be established with the appropriate functions. A
percentile can be determined from with percentage ratios within the quartile function (32%, 57%, 98% in this example). The range can
be determined by subtracting the minimum from the maxiumum, using the appropriate functions, and likewise the interquartile range.
Variance too, has a function, as does standard deviation, as does covariance. In this dataset, one can test the covariance of
eruption duration and waiting time in the data set to determine if there is a linear relationship which can be strengthened with a
correlation coefficient function. A test the third central moment of eruption duration will use the moment function, skewness with
that function, and the excess kurtosis to determine the shape of the tail.
> duration = faithful$eruptions
> mean(duration)
> median(duration)
> quantile(duration)
> quantile(duration, c(.32, .57, .98))
> max(duration) − min(duration)
> IQR(duration)
> var(duration)
> sd(duration)
> waiting = faithful$waiting
> cov(duration, waiting)
> cor(duration, waiting)
> moment(duration, order=3, center=TRUE)
> skewness(duration)
> kurtosis(duration)
R has four in-built functions to generate binomial distribution; dbinom(x, size, prob), pbinom(x, size, prob), qbinom(p, size, prob)
and rbinom(n, size, prob). In these descriptions, x is a vector of numbers, p a vector of probabilities, n is the number of
observations, size is the number of trials and prob is the probability of success of each trial.
The dbinom() gives the probability density distribution at each point, the pbinom() function gives the cumulative probability of an
event. The qbinom() function takes the probability value and gives a number whose cumulative value matches the probability value,
and the rbinom() This function generates required number of random values of given probability from a given sample.
> x <- seq(0,50,by = 1) # Create a sample of 50 numbers which are incremented by 1.
> y <- dbinom(x,50,0.5) # Create the binomial distribution.
> x <- pbinom(26,51,0.5) # Probability of getting 26 or less heads from a 51 tosses of a coin.
> print x
> x <- qbinom(0.25,51,1/2) # How many heads will have a probability of 0.25 will come out when a coin is tossed 51 times.
> print(x)
> x <- rbinom(8,150,.4) # Find 8 random values from a sample of 150 with probability of 0.4.
> print(x)
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Scythe-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load scythe/0.994
module load web_proxy
# Download the adapter
curl -O -J -L https://osf.io/v24pt/download
scythe -a adapters.fasta SRR957824_500K_R1.fastq.gz -o SRR957824_adapt_R1.fastq
scythe -a adapters.fasta SRR957824_500K_R2.fastq.gz -o SRR957824_adapt_R2.fastq
Derived from content from the University of Agricultural Sciences, Sweden
Scythe uses a Naive Bayesian approach to classify contaminant substrings in sequence reads. It considers quality information, which
can make it robust in picking out 3'-end adapters, which often include poor quality bases.
The sample job submission script takes a selection of paired reads from enterohaemorrhagic E. coli (EHEC) of the serotype O157, a
potentially fatal gastrointestinal pathogen. The sequenced bacterium was part of an outbreak investigation in the St. Louis area,
USA in 2011.
The script has two components; first download the adapter. The run scythe on both of the paired read files.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Parsnp was designed to align the core genome of hundreds to thousands of bacterial genomes within a few minutes to few hours. It is multi-threaded, too!
The following is a small MERS coronavirus job.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment