Commit 737b9070 authored by root's avatar root

Update for 3/2/2021

parent fe028630
...@@ -7,6 +7,7 @@ Delft3D/* ...@@ -7,6 +7,7 @@ Delft3D/*
!Delft3D/run_all_examples.sh !Delft3D/run_all_examples.sh
!Delft3D/run_all_examples_sp.sh !Delft3D/run_all_examples_sp.sh
!Delft3D/sed_in_file.tcl !Delft3D/sed_in_file.tcl
eazy-photoz/inputs
FreeSurfer/buckner_data FreeSurfer/buckner_data
FSL/intro FSL/intro
FSL/fmri FSL/fmri
...@@ -20,6 +21,7 @@ Trimmomatic/.backup ...@@ -20,6 +21,7 @@ Trimmomatic/.backup
*.fastq *.fastq
*.fastq.gz *.fastq.gz
*.fasta *.fasta
*.fasta.gz
*.faa *.faa
*.tar *.tar
*.tar.gz *.tar.gz
...@@ -40,3 +42,4 @@ Trimmomatic/.backup ...@@ -40,3 +42,4 @@ Trimmomatic/.backup
*.JPEG *.JPEG
*.odb *.odb
*.csv *.csv
*.data
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Barrnap-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load barrnap/0.9
# Search the draft genome for rRNA genes.
barrnap -o bin2_rrna.fa bin.2.fa
# Check the file bin2_rrna.fa for results
Example derived from the Swedish University of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
Barrnap (BAsic Rapid Ribosomal RNA Predictor) predicts the location of ribosomal RNA genes in genomes.
A draft genome is provided. A search is conducted on the genome for rRNA genes. Since these genes are usually quite conserved across
species/genera, it could give us a broad idea of the organism.
The bin, bin.2.fa, is also used for the diamond example.
[barrnap] This is barrnap 0.9
[barrnap] Written by Torsten Seemann
[barrnap] Obtained from https://github.com/tseemann/barrnap
[barrnap] Detected operating system: linux
[barrnap] Adding /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/barrnap/0.9/bin/../binaries/linux to end of PATH
[barrnap] Checking for dependencies:
[barrnap] Found nhmmer - /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/hmmer/3.2.1/bin/nhmmer
[barrnap] Found bedtools - /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/bedtools/2.27.1/bin/bedtools
[barrnap] Will use 1 threads
[barrnap] Setting evalue cutoff to 1e-06
[barrnap] Will tag genes < 0.8 of expected length.
[barrnap] Will reject genes < 0.25 of expected length.
[barrnap] Using database: /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/barrnap/0.9/bin/../db/bac.hmm
[barrnap] Scanning bin.2.fa for bac rRNA genes... please wait
[barrnap] Command: nhmmer --cpu 1 -E 1e-06 --w_length 3878 -o /dev/null --tblout /dev/stdout '/usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/barrnap/0.9/bin/../db/bac.hmm' 'bin.2.fa'
[barrnap] Rejecting short 68 nt predicted 16S_rRNA. Adjust via --reject option.
[barrnap] Rejecting short 205 nt predicted 23S_rRNA. Adjust via --reject option.
[barrnap] Found: 5S_rRNA k141_4428 L=110/119 240173..240282 - 5S ribosomal RNA
[barrnap] Found 1 ribosomal RNA features.
[barrnap] Sorting features and outputting GFF3...
[barrnap] Writing hit sequences to: bin2_rrna.fa
[barrnap] Running: bedtools getfasta -s -name+ -fo 'bin2_rrna.fa' -fi 'bin.2.fa' -bed '/tmp/AZo86tmMHj'
##gff-version 3
k141_4428 barrnap:0.9 rRNA 240173 240282 4.8e-15 - . Name=5S_rRNA;product=5S ribosomal RNA
index file bin.2.fa.fai not found, generating...
[barrnap] Done.
Derived from the Swedish Univeristy of Agricultural Sciences Derived from the Swedish Univeristy of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
This file contains the sequence of the pO157 plasmid from the Sakai outbreak strain of E. coli O157. This file contains the sequence of the pO157 plasmid from the Sakai outbreak strain of E. coli O157.
Available from: curl -O -J -L https://osf.io/rnzbe/download Available from: curl -O -J -L https://osf.io/rnzbe/download
......
...@@ -2,6 +2,8 @@ NOTA BENE: TUTORIAL IS INCOMPLETE. RUNS BUT WITH ERRORS. ...@@ -2,6 +2,8 @@ NOTA BENE: TUTORIAL IS INCOMPLETE. RUNS BUT WITH ERRORS.
Sone content derived from the Swedish Univeristy of Agricultural Sciences Sone content derived from the Swedish Univeristy of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
Busco (Benchmark Universal Single Copy Orthologs) can be used to to find marker genes in a assembly. Marker genes are conserved Busco (Benchmark Universal Single Copy Orthologs) can be used to to find marker genes in a assembly. Marker genes are conserved
across a range of species and finding intact conserved genes in the assembly would be a good indication of its quality. across a range of species and finding intact conserved genes in the assembly would be a good indication of its quality.
......
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Diamond-test.slurm
# Run with four threads
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load diamond/0.9.30
# Run diamond
diamond makedb --in uniprot_sprot.fasta.gz --db uniprot_sprot -p 4
diamond blastx -p 4 -q bin.2.fa -f 6 -d uniprot_sprot.dmnd -o bin2_diamond.txt
Diamond is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST
software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences
including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.
Use diamond against the swissprot database for quickly assigning taxonomy to our contigs.
It is very possible the swissprot database is too small for finding meaningful hits for undersequenced / poorly known organisms.
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Fastp-test.slurm
# Run this will multiple cores
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load fastqc/0.11.9-java-11.0.2
module load fastp/0.20.0
module load bowtie2/2.3.5.1
# Check read quality
cd dolphin/results/
fastqc Dol1_*.fastq.gz
# Removing the adapters and trim by quality with Fastp
fastp -i Dol1_S19_L001_R1_001.fastq.gz -o Dol1_trimmed_R1.fastq \
-I Dol1_S19_L001_R2_001.fastq.gz -O Dol1_trimmed_R2.fastq \
--detect_adapter_for_pe --length_required 30 \
--cut_front --cut_tail --cut_mean_quality 10
# View the html report produced.
# Use precomputed the index files. Extract the bowtie indexes of the dolphin genome into the results directory
tar -xzvf host_genome.tar.gz
# Map sequencing reads on the dolphin genome. How many reads mapped on the dolphin genome?
bowtie2 -x host_genome/Tursiops_truncatus \
-1 Dol1_trimmed_R1.fastq -2 Dol1_trimmed_R2.fastq \
-S dol_map.sam --un-conc Dol_reads_unmapped.fastq --threads 4
Derived from content by the Swedish University of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
This example investigates metagenomics data and retrieve draft genome from an assembled metagenome.
It uses a dataset published in 2017 in a study in dolphins, where fecal samples where prepared for viral metagenomics study. The
dolphin had a self-limiting gastroenteritis of suspected viral origin.
Use FastQC to check the quality of the data, as well as fastp for trimming the bad quality part of the reads.
Bowtie2 is used for removing the host sequences by mapping/aligning on the dolphin genome.
Started analysis of Dol1_S19_L001_R1_001.fastq.gz
Approx 5% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 10% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 15% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 20% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 25% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 30% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 35% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 40% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 45% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 50% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 55% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 60% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 65% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 70% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 75% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 80% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 85% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 90% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 95% complete for Dol1_S19_L001_R1_001.fastq.gz
Analysis complete for Dol1_S19_L001_R1_001.fastq.gz
Started analysis of Dol1_S19_L001_R2_001.fastq.gz
Approx 5% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 10% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 15% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 20% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 25% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 30% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 35% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 40% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 45% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 50% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 55% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 60% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 65% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 70% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 75% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 80% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 85% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 90% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 95% complete for Dol1_S19_L001_R2_001.fastq.gz
Analysis complete for Dol1_S19_L001_R2_001.fastq.gz
Detecting adapter sequence for read1...
No adapter detected for read1
Detecting adapter sequence for read2...
No adapter detected for read2
Read1 before filtering:
total reads: 257617
total bases: 68087857
Q20 bases: 64868031(95.2711%)
Q30 bases: 59552985(87.4649%)
Read2 before filtering:
total reads: 257617
total bases: 68349702
Q20 bases: 62509996(91.4561%)
Q30 bases: 55246319(80.8289%)
Read1 after filtering:
total reads: 256037
total bases: 67566252
Q20 bases: 64538483(95.5188%)
Q30 bases: 59320846(87.7966%)
Read2 aftering filtering:
total reads: 256037
total bases: 67448539
Q20 bases: 62176728(92.184%)
Q30 bases: 55096219(81.6863%)
Filtering result:
reads passed filter: 512074
reads failed due to low quality: 3006
reads failed due to too many N: 0
reads failed due to too short: 154
reads with adapter trimmed: 27072
bases trimmed due to adapters: 472337
Duplication rate: 8.51956%
Insert size peak (evaluated by paired-end reads): 240
JSON report: fastp.json
HTML report: fastp.html
fastp -i Dol1_S19_L001_R1_001.fastq.gz -o Dol1_trimmed_R1.fastq -I Dol1_S19_L001_R2_001.fastq.gz -O Dol1_trimmed_R2.fastq --detect_adapter_for_pe --length_required 30 --cut_front --cut_tail --cut_mean_quality 10
fastp v0.20.0, time used: 17 seconds
host_genome/
host_genome/Tursiops_truncatus.rev.1.bt2
host_genome/Tursiops_truncatus.1.bt2
host_genome/Tursiops_truncatus.rev.2.bt2
host_genome/Tursiops_truncatus.3.bt2
host_genome/Tursiops_truncatus.2.bt2
host_genome/Tursiops_truncatus.4.bt2
256037 reads; of these:
256037 (100.00%) were paired; of these:
255804 (99.91%) aligned concordantly 0 times
151 (0.06%) aligned concordantly exactly 1 time
82 (0.03%) aligned concordantly >1 times
----
255804 pairs aligned concordantly 0 times; of these:
2 (0.00%) aligned discordantly 1 time
----
255802 pairs aligned 0 times concordantly or discordantly; of these:
511604 mates make up the pairs; of these:
511598 (100.00%) aligned 0 times
5 (0.00%) aligned exactly 1 time
1 (0.00%) aligned >1 times
0.09% overall alignment rate
Derived from a tutorial from the Swedish Univeristy of Agricultural Sciences. Derived from a tutorial from the Swedish Univeristy of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
Kraken is a system for assigning taxonomic labels to short DNA sequences (i.e. reads) Kraken aims to achieve high sensitivity and Kraken is a system for assigning taxonomic labels to short DNA sequences (i.e. reads) Kraken aims to achieve high sensitivity and
high speed by utilizing exact alignments of k-mers and a novel classification algorithm. high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
......
...@@ -2,6 +2,8 @@ ...@@ -2,6 +2,8 @@
Derived from content from the Swedish University of Agricultural Sciences Derived from content from the Swedish University of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
## Data collection ## Data collection
M. genitalium was sequenced using the MiSeq platform (2 * 150bp). The reads were deposited in the ENA Short Read Archive under the M. genitalium was sequenced using the MiSeq platform (2 * 150bp). The reads were deposited in the ENA Short Read Archive under the
......
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Metagenome-test.slurm
# Run multicore
#SBATCH --ntasks-per-node=4
# Set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:45:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load fastqc/0.11.9-java-11.0.2
module load sickle/1.33
module load megahit/1.1.4-python-3.7.4
module load bowtie2/2.3.5.1
module load samtools/1.9
module load bcftools/1.9
module load metabat/2.12.1-python-2.7.16
module load checkm/1.1.2-python-3.7.4
# Use FastQC to check the quality of our data
cd results/
ln -s ../data/tara_reads_* .
fastqc tara_reads_*.fastq.gz
# Trim the reads using sickle
sickle pe -f tara_reads_R1.fastq.gz -r tara_reads_R2.fastq.gz -t sanger \
-o tara_trimmed_R1.fastq -p tara_trimmed_R2.fastq -s /dev/null
# Assemble with MEGAHIT
megahit -1 tara_trimmed_R1.fastq -2 tara_trimmed_R2.fastq -o tara_assembly
# Map the reads back against the assembly to get coverage information
ln -s tara_assembly/final.contigs.fa .
bowtie2-build final.contigs.fa final.contigs
bowtie2 -x final.contigs -1 tara_reads_R1.fastq.gz -2 tara_reads_R2.fastq.gz | \
samtools view -bS -o tara_to_sort.bam
samtools sort tara_to_sort.bam -o tara.bam
samtools index tara.bam
# Get the bins from metabat
runMetaBat.sh -m 1500 final.contigs.fa tara.bam
mv final.contigs.fa.metabat-bins1500 metabat
# Check the quality of the bins
checkm lineage_wf -x fa metabat checkm/
## Below has been deprecated ##
# Plot the completeness
# checkm bin_qa_plot -x fa checkm metabat plots
# Take a look at plots/bin_qa_plot.png
Material derived from the Swedish University of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
In this example several applications are used. The exmple involves how to inspect and assemble metagenomic data and retrieve draft
genomes from assembled metagenomes.
The dataset consistes of 20 bacteria selected from the Tara Ocean study that recovered 957 distinct Metagenome-assembled-genomes (or
MAGs) that were previsouly unknown.
http://ocean-microbiome.embl.de/companion.html
This diff is collapsed.
Content derived from the Swedish University of Agricultural Sciences Content derived from the Swedish University of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
Pilon is a software tool which can be used to automatically improve draft assemblies. It attempts to make improvements to the input Pilon is a software tool which can be used to automatically improve draft assemblies. It attempts to make improvements to the input
genome, including: genome, including:
......
Derived from content by the Swedish University of Agricultural Sciences Derived from content by the Swedish University of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
Once the Prokka job has finished, examine each of its output files. Once the Prokka job has finished, examine each of its output files.
The GFF and GBK files contain all of the information about the features annotated (in different formats.) The GFF and GBK files contain all of the information about the features annotated (in different formats.)
......
Content derived from the Swedish University of Agricultural Sciences Content derived from the Swedish University of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
QUAST is a software evaluating the quality of genome assemblies by computing various metrics. QUAST is a software evaluating the quality of genome assemblies by computing various metrics.
The file `m_genitalium.fasta` is a copy of `m_genitalium.fasta` which is produced by the de novo assembly in the MEGAHIT example. The file `m_genitalium.fasta` is a copy of `m_genitalium.fasta` which is produced by the de novo assembly in the MEGAHIT example.
......
# This folder contains files for a sample, single-processor R job, which reads in the file tutorial.R, which itself calls the datasets w1.dat and trees91.csv. It is for leaf biomass of a pine tree in a high CO2 environment. With the `vanilla` option in the R command, it will create a PDF of the output. The job can be submitted with: # R README file. Last update LL20210127
# Basic Job Submission
This folder contains files for a sample, single-processor R job, which reads in the file tutorial.R, which itself calls the
datasets w1.dat and trees91.csv. It is for leaf biomass of a pine tree in a high CO2 environment. With the `vanilla` option in the
R command, it will create a PDF of the output. The job can be submitted with:
sbatch Rsample.slurm sbatch Rsample.slurm
# There is also a R-parallel script. Note this script does not work (yet). The library Rmpi has a few issues with OpenMI at the moment which we're working on. It does provide a good example of an error output for MPI :) Try modifying Rsample.slurm for these simple R scripts
mandy.r is a short R code calculating Mandelbrot set through the first 20 iterations of equation z = z2 + c plotted for different complex
constants c. This example demonstrates:
* use of community-developed external libraries (called packages), in this case caTools package
* handling of complex numbers
* multidimensional arrays of numbers used as basic data type, see variables C, Z and X.
# More Sample R Scripts
Some more sample R scripts from Prof. Andrew Turpin https://people.eng.unimelb.edu.au/aturpin/R/
log.r includes the following functions
* read.table() is used to load data from a text file.
* plot() is used to create axes and labels.
* lines() is used to plot single curves.
* t.test() is used to test if the means of the two data sets differ.
* legend() is used to manually create a legend.
* print() is used to output the graph.
useawk.r R commands are used to filter the data, and print an average every tenth point instead of printing every point.
Note some of the features on this graph.
* Two plots are added to the same set of axis.
* R is used to process the data prior to plotting.
twoxes.r is useful for plotting two curves that have the same domain but different ranges.
* plot() is used to draw plots of the graph with desired mark type and line style.
use type ='n' to suppress the drawing of axes, then
use axis to draw each axis with desired style.
* since the two y-axes are of different ranges, 2 separate plots are used to draw plots for the 1st and 2nd y-axis.
par(new=TRUE) is used so that the two plots end up in a single graph.
* each axis needs major and minor ticks, axis() is called twice to draw minor and major ticks.
* use 'pos' in y-axis (x-axis) to specify the location of y-axis (x-axis) relative to x-axis (y-axis).
* use legend() to draw legend manually.
* mtext() is used for drawing the label of the 2nd y-axis.
strings.r uses the text function to place text on a graph. The most basic form of the function is
* text(x, y, "text")
* This places the word "text" at the point (x, y).
* x, y and text can all be vectors. If vectors of different lengths are used, the shorter vectors are recycled.
ebars.r includes
* Shifting of points to left and right to separate out the points shown at each integer level.
* Use of a function (with variable number of arguments)
* Use of the arrows() function to draw error bars
bargraph.r is a simple bar graph that makes use of the transpose function t()
# Tips 'n' tricks
## "join"ing two files
f1 <- read.table("data1",col.names=c("word","freq"))
f2 <- read.table("data2",col.names=c("word","length"))
d <- merge(f1,f2,by="word")
d now has three columns word, freq, length
# Parallel R
There is an example script for running R with MPI, RMpi.
# Extensions
sbatch r-parallel.slurm There is also a LIBRARY file which describes how to access R extensions.
less xvalidate.out
# There is also a LIBRARY file which describes how to access R extensions.
...@@ -5,7 +5,7 @@ ...@@ -5,7 +5,7 @@
#SBATCH --time=1 #SBATCH --time=1
module purge module purge
/usr/local/module/spartan_old.sh source /usr/local/module/spartan_old.sh
module load R/3.5.0-spartan_gcc-6.2.0 module load R/3.5.0-spartan_gcc-6.2.0
module load Rmpi module load Rmpi
......
Host: spartan-rcg001 Rank(ID): 1 of Size: 5 on comm 1
[1] "Done"
Host: spartan-rcg001 Rank(ID): 2 of Size: 5 on comm 1
[1] "Done"
Host: spartan-rcg001 Rank(ID): 3 of Size: 5 on comm 1
[1] "Done"
Host: spartan-rcg002 Rank(ID): 4 of Size: 5 on comm 1
[1] "Done"
#postscript("log.ps")
#pdf("bargraph.pdf")
data = read.table("bargraph.data")
# Note: need to take transpose of data [ t(data) ] because of source
# table format
barplot(t(data),
beside=T,
legend.text=c("Method A", "Method B"),
ylab="Elapsed time (millisec)",
xlab="Data set")
#dev.off() # This is only needed if you use pdf/postscript in interactive mode
#
# Example of ploting R graphs with error bars
# Modelled precisely on Justin Zobel's example in jgraph
#
# Authour: andrew.turpin@rmit.edu.au
# Date: Fri Apr 20 16:38:28 EST 2007
#
# uncomment this for pdf output to file
#pdf("ebars.pdf", width=7, height=5, family="Times", paper="a4")
#postscript("ebars.ps", width=7, height=5, family="Times", paper="a4")
# A little function I wrote to draw a point at (x,y)
# and a vertical error bar from (x, y-len) to (x, y+len)
# assumes plotting axis already set up
# Note the use of ... to pass on unspecified arguments
plotPandB <- function(x, y, len, sfrac=0.01, ...) {
points(x, y, ...)
arrows(x, y, x, y+len, length=par("fin")[1] * sfrac, angle=90)
arrows(x, y, x, pmax(0, y-len), length=par("fin")[1] * sfrac, angle=90)
}
# Read in the 4 data sets into four list elements
# Only read the first 12 rows as in jz's original eg
d <- list(1:4)
for (i in 1:4)
d[[i]] <- read.table(
paste("ebars-d",i,".data",sep=""),
nrows = 12
)
#set up the axis
plot(0,0,
xlim=c(0, 12),
ylim=c(0, 0.045),
type="n",
xlab = "Level",
ylab = "Av. probability at level"
)
# draw points (x offsets as per jz's original jgraph)
# square x circle triangle
markType <- c(22, 120, 19, 24)
backGs <- c("black", "white", "gray", "white")
cols <- c("black", "gray" , "gray", "gray")
offsets <- c(-0.24, -0.08, +0.08, +0.24)
for(i in 1:4)
plotPandB(1:length(d[[i]][,1]) + offsets[i], d[[i]][,1], d[[i]][,2],
pch = markType[i],
bg = backGs[i],
col = cols[i],
sfrac = 0.005
)
# plonk on a legend
legend(8.5, 0.025,
c("Major", "Upper", "Lower", "Minor"),
pch=markType,
col = cols,
pt.bg = backGs
)
#dev.off() # This is only needed if you use pdf/postscript in interactive mode
#postscript("log.ps")
#pdf("log.pdf")
# Load second data table.
data_table_2 <- read.table("log-d2.data", col.names=c("DocLen2","NumDoc2"))
plot(data_table_2,
type="l",
xlab="Document length",
ylab="Number of documents",
log="y",
lwd=3)
# Load first data table.
data_table_1 <- read.table("log-d1.data", col.names=c("DocLen1","NumDoc1"))
lines(data_table_1, lwd="1")
legend(50,50,legend=c("Data set 1", "Data set 2"), lwd=c(1,3))
# a t.test just for fun!
t <- t.test(data_table_2$DocLen2, data_table_2$NumDoc2, paired=TRUE)
print(paste("t-test pvalue = ",t$p.value))
#dev.off() # This is only needed if you use pdf/postscript in interactive mode
library(caTools) # external package providing write.gif function
jet.colors <- colorRampPalette(c("red", "blue", "#007FFF", "cyan", "#7FFF7F",
"yellow", "#FF7F00", "red", "#7F0000"))
dx <- 1500 # define width
dy <- 1400 # define height
C <- complex(real = rep(seq(-2.2, 1.0, length.out = dx), each = dy),
imag = rep(seq(-1.2, 1.2, length.out = dy), dx))
C <- matrix(C, dy, dx) # reshape as square matrix of complex numbers
Z <- 0 # initialize Z to zero
X <- array(0, c(dy, dx, 20)) # initialize output 3D array
for (k in 1:20) { # loop with 20 iterations
Z <- Z^2 + C # the central difference equation
X[, , k] <- exp(-abs(Z)) # capture results
}
write.gif(X, "Mandelbrot.gif", col = jet.colors, delay = 100)
#postscript(family="Times", 'multi.ps', width=40)
#pdf(family="Times", 'multi.pdf', width=40)
# draw points (offsets is for color)
offsets <- c(0.0, 0.3, 0.3, 0.6, 0.6)
linethick <- c(0.5, 0.5, 0.5, 1.5, 1.5)
leg.txt <- c(
"actual p(novel)",
"p'(novel), mehtod X",
"p'(novel), method X+stdev",
"p'(novel), method X-fac",
"p'(novel), method X-fac+stdev ")
#read source file
data_table <- read.table("multi.data",
col.names=c("vx1","vy1","vy2","vy3","vy4","vy5","vy6","vy7"))
#cbind() forms matrices by binding together matrices horizontally,
#or column-wise, and rbind() vertically, or row-wise
line1node <- cbind(data_table$vx1, data_table$vy1)
#draw line1, xaxt="n" means not draw x axis now;
# lty=1 means solid line, lwd means line width
plot(line1node,
xaxt="n", yaxt="n",
xlab="Probability of novel symbol",
ylab="Number of symbol occurrences",
log = "x",