Commit 737b9070 authored by root's avatar root

Update for 3/2/2021

parent fe028630
......@@ -7,6 +7,7 @@ Delft3D/*
!Delft3D/run_all_examples.sh
!Delft3D/run_all_examples_sp.sh
!Delft3D/sed_in_file.tcl
eazy-photoz/inputs
FreeSurfer/buckner_data
FSL/intro
FSL/fmri
......@@ -20,6 +21,7 @@ Trimmomatic/.backup
*.fastq
*.fastq.gz
*.fasta
*.fasta.gz
*.faa
*.tar
*.tar.gz
......@@ -40,3 +42,4 @@ Trimmomatic/.backup
*.JPEG
*.odb
*.csv
*.data
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Barrnap-test.slurm
# Run on single CPU
#SBATCH --ntasks=1
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load barrnap/0.9
# Search the draft genome for rRNA genes.
barrnap -o bin2_rrna.fa bin.2.fa
# Check the file bin2_rrna.fa for results
Example derived from the Swedish University of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
Barrnap (BAsic Rapid Ribosomal RNA Predictor) predicts the location of ribosomal RNA genes in genomes.
A draft genome is provided. A search is conducted on the genome for rRNA genes. Since these genes are usually quite conserved across
species/genera, it could give us a broad idea of the organism.
The bin, bin.2.fa, is also used for the diamond example.
[barrnap] This is barrnap 0.9
[barrnap] Written by Torsten Seemann
[barrnap] Obtained from https://github.com/tseemann/barrnap
[barrnap] Detected operating system: linux
[barrnap] Adding /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/barrnap/0.9/bin/../binaries/linux to end of PATH
[barrnap] Checking for dependencies:
[barrnap] Found nhmmer - /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/hmmer/3.2.1/bin/nhmmer
[barrnap] Found bedtools - /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/bedtools/2.27.1/bin/bedtools
[barrnap] Will use 1 threads
[barrnap] Setting evalue cutoff to 1e-06
[barrnap] Will tag genes < 0.8 of expected length.
[barrnap] Will reject genes < 0.25 of expected length.
[barrnap] Using database: /usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/barrnap/0.9/bin/../db/bac.hmm
[barrnap] Scanning bin.2.fa for bac rRNA genes... please wait
[barrnap] Command: nhmmer --cpu 1 -E 1e-06 --w_length 3878 -o /dev/null --tblout /dev/stdout '/usr/local/easybuild-2019/easybuild/software/mpi/gcc/8.3.0/openmpi/3.1.4/barrnap/0.9/bin/../db/bac.hmm' 'bin.2.fa'
[barrnap] Rejecting short 68 nt predicted 16S_rRNA. Adjust via --reject option.
[barrnap] Rejecting short 205 nt predicted 23S_rRNA. Adjust via --reject option.
[barrnap] Found: 5S_rRNA k141_4428 L=110/119 240173..240282 - 5S ribosomal RNA
[barrnap] Found 1 ribosomal RNA features.
[barrnap] Sorting features and outputting GFF3...
[barrnap] Writing hit sequences to: bin2_rrna.fa
[barrnap] Running: bedtools getfasta -s -name+ -fo 'bin2_rrna.fa' -fi 'bin.2.fa' -bed '/tmp/AZo86tmMHj'
##gff-version 3
k141_4428 barrnap:0.9 rRNA 240173 240282 4.8e-15 - . Name=5S_rRNA;product=5S ribosomal RNA
index file bin.2.fa.fai not found, generating...
[barrnap] Done.
Derived from the Swedish Univeristy of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
This file contains the sequence of the pO157 plasmid from the Sakai outbreak strain of E. coli O157.
Available from: curl -O -J -L https://osf.io/rnzbe/download
......
......@@ -2,6 +2,8 @@ NOTA BENE: TUTORIAL IS INCOMPLETE. RUNS BUT WITH ERRORS.
Sone content derived from the Swedish Univeristy of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
Busco (Benchmark Universal Single Copy Orthologs) can be used to to find marker genes in a assembly. Marker genes are conserved
across a range of species and finding intact conserved genes in the assembly would be a good indication of its quality.
......
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Diamond-test.slurm
# Run with four threads
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load diamond/0.9.30
# Run diamond
diamond makedb --in uniprot_sprot.fasta.gz --db uniprot_sprot -p 4
diamond blastx -p 4 -q bin.2.fa -f 6 -d uniprot_sprot.dmnd -o bin2_diamond.txt
Diamond is a sequence aligner for protein and translated DNA searches and functions as a drop-in replacement for the NCBI BLAST
software tools. It is suitable for protein-protein search as well as DNA-protein search on short reads and longer sequences
including contigs and assemblies, providing a speedup of BLAST ranging up to x20,000.
Use diamond against the swissprot database for quickly assigning taxonomy to our contigs.
It is very possible the swissprot database is too small for finding meaningful hits for undersequenced / poorly known organisms.
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Fastp-test.slurm
# Run this will multiple cores
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
# set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:15:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load fastqc/0.11.9-java-11.0.2
module load fastp/0.20.0
module load bowtie2/2.3.5.1
# Check read quality
cd dolphin/results/
fastqc Dol1_*.fastq.gz
# Removing the adapters and trim by quality with Fastp
fastp -i Dol1_S19_L001_R1_001.fastq.gz -o Dol1_trimmed_R1.fastq \
-I Dol1_S19_L001_R2_001.fastq.gz -O Dol1_trimmed_R2.fastq \
--detect_adapter_for_pe --length_required 30 \
--cut_front --cut_tail --cut_mean_quality 10
# View the html report produced.
# Use precomputed the index files. Extract the bowtie indexes of the dolphin genome into the results directory
tar -xzvf host_genome.tar.gz
# Map sequencing reads on the dolphin genome. How many reads mapped on the dolphin genome?
bowtie2 -x host_genome/Tursiops_truncatus \
-1 Dol1_trimmed_R1.fastq -2 Dol1_trimmed_R2.fastq \
-S dol_map.sam --un-conc Dol_reads_unmapped.fastq --threads 4
Derived from content by the Swedish University of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
This example investigates metagenomics data and retrieve draft genome from an assembled metagenome.
It uses a dataset published in 2017 in a study in dolphins, where fecal samples where prepared for viral metagenomics study. The
dolphin had a self-limiting gastroenteritis of suspected viral origin.
Use FastQC to check the quality of the data, as well as fastp for trimming the bad quality part of the reads.
Bowtie2 is used for removing the host sequences by mapping/aligning on the dolphin genome.
Started analysis of Dol1_S19_L001_R1_001.fastq.gz
Approx 5% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 10% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 15% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 20% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 25% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 30% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 35% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 40% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 45% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 50% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 55% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 60% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 65% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 70% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 75% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 80% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 85% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 90% complete for Dol1_S19_L001_R1_001.fastq.gz
Approx 95% complete for Dol1_S19_L001_R1_001.fastq.gz
Analysis complete for Dol1_S19_L001_R1_001.fastq.gz
Started analysis of Dol1_S19_L001_R2_001.fastq.gz
Approx 5% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 10% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 15% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 20% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 25% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 30% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 35% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 40% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 45% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 50% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 55% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 60% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 65% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 70% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 75% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 80% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 85% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 90% complete for Dol1_S19_L001_R2_001.fastq.gz
Approx 95% complete for Dol1_S19_L001_R2_001.fastq.gz
Analysis complete for Dol1_S19_L001_R2_001.fastq.gz
Detecting adapter sequence for read1...
No adapter detected for read1
Detecting adapter sequence for read2...
No adapter detected for read2
Read1 before filtering:
total reads: 257617
total bases: 68087857
Q20 bases: 64868031(95.2711%)
Q30 bases: 59552985(87.4649%)
Read2 before filtering:
total reads: 257617
total bases: 68349702
Q20 bases: 62509996(91.4561%)
Q30 bases: 55246319(80.8289%)
Read1 after filtering:
total reads: 256037
total bases: 67566252
Q20 bases: 64538483(95.5188%)
Q30 bases: 59320846(87.7966%)
Read2 aftering filtering:
total reads: 256037
total bases: 67448539
Q20 bases: 62176728(92.184%)
Q30 bases: 55096219(81.6863%)
Filtering result:
reads passed filter: 512074
reads failed due to low quality: 3006
reads failed due to too many N: 0
reads failed due to too short: 154
reads with adapter trimmed: 27072
bases trimmed due to adapters: 472337
Duplication rate: 8.51956%
Insert size peak (evaluated by paired-end reads): 240
JSON report: fastp.json
HTML report: fastp.html
fastp -i Dol1_S19_L001_R1_001.fastq.gz -o Dol1_trimmed_R1.fastq -I Dol1_S19_L001_R2_001.fastq.gz -O Dol1_trimmed_R2.fastq --detect_adapter_for_pe --length_required 30 --cut_front --cut_tail --cut_mean_quality 10
fastp v0.20.0, time used: 17 seconds
host_genome/
host_genome/Tursiops_truncatus.rev.1.bt2
host_genome/Tursiops_truncatus.1.bt2
host_genome/Tursiops_truncatus.rev.2.bt2
host_genome/Tursiops_truncatus.3.bt2
host_genome/Tursiops_truncatus.2.bt2
host_genome/Tursiops_truncatus.4.bt2
256037 reads; of these:
256037 (100.00%) were paired; of these:
255804 (99.91%) aligned concordantly 0 times
151 (0.06%) aligned concordantly exactly 1 time
82 (0.03%) aligned concordantly >1 times
----
255804 pairs aligned concordantly 0 times; of these:
2 (0.00%) aligned discordantly 1 time
----
255802 pairs aligned 0 times concordantly or discordantly; of these:
511604 mates make up the pairs; of these:
511598 (100.00%) aligned 0 times
5 (0.00%) aligned exactly 1 time
1 (0.00%) aligned >1 times
0.09% overall alignment rate
Derived from a tutorial from the Swedish Univeristy of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
Kraken is a system for assigning taxonomic labels to short DNA sequences (i.e. reads) Kraken aims to achieve high sensitivity and
high speed by utilizing exact alignments of k-mers and a novel classification algorithm.
......
......@@ -2,6 +2,8 @@
Derived from content from the Swedish University of Agricultural Sciences
https://www.hadriengourle.com/tutorials/
## Data collection
M. genitalium was sequenced using the MiSeq platform (2 * 150bp). The reads were deposited in the ENA Short Read Archive under the
......
#!/bin/bash
# To give your job a name, replace "MyJob" with an appropriate name
#SBATCH --job-name=Metagenome-test.slurm
# Run multicore
#SBATCH --ntasks-per-node=4
# Set your minimum acceptable walltime=days-hours:minutes:seconds
#SBATCH -t 0:45:00
# Specify your email address to be notified of progress.
# SBATCH --mail-user=youreamiladdress@unimelb.edu
# SBATCH --mail-type=ALL
# Load the environment variables
module purge
module load foss/2019b
module load fastqc/0.11.9-java-11.0.2
module load sickle/1.33
module load megahit/1.1.4-python-3.7.4
module load bowtie2/2.3.5.1
module load samtools/1.9
module load bcftools/1.9
module load metabat/2.12.1-python-2.7.16
module load checkm/1.1.2-python-3.7.4
# Use FastQC to check the quality of our data
cd results/
ln -s ../data/tara_reads_* .
fastqc tara_reads_*.fastq.gz
# Trim the reads using sickle
sickle pe -f tara_reads_R1.fastq.gz -r tara_reads_R2.fastq.gz -t sanger \
-o tara_trimmed_R1.fastq -p tara_trimmed_R2.fastq -s /dev/null
# Assemble with MEGAHIT
megahit -1 tara_trimmed_R1.fastq -2 tara_trimmed_R2.fastq -o tara_assembly
# Map the reads back against the assembly to get coverage information
ln -s tara_assembly/final.contigs.fa .
bowtie2-build final.contigs.fa final.contigs
bowtie2 -x final.contigs -1 tara_reads_R1.fastq.gz -2 tara_reads_R2.fastq.gz | \
samtools view -bS -o tara_to_sort.bam
samtools sort tara_to_sort.bam -o tara.bam
samtools index tara.bam
# Get the bins from metabat
runMetaBat.sh -m 1500 final.contigs.fa tara.bam
mv final.contigs.fa.metabat-bins1500 metabat
# Check the quality of the bins
checkm lineage_wf -x fa metabat checkm/
## Below has been deprecated ##
# Plot the completeness
# checkm bin_qa_plot -x fa checkm metabat plots
# Take a look at plots/bin_qa_plot.png
Material derived from the Swedish University of Agricultural Sciences.
https://www.hadriengourle.com/tutorials/
In this example several applications are used. The exmple involves how to inspect and assemble metagenomic data and retrieve draft
genomes from assembled metagenomes.
The dataset consistes of 20 bacteria selected from the Tara Ocean study that recovered 957 distinct Metagenome-assembled-genomes (or
MAGs) that were previsouly unknown.
http://ocean-microbiome.embl.de/companion.html
The following have been reloaded with a version change:
1) python/3.7.4 => python/2.7.16
The following have been reloaded with a version change:
1) python/2.7.16 => python/3.7.4
Started analysis of tara_reads_R1.fastq.gz
Approx 5% complete for tara_reads_R1.fastq.gz
Approx 10% complete for tara_reads_R1.fastq.gz
Approx 15% complete for tara_reads_R1.fastq.gz
Approx 20% complete for tara_reads_R1.fastq.gz
Approx 25% complete for tara_reads_R1.fastq.gz
Approx 30% complete for tara_reads_R1.fastq.gz
Approx 35% complete for tara_reads_R1.fastq.gz
Approx 40% complete for tara_reads_R1.fastq.gz
Approx 45% complete for tara_reads_R1.fastq.gz
Approx 50% complete for tara_reads_R1.fastq.gz
Approx 55% complete for tara_reads_R1.fastq.gz
Approx 60% complete for tara_reads_R1.fastq.gz
Approx 65% complete for tara_reads_R1.fastq.gz
Approx 70% complete for tara_reads_R1.fastq.gz
Approx 75% complete for tara_reads_R1.fastq.gz
Approx 80% complete for tara_reads_R1.fastq.gz
Approx 85% complete for tara_reads_R1.fastq.gz
Approx 90% complete for tara_reads_R1.fastq.gz
Approx 95% complete for tara_reads_R1.fastq.gz
Analysis complete for tara_reads_R1.fastq.gz
Started analysis of tara_reads_R2.fastq.gz
Approx 5% complete for tara_reads_R2.fastq.gz
Approx 10% complete for tara_reads_R2.fastq.gz
Approx 15% complete for tara_reads_R2.fastq.gz
Approx 20% complete for tara_reads_R2.fastq.gz
Approx 25% complete for tara_reads_R2.fastq.gz
Approx 30% complete for tara_reads_R2.fastq.gz
Approx 35% complete for tara_reads_R2.fastq.gz
Approx 40% complete for tara_reads_R2.fastq.gz
Approx 45% complete for tara_reads_R2.fastq.gz
Approx 50% complete for tara_reads_R2.fastq.gz
Approx 55% complete for tara_reads_R2.fastq.gz
Approx 60% complete for tara_reads_R2.fastq.gz
Approx 65% complete for tara_reads_R2.fastq.gz
Approx 70% complete for tara_reads_R2.fastq.gz
Approx 75% complete for tara_reads_R2.fastq.gz
Approx 80% complete for tara_reads_R2.fastq.gz
Approx 85% complete for tara_reads_R2.fastq.gz
Approx 90% complete for tara_reads_R2.fastq.gz
Approx 95% complete for tara_reads_R2.fastq.gz
Analysis complete for tara_reads_R2.fastq.gz
FastQ paired records kept: 2995072 (1497536 pairs)
FastQ single records kept: 2460 (from PE1: 2366, from PE2: 94)
FastQ paired records discarded: 0 (0 pairs)
FastQ single records discarded: 2460 (from PE1: 94, from PE2: 2366)
754.162Gb memory in total.
Using: 678.746Gb.
MEGAHIT v1.1.4
--- [Wed Jan 13 16:21:55 2021] Start assembly. Number of CPU threads 72 ---
--- [Wed Jan 13 16:21:55 2021] Available memory: 809775181824, used: 728797663641
--- [Wed Jan 13 16:21:55 2021] Converting reads to binaries ---
b' [read_lib_functions-inl.h : 209] Lib 0 (tara_trimmed_R1.fastq,tara_trimmed_R2.fastq): pe, 2995072 reads, 126 max length'
b' [utils.h : 126] Real: 2.4058\tuser: 1.6662\tsys: 0.4406\tmaxrss: 149868'
--- [Wed Jan 13 16:21:58 2021] k list: 21,29,39,59,79,99,119,141 ---
--- [Wed Jan 13 16:21:58 2021] Extracting solid (k+1)-mers for k = 21 ---
--- [Wed Jan 13 16:22:37 2021] Building graph for k = 21 ---
--- [Wed Jan 13 16:22:49 2021] Assembling contigs from SdBG for k = 21 ---
--- [Wed Jan 13 16:23:16 2021] Local assembling k = 21 ---
--- [Wed Jan 13 16:24:12 2021] Extracting iterative edges from k = 21 to 29 ---
--- [Wed Jan 13 16:24:31 2021] Building graph for k = 29 ---
--- [Wed Jan 13 16:24:39 2021] Assembling contigs from SdBG for k = 29 ---
--- [Wed Jan 13 16:25:07 2021] Local assembling k = 29 ---
--- [Wed Jan 13 16:26:02 2021] Extracting iterative edges from k = 29 to 39 ---
--- [Wed Jan 13 16:26:20 2021] Building graph for k = 39 ---
--- [Wed Jan 13 16:26:27 2021] Assembling contigs from SdBG for k = 39 ---
--- [Wed Jan 13 16:26:59 2021] Local assembling k = 39 ---
--- [Wed Jan 13 16:28:02 2021] Extracting iterative edges from k = 39 to 59 ---
--- [Wed Jan 13 16:28:19 2021] Building graph for k = 59 ---
--- [Wed Jan 13 16:28:27 2021] Assembling contigs from SdBG for k = 59 ---
--- [Wed Jan 13 16:28:54 2021] Local assembling k = 59 ---
--- [Wed Jan 13 16:29:53 2021] Extracting iterative edges from k = 59 to 79 ---
--- [Wed Jan 13 16:30:05 2021] Building graph for k = 79 ---
--- [Wed Jan 13 16:30:12 2021] Assembling contigs from SdBG for k = 79 ---
--- [Wed Jan 13 16:30:39 2021] Local assembling k = 79 ---
--- [Wed Jan 13 16:31:39 2021] Extracting iterative edges from k = 79 to 99 ---
--- [Wed Jan 13 16:31:48 2021] Building graph for k = 99 ---
--- [Wed Jan 13 16:31:54 2021] Assembling contigs from SdBG for k = 99 ---
--- [Wed Jan 13 16:32:20 2021] Local assembling k = 99 ---
--- [Wed Jan 13 16:33:19 2021] Extracting iterative edges from k = 99 to 119 ---
--- [Wed Jan 13 16:33:25 2021] Building graph for k = 119 ---
--- [Wed Jan 13 16:33:30 2021] Assembling contigs from SdBG for k = 119 ---
--- [Wed Jan 13 16:33:54 2021] Local assembling k = 119 ---
--- [Wed Jan 13 16:34:54 2021] Extracting iterative edges from k = 119 to 141 ---
--- [Wed Jan 13 16:34:55 2021] Building graph for k = 141 ---
--- [Wed Jan 13 16:35:00 2021] Assembling contigs from SdBG for k = 141 ---
--- [Wed Jan 13 16:35:28 2021] Merging to output final contigs ---
--- [STAT] 5904 contigs, total 22989006 bp, min 205 bp, max 2448420 bp, avg 3894 bp, N50 21060 bp
--- [Wed Jan 13 16:35:28 2021] ALL DONE. Time elapsed: 813.115067 seconds ---
Settings:
Output files: "final.contigs.*.bt2"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 4 (one in 16)
FTable chars: 10
Strings: unpacked
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:8, int:4, long:8, size_t:8
Input files DNA, FASTA:
final.contigs.fa
Building a SMALL index
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:01
bmax according to bmaxDivN setting: 5747251
Using parameters --bmax 4310439 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 4310439 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:00
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 2.2989e+07 (target: 4310438)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 22989006 for bucket 1
(Using difference cover)
Sorting block time: 00:00:05
Returning block of 22989007 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 5723731
fchr[G]: 11487502
fchr[T]: 17267043
fchr[$]: 22989006
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 12178208 bytes to primary EBWT file: final.contigs.1.bt2
Wrote 5747256 bytes to secondary EBWT file: final.contigs.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 22989006
bwtLen: 22989007
sz: 5747252
bwtSz: 5747252
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 1436813
offsSz: 5747252
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 119735
numLines: 119735
ebwtTotLen: 7663040
ebwtTotSz: 7663040
color: 0
reverse: 0
Total time for call to driver() for forward index: 00:00:08
Reading reference sizes
Time reading reference sizes: 00:00:00
Calculating joined length
Writing header
Reserving space for joined string
Joining reference sequences
Time to join reference sequences: 00:00:00
Time to reverse reference sequence: 00:00:00
bmax according to bmaxDivN setting: 5747251
Using parameters --bmax 4310439 --dcv 1024
Doing ahead-of-time memory usage test
Passed! Constructing with these parameters: --bmax 4310439 --dcv 1024
Constructing suffix-array element generator
Building DifferenceCoverSample
Building sPrime
Building sPrimeOrder
V-Sorting samples
V-Sorting samples time: 00:00:00
Allocating rank array
Ranking v-sort output
Ranking v-sort output time: 00:00:01
Invoking Larsson-Sadakane on ranks
Invoking Larsson-Sadakane on ranks time: 00:00:00
Sanity-checking and returning
Building samples
Reserving space for 12 sample suffixes
Generating random suffixes
QSorting 12 sample offsets, eliminating duplicates
QSorting sample offsets, eliminating duplicates time: 00:00:00
Multikey QSorting 12 samples
(Using difference cover)
Multikey QSorting samples time: 00:00:00
Calculating bucket sizes
Splitting and merging
Splitting and merging time: 00:00:00
Avg bucket size: 2.2989e+07 (target: 4310438)
Converting suffix-array elements to index image
Allocating ftab, absorbFtab
Entering Ebwt loop
Getting block 1 of 1
No samples; assembling all-inclusive block
Sorting block of length 22989006 for bucket 1
(Using difference cover)
Sorting block time: 00:00:04
Returning block of 22989007 for bucket 1
Exited Ebwt loop
fchr[A]: 0
fchr[C]: 5723731
fchr[G]: 11487502
fchr[T]: 17267043
fchr[$]: 22989006
Exiting Ebwt::buildToDisk()
Returning from initFromVector
Wrote 12178208 bytes to primary EBWT file: final.contigs.rev.1.bt2
Wrote 5747256 bytes to secondary EBWT file: final.contigs.rev.2.bt2
Re-opening _in1 and _in2 as input streams
Returning from Ebwt constructor
Headers:
len: 22989006
bwtLen: 22989007
sz: 5747252
bwtSz: 5747252
lineRate: 6
offRate: 4
offMask: 0xfffffff0
ftabChars: 10
eftabLen: 20
eftabSz: 80
ftabLen: 1048577
ftabSz: 4194308
offsLen: 1436813
offsSz: 5747252
lineSz: 64
sideSz: 64
sideBwtSz: 48
sideBwtLen: 192
numSides: 119735
numLines: 119735
ebwtTotLen: 7663040
ebwtTotSz: 7663040
color: 0
reverse: 1
Total time for backward call to driver() for mirror index: 00:00:07
1499996 reads; of these:
1499996 (100.00%) were paired; of these:
1067564 (71.17%) aligned concordantly 0 times
432218 (28.81%) aligned concordantly exactly 1 time
214 (0.01%) aligned concordantly >1 times
----
1067564 pairs aligned concordantly 0 times; of these:
1047682 (98.14%) aligned discordantly 1 time
----
19882 pairs aligned 0 times concordantly or discordantly; of these:
39764 mates make up the pairs; of these:
14857 (37.36%) aligned 0 times
18063 (45.43%) aligned exactly 1 time
6844 (17.21%) aligned >1 times
99.50% overall alignment rate
[bam_sort_core] merging from 1 files and 1 in-memory blocks...
Executing: 'jgi_summarize_bam_contig_depths --outputDepth final.contigs.fa.depth.txt --pairedContigs final.contigs.fa.paired.txt --minContigLength 1000 --minContigDepth 1 tara.bam' at Wed Jan 13 16:42:49 AEDT 2021
Output depth matrix to final.contigs.fa.depth.txt
Output pairedContigs lower triangle to final.contigs.fa.paired.txt
minContigLength: 1000
minContigDepth: 1
Output matrix to final.contigs.fa.depth.txt
Opening 1 bams
Consolidating headers
Allocating pairedContigs matrix: 0 MB over 1 threads
Processing bam files
Thread 0 finished: tara.bam with 2999992 reads and 2957427 readsWellMapped
Creating depth matrix file: final.contigs.fa.depth.txt
Closing most bam files
Creating pairedContigs matrix file: final.contigs.fa.paired.txt
Closing last bam file
Finished
Finished jgi_summarize_bam_contig_depths at Wed Jan 13 16:42:55 AEDT 2021
Creating depth file for metabat at Wed Jan 13 16:42:55 AEDT 2021
Executing: 'metabat2 -m 1500 --inFile final.contigs.fa --outFile final.contigs.fa.metabat-bins1500/bin --abdFile final.contigs.fa.depth.txt' at Wed Jan 13 16:42:55 AEDT 2021
MetaBAT 2 (v2.12.1) using minContig 1500, minCV 1.0, minCVSum 1.0, maxP 95%, minS 60, and maxEdges 200.
10 bins (19704068 bases in total) formed.
Finished metabat2 at Wed Jan 13 16:43:05 AEDT 2021
[2021-01-13 16:43:08] INFO: CheckM v1.1.2
[2021-01-13 16:43:08] INFO: checkm lineage_wf -x fa metabat checkm/
[2021-01-13 16:43:08] INFO: [CheckM - tree] Placing bins in reference genome tree.
[2021-01-13 16:43:09] INFO: Identifying marker genes in 10 bins with 1 threads:
Finished processing 0 of 10 (0.00%) bins. Finished processing 1 of 10 (10.00%) bins. Finished processing 2 of 10 (20.00%) bins. Finished processing 3 of 10 (30.00%) bins. Finished processing 4 of 10 (40.00%) bins. Finished processing 5 of 10 (50.00%) bins. Finished processing 6 of 10 (60.00%) bins. Finished processing 7 of 10 (70.00%) bins. Finished processing 8 of 10 (80.00%) bins. Finished processing 9 of 10 (90.00%) bins. Finished processing 10 of 10 (100.00%) bins.
[2021-01-13 16:44:24] INFO: Saving HMM info to file.
[2021-01-13 16:44:24] INFO: Calculating genome statistics for 10 bins with 1 threads:
Finished processing 0 of 10 (0.00%) bins. Finished processing 1 of 10 (10.00%) bins. Finished processing 2 of 10 (20.00%) bins. Finished processing 3 of 10 (30.00%) bins. Finished processing 4 of 10 (40.00%) bins. Finished processing 5 of 10 (50.00%) bins. Finished processing 6 of 10 (60.00%) bins. Finished processing 7 of 10 (70.00%) bins. Finished processing 8 of 10 (80.00%) bins. Finished processing 9 of 10 (90.00%) bins. Finished processing 10 of 10 (100.00%) bins.
[2021-01-13 16:44:24] INFO: Extracting marker genes to align.