Commit 9f2c5178 authored by root's avatar root

Nov 4 update

parent c4b3a28a
#!/bin/bash
# Handy Extract Program
# Modified Lev Lafayette 20201006 for order and LZMA2 xz files
if [[ -f $1 ]]; then
case $1 in
*.tar.bz2) tar xvjf $1 ;;
*.gz) gunzip $1 ;;
*.tar) tar xvf $1 ;;
*.tgz) tar xvzf $1 ;;
*.tar.xz) tar xvf $1 ;;
*.tar.gz) tar xvzf $1 ;;
*.bz2) bunzip2 $1 ;;
*.tar.bz2) tar xvjf $1 ;;
*.rar) unrar x $1 ;;
*.gz) gunzip $1 ;;
*.tar) tar xvf $1 ;;
*.tbz2) tar xvjf $1 ;;
*.tgz) tar xvzf $1 ;;
*.zip) unzip $1 ;;
*.Z) uncompress $1 ;;
*.7z) 7z x $1 ;;
......
To run comsol in GUI:
start FastX
load the module
run command:
comsol -3drend sw
......@@ -30,9 +30,7 @@ Often, code executed on the CPU is referred to as host code, and code running on
To run a sample CUDA job start with interactive job.
sinteractive --partition=gpgputest -A hpcadmingpgpu --gres=gpu:p100:4
Change "hpcadmingpgpu" to another gpgpu project.
sinteractive --partition=gpgpu --gres=gpu:p100:4
Load a CUDA module
......
......@@ -20,9 +20,7 @@ sinteractive --x11=first --partition=shortgpgpu --gres=gpu:p100:1
sinteractive --x11=first --partition=deeplearn --qos=gpgpudeeplearn --gres=gpu:v100:1
sinteractive --partition=gpgpu --account=hpcadmingpgpu --gres=gpu:2
# (Change hpcadmingpgpu to another gpgpu-enabled account)
sinteractive --partition=gpgpu --gres=gpu:2
# If the user is not using a Linux local machine they will need to install an X-windows client, such as Xming for MS-Windows or X11 on Mac OSX from the XQuartz project.
......
#!/bin/bash
#SBATCH --account=hpcadmingpgpu # Use a project ID that has access.
#SBATCH --partition=gpgputest
#SBATCH --account=hpcadmin # Use a project ID that has access.
#SBATCH --partition=gpgpu
#SBATCH --gres=gpu:2
#SBATCH --time=0:10:00
#SBATCH --ntasks=2
......
......@@ -8,7 +8,7 @@
#SBATCH --job-name namdgpu
# Which partition
#SBATCH --partition=gpgpu-test
#SBATCH --partition=gpgpu
# How many cores ?
#SBATCH --nodes=1
......
#!/bin/bash
# This is a sample job template for Gadi.
# Change the project to your project
#PBS -P vp61
# Standard Queue
#PBS -q normal
#PBS -l walltime=0:10:00
# PBS -l mem=5GB
# This is for local compute disk.
# PBS -l jobfs=1GB
#PBS -j oe
#PBS -l ncpus=2
# Change to working directory
#PBS -l wd
module load openmpi/4.0.2
mpiexec ./mpi-helloworld
# Basic Scheduler Commands
To submit a job use `qsub $JobName`. Job status can be determined by `qstat $JobID`, `qstat -s $JobID` or `qstat -u $Username`, or `qdel $JobID` to delete a job. To review a job's details use `qstat -f $JobID`.
Standard output and error streams are collected by PBSPro and saved in `<Jobname>.o<Jobid>` for standard output and `<Jobname>.e<Jobid>` for standard error.
To put a user hold on a job use `qhold $JobID`, and `qrls -h u $JobID` to release.
A job can be terminated and relaunched with `qrerun $JobID`
A job selection can be shown e.g., `qselect -u $username -l ncpus.gt. $number`
# PBS Directives
`#PBS -N job_name` for job name
`#PBS -j oe` or `eo` to combine output and error files; also `-e directory` or `-o $directory` for specific locations.
`#PBS -m abe` for mail when job aborts, begins, ends. Combine with `-M $email` directive.
# Gadi Directives
Jobs must explicitly declare the file systems accessed. Files in `/scratch/$project` and `/g/data/$project` directories must include the directive `-lstorage=scratch/$project+gdata/$project`.
# Example Scripts
# About NCI and Gadi
National Computational Infrastructure (NCI) is Australia's peak facility for computational research and is located at ANU, Canberra.
Main HPC system in Gadi, Australia's peak research supercomputer; 9 PetaFLOP peak compute performance, 15 PFs theoretical. Number 24 in the Top 500 in June 2020.
# Getting An Account
Getting an account on Gadi is not as easy as Spartan.
There are merit allocations schemes, collaborator schemes, a start-up scheme for new users and an industry access scheme.
Register for an account or new project at the MyNCI portal. `https://my.nci.org.au/`
The NCI Flagship Allocation Scheme provides for projects identified by the NCI Board as being of high-impact or national strategic
Main access through National Computational Merit Allocation Scheme (NCMAS). `https://ncmas.nci.org.au`. Includes NCI (Gadi), Pawsey Centre (Magnus), Monash (MASSIVE), and UQ (FlashLite).
NCI Start-up Scheme, much smaller compute quota, used primarily for evaluation. Follow the 'propose a project' link on MyNCI portal to submit a start-up proposal.
# Accessing Gadi
The hostname for Gadi is gadi.nci.org.au. As with similar systems logins are via SSH. The command `ssh username@Gadi.nci.org.au` will put the user one of the login nodes; use -Y for X-Windows forwarding.
Do consider using an SSH config and/or passwordless SSH, it will make things a lot easier
# Shell Environment
Gadi login configuation is located at `.config/gadi-login.conf`
This caan be used to change the the default project and the CLI shell that Gadi initiates, which is bash by default (e.g., `SHELL /bin/tcsh`). If you try to use a shell not registered it will default to bash.
# Modules and Scheduler
Gadi has software under a TCL enviroment modules scheme.
Gadi uses PBSPro for workload management, not Slurm. See PBS_Commands file.
There are sample job submission scripts in this directory for multicore, multinode, job dependencies, and job arrays on Gadi.
#!/bin/bash
# Example script to submit two jobs in a dependency
# Directives include; `after`, `afterok`, `afternotok`, `afterany`, `before`, `beforeok`, `beforenotok`, `beforeany`
FIRST=$(qsub job1-1.pbs)
echo $FIRST
SUB1=$(echo ${FIRST##* })
SECOND=$(qsub -W depend=afterany:$SUB1 job1-2.pbs)
echo $SECOND
#!/bin/bash
# NCI is not fond of job arrays and they are restricted on Gadi.
# Create the equivalent through multiple jobs using a heredoc.
# Put this in its own directory; run the herescript
# Then submit with
# for item in {1..5}; do qsub helloworld-${item}.pbs; done
for item in {1..5}
do
cat <<- EOF > helloworld-${item}.pbs
#!/bin/bash
#PBS -P vp61
#PBS -q normal
#PBS -l walltime=0:10:00
#PBS -j oe
#PBS -l wd
#PBS -l ncpus=2
mpiexec mpihelloworld-${item}
EOF
done
#!/bin/bash
#PBS -N HelloWorld
#PBS -P vp61
#PBS -q normal
#PBS -l walltime=0:10:00
# PBS -l mem=5GB
# PBS -l jobfs=1GB
#PBS -j oe
#PBS -l ncpus=2
#PBS -l wd
module load openmpi/4.0.2
mpiexec ./mpi-helloworld
sleep 120
#!/bin/bash
#PBS -N HelloWorld
#PBS -P vp61
#PBS -q normal
#PBS -l walltime=0:10:00
# PBS -l mem=5GB
# PBS -l jobfs=1GB
#PBS -j oe
#PBS -l ncpus=2
#PBS -l wd
module load openmpi/4.0.2
mpiexec ./mpi-helloworld
sleep 120
#!/bin/bash
#PBS -N HelloWorld
#PBS -P vp61
#PBS -q normal
#PBS -l walltime=0:10:00
# PBS -l mem=5GB
# PBS -l jobfs=1GB
#PBS -j oe
#PBS -l ncpus=2
#PBS -l wd
module load openmpi/4.0.2
mpiexec ./mpi-helloworld
#!/bin/bash
#PBS -N HelloWorld
#PBS -P vp61
#PBS -q normal
#PBS -l walltime=0:10:00
# PBS -l mem=5GB
# PBS -l jobfs=1GB
#PBS -j oe
#PBS -l ncpus=2
#PBS -l wd
module load openmpi/4.0.2
mpiexec ./mpi-helloworld
#!/bin/bash
#PBS -q normal
#PBS -l walltime=00:30:00,ncpus=4,mem=8GB
#PBS -l jobfs=100GB
INPUT_DIR=${PBS_O_WORKDIR}
OUTPUT_DIR=/g/data/$projectid
cp -r ${INPUT_DIR} ${PBS_JOBFS}/mydata
cd ${PBS_JOBFS}/mydata
myprogramme
tar -cf ${PBS_JOBID}.tar .
cp ${PBS_JOBID}.tar $OUTPUT_DIR
#include <stdio.h>
#include "mpi.h"
int main( argc, argv )
int argc;
char **argv;
{
int rank, size;
MPI_Init( &argc, &argv );
MPI_Comm_size( MPI_COMM_WORLD, &size );
MPI_Comm_rank( MPI_COMM_WORLD, &rank );
printf( "Hello world from process %d of %d\n", rank, size );
MPI_Finalize();
return 0;
}
CXX=pgc++
CXXFLAGS=-fast -Minfo=all,intensity,ccff
LDFLAGS=${CXXFLAGS}
cg.x: main.o
${CXX} $^ -o $@ ${LDFLAGS}
main.o: main.cpp matrix.h matrix_functions.h vector.h vector_functions.h
.SUFFIXES: .o .cpp .h
.PHONY: clean
clean:
rm -Rf cg.x pgprof* *.o core
/*
* Copyright 2016 NVIDIA Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#include <cstdlib>
#include <cstdio>
#include <omp.h>
#include "vector.h"
#include "vector_functions.h"
#include "matrix.h"
#include "matrix_functions.h"
#define N 200
#define MAX_ITERS 100
#define TOL 1e-12
int main() {
vector x,b;
vector r,p,Ap;
matrix A;
double one=1.0, zero=0.0;
double normr, rtrans, oldtrans, p_ap_dot , alpha, beta;
int iter=0;
//create matrix
allocate_3d_poisson_matrix(A,N);
printf("Rows: %d, nnz: %d\n", A.num_rows, A.row_offsets[A.num_rows]);
allocate_vector(x,A.num_rows);
allocate_vector(Ap,A.num_rows);
allocate_vector(r,A.num_rows);
allocate_vector(p,A.num_rows);
allocate_vector(b,A.num_rows);
initialize_vector(x,100000);
initialize_vector(b,1);
waxpby(one, x, zero, x, p);
matvec(A,p,Ap);
waxpby(one, b, -one, Ap, r);
rtrans=dot(r,r);
normr=sqrt(rtrans);
double st = omp_get_wtime();
do {
if(iter==0) {
waxpby(one,r,zero,r,p);
} else {
oldtrans=rtrans;
rtrans = dot(r,r);
beta = rtrans/oldtrans;
waxpby(one,r,beta,p,p);
}
normr=sqrt(rtrans);
matvec(A,p,Ap);
p_ap_dot = dot(Ap,p);
alpha = rtrans/p_ap_dot;
waxpby(one,x,alpha,p,x);
waxpby(one,r,-alpha,Ap,r);
if(iter%10==0)
printf("Iteration: %d, Tolerance: %.4e\n", iter, normr);
iter++;
} while(iter<MAX_ITERS && normr>TOL);
double et = omp_get_wtime();
printf("Total Iterations: %d Total Time: %lfs\n", iter, (et-st));
free_vector(x);
free_vector(r);
free_vector(p);
free_vector(Ap);
free_matrix(A);
return 0;
}
/*
* Copyright 2016 NVIDIA Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include<cstdlib>
struct matrix {
unsigned int num_rows;
unsigned int nnz;
unsigned int *row_offsets;
unsigned int *cols;
double *coefs;
};
void allocate_3d_poisson_matrix(matrix &A, int N) {
int num_rows=(N+1)*(N+1)*(N+1);
int nnz=27*num_rows;
A.num_rows=num_rows;
A.row_offsets=(unsigned int*)malloc((num_rows+1)*sizeof(unsigned int));
A.cols=(unsigned int*)malloc(nnz*sizeof(unsigned int));
A.coefs=(double*)malloc(nnz*sizeof(double));
int offsets[27];
double coefs[27];
int zstride=N*N;
int ystride=N;
int i=0;
for(int z=-1;z<=1;z++) {
for(int y=-1;y<=1;y++) {
for(int x=-1;x<=1;x++) {
offsets[i]=zstride*z+ystride*y+x;
if(x==0 && y==0 && z==0)
coefs[i]=27;
else
coefs[i]=-1;
i++;
}
}
}
nnz=0;
for(int i=0;i<num_rows;i++) {
A.row_offsets[i]=nnz;
for(int j=0;j<27;j++) {
int n=i+offsets[j];
if(n>=0 && n<num_rows) {
A.cols[nnz]=n;
A.coefs[nnz]=coefs[j];
nnz++;
}
}
}
A.row_offsets[num_rows]=nnz;
A.nnz=nnz;
}
void free_matrix(matrix &A) {
unsigned int *row_offsets=A.row_offsets;
unsigned int * cols=A.cols;
double * coefs=A.coefs;
free(row_offsets);
free(cols);
free(coefs);
}
/*
* Copyright 2016 NVIDIA Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include "vector.h"
#include "matrix.h"
void matvec(const matrix& A, const vector& x, const vector &y) {
unsigned int num_rows=A.num_rows;
unsigned int *row_offsets=A.row_offsets;
unsigned int *cols=A.cols;
double *Acoefs=A.coefs;
double *xcoefs=x.coefs;
double *ycoefs=y.coefs;
for(int i=0;i<num_rows;i++) {
double sum=0;
int row_start=row_offsets[i];
int row_end=row_offsets[i+1];
for(int j=row_start;j<row_end;j++) {
unsigned int Acol=cols[j];
double Acoef=Acoefs[j];
double xcoef=xcoefs[Acol];
sum+=Acoef*xcoef;
}
ycoefs[i]=sum;
}
}
/*
* Copyright 2016 NVIDIA Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include<cmath>
struct vector {
unsigned int n;
double *coefs;
};
void allocate_vector(vector &v, unsigned int n) {
v.n=n;
v.coefs=(double*)malloc(n*sizeof(double));
}
void free_vector(vector &v) {
free(v.coefs);
}
void initialize_vector(vector &v,double val) {
for(int i=0;i<v.n;i++)
v.coefs[i]=val;
}
/*
* Copyright 2016 NVIDIA Corporation
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
#pragma once
#include<cstdlib>
#include "vector.h"
double dot(const vector& x, const vector& y) {
double sum=0;
unsigned int n=x.n;
double *xcoefs=x.coefs;
double *ycoefs=y.coefs;
for(int i=0;i<n;i++) {
sum+=xcoefs[i]*ycoefs[i];
}
return sum;
}
void waxpby(double alpha, const vector &x, double beta, const vector &y, const vector& w) {
unsigned int n=x.n;
double *xcoefs=x.coefs;
double *ycoefs=y.coefs;
double *wcoefs=w.coefs;
for(int i=0;i<n;i++) {
wcoefs[i]=alpha*xcoefs[i]+beta*ycoefs[i];
}
}
......@@ -15,13 +15,27 @@ main()
Examples exercises and solutions from Pawsey Supercomputing Centre.
1. Start an interactive job. Use a project ID that has gpgpu access.
`sinteractive --partition=gpgputest -A hpcadmingpgpu --gres=gpu:p100:4`
$ sinteractive --partition=gpgpu -A hpcadmin --gres=gpu:p100:4
$ cd ~ ; cp -r /usr/local/common/OpenACC .
$ source /usr/local/module/spartan_old.sh
$ module load PGI/19.10-GCC-8.3.0-2.32
2.Start with serial code
cd ~/OpenACC/Exercise/exe1
module load PGI/19.10-GCC-8.3.0-2.32
make
time ./heat_eq_serial
2. The Importance of Profiling
Excample here from ComputeCanada
$ cd ~/OpenACC/Profile
$ make
Check profile information.
3. Run serial code
Examples here from Pawsey Supercomputing Centre.
$ cd ~/OpenACC/Exercise/exe1
$ make
$ time ./heat_eq_serial
The output should be something like:
......@@ -32,7 +46,7 @@ real 0m5.062s
user 0m5.051s
sys 0m0.008s
3. Identify parallel blocks.
4. Identify parallel blocks.
PGI has inbuilt profiling tools. Nice!
......
......@@ -17,8 +17,8 @@ $ module load gcc/8.3.0
$ export OMP_NUM_THREADS=8
# Compile with OpenMP directives. These examples use free-form for Fortran e.g.,
$ gcc -fopenmp helloomp.c -o helloompc
$ gfortran -fopenmp helloomp.f90 -o helloompf
$ gcc -fopenmp helloomp1.c -o helloompc
$ gfortran -fopenmp helloomp1.f90 -o helloompf
# Execute the programs
......
# Don't do this on the head node.
# Many of these examples are from Lev Lafayette, Sequential and Parallel Programming with C and Fortran, VPAC, 2015-2016, ISBN 978-0-9943373-1-3, https://github.com/VPAC/seqpar
$ sinteractive --time=6:00:00 --ntasks=1 --cpus-per-task=8
# 2015 modules system ..
$ module purge
$ source /usr/local/module/spartan_old.sh
$ module load GCC/4.9.2
# .. or 2019 modules system
$ module purge
$ module load spartan_2019
$ module load gcc/8.3.0
# Export with the number of threads desired. Note that it is most efficient to have a number of cpus equal to the number of threads.
$ export OMP_NUM_THREADS=8
# Compile with OpenMP directives. These examples use free-form for Fortran e.g.,
$ gcc -fopenmp helloomp.c -o helloompc
$ gfortran -fopenmp helloomp.f90 -o helloompf
# Execute the programs
$ ./helloompc
$ ./helloompf
# Note that creating executables with different compilers requires a different compiler command OpenMP flag. For example:
$ module load intel/2017.u2
$ icc -qopenmp helloomp.c -o hellompc
$ ifort -qopenmp helloomp.f90 -o hellompf
$ ./helloompc
$ ./helloompf