Commit dfc317f1 authored by Research Platforms's avatar Research Platforms

Update for Sep 2 2019

parent 0d726dc8
Pipeline #2159 canceled with stages
......@@ -8,6 +8,7 @@ FSL/preCourse.tar.gz
FSL/fmri
Gaussian/g16
Gaussian/tests
Genomics
HPCshells/NAMD
NAMD/apoa1
NAMD/NAMD_BENCHMARKS_SPARTAN
......
File added
This diff is collapsed.
File added
Abaqus example modified from Lev Lafayette, "Supercomputing with Linux", Victorian Partnership for Advanced Computing, 2015
The Abaqus FEA suite is commonly used in automatic engineering problems using a common model data structure and integrated solver technology. As licensed software it requires a number of license tokens based on the number of cores required, which can be calculated by the simple formula int(5 x N^0.422), where N is the number of cores. Device Analytics offers an online calculator at http://deviceanalytics.com/abaqus-token-calculator .
The case study here is a car door being propelled into a pole. This is analogous to the EURONCAP pole test, in which a car is propelled sideways into a rigid pole of diameter 254 mm at 29 km/h. While a crash event generally lasts for around 100 milliseconds, the time step of the case study has been reduced to 10 milliseconds to reduce the job time.
`Door.cae Door.inp abaqus.slurm abaqus-mpi.slurm`
The cae file is "complete abaqus environment", the inp file is for input. The output files will be Door.odb and Door.jnl ("output database" and "journal")
Submit the job using the following command: `sbatch abaqus.slurm`
The status of the job can be queried using the following command: `tail -f door.sta`
Once the job has completed, all files, with the exception of the output database (.ODB) file can be deleted. By default, ABAQUS/CAE writes the results of the analysis to the ODB file. When one creates a step, ABAQUS/CAE generates a default output request for the step, which in the case of this analysis is Energy Output. Check the output files for the job to ensure it has run correctly.
Use the Field Output Requests Manager to request output of variables that should be written at relatively low frequencies to the output database from the entire model or from a large portion of the model. The History Output Requests Manager is used to request output of variables that should be written to the output database at a high frequency from a small portion of the model; for example, the displacement of a single node.
The results will be visualised using ABAQUS/CAE. It should be noted that ABAQUS/Viewer is a subset of ABAQUS/CAE that contains only the post-processing capabilities of the Visualization module. The procedure discussed in this tutorial also applies to ABAQUS/Viewer. Copy the files to your local machine and run the Abaqus CAE. Do not do this in Trifid itself if at all possible. One should have Abaqus on your desktop machine for ease of visualisation.
It is almost always better do conduct computational intensive tasks on the cluster, and visualisation locally.
From the local command: `abaqus cae`
The following procedure is used to open the ODB file;
* Select [Open Database] in the Session Start window.
* The Open Database dialog will appear. Select Output Database from the File Filter dropdown menu.
* Select Door.odb and click [OK].
By default, ABAQUS/CAE will plot the undeformed shape with exterior edges visible. For clarity (if the mesh density is high) it may be necessary to make feature edges visible. The following procedure is used:
* Select [Common Plot Options] in the Toolbox Area.
* In the Basic Tab, check Feature edges in the Visible Edges section.
* Select [OK]. The door assembly undeformed shape plot is shown in the following figure. Both exterior edges and feature edges are shown.
The following procedure can be used to plot the crash models deformed shape:
* Select [Plot Deformed Shape] in the Toolbox area. By default, the final step is displayed. It should be noted that the Deformation Scale Factor is 1 by default in explicit analyses.
* Select [Animate: Time History] to animate the crash event. The frame rate can be adjusted by clicking [Animation Options] and moving the slider in the Player tab to the desired speed.
#!/bin/bash
#SBATCH --ntasts=1
#SBATCH --time=0:05:00
#SBATCH --GRES=abaqus+5
module load ABAQUS/6.14.2-linux-x86_64
# Run the job 'Door'
abaqus job=Door
Description: ABINIT is a package whose main program allows one to find the total energy, charge density and electronic structure of systems made of electrons and nuclei (molecules and periodic solids) within Density Functional Theory (DFT), using pseudopotentials and a planewave or wavelet basis. - Homepage: http://www.abinit.org/
Sample job script based on: https://www.abinit.org/sites/default/files/last/tutorial/generated_files/lesson_base1.html
TUTORIAL IS UNDER DEVELOPMENT
#!/bin/bash
#SBATCH --partition=physical
#SBATCH --time=1:00:00
#SBATCH --ntasks=8
module load ABINIT/8.0.8b-intel-2016.u3
abinit < tbase1_x.files >& log
# H2 molecule in a big box
#
# In this input file, the location of the information on this or that line
# is not important : a keyword is located by the parser, and the related
# information should follow.
# The "#" symbol indicates the beginning of a comment : the remaining
# of the line will be skipped.
#Definition of the unit cell
acell 10 10 10 # The keyword "acell" refers to the
# lengths of the primitive vectors (in Bohr)
#rprim 1 0 0 0 1 0 0 0 1 # This line, defining orthogonal primitive vectors,
# is commented, because it is precisely the default value of rprim
#Definition of the atom types
ntypat 1 # There is only one type of atom
znucl 1 # The keyword "znucl" refers to the atomic number of the
# possible type(s) of atom. The pseudopotential(s)
# mentioned in the "files" file must correspond
# to the type(s) of atom. Here, the only type is Hydrogen.
#Definition of the atoms
natom 2 # There are two atoms
typat 1 1 # They both are of type 1, that is, Hydrogen
xcart # This keyword indicates that the location of the atoms
# will follow, one triplet of number for each atom
-0.7 0.0 0.0 # Triplet giving the cartesian coordinates of atom 1, in Bohr
0.7 0.0 0.0 # Triplet giving the cartesian coordinates of atom 2, in Bohr
#Definition of the planewave basis set
ecut 10.0 # Maximal plane-wave kinetic energy cut-off, in Hartree
#Definition of the k-point grid
kptopt 0 # Enter the k points manually
nkpt 1 # Only one k point is needed for isolated system,
# taken by default to be 0.0 0.0 0.0
#Definition of the SCF procedure
nstep 10 # Maximal number of SCF cycles
toldfe 1.0d-6 # Will stop when, twice in a row, the difference
# between two consecutive evaluations of total energy
# differ by less than toldfe (in Hartree)
# This value is way too large for most realistic studies of materials
diemac 2.0 # Although this is not mandatory, it is worth to
# precondition the SCF cycle. The model dielectric
# function used as the standard preconditioner
# is described in the "dielng" input variable section.
# Here, we follow the prescriptions for molecules
# in a big box
## After modifying the following section, one might need to regenerate the pickle database with runtests.py -r
#%%<BEGIN TEST_INFO>
#%% [setup]
#%% executable = abinit
#%% [files]
#%% files_to_test =
#%% tbase1_1.out, tolnlines= 0, tolabs= 0.000e+00, tolrel= 0.000e+00
#%% psp_files = 01h.pspgth
#%% [paral_info]
#%% max_nprocs = 1
#%% [extra_info]
#%% authors = Unknown
#%% keywords =
#%% description = H2 molecule in a big box
#%%<END TEST_INFO>
tbase1_1.in
tbase1_1.out
in_tbase1
out_tbase1
tmp_tbase1
# LL 20190805
Amber (originally Assisted Model Building with Energy Refinement) is software for performing molecular dynamics and structure prediction.
TUTORIAL NOT YET COMPLETE
#!/bin/bash
# Add your project account details here.
# SBATCH --account=XXXX
#SBATCH --partition=gpgpu
#SBATCH --ntasks=4
#SBATCH --time=1:00:00
module load Amber/16-gompi-2017b-CUDA-mpi
mpiexec /usr/local/easybuild/software/Amber/16-gompi-2017b-CUDA-mpi/amber16/bin/pmemd.cuda_DPFP.MPI -O -i mdin -o mdout -inf mdinfo -x mdcrd -r restrt
#include <stdio.h>
void init(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
a[i] = i;
}
}
__global__
void doubleElements(int *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int stride = gridDim.x * blockDim.x;
/*
* The previous code (now commented out) attempted
* to access an element outside the range of `a`.
*/
// for (int i = idx; i < N + stride; i += stride)
for (int i = idx; i < N; i += stride)
{
a[i] *= 2;
}
}
bool checkElementsAreDoubled(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
if (a[i] != i*2) return false;
}
return true;
}
int main()
{
int N = 10000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
init(a, N);
/*
* The previous code (now commented out) attempted to launch
* the kernel with more than the maximum number of threads per
* block, which is 1024.
*/
size_t threads_per_block = 1024;
/* size_t threads_per_block = 2048; */
size_t number_of_blocks = 32;
cudaError_t syncErr, asyncErr;
doubleElements<<<number_of_blocks, threads_per_block>>>(a, N);
/*
* Catch errors for both the kernel launch above and any
* errors that occur during the asynchronous `doubleElements`
* kernel execution.
*/
syncErr = cudaGetLastError();
asyncErr = cudaDeviceSynchronize();
/*
* Print errors should they exist.
*/
if (syncErr != cudaSuccess) printf("Error: %s\n", cudaGetErrorString(syncErr));
if (asyncErr != cudaSuccess) printf("Error: %s\n", cudaGetErrorString(asyncErr));
bool areDoubled = checkElementsAreDoubled(a, N);
printf("All elements were doubled? %s\n", areDoubled ? "TRUE" : "FALSE");
cudaFree(a);
}
#include <stdio.h>
void init(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
a[i] = i;
}
}
__global__
void doubleElements(int *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int stride = gridDim.x * blockDim.x;
for (int i = idx; i < N + stride; i += stride)
{
a[i] *= 2;
}
}
bool checkElementsAreDoubled(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
if (a[i] != i*2) return false;
}
return true;
}
int main()
{
int N = 10000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
init(a, N);
size_t threads_per_block = 2048;
size_t number_of_blocks = 32;
cudaError_t syncErr, asyncErr;
doubleElements<<<number_of_blocks, threads_per_block>>>(a, N);
/*
* Catch errors for both the kernel launch above and any
* errors that occur during the asynchronous `doubleElements`
* kernel execution.
*/
syncErr = cudaGetLastError();
asyncErr = cudaDeviceSynchronize();
/*
* Print errors should they exist.
*/
if (syncErr != cudaSuccess) printf("Error: %s\n", cudaGetErrorString(syncErr));
if (asyncErr != cudaSuccess) printf("Error: %s\n", cudaGetErrorString(asyncErr));
bool areDoubled = checkElementsAreDoubled(a, N);
printf("All elements were doubled? %s\n", areDoubled ? "TRUE" : "FALSE");
cudaFree(a);
}
#include <stdio.h>
void init(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
a[i] = i;
}
}
__global__
void doubleElements(int *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int stride = gridDim.x * blockDim.x;
for (int i = idx; i < N + stride; i += stride)
{
a[i] *= 2;
}
}
bool checkElementsAreDoubled(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
if (a[i] != i*2) return false;
}
return true;
}
int main()
{
/*
* Add error handling to this source code to learn what errors
* exist, and then correct them. Googling error messages may be
* of service if actions for resolving them are not clear to you.
*/
int N = 10000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
init(a, N);
size_t threads_per_block = 2048;
size_t number_of_blocks = 32;
doubleElements<<<number_of_blocks, threads_per_block>>>(a, N);
cudaDeviceSynchronize();
bool areDoubled = checkElementsAreDoubled(a, N);
printf("All elements were doubled? %s\n", areDoubled ? "TRUE" : "FALSE");
cudaFree(a);
}
\ No newline at end of file
#include <stdio.h>
/*
* Refactor firstParallel so that it can run on the GPU.
*/
__global__ void firstParallel()
{
printf("This should be running in parallel.\n");
}
int main()
{
/*
* Refactor this call to firstParallel to execute in parallel
* on the GPU.
*/
firstParallel<<<5,5>>>();
/*
* Some code is needed below so that the CPU will wait
* for the GPU kernels to complete before proceeding.
*/
cudaDeviceSynchronize();
}
#include <stdio.h>
void init(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
a[i] = i;
}
}
__global__
void doubleElements(int *a, int N)
{
int i;
i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < N)
{
a[i] *= 2;
}
}
bool checkElementsAreDoubled(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
if (a[i] != i*2) return false;
}
return true;
}
int main()
{
int N = 1000;
int *a;
size_t size = N * sizeof(int);
/*
* Use `cudaMallocManaged` to allocate pointer `a` available
* on both the host and the device.
*/
cudaMallocManaged(&a, size);
init(a, N);
size_t threads_per_block = 256;
size_t number_of_blocks = (N + threads_per_block - 1) / threads_per_block;
doubleElements<<<number_of_blocks, threads_per_block>>>(a, N);
cudaDeviceSynchronize();
bool areDoubled = checkElementsAreDoubled(a, N);
printf("All elements were doubled? %s\n", areDoubled ? "TRUE" : "FALSE");
/*
* Use `cudaFree` to free memory allocated
* with `cudaMallocManaged`.
*/
cudaFree(a);
}
#include <stdio.h>
/*
* Refactor firstParallel so that it can run on the GPU.
*/
__global__ void firstParallel()
{
printf("This should be running in parallel.\n");
}
int main()
{
/*
* Refactor this call to firstParallel to execute in parallel
* on the GPU.
*/
firstParallel<<<5,5>>>();
/*
* Some code is needed below so that the CPU will wait
* for the GPU kernels to complete before proceeding.
*/
cudaDeviceSynchronize();
}
#include <stdio.h>
/*
* Refactor firstParallel so that it can run on the GPU.
*/
void firstParallel()
{
printf("This should be running in parallel.\n");
}
int main()
{
/*
* Refactor this call to firstParallel to execute in parallel
* on the GPU.
*/
firstParallel();
/*
* Some code is needed below so that the CPU will wait
* for the GPU kernels to complete before proceeding.
*/
}
\ No newline at end of file
#include <stdio.h>
void helloCPU()
{
printf("Hello from the CPU.\n");
}
/*
* The addition of `__global__` signifies that this function
* should be launced on the GPU.
*/
__global__ void helloGPU()
{
printf("Hello from the GPU.\n");
}
int main()
{
helloCPU();
/*
* Add an execution configuration with the <<<...>>> syntax
* will launch this function as a kernel on the GPU.
*/
helloGPU<<<1, 1>>>();
/*
* `cudaDeviceSynchronize` will block the CPU stream until
* all GPU kernels have completed.
*/
cudaDeviceSynchronize();
}
#include <stdio.h>
void helloCPU()
{
printf("Hello from the CPU.\n");
}
/*
* Refactor the `helloGPU` definition to be a kernel
* that can be launched on the GPU. Update its message
* to read "Hello from the GPU!"
*/
void helloGPU()
{
printf("Hello also from the CPU.\n");
}
int main()
{
helloCPU();
/*
* Refactor this call to `helloGPU` so that it launches
* as a kernel on the GPU.
*/
helloGPU();
/*
* Add code below to synchronize on the completion of the
* `helloGPU` kernel completion before continuing the CPU
* thread.
*/
}
#include <stdio.h>
/*
* Notice the absence of the previously expected argument `N`.
*/
__global__ void loop()
{
/*
* This kernel does the work of only 1 iteration
* of the original for loop. Indication of which
* "iteration" is being executed by this kernel is
* still available via `threadIdx.x`.
*/
printf("This is iteration number %d\n", threadIdx.x);
}
int main()
{
/*
* It is the execution context that sets how many "iterations"
* of the "loop" will be done.
*/
loop<<<1, 10>>>();
cudaDeviceSynchronize();
}
#include <stdio.h>
/*
* Refactor `loop` to be a CUDA Kernel. The new kernel should
* only do the work of 1 iteration of the original loop.
*/
void loop(int N)
{
for (int i = 0; i < N; ++i)
{
printf("This is iteration number %d\n", i);
}
}
int main()
{
/*
* When refactoring `loop` to launch as a kernel, be sure
* to use the execution configuration to control how many
* "iterations" to perform.
*
* For this exercise, only use 1 block of threads.
*/
int N = 10;
loop(N);
}
\ No newline at end of file
#include <stdio.h>
__global__ void printSuccessForCorrectExecutionConfiguration()
{
if(threadIdx.x == 1023 && blockIdx.x == 255)
{
printf("Success!\n");
}
}
int main()
{
/*
* This is one possible execution context that will make
* the kernel launch print its success message.
*/
printSuccessForCorrectExecutionConfiguration<<<256, 1024>>>();
/*
* Don't forget kernel execution is asynchronous and you must
* sync on its completion.
*/
cudaDeviceSynchronize();
}
#include <stdio.h>
__global__ void initializeElementsTo(int initialValue, int *a, int N)
{
int i = threadIdx.x + blockIdx.x * blockDim.x;
if (i < N)
{
a[i] = initialValue;
}
}
int main()
{
/*
* Do not modify `N`.
*/
int N = 1000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
/*
* Assume we have reason to want the number of threads
* fixed at `256`: do not modify `threads_per_block`.
*/
size_t threads_per_block = 256;
/*
* The following is idiomatic CUDA to make sure there are at
* least as many threads in the grid as there are `N` elements.
*/
size_t number_of_blocks = (N + threads_per_block - 1) / threads_per_block;
int initialValue = 6;
initializeElementsTo<<<number_of_blocks, threads_per_block>>>(initialValue, a, N);
cudaDeviceSynchronize();
/*
* Check to make sure all values in `a`, were initialized.
*/
for (int i = 0; i < N; ++i)
{
if(a[i] != initialValue)
{
printf("FAILURE: target value: %d\t a[%d]: %d\n", initialValue, i, a[i]);
exit(1);
}
}
printf("SUCCESS!\n");
cudaFree(a);
}
#include <stdio.h>
/*
* Currently, `initializeElementsTo`, if executed in a thread whose
* `i` is calculated to be greater than `N`, will try to access a value
* outside the range of `a`.
*
* Refactor the kernel defintition to prevent our of range accesses.
*/
__global__ void initializeElementsTo(int initialValue, int *a, int N)
{
int i = threadIdx.x + blockIdx.x * blockDim.x;
a[i] = initialValue;
}
int main()
{
/*
* Do not modify `N`.
*/
int N = 1000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
/*
* Assume we have reason to want the number of threads
* fixed at `256`: do not modify `threads_per_block`.
*/
size_t threads_per_block = 256;
/*
* Assign a value to `number_of_blocks` that will
* allow for a working execution configuration given
* the fixed values for `N` and `threads_per_block`.
*/
size_t number_of_blocks;
int initialValue = 6;
initializeElementsTo<<<number_of_blocks, threads_per_block>>>(initialValue, a, N);
cudaDeviceSynchronize();
/*
* Check to make sure all values in `a`, were initialized.
*/
for (int i = 0; i < N; ++i)
{
if(a[i] != initialValue)
{
printf("FAILURE: target value: %d\t a[%d]: %d\n", initialValue, i, a[i]);
exit(1);
}
}
printf("SUCCESS!\n");
cudaFree(a);
}
\ No newline at end of file
#include <stdio.h>
__global__ void loop()
{
/*
* This idiomatic expression gives each thread
* a unique index within the entire grid.
*/
int i = blockIdx.x * blockDim.x + threadIdx.x;
printf("%d\n", i);
}
int main()
{
/*
* Additional execution configurations that would
* work and meet the exercises contraints are:
*
* <<<5, 2>>>
* <<<10, 1>>>
*/
loop<<<2, 5>>>();
cudaDeviceSynchronize();
}
#include <stdio.h>
/*
* Refactor `loop` to be a CUDA Kernel. The new kernel should
* only do the work of 1 iteration of the original loop.
*/
void loop(int N)
{
for (int i = 0; i < N; ++i)
{
printf("This is iteration number %d\n", i);
}
}
int main()
{
/*
* When refactoring `loop` to launch as a kernel, be sure
* to use the execution configuration to control how many
* "iterations" to perform.
*
* For this exercise, be sure to use more than 1 block in
* the execution configuration.
*/
int N = 10;
loop(N);
}
\ No newline at end of file
#include <stdio.h>
void init(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
a[i] = i;
}
}
__global__
void doubleElements(int *a, int N)
{
/*
* Use a grid-stride loop so each thread does work
* on more than one element in the array.
*/
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int stride = gridDim.x * blockDim.x;
for (int i = idx; i < N; i += stride)
{
a[i] *= 2;
}
}
bool checkElementsAreDoubled(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
if (a[i] != i*2) return false;
}
return true;
}
int main()
{
int N = 10000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
init(a, N);
size_t threads_per_block = 256;
size_t number_of_blocks = 32;
doubleElements<<<number_of_blocks, threads_per_block>>>(a, N);
cudaDeviceSynchronize();
bool areDoubled = checkElementsAreDoubled(a, N);
printf("All elements were doubled? %s\n", areDoubled ? "TRUE" : "FALSE");
cudaFree(a);
}
#include <stdio.h>
void init(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
a[i] = i;
}
}
__global__
void doubleElements(int *a, int N)
{
/*
* Use a grid-stride loop so each thread does work
* on more than one element in the array.
*/
int idx = blockIdx.x * blockDim.x + threadIdx.x;
int stride = gridDim.x * blockDim.x;
for (int i = idx; i < N; i += stride)
{
a[i] *= 2;
}
}
bool checkElementsAreDoubled(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
if (a[i] != i*2) return false;
}
return true;
}
int main()
{
int N = 10000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
init(a, N);
size_t threads_per_block = 256;
size_t number_of_blocks = 32;
doubleElements<<<number_of_blocks, threads_per_block>>>(a, N);
cudaDeviceSynchronize();
bool areDoubled = checkElementsAreDoubled(a, N);
printf("All elements were doubled? %s\n", areDoubled ? "TRUE" : "FALSE");
cudaFree(a);
}
\ No newline at end of file
#include <stdio.h>
void init(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
a[i] = i;
}
}
/*
* In the current application, `N` is larger than the grid.
* Refactor this kernel to use a grid-stride loop in order that
* each parallel thread work on more than one element of the array.
*/
__global__
void doubleElements(int *a, int N)
{
int i;
i = blockIdx.x * blockDim.x + threadIdx.x;
if (i < N)
{
a[i] *= 2;
}
}
bool checkElementsAreDoubled(int *a, int N)
{
int i;
for (i = 0; i < N; ++i)
{
if (a[i] != i*2) return false;
}
return true;
}
int main()
{
/*
* `N` is greater than the size of the grid (see below).
*/
int N = 10000;
int *a;
size_t size = N * sizeof(int);
cudaMallocManaged(&a, size);
init(a, N);
/*
* The size of this grid is 256*32 = 8192.
*/
size_t threads_per_block = 256;
size_t number_of_blocks = 32;
doubleElements<<<number_of_blocks, threads_per_block>>>(a, N);
cudaDeviceSynchronize();
bool areDoubled = checkElementsAreDoubled(a, N);
printf("All elements were doubled? %s\n", areDoubled ? "TRUE" : "FALSE");
cudaFree(a);
}
\ No newline at end of file
# Structure of CUDA Code
As with all parallel programming, start with serial code, engage in decomposition, then generate parallel code.
General CPU/GPU code with CUDA will look like:
void CPUFunction()
{
printf("This function is defined to run on the CPU.\n");
}
__global__ void GPUFunction()
{
printf("This function is defined to run on the GPU.\n");
}
int main()
{
CPUFunction();
GPUFunction<<<1, 1>>>();
cudaDeviceSynchronize();
}
The __global__ keyword indicates that the following function will run on the GPU, and can be invoked globally, which in this context means either by
the CPU, or, by the GPU.
Often, code executed on the CPU is referred to as host code, and code running on the GPU is referred to as device code.
# Compiling a Sample GPU Job
To run a sample CUDA job start with interactive job.
sinteractive --partition=gpgputest -A hpcadmingpgpu --gres=gpu:p100:4
Load a CUDA module
`module load CUDA/8.0.44-GCC-4.9.2`
To compile 01-hello-gpu-solution.cu, run:
`nvcc 01-hello-gpu-solution.cu -o helloCUDA -gencode arch=compute_60,code=sm_60`
Execute the generated helloCUDA running:
`./helloCUDA`
Or, as an alternative, compile with `-run` at the end of the compilation line which will run the compiled binary right away.
# Examples
All examples with a numerical prefix, 01-, 02- etc are from NVidia.
# Debug with printf
Calling printf from a CUDA kernel function is no different than calling printf on CPU code. In the vector addition example, edit vec_add.cu and insert the following code after line 18:
if(threadIdx.x == 10)
printf("c[%d] = %dn", id, c[id]);
# Supported Gencode variations for sm and compute
Below are the supported sm variations and sample cards from that generation
Supported on CUDA 7 and later
Fermi (CUDA 3.2 until CUDA 8) (deprecated from CUDA 9):
SM20 or SM_20, compute_30 – Older cards such as GeForce 400, 500, 600, GT-630
Kepler (CUDA 5 and later):
SM30 or SM_30, compute_30 – Kepler architecture (generic – Tesla K40/K80, GeForce 700, GT-730)
Adds support for unified memory programming
SM35 or SM_35, compute_35 – More specific Tesla K40
Adds support for dynamic parallelism. Shows no real benefit over SM30 in my experience.
SM37 or SM_37, compute_37 – More specific Tesla K80
Adds a few more registers. Shows no real benefit over SM30 in my experience
Maxwell (CUDA 6 and later):
SM50 or SM_50, compute_50 – Tesla/Quadro M series
SM52 or SM_52, compute_52 – Quadro M6000 , GeForce 900, GTX-970, GTX-980, GTX Titan X
SM53 or SM_53, compute_53 – Tegra (Jetson) TX1 / Tegra X1
Pascal (CUDA 8 and later)
SM60 or SM_60, compute_60 – GP100/Tesla P100 – DGX-1 (Generic Pascal)
SM61 or SM_61, compute_61 – GTX 1080, GTX 1070, GTX 1060, GTX 1050, GTX 1030, Titan Xp, Tesla P40, Tesla P4, Discrete GPU on the NVIDIA Drive PX2
SM62 or SM_62, compute_62 – Integrated GPU on the NVIDIA Drive PX2, Tegra (Jetson) TX2
Volta (CUDA 9 and later)
SM70 or SM_70, compute_70 – Tesla V100, GTX 1180 (GV104)
SM71 or SM_71, compute_71 – probably not implemented
SM72 or SM_72, compute_72 – currently unknown
Turing (CUDA 10 and later)
SM80 or SM_80, compute_80 – RTX 2080, Titan RTX, Quadro R8000
(c.f., http://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/)
# CUDA Error Handling
Most CUDA functions return a value of type cudaError_t, which can be used to check for errors when calling a function.
e.g.,
```
cudaError_t err;
err = cudaMallocManaged(&a, N)
// Assume the existence of `a` and `N`.
if (err != cudaSuccess)
// `cudaSuccess` is provided by CUDA.
{
printf("Error: %s\n", cudaGetErrorString(err)); // `cudaGetErrorString` is provided by CUDA.
}
```
File added
File added
#include <stdio.h>
#include <assert.h>
inline cudaError_t checkCuda(cudaError_t result)
{
if (result != cudaSuccess) {
fprintf(stderr, "CUDA Runtime Error: %s\n", cudaGetErrorString(result));
assert(result == cudaSuccess);
}
return result;
}
int main()
{
/*
* The macro can be wrapped around any function returning
* a value of type `cudaError_t`.
*/
checkCuda( cudaDeviceSynchronize() )
}
File added
File added
File added
#include <stdio.h>
#include <omp.h>
void vadd2(int n, float * a, float * b, float * c)
{
#pragma omp target map(to:n,a[0:n],b[0:n]) map(from:c[0:n])
#pragma omp teams distribute parallel for simd
for(int i = 0; i < n; i++)
c[i] = a[i] + b[i];
}
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
// CUDA kernel. Each thread takes care of one element of c
__global__ void vecAdd(double *a, double *b, double *c, int n)
{
// Get our global thread ID
int id = blockIdx.x*blockDim.x+threadIdx.x;
// Make sure we do not go out of bounds
if (id < n)
c[id] = a[id] + b[id];
}
int main( int argc, char* argv[] )
{
// Size of vectors
int n = 100000;
// Host input vectors
double *h_a;
double *h_b;
//Host output vector
double *h_c;
// Device input vectors
double *d_a;
double *d_b;
//Device output vector
double *d_c;
// Size, in bytes, of each vector
size_t bytes = n*sizeof(double);
// Allocate memory for each vector on host
h_a = (double*)malloc(bytes);
h_b = (double*)malloc(bytes);
h_c = (double*)malloc(bytes);
// Allocate memory for each vector on GPU
cudaMalloc(&d_a, bytes);
cudaMalloc(&d_b, bytes);
cudaMalloc(&d_c, bytes);
int i;
// Initialize vectors on host
for( i = 0; i < n; i++ ) {
h_a[i] = sin(i)*sin(i);
h_b[i] = cos(i)*cos(i);
}
// Copy host vectors to device
cudaMemcpy( d_a, h_a, bytes, cudaMemcpyHostToDevice);
cudaMemcpy( d_b, h_b, bytes, cudaMemcpyHostToDevice);
int blockSize, gridSize;
// Number of threads in each thread block
blockSize = 1024;
// Number of thread blocks in grid
gridSize = (int)ceil((float)n/blockSize);
// Execute the kernel
vecAdd<<<gridSize, blockSize>>>(d_a, d_b, d_c, n);
// Copy array back to host
cudaMemcpy( h_c, d_c, bytes, cudaMemcpyDeviceToHost );
// Sum up vector c and print result divided by n, this should equal 1 within error
double sum = 0;
for(i=0; i<n; i++)
sum += h_c[i];
printf("final result: %f\n", sum/(double)n);
// Release device memory
cudaFree(d_a);
cudaFree(d_b);
cudaFree(d_c);
// Release host memory
free(h_a);
free(h_b);
free(h_c);
return 0;
}
/* File: vec_add.cu
* Purpose: Implement vector addition on a gpu using cuda
*
* Compile: nvcc [-g] [-G] -o vec_add vec_add.cu
* Run: ./vec_add
*/
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>
#include <math.h>
__global__ void Vec_add(float x[], float y[], float z[], int n) {
int thread_id = threadIdx.x;
if (thread_id < n){
z[thread_id] = x[thread_id] + y[thread_id];
}
}
int main(int argc, char* argv[]) {
int n, m;
float *h_x, *h_y, *h_z;
float *d_x, *d_y, *d_z;
size_t size;
/* Define vector length */
n = 1000;
m = 20;
size = n*sizeof(float);
// Allocate memory for the vectors on host memory.
h_x = (float*) malloc(size);
h_y = (float*) malloc(size);
h_z = (float*) malloc(size);
for (int i = 0; i < n; i++) {
h_x[i] = i+1;
h_y[i] = n-i;
}
// Print original vectors.
printf("h_x = ");
for (int i = 0; i < m; i++){
printf("%.1f ", h_x[i]);
}
printf("\n\n");
printf("h_y = ");
for (int i = 0; i < m; i++){
printf("%.1f ", h_y[i]);
}
printf("\n\n");
/* Allocate vectors in device memory */
cudaMalloc(&d_x, size);
cudaMalloc(&d_y, size);
cudaMalloc(&d_z, size);
/* Copy vectors from host memory to device memory */
cudaMemcpy(d_x, h_x, size, cudaMemcpyHostToDevice);
cudaMemcpy(d_y, h_y, size, cudaMemcpyHostToDevice);
/* Kernel Call */
Vec_add<<<1,1000>>>(d_x, d_y, d_z, n);
cudaThreadSynchronize();
cudaMemcpy(h_z, d_z, size, cudaMemcpyDeviceToHost);
printf("The sum is: \n");
for (int i = 0; i < m; i++){
printf("%.1f ", h_z[i]);
}
printf("\n");
/* Free device memory */
cudaFree(d_x);
cudaFree(d_y);
cudaFree(d_z);
/* Free host memory */
free(h_x);
free(h_y);
free(h_z);
return 0;
} /* main */
#include <iostream>
#include <stdio.h>
#define checkCudaError(status) { \
if(status != cudaSuccess) { \
std::cout << "CUDA Error " << __FILE__ << ", " << __LINE__ \
<< ": " << cudaGetErrorString(status) << "\n"; \
exit(-1); \
} \
}
__global__ void vecAdd(int * a, int * b, int * c, int size) {
//ADD CODE HERE
int i = threadIdx.x;
int j = blockIdx.x*blockDim.x;
printf("I am in: %d, %d\n", i , j);
c[i + j] = a[i + j] + b[i + j];
}
int main() {
//checkCudaError(cudaSetDevice(1));
int device;
checkCudaError(cudaGetDevice(&device));
cudaDeviceProp prop;
checkCudaError(cudaGetDeviceProperties(&prop, device));
std::cout << "Device " << device << ": " << prop.name << "\n";
std::cout << "GPU Cores: " << prop.multiProcessorCount << "\n";
std::cout << "Compute Capability: " << prop.major << "." << prop.minor << "\n";
const int GRID_SIZE = 16;
const int CTA_SIZE = 128;
const int size = GRID_SIZE * CTA_SIZE;
int * a, * b, * c;
int * dev_a, * dev_b, * dev_c;
a = (int *) malloc (sizeof(int) * size);
b = (int *) malloc (sizeof(int) * size);
c = (int *) malloc (sizeof(int) * size);
if(!a || !b || !c) {
std::cout << "Error: out of memory\n";
exit(-1);
}
for(int i = 0; i < size; i++) {
a[i] = i;
b[i] = i+1;
}
memset(c, 0, sizeof(int) * size);
checkCudaError(cudaMalloc(&dev_a, sizeof(int) * size));
checkCudaError(cudaMalloc(&dev_b, sizeof(int) * size));
checkCudaError(cudaMalloc(&dev_c, sizeof(int) * size));
checkCudaError(cudaMemcpy(dev_a, a, sizeof(int) * size, cudaMemcpyHostToDevice));
checkCudaError(cudaMemcpy(dev_b, b, sizeof(int) * size, cudaMemcpyHostToDevice));
checkCudaError(cudaMemset(dev_c, 0, sizeof(int) * size));
vecAdd<<<GRID_SIZE, CTA_SIZE>>>(dev_a, dev_b, dev_c, size);
checkCudaError(cudaDeviceSynchronize());
checkCudaError(cudaMemcpy(c, dev_c, sizeof(int) * size, cudaMemcpyDeviceToHost));
for(int i = 0; i < size; i++) {
// std::cout << i << ": " << c[i] << "\n";
if(c[i] != i*2+1) {
std::cout << "Error: c[" << i << "] != " <<
i*2+1 << "\n";
exit(-1);
}
}
std::cout << "Pass\n";
}
\ No newline at end of file
# This is a directory for memory and program debugging and profiling.
# Launch an interactive job for these examples!
sinteractive --partition=physical --ntasks=2 --time=1:00:00
# Valgrind
# The test program valgrindtest.c is form Punit Guron. In this example the memory allocated to the pointer 'ptr' is never freed in the program.
# Load the module and compile with debugging symbols.
module load Valgrind/3.13.0-goolf-2015a
gcc -Wall -g valgrindtest.c -o valgrindtest
valgrind --leak-check=full ./valgrindtest 2> valgrind.out
# GDB
# Compile with debugging symbols. A good compiler will give a warning here, and run the program.
gcc -Wall -g gdbtest.c -o gdbtest
$ ./gdbtest
Enter the number: 3
The factorial of 3 is 0
# Load the GDB module e.g.,
module load GDB/7.8.2-goolf-2015a
# Launch GDB, set up a break point in the code, and execute
gbd gdbtest
..
(gdb) break 10
(gdb) run
(gdb) print j
# Basic commands in GDB
# run = run a program until end, signit, or breakpoint. Use Ctrl-C to stop
# break = set a breakpoint, either by linenumber, function etc. (shortcut b)
# list = list the code above and below where the program stopped (shortcut l)
# continue = restart execution of program where is stopped (shortcut c).
# print = print a variable (shortcut p)
# next, step = after using a signal or breakpoint use next and step to
# continue a progame line-by-line.
# NB: next will go 'over' the function call to the next line of code,
# step will go 'into' the function call (shortcut s)
#
# Variables can be temporarily modified with the `set` command
# e.g., set j=1
# The code will hit the breakpoint where you can interrogate the variables.
# Testing the variable 'j' will show it has not been initialised.
# Create a new file, initialise j to 1, and test again.
cp gdbtest.c gdbtest2.c
gcc -Wall -g gdbtest2.c -o gdbtest2
$ ./gdbtest
# There is still another bug! Can you find it? Use GDB to help.
# Once you have fixed the second bug, use diff and patch to fix the original.
# The -u option provides unified content for both files.
diff -u gdbtest.c gdbtest2.c > gdbpatch.patch
# The patch command will overwrite the source with the modifications
# specified in the destination. Test the original again!
patch gdbtest.c gdbpatch.patch
# For Gprof, instrumentation code is inserted with the `-pg` option when
# compiled.
#
# GPROF output consists of two parts; the flat profile and the call graph.
# The flat profile gives the total execution time spent in each function.
# The textual call graph, shows for each function;
# (a) who called it (parent) and (b) who it called (child subroutines).
#
# Sample progam from Himanshu Arora, published on The Geek Stuff
# Compile, run the executable.
# Run the gprof tool. Various output options are available.
gcc -Wall -pg test_gprof.c test_gprof_new.c -o test_gprof
./test_gprof
gprof test_gprof gmon.out > analysis.txt
# For parallel applications each parallel process can be given its own
# output file, using the undocumented environment variable GMON_OUT_PREFIX
# Then run the parallel application as normal.
# Each grof will create a binary for each profile ID.
# View the gmon.out's as one
export GMON_OUT_PREFIX=gmon.out
mpicc -Wall -pg mpi-debug.c -o mpi-debug
srun -n2 mpi-debug
gprof mpi-debug gmon.out.*
# Last update 20190416 LL
# include <stdio.h>
int main()
{
int i, num, j;
printf ("Enter the number: ");
scanf ("%d", &num );
for (i=1; i<num; i++)
j=j*i;
printf("The factorial of %d is %d\n",num,j);
}
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
int main(argc,argv)
int argc;
char *argv[];
{
int myid, numprocs;
int tag, source, destination, count;
int buffer;
MPI_Status status;
MPI_Init(&argc,&argv);
MPI_Comm_size(MPI_COMM_WORLD,&numprocs);
MPI_Comm_rank(MPI_COMM_WORLD,&myid);
tag=1;
source=0;
destination=1;
count=1;
if(myid == source){
printf( "I am the root 0 process of the group (total %d).\n", numprocs );
buffer=1729;
MPI_Send(&buffer,count,MPI_INT,destination,tag,MPI_COMM_WORLD);
printf("processor %d sent %d\n",myid,buffer);
}
if(myid == destination){
printf( "I am a subsidiary process %d of the group (total %d).\n", myid, numprocs );
MPI_Recv(&buffer,count,MPI_INT,source,tag,MPI_COMM_WORLD,&status);
printf("processor %d received %d\n",myid,buffer);
}
MPI_Finalize();
}
program sendrecv
include "mpif.h"
integer myid, ierr,numprocs
integer tag,source,destination,count
integer buffer
integer status(MPI_STATUS_SIZE)
call MPI_INIT( ierr )
call MPI_COMM_RANK( MPI_COMM_WORLD, myid, ierr )
call MPI_COMM_SIZE( MPI_COMM_WORLD, numprocs, ierr )
tag=1
source=0
destination=1
count=1
if(myid .eq. source)then
print*, 'I am the root 0 process of the group (total', &
& numprocs, ').'
buffer=1729
Call MPI_Send(buffer, count, MPI_INTEGER,destination,&
tag, MPI_COMM_WORLD, ierr)
write(*,*)"processor ",myid," sent ",buffer
endif
if(myid .eq. destination)then
print*, 'I am a subsidiary process', rank, &
& ' of the group (total ', numProcs, ').'
Call MPI_Recv(buffer, count, MPI_INTEGER,source,&
tag, MPI_COMM_WORLD, status,ierr)
write(*,*)"processor ",myid," received ",buffer
endif
call MPI_FINALIZE(ierr)
stop
end
//test_gprof.c
#include<stdio.h>
void new_func1(void);
void func1(void)
{
printf("\n Inside func1 \n");
int i = 0;
for(;i<0xffffffff;i++);
new_func1();
return;
}
static void func2(void)
{
printf("\n Inside func2 \n");
int i = 0;
for(;i<0xffffffaa;i++);
return;
}
int main(void)
{
printf("\n Inside main()\n");
int i = 0;
for(;i<0xffffff;i++);
func1();
func2();
return 0;
}
//test_gprof_new.c
#include<stdio.h>
void new_func1(void)
{
printf("\n Inside new_func1()\n");
int i = 0;
for(;i<0xffffffee;i++);
return;
}
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
int main(void)
{
char *ptr = (char*)malloc(10);
memset(ptr, 0, 10);
strncpy(ptr, "Linux", strlen("Linux"));
printf("\n ptr = [%s]\n", ptr);
ptr[0] = 'a';
printf("\n ptr = [%s]\n", ptr);
return 0;
}
#!/bin/bash
#SBATCH -p cloud
#SBATCH --ntasks=8
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
#SBATCH -t 0:00:05
module load FSL/5.0.9-centos6_64
......@@ -8,5 +9,4 @@ module load FSL/5.0.9-centos6_64
# FSL needs to be sourced
source $FSLDIR/etc/fslconf/fsl.sh
srun -n 8 bet /usr/local/common/FSL/intro/structural.nii.gz test2FSL -f 0.1
time bet /usr/local/common/FSL/intro/structural.nii.gz test1FSL -f 0.1
We have a FreePascal compiler on Spartan!
`module load fpc/3.0.4`
However, you will also need a fpc.cfg and a fp.cfg, for command-line and gui IDE. This includes PATHs to the various units etc.
These are all included in this directory, along with a simple "Hello World" program.
Compile with
`fpc hello.pas`
# Automatically created file, don't edit.
#IFDEF NORMAL
-TLinux
-Mfpc
-Sg
-Si
-Os
-CpATHLON64
-OpCOREI
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/x86_64-linux/*
-g-
-p-
-b-
#ENDIF
#IFDEF DEBUG
-TLinux
-Mfpc
-Sg
-Si
-Sa
-Cr
-Ci
-Co
-CR
-Os
-CpATHLON64
-OpCOREI
-g
-p-
-b-
#ENDIF
#IFDEF RELEASE
-TLinux
-Mfpc
-Sg
-Si
-CX
-Os
-CpATHLON64
-OpCOREI
-XS
-g-
-p-
-b-
#ENDIF
[Compile]
CompileMode=NORMAL
[Help]
Files=""
[Editor]
DefaultTabSize=8
DefaultIndentSize=1
DefaultFlags=20599
DefaultSaveExt=.pas
[Highlight]
Exts="*.pas;*.pp;*.inc;*.dpr;*.lpr"
NeedsTabs="make*;make*.*;fpcmake.loc"
[SourcePath]
SourceList=""
[Mouse]
DoubleDelay=8
ReverseButtons=0
AltClickAction=6
CtrlClickAction=1
[Keyboard]
EditKeys=microsoft
[Search]
FindFlags=4
[Preferences]
DesktopFileFlags=209
CenterCurrentLineWhileDebugging=1
AutoSaveFlags=6
MiscOptions=6
DesktopLocation=1
[Misc]
ShowReadme=0
[Files]
OpenExts="*.pas;*.pp;*.inc;*.dpr;*.lpr"
PrinterDevice=prn
#
# Config file generated by fpcmkcfg on 7-8-19 - 17:17:03
# Example fpc.cfg for Free Pascal Compiler
#
# ----------------------
# Defines (preprocessor)
# ----------------------
#
# nested #IFNDEF, #IFDEF, #ENDIF, #ELSE, #DEFINE, #UNDEF are allowed
#
# -d is the same as #DEFINE
# -u is the same as #UNDEF
#
#
# Some examples (for switches see below, and the -? helppages)
#
# Try compiling with the -dRELEASE or -dDEBUG on the commandline
#
# For a release compile with optimizes and strip debuginfo
#IFDEF RELEASE
-O2
-Xs
#WRITE Compiling Release Version
#ENDIF
# For a debug version compile with debuginfo and all codegeneration checks on
#IFDEF DEBUG
-gl
-Crtoi
#WRITE Compiling Debug Version
#ENDIF
# assembling
#ifdef darwin
# use pipes instead of temporary files for assembling
-ap
# path to Xcode 4.3+ utilities (no problem if it doesn't exist)
-FD/Applications/Xcode.app/Contents/Developer/usr/bin
#endif
# ----------------
# Parsing switches
# ----------------
# Pascal language mode
# -Mfpc free pascal dialect (default)
# -Mobjfpc switch some Delphi 2 extensions on
# -Mdelphi tries to be Delphi compatible
# -Mtp tries to be TP/BP 7.0 compatible
# -Mgpc tries to be gpc compatible
# -Mmacpas tries to be compatible to the macintosh pascal dialects
#
# Turn on Object Pascal extensions by default
#-Mobjfpc
# Assembler reader mode
# -Rdefault use default assembler
# -Ratt read AT&T style assembler
# -Rintel read Intel style assembler
#
# All assembler blocks are AT&T styled by default
#-Ratt
# Semantic checking
# -S2 same as -Mobjfpc
# -Sc supports operators like C (*=,+=,/= and -=)
# -Sa include assertion code.
# -Sd same as -Mdelphi
# -Se<x> error options. <x> is a combination of the following:
# <n> : compiler stops after <n> errors (default is 1)
# w : compiler stops also after warnings
# n : compiler stops also after notes
# h : compiler stops also after hints
# -Sg allow LABEL and GOTO
# -Sh Use ansistrings
# -Si support C++ styled INLINE
# -Sk load fpcylix unit
# -SI<x> set interface style to <x>
# -SIcom COM compatible interface (default)
# -SIcorba CORBA compatible interface
# -Sm support macros like C (global)
# -So same as -Mtp
# -Sp same as -Mgpc
# -Ss constructor name must be init (destructor must be done)
# -Sx enable exception keywords (default in Delphi/ObjFPC modes)
#
# Allow goto, inline, C-operators, C-vars
-Sgic
# ---------------
# Code generation
# ---------------
# Uncomment the next line if you always want static/dynamic units by default
# (can be overruled with -CD, -CS at the commandline)
#-CS
#-CD
# Set the default heapsize to 8Mb
#-Ch8000000
# Set default codegeneration checks (iocheck, overflow, range, stack)
#-Ci
#-Co
#-Cr
#-Ct
# Optimizer switches
# -Os generate smaller code
# -Oa=N set alignment to N
# -O1 level 1 optimizations (quick optimizations, debuggable)
# -O2 level 2 optimizations (-O1 + optimizations which make debugging more difficult)
# -O3 level 3 optimizations (-O2 + optimizations which also may make the program slower rather than faster)
# -Oo<x> switch on optimalization x. See fpc -i for possible values
# -OoNO<x> switch off optimalization x. See fpc -i for possible values
# -Op<x> set target cpu for optimizing, see fpc -i for possible values
#ifdef darwin
#ifdef cpui386
-Cppentiumm
-Oppentiumm
#endif
#endif
# -----------------------
# Set Filenames and Paths
# -----------------------
# Both slashes and backslashes are allowed in paths
# path to the messagefile, not necessary anymore but can be used to override
# the default language
#-Fr/msg/errore.msg
#-Fr/msg/errorn.msg
#-Fr/msg/errores.msg
#-Fr/msg/errord.msg
#-Fr/msg/errorr.msg
# search path for unicode binary files (FPC 2.x does not know this switch)
#ifndef VER2
-FM/unicode/
#endif
# searchpath for units and other system dependent things
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/*
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/rtl
#ifdef cpui8086
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/$fpcsubarch-$fpcmemorymodel
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/$fpcsubarch-$fpcmemorymodel/*
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/$fpcsubarch-$fpcmemorymodel/rtl
#endif
#IFDEF FPCAPACHE_1_3
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/httpd13/
#ELSE
#IFDEF FPCAPACHE_2_0
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/httpd20
#ELSE
-Fu/usr/local/easybuild/software/fpc/3.0.4/lib/fpc/3.0.4/units/$fpctarget/httpd22
#ENDIF
#ENDIF
# searchpath for fppkg user-specific packages
-Fu~/.fppkg/lib/fpc/$fpcversion/units/$FPCTARGET/*
# path to the gcclib
#ifdef cpui386
-Fl/usr/lib/gcc/x86_64-redhat-linux/4.8.5/32
#endif
#ifdef cpux86_64
-Fl/usr/lib/gcc/x86_64-redhat-linux/4.8.5
#endif
# searchpath for libraries
#-Fl/lib
#-Fl/lib;/usr/lib
-Fl/usr/local/easybuild/software/fpc/3.0.4/lib/$FPCTARGET
# searchpath for tools
-FD/usr/local/easybuild/software/fpc/3.0.4/bin/$FPCTARGET
#IFNDEF CPUI386
#IFNDEF CPUAMD64
#DEFINE NEEDCROSSBINUTILS
#ENDIF
#ENDIF
#IFNDEF Linux
#DEFINE NEEDCROSSBINUTILS
#ENDIF
# never need cross-prefix when targeting the JVM
# (no native compiler, always cross-compiling)
#ifdef cpujvm
#undef NEEDCROSSBINUTILS
#endif
# for android cross-prefix is set by compiler
#ifdef android
#undef NEEDCROSSBINUTILS
#endif
# never need cross-prefix when targeting the i8086
# (no native compiler, always cross-compiling)
#ifdef cpui8086
#undef NEEDCROSSBINUTILS
#endif
# never need cross-prefix when targeting the i8086
# (no native compiler, always cross-compiling)
#ifdef cpujvm
#undef NEEDCROSSBINUTILS
#endif
# binutils prefix for cross compiling
#IFDEF FPC_CROSSCOMPILING
#IFDEF NEEDCROSSBINUTILS
-XP$FPCTARGET-
#ENDIF
#ENDIF
# -------------
# Linking
# -------------
# generate always debugging information for GDB (slows down the compiling
# process)
# -gc generate checks for pointers
# -gd use dbx
# -gg use gsym
# -gh use heap trace unit (for memory leak debugging)
# -gl use line info unit to show more info for backtraces
# -gv generates programs tracable with valgrind
# -gw generate dwarf debugging info
#
# Enable debuginfo and use the line info unit by default
#-gl
# always pass an option to the linker
#-k-s
# Always strip debuginfo from the executable
-Xs
# Always use smartlinking on i8086, because the system unit exceeds the 64kb
# code limit
#ifdef cpui8086
-CX
-XX
#endif
# -------------
# Miscellaneous
# -------------
# Write always a nice FPC logo ;)
-l
# Verbosity
# e : Show errors (default) d : Show debug info
# w : Show warnings u : Show unit info
# n : Show notes t : Show tried/used files
# h : Show hints s : Show time stamps
# i : Show general info q : Show message numbers
# l : Show linenumbers c : Show conditionals
# a : Show everything 0 : Show nothing (except errors)
# b : Write file names messages r : Rhide/GCC compatibility mode
# with full path x : Executable info (Win32 only)
# v : write fpcdebug.txt with p : Write tree.log with parse tree
# lots of debugging info
#
# Display Info, Warnings and Notes
-viwn
# If you don't want so much verbosity use
#-vw
program HelloWorld;
begin
writeln('Hello World');
end.
# This directory contains a sample Slurm scripts for the use of GPUs on Spartan. The main difference between submitting a standard Slurm job and a job that makes use of GPUs is additional paramters to the Slurm script. A user will need to specifiy that the GPU partition is being used and, in addition, a generic resource (GRES) resource request hhas been specified and the quantity of GPUs being requested.
#SBATCH --partition gpu
#SBATCH --gres=gpu
# One can also select instead, if you have access:
#
#SBATCH --partition gpgpu
#SBATCH --account=test # Use a project ID that has access.
#SBATCH --qos=gpgu # Note that this qos may differ if you are from a non UoM institution
#SBATCH --gres=gpu:1
# For example if you wish to access up four GPUs in a single job use:
#SBATCH --gres=gpu:4.
#SBATCH --gres=gpu:4
# However, note that this is for any type of GPGPU. However we have different GPGPUs installed. This will need to be specified.
# However, note that this is for any type of GPGPU. However we have different GPGPUs installed. This can be specified but doesn't need to be.
#
# For example if you submit a job that says `--gres=gpu` for 1 GPU or `--gres=gpu:2` for 2 GPUs per task then that can be satisfied by either type
# For example if you submit a job that says `--gres=gpu:1` for 1 GPU or `--gres=gpu:2` for 2 GPUs per task then that can be satisfied by either type
# but if you need a specific type (say P100) then you need to submit with `--gres=gpu:p100` and if you need 2 per task then you would do `--gres=gpu:p100:2`.
# Similarly, for the MSE deeplearn partition (9 nodes, 4 Nvidia V100s):
#SBATCH --partition=deeplearn
#SBATCH --qos=gpgpudeeplearn
#SBATCH --gres=gpu:1
# Or, directly on a job submission script.
sbatch -q gpgpudeeplearn -p deeplearn
# Do you have access to this partition? Check with `scontrol`
scontrol show partition deeplearn
# Derived from:
# https://stackoverflow.com/questions/7663343/simplest-possible-example-to-show-gpu-outperform-cpu-using-cuda
......@@ -42,16 +54,16 @@ real 0m22.516s
user 0m19.933s
sys 0m0.004s
[lev@spartan ~]$ sinteractive --time=0:30:0 --partition=gpu --gres=gpu:1
[lev@spartan ~]$ sinteractive --time=0:30:0 --partition=gpgpu --gres=gpu:1
srun: job 1191798 queued and waiting for resources
srun: job 1191798 has been allocated resources
[lev@spartan-gpu005 ~]$ nvcc ferrari.cu -o ferrari_gpu
[lev@spartan-gpu005 ~]$ time ./ferrari_gpu
[lev@spartan-gpgpu005 ~]$ nvcc ferrari.cu -o ferrari_gpu
[lev@spartan-gpgpu005 ~]$ time ./ferrari_gpu
Enter an index: 33
data[33] = 0.000000
real 0m1.112s
user 0m0.001s
sys 0m0.015s
[lev@spartan-gpu005 GPU]$
[lev@spartan-gpgpu005 GPU]$
......@@ -6,7 +6,7 @@
# Enjoy submitting 1044 Gaussian jobs!
#
# The g16 directory contains source and binaries for Gaussian, which we
# have a site-wide license for. They may not be distributed on
# non-University systems.
# have a site-wide license for University of Melbourne researchers. The sources
# must not be distributed on non-University systems.
#
01 WS-NAME PIC A(30).
01 WS-INCOME PIC S9(7)V9(2).
01 WS-EXPENSES PIC S9(7)V9(2).
01 WS-PROFIT PIC S9(7)V9(2).
01 WS-ASSETS PIC S9(10).
01 WS-LIABILITIES PIC S9(10).
01 WS-EQUITY PIC S9(10).
01 WS-SOLVENCY PIC S9(2)V9(2).
We have GnuCOBOL on Spartan!
GnuCOBOL is a free version of the COBOL compiler. Best of all, it's a transpiler, which translates into C.
Which means parallel COBOL!
Various example programs from Lev Lafayette's talk to Linux Users of Victoria,
GnuCOBOL: A Gnu Life for an Old Workhorse, July 2016
http://levlafayette.com/files/2016cobol.pdf
Here's some various tests:
`module load gnucobol/3.0-rc1-GCC-6.2.0`
`cobc -Wall -x -free hello.cob -o hello-world`
./hello-world
cobc -Wall -m -free hello.cob
cobc -Wall -C -free hello.cob
cobc -x shortest.cob
./shortest
cobc -x hello-trad.cob
./hello-trad
cobc -Wall -x -free luv.cob
./luv
cobc -Wall -free -x literals.cob
./literals
cobc -Wall -free -x posmov1.cob
./posmov1
cobc -Wall -free -x posmov2.cob
./posmov2
cobc -Wall -free -x redefines.cob
./redefines
cobc -Wall -free -x renames.cob
./renames
cobc -Wall -free -x posmov3.cob
./posmov3
cobc -Wall -free -x posmov4.cob
cobc -Wall -free -x class.cob
./posmov4
./class
./evaluate
IDENTIFICATION DIVISION.
PROGRAM-ID. Classchecks.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-NUM1 PIC X(9) VALUE 'ABCD '.
01 WS-NUM2 PIC 9(9) VALUE 123456789.
PROCEDURE DIVISION.
A000-FIRST-PARA.
IF WS-NUM1 IS ALPHABETIC THEN
DISPLAY 'WS-NUM1 IS ALPHABETIC'.
IF WS-NUM1 IS NUMERIC THEN
DISPLAY 'WS-NUM1 IS NUMERIC'.
IF WS-NUM2 IS NUMERIC THEN
DISPLAY 'WS-NUM1 IS NUMERIC'.
STOP RUN.
IDENTIFICATION DIVISION.
PROGRAM-ID. EvaluateTest.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-A PIC 9 VALUE 0.
PROCEDURE DIVISION.
MOVE 3 TO WS-A.
EVALUATE TRUE
WHEN WS-A > 2
DISPLAY 'WS-A GREATER THAN 2'
WHEN WS-A < 0
DISPLAY 'WS-A LESS THAN 0'
WHEN OTHER
DISPLAY 'INVALID VALUE OF WS-A'
END-EVALUATE.
STOP RUN.
000100* HELLO.COB GnuCOBOL FAQ example
000200 IDENTIFICATION DIVISION.
000300 PROGRAM-ID. hello.
000400 PROCEDURE DIVISION.
000500 DISPLAY "Hello, world".
000600 STOP RUN.
*> Hello World Program
IDENTIFICATION DIVISION.
PROGRAM-ID. hello.
PROCEDURE DIVISION.
DISPLAY "Hello World!".
STOP RUN.
*> Literals Program
IDENTIFICATION DIVISION.
PROGRAM-ID. Literal.
PROCEDURE DIVISION.
DISPLAY "Hello World!".
DISPLAY 'Hello World!'.
DISPLAY "This isn't invalid!".
DISPLAY 'This isn"t invalid, either!'.
DISPLAY +33.3333
DISPLAY 33.333
DISPLAY -33.333
DISPLAY 33
STOP RUN.
IDENTIFICATION DIVISION.
PROGRAM-ID. All_About_Divsions.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-NAME PIC A(30).
01 WS-TITLE PIC A(30).
01 WS-ID PIC 9(8) VALUE 20160716.
PROCEDURE DIVISION.
A000-FIRST-PARA.
DISPLAY 'Linux Users of Victoria Beginners Workshop'.
MOVE 'Lev Lafayette' TO WS-NAME.
MOVE 'GnuCOBOL' TO WS-TITLE.
DISPLAY "Today's presenter is : "WS-NAME.
DISPLAY "Today's presentation is : "WS-TITLE.
DISPLAY "Today's date is : " WS-ID.
STOP RUN.
IDENTIFICATION DIVISION.
PROGRAM-ID. Financial_Position_and_Movement.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-NAME PIC A(30) VALUE 'Hedgehog Enterprises'.
01 WS-INCOME PIC 9(7)V9(2) VALUE 752326.55.
01 WS-EXPENSES PIC 9(7)V9(2) VALUE 721322.45.
01 WS-ASSETS PIC 9(10) VALUE 5271917.
01 WS-LIABILITIES PIC 9(10) VALUE 123677.
PROCEDURE DIVISION.
DISPLAY "STATEMENT OF INCOME, EXPENSES, ASSSETS AND LIABILITIES".
DISPLAY "NAME : "WS-NAME.
DISPLAY "INCOME : "WS-INCOME.
DISPLAY "EXPENSES : "WS-EXPENSES.
DISPLAY "ASSETS: "WS-ASSETS.
DISPLAY "LIABILITIES : "WS-LIABILITIES.
STOP RUN.
IDENTIFICATION DIVISION.
PROGRAM-ID. Financial_Position_and_Movement.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-NAME PIC A(30).
01 WS-INCOME PIC S9(7)V9(2).
01 WS-EXPENSES PIC S9(7)V9(2).
01 WS-PROFIT PIC S9(7)V9(2).
01 WS-ASSETS PIC S9(10).
01 WS-LIABILITIES PIC S9(10).
01 WS-EQUITY PIC S9(10).
01 WS-SOLVENCY PIC S9(2)V9(2).
PROCEDURE DIVISION.
A000-FIRST-PARA.
INITIALIZE WS-NAME, WS-INCOME, WS-EXPENSES, WS-PROFIT, WS-ASSETS, WS-LIABILITIES, WS-EQUITY, WS-SOLVENCY.
DISPLAY "Enter the following in order; income, expenses, assets, and liabilites".
MOVE "Hedgehog Enterprises" TO WS-NAME.
ACCEPT WS-INCOME.
ACCEPT WS-EXPENSES.
ACCEPT WS-ASSETS.
ACCEPT WS-LIABILITIES.
DISPLAY "STATEMENT OF INCOME, EXPENSES, ASSSETS AND LIABILITIES".
DISPLAY "NAME : "WS-NAME.
DISPLAY "INCOME : "WS-INCOME.
DISPLAY "EXPENSES : "WS-EXPENSES.
SUBTRACT WS-EXPENSES FROM WS-INCOME GIVING WS-PROFIT.
DISPLAY "PROFIT AND LOSS - Statement of Movement : "WS-PROFIT.
DISPLAY "ASSETS: "WS-ASSETS.
DISPLAY "LIABILITIES : "WS-LIABILITIES.
SUBTRACT WS-LIABILITIES FROM WS-ASSETS GIVING WS-EQUITY.
DISPLAY "EQUITY - Statement of Position : "WS-EQUITY.
COMPUTE WS-SOLVENCY= WS-EQUITY/WS-LIABILITIES.
DISPLAY "SOLVENCY : "WS-SOLVENCY.
STOP RUN.
IDENTIFICATION DIVISION.
PROGRAM-ID. Financial_Position_and_Movement.
DATA DIVISION.
WORKING-STORAGE SECTION.
COPY FINANCIALS.
PROCEDURE DIVISION.
A000-FIRST-PARA.
INITIALIZE WS-NAME, WS-INCOME, WS-EXPENSES, WS-PROFIT, WS-ASSETS, WS-LIABILITIES, WS-EQUITY, WS-SOLVENCY.
DISPLAY "Enter the following in order; income, expenses, assets, and liabilites".
MOVE "Hedgehog Enterprises" TO WS-NAME.
ACCEPT WS-INCOME.
ACCEPT WS-EXPENSES.
ACCEPT WS-ASSETS.
ACCEPT WS-LIABILITIES.
DISPLAY "STATEMENT OF INCOME, EXPENSES, ASSSETS AND LIABILITIES".
DISPLAY "NAME : "WS-NAME.
DISPLAY "INCOME : "WS-INCOME.
DISPLAY "EXPENSES : "WS-EXPENSES.
SUBTRACT WS-EXPENSES FROM WS-INCOME GIVING WS-PROFIT.
DISPLAY "PROFIT AND LOSS - Statement of Movement : "WS-PROFIT.
DISPLAY "ASSETS: "WS-ASSETS.
DISPLAY "LIABILITIES : "WS-LIABILITIES.
SUBTRACT WS-LIABILITIES FROM WS-ASSETS GIVING WS-EQUITY.
DISPLAY "EQUITY - Statement of Position : "WS-EQUITY.
COMPUTE WS-SOLVENCY= WS-EQUITY/WS-LIABILITIES.
DISPLAY "SOLVENCY : "WS-SOLVENCY.
STOP RUN.
IDENTIFICATION DIVISION.
PROGRAM-ID. Financial_Position_and_Movement.
DATA DIVISION.
WORKING-STORAGE SECTION.
COPY FINANCIALS.
PROCEDURE DIVISION.
A000-FIRST-PARA.
INITIALIZE WS-NAME, WS-INCOME, WS-EXPENSES, WS-PROFIT, WS-ASSETS, WS-LIABILITIES, WS-EQUITY, WS-SOLVENCY.
DISPLAY "Enter the following in order; income, expenses, assets, and liabilites".
MOVE "Hedgehog Enterprises" TO WS-NAME.
ACCEPT WS-INCOME.
ACCEPT WS-EXPENSES.
ACCEPT WS-ASSETS.
ACCEPT WS-LIABILITIES.
DISPLAY "STATEMENT OF INCOME, EXPENSES, ASSSETS AND LIABILITIES".
DISPLAY "NAME : "WS-NAME.
DISPLAY "INCOME : "WS-INCOME.
DISPLAY "EXPENSES : "WS-EXPENSES.
SUBTRACT WS-EXPENSES FROM WS-INCOME GIVING WS-PROFIT.
DISPLAY "PROFIT AND LOSS - Statement of Movement : "WS-PROFIT.
DISPLAY "ASSETS: "WS-ASSETS.
DISPLAY "LIABILITIES : "WS-LIABILITIES.
SUBTRACT WS-LIABILITIES FROM WS-ASSETS GIVING WS-EQUITY.
DISPLAY "EQUITY - Statement of Position : "WS-EQUITY.
COMPUTE WS-SOLVENCY= WS-EQUITY/WS-LIABILITIES.
DISPLAY "SOLVENCY : "WS-SOLVENCY.
IF WS-SOLVENCY < 0.25 THEN
DISPLAY "The company should check its solvency"
ELSE
DISPLAY "The company's solvency is acceptable".
STOP RUN.
*> Redefines Program
IDENTIFICATION DIVISION.
PROGRAM-ID. Redefines.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-DESCRIPTION.
05 WS-DATE1 VALUE '20160615'.
10 WS-YEAR PIC X(4).
10 WS-MONTH PIC X(2).
10 WS-DATE PIC X(2).
05 WS-DATE2 REDEFINES WS-DATE1 PIC 9(8).
PROCEDURE DIVISION.
DISPLAY "WS-DATE1 : "WS-DATE1.
DISPLAY "WS-DATE2 : "WS-DATE2.
STOP RUN.
IDENTIFICATION DIVISION.
PROGRAM-ID. WS-Renames.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-DESCRIPTION.
05 WS-NUM.
10 WS-NUM1 PIC 9(2) VALUE 20.
10 WS-NUM2 PIC 9(2) VALUE 56.
05 WS-CHAR.
10 WS-CHAR1 PIC X(2) VALUE 'AA'.
10 WS-CHAR2 PIC X(2) VALUE 'BB'.
66 WS-RENAME RENAMES WS-NUM2 THRU WS-CHAR2.
PROCEDURE DIVISION.
DISPLAY "WS-RENAME : " WS-RENAME.
STOP RUN.
program-id.h.procedure division.display "Hello, world!".
# Gurobi is licensed software. The following needs to be included in a user's .bbash_profile in order for it to access tokens.
# Gurobi is licensed software. The following needs to be included in a user's .bash_profile in order for it to access tokens.
export GRB_LICENSE_FILE=/usr/local/easybuild/software/Gurobi/gurobi.lic
......
#!/bin/bash
#SBATCH -p cloud
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=8
module load Gurobi/7.0.1
export GRB_LICENSE_FILE=/usr/local/easybuild/software/Gurobi/gurobi.lic
time gurobi_cl misc07.mps
This diff is collapsed.
AWK examples, from Supercomputing with Linux, Lev Lafayette, VPAC, 2015
awk '$7=="A" { ++count } END { print count }' simple1.txt
awk '{sum+=$7} END {print sum}' simple2.txt
awk '{ for(i=1; i<=NF;i++) j+=$i; print j; j=0 }' simple3.txt
......@@ -3,21 +3,20 @@ LIMIT=19 # Upper limit
echo
echo "Printing Numbers 1 through 20 (but breaks loop at 3)."
a=0
count=0
while [ "$a" -le "$LIMIT" ]
while [ "$count" -le "$LIMIT" ]
do
a=$(($a+1))
count=$(($count+1))
if [ "$a" -gt 2 ]
if [ "$count" -gt 2 ]
then
break # Skip entire rest of loop.
fi
echo -n "$a "
echo -n "$count "
done
echo; echo; echo
exit 0
......@@ -2,17 +2,17 @@
LIMIT=19 # Upper limit
echo
echo "Printing Numbers 1 through 20 (but not 3 and 11)."
a=0
while [ $a -le "$LIMIT" ]
count=0
while [ $count -le "$LIMIT" ]
do
a=$(($a+1))
count=$(($count+1))
if [ "$a" -eq 3 ] || [ "$a" -eq 11 ] # Excludes 3 and 11.
if [ "$count" -eq 3 ] || [ "$count" -eq 11 ] # Excludes 3 and 11.
then
continue # Skip rest of this particular loop iteration.
fi
echo -n "$a " # This will not execute for 3 and 11.
echo -n "$count " # This will not execute for 3 and 11.
done
echo; echo; echo
exit 0
exit 0
#!/bin/bash
# Search for email addresses in file, extract, turn into csv with designated file name
# Constants
INPUT=${1}
OUTPUT=${2}
{
if [ ! $1 -o ! $2 ]; then
# Filecheck Subroutine
filecheck() {
if [ ! $INPUT -o ! $OUTPUT ]; then
echo "Input file not found, or output file not specified. Exiting script."
exit 0
fi
}
# Search and Sort Subroutine
searchsort() {
grep --only-matching -E '[.[:alnum:]]+@[.[:alnum:]]+' $INPUT > $OUTPUT
sed -i 's/$/,/g' $OUTPUT
sort -u $OUTPUT -o $OUTPUT
sed -i '{:q;N;s/\n/ /g;t q}' $OUTPUT
}
# View and Print Subroutine
viewprint() {
echo "Data file extracted to" $OUTPUT
read -t5 -n1 -r -p "Press any key to see the list, sorted and with unique record"
if [ $? -eq 0 ]; then
......@@ -21,7 +33,17 @@ if [ $? -eq 0 ]; then
exit
fi
less $OUTPUT | \
# Output file piped through sort and uniq
sort | uniq
exit
# Output file piped through sort and uniq.
# Show that line extension still works with comments.
sort | uniq
}
main() {
filecheck
searchsort
viewprint
}
# Main function
main
exit
#!/bin/bash
subroutineA() {
codeblock
}
subroutineB() {
codeblock
}
main() {
subroutineA
subroutineB
}
main
exit
......@@ -15,5 +15,4 @@ Hello ${1} ${2}
# Capture value returned by last command
echo The name has this many characters $?
exit
for i in * ; do mv $i $(echo $i | tr "A-Z" "a-z") ; done
# The following are simple examples of a "for" loop.
# You may require specific software installed e.g., ffmpeg, image magick, liberoffice, calibre.
# Note the use of command substitution by using $(command); sometimes you will find the use of backticks instead (e.g., for i in * ; do mv $i `echo $i | tr "A-Z" "a-z"` ; done); this is not recommended.
for file in *.mp3 ; do ffmpeg -i "${file}" "${file/%mp3/ogg}" ; done
for i in *.jpeg ; do convert "$i" "${i%.*}.png" ; done
for item in ./*.mp3 ; do ffmpeg -i "${item}" "${item/%mp3/ogg}" ; done
for item in ./*.jpeg ; do convert "$item" "${item%.*}.png" ; done
for item in ./*; do convert "$item" -define jpeg:extent=512kb "${item%.*}.jpg" ; done
for item in ./*.doc ; do /usr/bin/soffice --headless --convert-to-pdf *.doc ; done
for item in ./*.pdf ' do ebook-convert "$item" "${item}.mobi ; done
# Loops can be applied in a step-wise manner.
$ cd ~/Genomics/shell_data
$ for filename in *.fastq
> do
> head -n 2 ${filename} >> seq_info.txt
> done
# Basename in a loop.
# Basename is removing a uniform part of a name from a list of files.
# In this case remove the .fastq extension and echo the output.
$ for filename in *.fastq
> do
> name=$(basename ${filename} .fastq)
> echo ${name}
> done
# What would happen if backticks were used instead of $() for shell substitution? What if someone mistook the backticks for single quotes?
for item in ./* ; do mv $item $(echo $item | tr "A-Z" "a-z") ; done
# What's wrong with spaces in filenames?
touch "This is a long file name"
for item in $(ls ./*); do echo ${item}; done
# The following examples remove spaces from filenames and apostrophes. The script is designed to prevent expansion from the wildcard, but remember that a `mv` command will overwrite existing files that have the same name.
for item in ./*; do mv "$item" "$(echo "$item" | tr -d " ")"; done
# Finally a few simple examples of loops with conditional tests.
x=1; while [ $x -le 5 ]; do echo "While-do count up $x"; x=$(( $x + 1 )); done
x=5; until [ $x -le 0 ]; do echo "Until-do count down $x"; x=$(( $x - 1 )); done
x=1; until [ $x = 6 ]; do echo "Until-do count up $x"; x=$(( $x + 1 )); done
# A while loop that reads in data from a file and runs a command on that data.
# This is what we used to originally set quotas on home and project directories.
# The 'read' command reads one line from standard input or a specified file.
while read line; do sleep 5; ./setquota.sh $line; done < quotalist.txt
# when searching for lines that contain a particular sequence in a file (e.g., from grep), reading those lines for processing can be accomplished with the something like the following:
grep sequence datafile.dat | while read -r line ; do
echo "Processing $line"
# Processing code #
done
# Curly braces are used to encapsulate statements or variables with {} or ${}
var # Set a variable
$var # Invoke the variable
${var}bar # Invoke the variable, append "bar".
# Example of determining jobs running on a set of nodes.
for host in "spartan-rc"{001..10}; do squeue -w $host; done
A C C T A G T
C A A A G T A
C A T T A C C
A G T A C A A
1 2 3 4 5 6 7 8 9
2 3 4 5 6 7 8 9 10
3 4 5 6 7 8 9 11 12
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
#!/bin/bash
# Prevents use of Control-C to prematurely end important script.
# User can override if they're really, really sure.
ctrlc_count=0
function test_ctrlc()
{
let ctrlc_count++
echo
if [[ $ctrlc_count == 1 ]]; then
echo "Cntrl-C prevented unless you're sure."
elif [[ $ctrlc_count == 2 ]]; then
echo "Really sure?"
elif [[ $ctrlc_count == 3 ]]; then
echo "Really, really sure?"
else
echo "OK, you're really, really sure.."
exit
fi
}
trap test_ctrlc SIGINT
while true
do
echo "This is a sleeping loop. The loop that keeps on sleeping on."
sleep 2
done
exit
#!/bin/bash
# This is an abstract example of things that could do wrong!
#SBATCH --output=/home/example/data/output_%j.out
for for file in /home/example/data/*
do
sbatch application ${file}
done
......@@ -6,10 +6,17 @@ sinteractive --nodes=1 --ntasks-per-node=2 --time=0:10:0
# Example interactive job that specifies cloud partition with X-windows forwarding, after loggin in with secure X-windows forwarding. Note that X-windows forwarding is not highly recommended; try to do compute on Spartan and visualisation locally. However if one absolutely has to visualise from Spartan, the following can be used.
ssh <username>@spartan.hpc.unimelb.edu.au -Y
ssh <username>@spartan.hpc.unimelb.edu.au -X
sinteractive -p cloud --x11=first
xclock
# If you are running interactive jobs on GPU partitions you have to include the appropriate QOS commands.
sinteractive --x11=first --partition=shortgpgpu --gres=gpu:p100:1
sinteractive --x11=first --partition=deeplearn --qos=gpgpudeeplearn --gres=gpu:v100:1
# If the user is not using a Linux local machine they will need to install an X-windows client, such as Xming for MS-Windows or X11 on Mac OSX from the XQuartz project.
# If you need to download files whilst on an interactive job you must use the University proxy.
......@@ -17,3 +24,5 @@ xclock
export http_proxy=http://wwwproxy.unimelb.edu.au:8000
export https_proxy=$http_proxy
export ftp_proxy=$http_proxy
......@@ -3,3 +3,6 @@ The file `gattaca.txt` is used for diff examples in the Introductory course and
The file `default.slurm` uses all the default values for slurm on this system; cloud partition, one node, one task, one cpu-per-task, no mail, jobid as job name, ten minute walltime, etc.
The file `specific.slurm` runs on a specific node. The list may be specified as a comma-separated list of hosts, a range of hosts (host[1-5,7,...] for example), or a filename.
The file `filenames.md` gives some examples about filenaming conventions in UNIX-like systems.
#!/bin/bash
MYGROUP=$@
if [ "$MYGROUP" == "" ]; then MYGROUP=$(groups); fi
for group in $MYGROUP
do
if [[ $group == "gaussian" ]] || [[ $group == "rmit" ]] || [[ $group == "deakin" ]] || [[ $group == "vu" ]] || [[ $group == "rmit" ]]; then continue; fi
QUOTASET=1;
PROJDIR="/data/cephfs/${group}"
CURQUOTABYTES=$(getfattr -n ceph.quota.max_bytes ${PROJDIR} --only-values --absolute-names 2>/dev/null)
if [ $? -eq 1 ]; then QUOTASET=0; fi
CURQUOTA=$((CURQUOTABYTES/(1000*1000*1000)))
CURUSAGEBYTES=$(ls -ldH ${PROJDIR} | awk '{print $5}')
CURUSAGE=$((CURUSAGEBYTES/(1000*1000*1000)))
if [ $QUOTASET -eq 1 ]
then
echo $group has used ${CURUSAGE}GB out of ${CURQUOTA}GB in ${PROJDIR}
else
echo $group has used ${CURUSAGE}GB in ${PROJDIR}. Currently no quota is set.
fi
done
#!/bin/bash
# Enter username to check; default is current user.
MYUSER=$1
ls -lhd --si ${MYUSER} | awk '{print $3 "," $5}'
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
This diff is collapsed.
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment