Commit 0d726dc8 authored by Research Platforms's avatar Research Platforms

Add Tensorflow and Digits

parent 72479b77
## TensorFlow Benchmark Example
This example runs the TensorFlow benchmarks (for V1.8) on the Spartan GPGPU partition. By default, it uses ResNet, a batch size of 64, and a whole node (4 GPUS and 24 CPUs), but this can be varied as needed.
As of 18 July 2018, this particular configuration was achieving about 730 images/second across 4 GPUs.
benchmarks @ 3b90c14f
Subproject commit 3b90c14fb2bf02ca5d27c188aee878663229a0a7
#SBATCH --nodes 1
#SBATCH --partition gpgpu
#SBATCH --gres=gpu:p100:4
#SBATCH --time 01:00:00
#SBATCH --cpus-per-task=24
module load Tensorflow/1.8.0-intel-2017.u2-GCC-6.2.0-CUDA9-Python-3.5.2-GPU
cd benchmarks/scripts/tf_cnn_benchmarks
python --num_gpus=4 --batch_size=64 --model=resnet50 --variable_update=parameter_server
## TensorFlow Example
This is a very simple example which shows how to use TensorFlow with the Spartan GPGPU partition. It requests a single CPU and NVidia P100 GPU, multiplies together two small matrices on the GPU, and prints the result. It will also print a little debug info showing that the calculation is being performed on the GPU (rather than CPU).
It can be submitted with the command `sbatch tensor_flow.slurm`.
You'll need access to the GPGPU partition before this example will work, see for details.
N.B. If you belong to multiple projects, and the default one doesn't have access to the gpgpu partition, you might have to explictly specify the project with `sbatch -A <project name> tensor_flow.slurm`.
This example is based on:
# Based on
import tensorflow as tf
# Creates a graph -- force it to run on the GPU
with tf.device('/gpu:0'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
# Creates a session with log_device_placement set to True.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
# Runs the op.
#SBATCH --nodes 1
#SBATCH --partition gpgpu
#SBATCH --gres=gpu:p100:1
#SBATCH --time 00:05:00
#SBATCH --cpus-per-task=1
module load Tensorflow/1.8.0-intel-2017.u2-GCC-6.2.0-CUDA9-Python-3.5.2-GPU
## DIGITS Spartan Example
DIGITS is a deep-learning package from Nvidia with a web-based GUI. This example shows you how to run it on Spartan.
1. As DIGITS makes use of GPUs, you'll first need access to our GPGPU partition. See:
2. Submit the job using `sbatch digits.slurm`. The example uses a whole node with 4 GPUs, with a wall time of 2 hours, but you can adjust to suit your needs.
3. Check the job status using `squeue -u your_username`. Once it starts (which might take some time if the queue is busy), take note of the node your job is running on, e.g. `spartan-gpgpu025`
4. As this is an interactive web application, we need to open a tunnel to the compute node so we can interact with it. You can do this with: `ssh -vNL 5000:spartan-gpgpu025:5000`
5. Navigate to `localhost:5000` in your browser, and start playing with DIGITS.
N.B. DIGITS is running in a container, which means that the filesystem available to DIGITS will vary from that of the host (i.e. Spartan). Your home directory (i.e. `/home/your_username`) is mapped across however, so you can access your training data and models from there.
#SBATCH --nodes 1
#SBATCH --cpus-per-task=12
#SBATCH --partition gpgpu
#SBATCH --gres=gpu:4
#SBATCH --time 02:00:00
module load Singularity
singularity exec --nv -B /tmp:/jobs -B /tmp:/scratch digits.img bash -c "export DIGITS_JOBS_DIR=/jobs && python -m digits"
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment