Preparing a GCP VM 

In this tutorial we prepare a virtual machine on the Google Cloud Platform (GCP) ready to run GX. It is assumed that the reader already has an account on GCP and is familiar with creating a virtual machine instance.

Contents

Preparing a GCP VM

Creating the Virtual Machine 

This tutorial, the provided Makefiles/Makefile.gcpa2 and the benchmarks in benchmarks/ have been tested with GCP virtual machines created from scratch with the following specifications:

Machine configuration:
- Machine type (single GPU): a2-highgpu-1g: 12 vCPU, 6 core, 85 GB memory
- Machine type (dual GPU): a2-highgpu-2g: 24 vCPU, 12 core, 170 GB memory
- Architecture: x86
- GPU type: NVIDIA A100 40GB
- Display device: none
Boot disk:

Operating system: Ubuntu

Version: Ubuntu 22.04 LTS, x86/64, amd64 jammy image

Boot disk type: Standard persitent disk or faster

Size: At least 50 GB

Although not tested yet, this tutorial could probably work on a different virtual machine configurations, including on H100-based machine, with minimal tweaks.

Installing Packages 

NVIDIA HPC SDK 

We are using version 23.11 for this tutorial. The makefile and the suggested modification to various configuation files in this tutorial are highly dependant on the selected version. See https://developer.nvidia.com/hpc-sdk-downloads for additional details. Run the following commands one by one in your shell:

$ curl https://developer.download.nvidia.com/hpc-sdk/ubuntu/DEB-GPG-KEY-NVIDIA-HPC-SDK | sudo gpg --dearmor -o  /usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg
$ echo 'deb [signed-by=/usr/share/keyrings/nvidia-hpcsdk-archive-keyring.gpg] https://developer.download.nvidia.com/hpc-sdk/ubuntu/amd64 /' | sudo tee /etc/apt/sources.list.d/nvhpc.list
$ sudo apt-get update -y
$ sudo apt-get install -y nvhpc-23-11

NVIDIA Drivers 

We are using the recommended drivers. Run the following commands one by one in your shell:

$ sudo apt install -y make
$ sudo apt install -y ubuntu-drivers-common
$ sudo ubuntu-drivers autoinstall

Verify that drivers are correctly installed. You should see a table with the detected GPU configuration with the following command:

$ sudo nvidia-smi

If you have multiple GPUs, you can verify that their interconnection is correctly usign NVLink. You should see NVxy bettween GPUs by running the following command:

$ sudo nvidia-smi topo -m

HDF5 

Using the parallel version. Run the following command:

$ sudo apt install -y libhdf5-mpi-dev

NetCDF-4 

Using the parallel version. Run the following command:

$ sudo apt install -y libnetcdf-mpi-dev

GSL 

Run the following command:

$ sudo apt install -y libgsl-dev

MPI 

The NVIDIA HPC SDK already includes a version of OpenMPI. So no need to install an extra one.

Python Environnment 

We will use Anaconda to prepare a Python environment suitable for GX.

Verify the latest version of Anaconda for Ubuntu x86 at https://repo.anaconda.com/archive/. Assuming aconda3-2024.02-1-Linux-x86_64.sh, from your home directory ~/ run the following command:

$  wget https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh

And then install the package with:

$ bash Anaconda3-2024.02-1-Linux-x86_64.sh

Follow instruction to accept the license. Use the default install directory. Answer yes when suggested to modify your profile (last question).

Prepare the Python environnement with the commands:

$ source .bashrc
$ conda create -n gxenv python=3.11 numpy matplotlib scipy netCDF4 sphinx sphinx_rtd_theme

And then activate the environment with:

$ conda activate gxenv

Final Configuration 

Add the following lines to you .bashrc file.

#For GX and NVIDIA HPC SDK
export LD_LIBRARY_PATH=/opt/nvidia/hpc_sdk/Linux_x86_64/2023/math_libs/lib64
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/hpc_sdk/Linux_x86_64/2023/comm_libs/nccl/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/nvidia/hpc_sdk/Linux_x86_64/2023/cuda/lib64
PATH=$PATH:/opt/nvidia/hpc_sdk/Linux_x86_64/2023/comm_libs/mpi/bin
export GK_SYSTEM='gcpa2'
conda activate gxenv

If not already existing, create the following configuration file ~/.config/matplotlib/matplotlibrc. Add the lines

#backend as default one does not seems to load
backend : TkAgg

Completing Setup 

Reboot the virtual machine with:

$ sudo reboot

Once rebooted, you can reconnect.

Compiling and Running GX 

Get GX source code with:

$ git clone https://bitbucket.org/gyrokinetics/gx

From within gx directory, build with:

$ make

To run gx on dual GPU, you will have to use the command:

$ mpiexec -n 2 path\gx