Home

Training

Training is now available!

Sign up for training at northeastern.gosignmeup.com

Then select courses under “Research Computing”

Introduction

Introduction to Using the Discovery Cluster at NU
our thanks to Paul Whitford for sharing this

Open a Service Now ticket for questions and requests.

Note on what NOT to do:

Please do not run ANY jobs on the login nodes!  that is not the purpose for those nodes.  Please use srun to allocate an interactive node or sbatch to to run the jobs in batch mode on a compute node.

If you do this, your access to the cluster may be denied because running jobs on login nodes causes a serious disruption for the entire research community.

Thank you for your understanding.

Instead of using salloc and then ssh try

srun –pty /bin/bash

it will log you directly into into a node.

Technical Guide

New Discovery “guide”:      last updated 9/25/2018

  • Prior to logging in to the new
    cluster, comment out all references to the old cluster from your .bashrc
    including

    1. Automatic module loading
    2. Environment settings to /anything
      in /shared/apps/…. including but not limited to:

      • PATH
      • LD_LIBRARY_PATH
      • MANPATH
      • LICPATH
    3. This will ensure that your
      environment is unpolluted by any settings which will point you to
      incompatible utilities.
    4. ssh to login.discovery.neu.edu
    5. The login and job submittal nodes
      will now be called login-00.discovery.neu.edu,
      login-01.discovery.neu.edu.  This will replace discovery2 and discovery4
    6. Round-robin DNS will allow you to
      use login.discovery.neu.edu for logging in.
  • Partitions on new discovery

 

Old name New name description res/tres features
ser-par-10g-2

ser-par-10g-3

ser-par-10g-4

ht-10g

largemem-10g

interactive-10g

general A mix of lenovo, dell multi-core nodes cpu, mem dell, lenovo

largemem

ser-par-10g-5 fullnode 56 core machine cpu, mem n/a
par-gpu-3 multigpu Multi-gpu nodes gpu, mem n/a
hadoop-10g hadoop Hadoop n/a

n/a

 

interactive-ib

parallel-ib

infiniband Infiniband nodes n/a

n/a

 

test_phi phi Nodes with a xeon-phi processor n/a

n/a

 

  • Submitting jobs
    1. Basic syntax (sbatch, srun) will
      be the same.
    2. Specific resources, s.a. gpu
      count, cpu count, memory, licenses, can be requested using gres ( general
      resources), tres (trackable resources), and features.  These allow the
      user to request specific components to be available on a node which will
      run their job.  The presence of those components is defined in the node
      definition within slurm.conf

      1. in order to use gpu, a number of gpus must be requested:
        • interactive with N gpus: srun -p gpu --gres=gpu:N --pty /bin/bash
        • batch with N gpus: sbatch -p gpu --gres=gpu:N sbatchscript
    3. It is not be necessary to ssh -X to the nodes. slurm will provide X11 forwarding
  • Transferring data
    1. xfer-00.discovery.neu.edu will be
      used for transferring data from and to the cluster
    2. Transfers can be done in the same
      manner they are done on the login nodes, scp, rsync, filezilla, winscp.
    3. It will also host a squid server
      to be aliased squid.discovery.neu.edu for internet access from the
      compute nodes.
  • Software
    1. Software resides in /shared/centos7
    2. Legacy software will continue to
      be available in /shared/apps and could be made accessible by loading the
      “legacy” module.
  • Storage
    1. Your old home directory under /home will be the same on the new discovery.
    2. gpfs is still used for scratch with a minor change:
    3. /gss_gpfs_scratch is now mounted as /scratch.  All the data that existed on /gss_gpfs_scratch should now be available on /scratch
    4. Please replace /gss_gpfs_scratch in all of your sbatch scripts to /scratch, otherwise your jobs will fail.
  • More SLURM-specific information
    can be found here: https://slurm.schedmd.com/