Home

Migration Schedule

Migration Schedule last updated 8/13/2018

Partition new name # nodes migration start date
ser-par-10g-4 default 184 8/20-21/2018
par-gpu-2 gpu 16 22-Aug
ser-par-10g-3 default 48 23-Aug
ser-par-10g-2 default 30 23-Aug
ht-10g default 4 24-Aug
largemem-10g default 4 24-Aug
interactive-10g default 3 24-Aug
par-gpu gpu 32 25-Aug
hadoop-10g hadoop 3 25-Aug

Things you need to know about job submission during this transition:

  • Once the nodes of the partitions scheduled to be moved  are put in drain state the following things will happen:

o   Currently running jobs will run to completion

o   No new jobs will be launched.

o   Jobs submitted after the partition is in drain state will never launch on the old discovery cluster

  • Once the partition nodes are in drain state, please cancel all your pending jobs and relaunch them on the new discovery.

 

It is highly advisable that you start familiarizing yourself with the new discovery prior to this migration. 

 

New Guide

New Discovery “guide”:      last updated 8/13/2018

  • Prior to logging in to the new
    cluster, comment out all references to the old cluster from your .bashrc
    including

    1. Automatic module loading
    2. Environment settings to /anything
      in /shared/apps/…. including but not limited to:

      • PATH
      • LD_LIBRARY_PATH
      • MANPATH
      • LICPATH
    3. This will ensure that your
      environment is unpolluted by any settings which will point you to
      incompatible utilities.
    4. ssh to login.discovery.neu.edu
    5. The login and job submittal nodes
      will now be called login-00.discovery.neu.edu,
      login-01.discovery.neu.edu.  This will replace discovery2 and discovery4
    6. Round-robin DNS will allow you to
      use login.discovery.neu.edu for logging in.
  • Partitions on new discovery

 

Old name

New name

description

res/tres

features

ser-par-10g-2,

ser-par-10g-3,

ser-par-10g-4,

ht-10g,

largemem-10g,
interactive-10g

general

A mix
of lenovo, dell multi-core nodes

Cpu,
mem

 

Dell,
lenovo,

largemem

 

ser-par-10g-5

fullnode

 

56
core machine

Cpu,
mem

n/a

par-gpu-3

multigpu

Multi-gpu
nodes

Gpu,
mem

 

n/a

hadoop-10g

hadoop

Hadoop

n/a

n/a

Interactive-ib

parallel-ib

infiniband

Infiniband
nodes

n/a

n/a

test_phi

phi

Nodes
with a xeon-phi processor

n/a

n/a

 

  • Submitting jobs
    1. Basic syntax (sbatch, srun) will
      be the same.
    2. Specific resources, s.a. gpu
      count, cpu count, memory, licenses, can be requested using gres ( general
      resources), tres (trackable resources), and features.  These allow the
      user to request specific components to be available on a node which will
      run their job.  The presence of those components is defined in the node
      definition within slurm.conf
    3. It will not be necessary to ssh
      -X to the nodes as slurm will provide X11 forwarding
  • Transferring data
    1. xfer-00.discovery.neu.edu will be
      used for transferring data from and to the cluster
    2. Transfers can be done in the same
      manner they are done on the login nodes, scp, rsync, filezilla, winscp.
    3. It will also host a squid server
      to be aliased squid.discovery.neu.edu for internet access from the
      compute nodes.
  • Software
    1. Software resides in /shared/centos7
    2. Legacy software will continue to
      be available in /shared/apps and could be made accessible by loading the
      “legacy” module.
  • Storage
    1. Your old home directory under /home will be the same on the new discovery.
    2. gpfs is still used for scratch with a minor change:
    3. /gss_gpfs_scratch is now mounted as /scratch.  All the data that existed on /gss_gpfs_scratch should now be available on /scratch
    4. Please replace /gss_gpfs_scratch in all of your sbatch scripts to /scratch, otherwise your jobs will fail.
  • More SLURM-specific information
    can be found here: https://slurm.schedmd.com/

Migration Info

8/14/2018

Dear Individual PI Partition owners,

The Research Computing team is preparing the migration of the individual PI partitions to the New Discovery cluster.  The new discovery cluster has significant improvements in both functional and operational capabilities.

Changes / Features

  • slurm updated to current release (slurm-17.11.6)
  • Node types will be individually allocable by general resources and trackable resources (GRES and TRES) i.e. GPU, PHI, Memory
  • Base OS updated to CentOS 7.5 from CentOS 6.x
  • Software (latest versions) has been compiled for CentOS 7.5  available at /shared/centos7
  • Currently available software in /shared/apps can be made available by adding the ‘legacy’ module (modules add legacy)
  • Login node change, new nodes: login-00.discovery.neu.edu, login-01.discovery.neu.edu
  • x11 forwarding support by slurm for jobs.
  • Users will not be able to ssh to nodes unless they have a slurm-allocation on the node in question
  • File transfer node for transferring files to and from the Discovery cluster.

We need to hear from you to schedule this phase of the discovery cluster migration.  Please let us know if you have any limiting factors to migrate your partitions.  RC’s goal is to complete this process by the end of September.  By meeting this deadline, RC will be able to focus efforts on the new Discovery rather than supporting the legacy cluster.  Also, please review the new Discovery guide for cluster changes and access information

What you need to know about job submission during this transition:

  • Once the nodes of the partitions scheduled to be moved  are put in drain state the following things will happen:

o   Currently running jobs will run to completion

o   No new jobs will be launched.

o   Jobs submitted after the partition is in drain state will never launch on the old discovery cluster

  • Once the partition nodes are in drain state, please cancel all your pending jobs and relaunch them on the new discovery.

It is highly advisable that you start familiarizing yourself with the new discovery prior to this migration. 

The general partition migration schedule can be found here

 

The RC team