Computation Tutorials

Getting Access to JHU-MARCC (pdf download)

Congratulations on your approved PARADIM proposal. This tutorial explains how to get access to the JHU-MARCC high-performance computing center with an approved PARADIM proposal. If you do not have an approved project and are interested in submitting a two-page proposal, please use the link: https://www.paradim.org/project proposals.

For technical assistance with your calculations consult the PARADIM theory tutorials, the PARADIM computation support forum, or reach out to our PARADIM theory staff.

Step 1: MARCC Account Request

Please fill out and submit the online application form using the web link below to request a MARCC account:

http://www.marcc.jhu.edu/request-access/request-an-account/


A PARADIM specific sample form is shown below,

First Name – yourfirstname

Last Name – yourlastname

JHED ID or University Directory ID - youruserid@youruniversity.edu

University – JHU (if your university is not listed)

Contact Email – youruserid@youruniversity.edu

Phone – yourphonenumber

Name of your sponsor (Faculty member) – Tyrel McQueen

Email for sponsor – tmcquee2@jhu.edu


Comments related to your request – To request an account under the PARADIM allocation

Password – yourtemporarypassword

 MARCC staff will setup an account for you, provide you with a username, and a link to create a permanent password.

 

Step 2: Set up MARCC Account Password

Use the link provided in the email you received after the MARCC account was created or reset your password at https://password.marcc.jhu.edu/?action=sendtoken

 

 

Enter your information:

Login - emaileduserid@jhu.edu (this was emailed to you after the account was created)

Mail – youruserid@youruniversity.edu (same email address used in the account request)

Check your email for confirmation.
 

Step 3: Two-Factor Authentication

MARCC requires two-factor authentication using Google Authenticator.

Follow the instructions on MARCC’s website for setup
http://www.marcc.jhu.edu/getting-started/connecting-to-marcc/#multifactor
or simply retrieve the code if you already know how to use it
https://password.marcc.jhu.edu/?action=qrretrieve

Setup the two-factor authentication on your smartphone by scanning the QR code or entering it manually.


Step 4: Login to MARCC Account

Now that your user account has been created on MARCC:

Your login should be youremaileduserid@jhu.edu

You can connect to MARCC at ssh youremaileduserid@jhu.edu@gateway2.marcc.jhu.edu
(If you do not know how to ssh from your computer see the summer school tutorial.)

When asked for the verification code: type in the six-digit number (no blank space) from Google Authenticator.

Type in the password that you have setup.

The two most common reasons for failed login attempts:

1) You did not wait for a new verification code in between attempts. Each code is only valid once.

2) The codes are generated based on the time, so if your phone clock is even a few minutes off, it can cause issues.

 

MARCC is hosting training sessions for new users regularly. Please register at:

https://www.marcc.jhu.edu/training/intro-sessions/

For information about tutorials, tools, tricks, and tips that new MARCC users may find helpful, please visit: https://www.marcc.jhu.edu/training/tutorial-series/

 

Running Jobs at Computational Facilities (pdf download)

This tutorial discusses, 1) how to submit jobs at JHU-MARCC with a sample job submission file, 2) how to run interactive jobs at JHU-MARCC, 3) useful commands related to job submission, 4) useful Linux commands, and 5) how to submit jobs at XSEDE with a sample job submission file.

1. How to Submit Jobs at JHU-MARCC

MARCC policy states that all users must submit jobs to the scheduler for processing. Interactive use of login nodes for job processing is not allowed.

MARCC uses,

  • SLURM resource manager to manage resource scheduling and job submission.
  • Partitions (different job queues) to divide types of jobs which will allow sequential/shared computing, parallel, GPU jobs, and large memory jobs.

For the complete list of partitions available for users please visit:

https://www.marcc.jhu.edu/getting-started/running-jobs/

If you need further info or assistant, please contact MARCC support at,

marcc-help@marcc.jhu.edu

or

Reach us via the PARADIM computation support forum,

http://forums.paradim.org/forums/forum/theory-forum-cau/
 

Sample job submission scripts

A sample script to run a Quantum Espresso job in parallel partition using 24 cores with 5000MB memory in a single node would look like this,

#!/bin/bash -l

#SBATCH --job-name= myjob-1

#SBATCH --time=00:30:00

#SBATCH --partition=parallel

#SBATCH --nodes=1

#SBATCH --ntasks-per-node=24

#SBATCH --mem-per-cpu=5000MB

mpirun -np 24 pw.x < silicon.in > silicon.out

Jobs are usually submitted via a script file. The sbatch command is used.

$ sbatch my-script

 

2. How to Run Interactive Jobs at JHU-MARCC

Users who need to interact with their codes while these are running can request an interactive session. This will submit a request to the queuing system that will allow interactive access to the node.

If you would like an interactive session, you can use the following command

$ interact -p parallel -n 24 -c 1 -t 60 -m 5G

Here we are requesting 24 CPUs, since -n 24 is the number of tasks, and -c 1 is the number of cores per task. We are asking for a session of 60 min (-t 60) and with a total memory of 5 GB (-m 5G).

This command opens a session where we will be able to execute pw.x directly from the command line without a job submission script.

$ mpirun -n 24 pw.x < silicon.in > silicon.out


3. Useful Commands for job Submission

$ sbatch my-script                          submit a job script          

$ squeue                                            list all jobs

$ sqme                                               list all jobs belong to the current user     

$ squeue -u [userid]                       list jobs by user

$ squeue [job-id]                             check job status

$ scancel [job-id]                             delete a job

$ scontrol hold                                 hold a job                          

$ scontrol release                           release a held job

$ sacct                                               show finished jobs

*Users can still use torque commands like qsub, qdel, qstat, etc…

 

4. Useful Linux Commands

  • mkdir – make directories
    Usage: mkdir [OPTION] DIRECTORY…
    eg. mkdir paradim
     
  • cd – change directories
    Usage: cd [DIRECTORY]
    eg. cd paradim
  • ls – list directory contents
    Usage: ls [OPTION]… [FILE]…
    eg. ls, ls -l, ls paradim
     
  • pwd – print name of current working directory
    Usage: pwd
     
  • vim – Vi Improved, a programmer’s text editor
    Usage: vim [OPTION] [file]…
    eg. vim myscript.txt
     
  • cp – copy files and directories
    Usage: cp [OPTION]… SOURCE DEST
    eg. cp myscript.txt myscript_duplicate.txt, cp -r directory1 directory2
     
  • mv – rename/move files
    Usage: mv [OPTION]… SOURCE DESTINATION
    eg. mv myscript.txt directory
    eg. mv myoldscript.txt mynewscript.txt
     
  • rm ­ remove files or directories
    Usage: rm [OPTION]… FILE…
    eg. rm myoldscript.txt, rm -rf directory
     
  • find – search for files in a directory hierarchy
    Usage: find [OPTION] [path] [pattern]
    eg. find myscript.txt, find -name myscript.txt
     
  • history – prints recently used commands
    Usage: history
     
  • ps – report a snapshot of the current processes
    Usage: ps [OPTION]
    eg. ps, ps -el
     
  • kill – to kill a process
    Usage: kill [OPTION] pid
    eg. kill -9 2275


5. How to Submit Jobs at XSEDE

Allocation for The Extreme Science and Engineering Discovery Environment (XSEDE) is available for collaborative research only. Please contact members of PARADIM theory staff available at /people to get access to XSEDE allocations.

If you already have an allocation with XSEDE, following is a sample job scrip to submit jobs.

#!/bin/bash

#SBATCH -J myMPI                # job name

#SBATCH -o myMPI.o%j            # output and error file name (%j expands to jobID)

#SBATCH -n 32                   # total number of mpi tasks requested

#SBATCH -p development          # queue (partition) -- normal, development, etc.

#SBATCH -t 01:30:00             # run time (hh:mm:ss) - 1.5 hours

#SBATCH --mail-user=username@tacc.utexas.edu

#SBATCH --mail-type=begin       # email me when the job starts

#SBATCH --mail-type=end         # email me when the job finishes

ibrun ./pw.x                   # run the MPI executable named pw.x

 

Quantum Espresso (QE) is an integrated suite of Open-Source computer codes for electronic-structure calculations and materials modeling. It is based on density-functional theory, plane waves, and pseudopotentials. In this set of tutorials, you will learn how to run the essential calculations based on density functional theory as implemented in QE on JHU-MARCC, a shared computing facility. You can practice running calculations in various listed topics starting with a test run.

QE is an Open Source distribution. The primary references of the QE code are the articles:

P. Giannozzi et al., J. Phys.: Condens. Matter 21, 395502 (2009)

P. Giannozzi et al., J. Phys.: Condens. Matter 29, 465901 (2017)
 

 

  • How to Configure (pdf download)
  • Input and Convergence Parameters (pdf download)
  • Test Run (pdf download)
    In this tutorial we will prepare a simple job and execute it on JHU-MARCC. The goal is to make sure that we have configured Quantum Espresso properly and everything runs well. We will run a simple total energy calculation (scf calculation) for silicon in the diamond structure. We will also learn how to use the command line to create an input file and a job submission file to submit our job to the queue. 
  • Convergence Parameters of DFT Calculations (pdf download)
    In this tutorial we will explore two important convergence parameters of DFT calculations, the planewave kinetic energy cutoff ecutwfc, and the Brillouin zone sampling k-points. As an example, for silicon, we study how the total energy, the number of planewaves, and the timing vary as a function of the planewaves cutoff ecutwfc, and how the total energy of silicon varies with the number of k-points. As an exercise, we also explore the scaling of DFT calculations as a function of system size.
  • Equilibrium Structure of a Diatomic Molecule and a Bulk Crystal (pdf download)
    Among all possible structures, the equilibrium structure at zero temperature and zero pressure is found by minimizing the DFT total energy. In this tutorial we will learn the concept of calculating the equilibrium structure. We will calculate 1) the equilibrium structure and the binding energy of a diatomic molecule using the Cl2 molecule as an example, 2) the equilibrium structure and the cohesive energy of a bulk crystal using silicon, and 3) the equilibrium lattice parameter of silicon, diamond, and graphite.
  • Automatic Optimization of Crystal Structure and Elastic Constant (pdf download)
    In this tutorial we will 1) automatically optimize the atomic coordinates by using calculation type relax and 2) automatically optimize the unit cell by using calculation type vc-relax. We will then familiarize ourselves with calculation of bulk modulus and the elastic constant, using diamond as a test case. As an exercise, we will set up a new calculation on SrTiO3 starting with a simple input file for the material. To find the initial geometry for the unit cell and atomic coordinates, we search the Materials Project Database.
  • Phonon Dispersion (pdf download)
    In this tutorial we will learn how to calculate 1) the vibrational frequencies of a diatomic molecule, Cl2, 2) phonon dispersion relations of diamond, GaAs, and SrTiO3, and 3) LO-TO splitting, IR activity, and low-frequency dielectric constants of a polar semiconductor, GaAs.
  • Band Structure and UV/VIS Spectra (pdf download)
    In this tutorial we will explore how to 1) calculate the band structure and visualize the wavefunctions corresponding to selected Kohn-Sham eigenvalues of silicon, and 2) calculate the band structure and the corresponding optical absorption spectrum (UV/Vis spectra) of GaAs to obtain the imaginary part of the dielectric function, e2(ω), which is related to the optical absorption coefficient κ(ω).

 

YAMBO is a code for Many-Body calculations in solid state and molecular physics to accurately determine excited states. YAMBO relies on the Kohn-Sham wavefunctions generated by DFT codes such as Quantum Espresso (QE). In this set of tutorials, you will first learn how to configure YAMBO with QE on JHU-MARCC and then you can practice running a range of calculations covering various topics. Tutorials include ground state calculation, file conversion, quasiparticle GW band structure calculation, and the calculation of optical absorption spectra using the Bethe-Salpeter equation (BSE).

The YAMBO code was originally developed in the Condensed Matter Theoretical Group of the Physics Department at the University of Rome "Tor Vergata" by Andrea Marini.

The primary reference for YAMBO code is the article:

Yambo: an ab initio tool for excited state calculations, Andrea Marini, Conor Hogan, Myrta Grüning, Daniele Varsano, Comp. Phys. Comm. 180, 1392 (2009).

 

  • How to Install (pdf download)
  • Ground State Calculation as Starting Point for YAMBO (pdf download)
    The initial step is to generate the ground state wavefunction for the proposed system using Quantum Espresso. In this tutorial we are going to use single layer MoS2 as an example material for our QE ground state calculation. We discuss the specific input parameters used for both self-consistency (SCF) and non-self-consistency (NSCF) runs to obtain the ground state wavefunction, which we will then use for YAMBO calculations. Please go to the Quantum Espresso Tutorial if you need further information about QE.
  • File Conversion from QE to YAMBO (pdf download)
    In this tutorial we are going to learn how to convert QE-generated wavefunctions into YAMBO-readable format. These wavefunctions will be the basis for the calculation of excited states in the following step.
  • GW Band Structure (pdf download)
    DFT methods usually underestimate the band gap. In this tutorial we are going to learn how to calculate the quasiparticle correction to the band gap using the GW approximation implemented in YAMBO. We are going to use the command line to generate the input file for the GW calculation. We are also going to discuss the input parameters that require convergence tests during the GW calculation.
  • Postprocessing of the Quasiparticle Energies to Obtain the GW Band Structure (pdf download)
    Once the GW calculation is completed, we are going to use the yambo post-processor ypp, along with a set of commands, to plot the GW band structure.
  • Calculation of Optical Properties with YAMBO (pdf download)
    In this tutorial we are going to calculate the optical absorption spectrum using the GW-BSE method implemented in YAMBO. We are going to calculate the absorption spectrum including the quasiparticle correction for single layer MoS2. Before starting this tutorial, you must first complete the GW tutorial since this calculation depends on previously calculated corrected quasiparticle energies.

 

The SciServer PARADIM Data Collective (PDC) Cloud is a collaborative research platform for large-scale data-driven science. SciServer includes tools and services to enable researchers to cope with Terabytes or Petabytes of scientific data without needing to download any large datasets. PARADIM users can run density functional theory (DFT) computational simulations and analyze microscopy data via inbuilt Jupyter notebook recipes. Focus of this tutorial is to show how users can login to the SciServer computing environment and launch Jupyter notebooks. For users new to Jupyter notebooks, this tutorial will show you how to create and test your first notebook. Through a series of notebook examples available on SciServer, you can then explore the capabilities of Jupyter notebooks with particular emphasis on topic relevant to PARADIM’s materials research.
 

  • How to Login to SciServer
    In this tutorial you will learn how to access the PDC Cloud through an internet browser, how to create a new account, and how to gain access to specific PARADIM and SciServer resources.
  1. Go to: http://pdc.paradim.org. (Chrome internet browser is highly recommended)
  2. Click login and create a new account. Once logged in you will be navigated with the following Dashboard page:

    SciServer Dashboard
  3. Email pdc@jhu.edu to be added to the PARADIM user group and gain access to PARADIM specific resources. 

    You will receive a reply email within 24 hours inviting you to the PARADIM group. The next time you are login after being added to PARADIM group, your SciServer Dashboard will list the invitation in the “Groups” app (circled in red in the screenshot above).
  4. Click “Groups” and accept the invitation.
     
  • How to Create and Launch a Container
  1. Login to your SciServer account and start the Compute app (note that “Compute” and “Compute Jobs” are different and you need to open the “Compute”, circled in blue in the screenshot below)

    Scieserver Dashboard
  2. In Compute, use the green button to Create a computing container. 

    Be sure to:

    a.
    ​  type in a name for your container
    b. 
    select the “PARADIM” Compute Image from the drop-down menu
    c.  
    check the “PARADIM Data Collective” volume to add it to your container

    In a few seconds you will return to the “Compute” screen and you can see your running container. 
     
  3. Once your container is made, click the container to open a new browser tab accessing your compute container (If you are returning to a stopped container, you will have to click the green arrowhead to restart it).

    The container mounts into a new browser tab showing the mounted volumes (paradim_data, storage, and temporary). The storage and temporary volumes are for your personal work. They will be available in any other SciServer container you make (they go with your account).  The paradim_data storage is for data or work you want available to the whole PARADIM group. 

    Your storage volume is only 10 GB but is backed up and is permanent.  Your temporary volume is part of 80 TB of storage shared between users.  Temporary storage has no guarantee of long-term storage but can be used for weeks or even months at a time. You will be warned before it is cleaned up and given time to move anything you have stored there.  The paradim_data volume is about 300 TB storage for sharing materials data.  This is where experiment results will be shared.
     
  • How to Open a Notebook
  1. You can now make a notebook.  To make a new notebook use the “New” menu on the right-hand side of the browser as shown here:

    jupyter
  2. Choose “Python 3” and a new browser tab will open with a blank notebook. 
  3. In the first cell type “# My First Notebook” (without the quotes)
  4. Change the cell type to “markdown” and hit shift-enter.

    Your notebook will look like this (the red arrow shows you the drop-down menu to make the cell "markdown" instead of code):

    jupyter notebook

    Notice that the kernel is Python 3 (shown on the right side) and that there are tools and menus you can look through and use.
  5. Enter some Python code in the second cell and execute it. 

    Try 1+1 or try print ("All I dream about is reciprocal space.")
  6. Save your notebook and continue following the tutorial to explore a range of SciServer notebook examples.

    Please contact David Elbert (elbert@jhu.edu) or Nick Carey (ncarey4@jhu.edu) for any questions related to SciServer.
  • Your First SciServer Tutorial
  1. Go back to the “Home” tab by clicking on the “Jupyter” logo, browse to the paradim_data volume and locate the “example>pdc_example_notebooks” folder.  Open it.
  2. Open the “ReadMePlease.ipynb” notebook for basic information about the example notebooks.  Everything in the examples directory is read-only.  If you want your own copies of the notebooks to play with, follow the directions in ReadMePlease.ipynd to copy the example notebooks to your own, persistent storage. Explore the Jupyter window’s menu bars and tools.  These include editing tools and the ability to change or restart the computational kernel.
  3. Close the notebook by using the File>Close and Halt in the Jupyter menubar.
  4. Open “Example 1a Intro Notebook Basics".  This notebook is a general overview of using Jupyter.  If you’re new to Jupyter, read through the notebook and follow examples to edit cells and make menu choices.
  5. Open “Example 1b Intro Running Code” to learn how to execute cells and run active code.  Double-click your cursor into the first code cell (it has a gray background and the python code “a = 10”.  Shift-enter to execute the cell.  Execute the next cell (Shift-enter) to use the Python print statement to print back the value of a.  Work through the other cells in Example 1b to get a feel for mixing execution and text in a notebook.
  6. Go back to the “Home” tab and go to the Jupyter tab labeled “Running.”  There you will find what notebooks and terminals you have open and running.  You can go back to them by clicking on them or you can shut them down here.  N.B. if you shutdown a notebook from this tab you don’t know what was saved.  This is a really useful tab to get back to something that you left running the last time you were on the Data Science Cloud (DSC).
     
  • Example Notebook Tutorials Available on SciServer
    Once you have followed the above steps successfully the system is now ready for you to explore some further notebook tutorials,
  1. ​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​​Open the Example 5 notebook.  Click in each cell to see the markdown (formatted text cells) and execute the calculations from top to bottom.  There are four important things to learn in this notebook:

    a.  Jupyter has two modes: command and edit.  Command mode is for moving around the notebook while edit mode lets you modify the contents of the cells. Esc puts you in command mode.  Esc-enter puts you back in edit mode once you click into a cell.

    b.  ​​​​​​You manually pick if a cell is code (executable) or markdown (formatted text).  There is a drop-down menu to do (you can pick up key-stroke shortcuts later, but command mode M changes a cell to markdown and command mode Y changes it to code)

    c.  Cells need to be executed when you finish with your input.  That means markdown (text) cells, too.  Execute any cell with shift-enter.

    d.  You can use these notebooks to combine complex formatting, materials calculations, and something like a scanned image to fully investigate problems.
  2. Open the Example 4 notebook.  N.B. This notebook requires an updated key from the Materials Project. Go to  https://www.materialsproject.org/ and login to the dashboard to get an API key.  To use the Materials Project APIs, you need to get a new key every day.

    a.  Replace the API key in quotes of the first code cell in the Example 4 notebook with a fresh key.

    b.  Execute the cells in order to retrieve and display data from the Materials Project.  The Materials Project is a data repository that has already calculated a range of properties for common materials. You can pull information directly into your notebooks to avoid duplicating that effort.
  3. Open the Example 6 notebook.  This notebook takes advantage of the preloaded Mantid environment in the MEDE-DSC.  Mantid is a data analysis and visualization package created for neutron and muon scattering results from beamlines. 

    a.  
    Read the text and execute the code in sequence.  This notebook reads in a neutron data file from the ISIS Neutron Source Facility near Oxford.

    b.  
    Using Mantid calls, the cells plot the raw data as well as showing a smoothing function.

    Mantid is the analysis package of choice at the Spallation Neutron Source (SNS) at Oak Ridge.
     
  • Jupyter Keyboard Shortcuts

    Frequently used:
    Esc-goes to Command Mode where arrow keys let you navigate
    Enter-goes to edit mode where you can type in cells​​​​​​​

    In Command Mode:
    A-inserts new cell above
    B-inserts new cell below
    M-changes current cell to Markdown
    Y-changes current cell to code​​​​​​​
    DD-(hit D twice) deletes the current cell
    Shift-Tab- will show the Docstring for code object you just typed
    ?-Typing ? before a command and evaluating it will show the Docstring
    Ctrl+Shift+hyphen will split the current cell into two at your cursor
    Esc+F to find and replace in code
    Esc+O to toggle cell output

    Selecting Multiple Cells:
    Shift-J or Shift-down- selects the next cell down
    Shift-K or Shift-up selects the next cell above
    Shift-M merges selected cells

    You can delete/copy/cut/paste multiply selected cells

    Multicursor support like Sublime.  Click and drag mouse while holding down Alt.​​​​​​​
    ​​​​​​​
  • ​​​​​​​Additional Resources and References​​​​​​​
  1. Graphing Tools

    ​​​​​​​​​​​​​​​​​​​​​​​​​​​​matplotlib is the de-facto standard.  It’s activated with %matplotlib inline - Here’s a Dataquest Matplotlib Tutorial.  Inline it can be a little slow because it is rendered on the server-side, but it’s easy and well known.

    Seaborn is built over matplotlib and makes building more attractive plots easier. Just by importing Seaborn, your matplotlib plots are made ‘prettier’ without any code modification.

    mpld3 provides an alternative renderer (using d3) for matplotlib code. Quite nice, though incomplete.

    bokeh is a better option for building interactive plots.

    plot.ly can generate nice plots - this used to be a paid service only but was recently open sourced.

    Altair is a relatively new declarative visualization library for Python. It’s easy to use and makes great looking plots, however the ability to customize those plots is not nearly as powerful as in matplotlib.​​​​​​​​​​​​​​
  2. Further Readings

    Numerical Python: A Practical Techniques Approach for Industry, Robert Johansson, 2015, available online through JHU Libraries

    ​​​​​​​Python: Pocket Primer, Oswald Campesato, 2012, available online through JHU Libraries

    Data Wrangling with Python, Jacqueline Kazil and Katherine Jarmul,

    Online Python Basics:
    http://www.mantidproject.org/Introduction_To_Python (Exercises 1, 2,…)
    http://cs231n.github.io/python-numpy-tutorial/
    https://engineering.ucsb.edu/~shell/che210d/numpy.pdf