All Facilities Open for Users

NSF 2D Materials Data Framework Training Workshop

JHU data workshop group

November 11-15, 2018
Baltimore, MD

This Materials Data Training Workshop was sponsored by the NSF and designed for graduate students and post-docs from research teams recently awarded NSF-2D Materials Data Framework Data Supplements.  This four-day workshop was organized by the Platform for the Accelerated Realization, Analysis, and Discovery of Interface Materials (PARADIM), an NSF Materials Innovation Platform (MIP), in partnership with the NIST Office of Data and Informatics.  

The workshop mission was to provide hands-on training to develop data-intensive knowledge and skills for DMR-2D research groups.  

Dates: November 11-15, 2018.  Location: Johns Hopkins University’s Mt. Washington Conference Center, Baltimore, MD. 

Details: We are in the midst of a data revolution. The confluence of information rich measurement techniques and computing capabilities to store and analyze information are rapidly changing the face of how data is collected, distributed, analyzed, and interpreted. The Materials Genome Initiative and the NSF Materials Innovation Platforms are designed to tap into this revolution as applied to materials. This was the first in a series of data workshops that build upon the data supplements recently awarded to multiple NSF DMR teams and provided a series of training activities for students and post-docs in the realm of data sciences, with profound implications in workforce development. It was organized by the Platform for the Accelerated Realization, Analysis, and Discovery of Interface Materials (PARADIM), an NSF Materials Innovation Platform (MIP), in partnership with the NIST Office of Data and Informatics.

  Specific curricular goals were for participants to be able to:

  • Set up and navigate within a Python environment, with emphasis on PARADIM MIP applications
  • Use Jupyter notebooks for data analysis and presentation 
  • Understand Python coding for control flow, data frames, plotting methods and basic statistics
  • Access public materials datasets and MIP data through APIs
  • Use the notebook interface for data mining and manipulation
  • Use version control for their code and analysis (via GitHub)

Specific topics included:

  1. Bash shell
  2. The basics of Python 
  3. Python packages and scripting for data analysis
  4. Introduction to databases and SQL scripting
  5. Git version control and the use of GitHub
  6. Introduction to Materials Domain Python packages
  7. Use of APIs for access to Materials datasets 
  8. Basics of Data mining, wrangling, and visualization.

Course resources link

Agenda/Speakers:

2D NSF Data Framework Workshop Agenda

Sunday

7:00 – 9:00pm Introduction and Computer Shakedown Staff

Monday

7:30am Breakfast  
8:30am -noon Terminal Shell/Bash Chandler Becker, NIST
Noon-1:00pm lunch  
1:00 – 4:30pm Python Daniel Wheeler, NIST
6:00 – 8:00pm Dinner  

Tuesday

7:30am Breakfast  
8:30am -noon Version control (Git and GitHub) Jonathan Guyer
Noon-1:00pm Lunch  
1:00 – 4:30pm Databases (SQL and more) Gretchen Greene
4:55pm Board bus to  Bloomberg Center (PARADIM tour)  
8:15pm Board bus to return to Conference Center  

Wednesday

7:30am Breakfast  
8:30 – 9:30 am Misc. Morning Notes (SciServer interface and environments, Conda, notebooks and metadata, GitHub vs Git and the GitHub desktop app, Jupyter Magics) David Elbert, Johns Hopkins University
9:30am – noon Introduction to SMB6/Group Problem Solving introduction to Data Curation and Materials API’s Nick Carey
Noon-1:00pm Lunch  
1:00 – 3:00 pm Group photo, SMB6/Group Problem Solving continued  
3:00 – 4:00 pm Accelerated Materials Discovery & Characterization with Quantum and Machine Learning Approaches Kamal Choudhary
4:00 – 4:30 pm

Lightning talks:

  • Kuanchen (Kevin) Xiong, Lehigh University
  • Anne Marie Tan, University of Florida
    • o Interactive electronic structure plotting tool
    • https://github.com/henniggroup/electronic-structure-visualization
    • You can download some scripts for making interactive bandstructure and density of states plots from this github repo. You will need to install some additional python packages for the plotting (details are in the README).
    • I have provided some sample files in the testdata/ directory which you can play around with. (They are not necessarily accurate calculations but are good enough for visualization purposes)

It is currently only set up to plot from local data, and only from VASP output files (vasprun.xml). I am working on getting it to plot directly from a provided mpid or similar. We are also interested in having it be able to read output files from other DFT codes; if you have output from your favourite DFT software that you would like to be able to plot using our tool, please send me an email and we can figure out how to get it into an appropriate format.

  • Franz Utermohlen (Ohio State University)
  • Pedram Tavadze (West Virginia University)
  • Mert Sengul (Pennsylvania State University)
 

Thursday

7:30am Breakfast  
8:30 – 10:00 am Materials API  
10:00 am- noon Data Mining ans Wrangling  
Noon- 1:00pm Lunch  
1:00 – 3:00 pm Notebook Integration with Atomists and Final Discussion  
     

 

 

 

 

We gratefully acknowledge funding support from the National Science Foundation’s Division of Materials Research (Award #1853842). Dr. Eva Campo of the NSF provided the leadership and impetus for this workshop. Dr. Lisa Lewis, the AAAS Science and Technology Fellow at the NSF, provided additional help, insight and encouragement. Claudia Johnson, NSF Contractor, has provided administrative support throughout the planning process. We are also grateful for critical organizational support and hands-on assistance provided by a team from NIST including Chandler Becker, Daniel Wheeler, Gretchen Greene, Jonathan Guyer, and Kamal Choudhary.
 

JHU students have been a central part of the team with particular help from Nick Carey and the HEMI Data Rabble (Ali Rachidi, Connor Krill, and Alex Laubscher).

nist

nsf