NSF 2D Materials Data Framework Training Workshop
November 11-15, 2018
This Materials Data Training Workshop was sponsored by the NSF and designed for graduate students and post-docs from research teams recently awarded NSF-2D Materials Data Framework Data Supplements. This four-day workshop was organized by the Platform for the Accelerated Realization, Analysis, and Discovery of Interface Materials (PARADIM), an NSF Materials Innovation Platform (MIP), in partnership with the NIST Office of Data and Informatics.
The workshop mission was to provide hands-on training to develop data-intensive knowledge and skills for DMR-2D research groups.
Dates: November 11-15, 2018. Location: Johns Hopkins University’s Mt. Washington Conference Center, Baltimore, MD.
Details: We are in the midst of a data revolution. The confluence of information rich measurement techniques and computing capabilities to store and analyze information are rapidly changing the face of how data is collected, distributed, analyzed, and interpreted. The Materials Genome Initiative and the NSF Materials Innovation Platforms are designed to tap into this revolution as applied to materials. This was the first in a series of data workshops that build upon the data supplements recently awarded to multiple NSF DMR teams and provided a series of training activities for students and post-docs in the realm of data sciences, with profound implications in workforce development. It was organized by the Platform for the Accelerated Realization, Analysis, and Discovery of Interface Materials (PARADIM), an NSF Materials Innovation Platform (MIP), in partnership with the NIST Office of Data and Informatics.
Specific curricular goals were for participants to be able to:
- Set up and navigate within a Python environment, with emphasis on PARADIM MIP applications
- Use Jupyter notebooks for data analysis and presentation
- Understand Python coding for control flow, data frames, plotting methods and basic statistics
- Access public materials datasets and MIP data through APIs
- Use the notebook interface for data mining and manipulation
- Use version control for their code and analysis (via GitHub)
Specific topics included:
- Bash shell
- The basics of Python
- Python packages and scripting for data analysis
- Introduction to databases and SQL scripting
- Git version control and the use of GitHub
- Introduction to Materials Domain Python packages
- Use of APIs for access to Materials datasets
- Basics of Data mining, wrangling, and visualization.
|7:00 – 9:00pm||Introduction and Computer Shakedown||Staff|
|8:30am -noon||Terminal Shell/Bash||Chandler Becker, NIST|
|1:00 – 4:30pm||Python||Daniel Wheeler, NIST|
|6:00 – 8:00pm||Dinner|
|8:30am -noon||Version control (Git and GitHub)||Jonathan Guyer|
|1:00 – 4:30pm||Databases (SQL and more)||Gretchen Greene|
|4:55pm||Board bus to Bloomberg Center (PARADIM tour)|
|8:15pm||Board bus to return to Conference Center|
|8:30 – 9:30 am||Misc. Morning Notes (SciServer interface and environments, Conda, notebooks and metadata, GitHub vs Git and the GitHub desktop app, Jupyter Magics)||David Elbert, Johns Hopkins University|
|9:30am – noon||Introduction to SMB6/Group Problem Solving introduction to Data Curation and Materials API’s||Nick Carey|
|1:00 – 3:00 pm||Group photo, SMB6/Group Problem Solving continued|
|3:00 – 4:00 pm||Accelerated Materials Discovery & Characterization with Quantum and Machine Learning Approaches||Kamal Choudhary|
|4:00 – 4:30 pm||
It is currently only set up to plot from local data, and only from VASP output files (vasprun.xml). I am working on getting it to plot directly from a provided mpid or similar. We are also interested in having it be able to read output files from other DFT codes; if you have output from your favourite DFT software that you would like to be able to plot using our tool, please send me an email and we can figure out how to get it into an appropriate format.
|8:30 – 10:00 am||Materials API|
|10:00 am- noon||Data Mining ans Wrangling|
|1:00 – 3:00 pm||Notebook Integration with Atomists and Final Discussion|
We gratefully acknowledge funding support from the National Science Foundation’s Division of Materials Research (Award #1853842). Dr. Eva Campo of the NSF provided the leadership and impetus for this workshop. Dr. Lisa Lewis, the AAAS Science and Technology Fellow at the NSF, provided additional help, insight and encouragement. Claudia Johnson, NSF Contractor, has provided administrative support throughout the planning process. We are also grateful for critical organizational support and hands-on assistance provided by a team from NIST including Chandler Becker, Daniel Wheeler, Gretchen Greene, Jonathan Guyer, and Kamal Choudhary.
JHU students have been a central part of the team with particular help from Nick Carey and the HEMI Data Rabble (Ali Rachidi, Connor Krill, and Alex Laubscher).