All Facilities Open for Users

PARADIM Data Collective—A Collaborative Discovery Platform

The PARADIM and NanoHub 2D data framework (2DDF) supplement was critical to creating the infrastructure that now underpins the MIP’s data and data-driven efforts including centralized hosting, a rich semantic data model, a high-performance computational platform providing modern data-science tools, API access, a mature python library for remote access to NanoHub tools, FAIR-compliant data releases, and training materials. Infrastructure development has facilitated machine learning applications and the first streaming-data platform for materials data with end-to-end data encryption.

Figure 1: Goals and Building blocks for the development of a collaborative data investigation platform that brings all aspects of data analysis to the Materials Innovation Platform (MIP) and its users.

Goals:

  • Collaborative data investigation platform
  • Integrated data-centric, workforce development
  • “Bring the analysis to the MIP”
Built On:
  • SciServer Platform (NSF DIBB)
  • Containerized Compute Integrated SQL Server
  • Custom Python environments NanoHub Remote
Results:
  • Python-based data wrangling, visualization, and analysis
  • Machine Learning development and deployment
  • Jupyter notebook training materials
  • Secure, automated data ingress


https://www.paradim.org/publications/data_sets https://www.paradim.org/toolbox/datatools

PARADIM Data Program Icon

 

Figure 1: Goals and Building blocks for the development of a collaborative data investigation platform that brings all aspects of data analysis to the Materials Innovation Platform (MIP) and its users.

Embracing the vision of placing MIPs at the center of the 2DDF, PARADIM took a leadership role and partnered with NIST to create the first 2DDF Training Workshop. The workshop covered fundamental data topics over 4.5 days and was facilitated by PARADIM infrastructure.  PARADIM’s leadership role in materials data now reaches beyond the 2DDF and has been central to the recent creation of the Materials Research Data Alliance (MaRDA, https://www.marda-alliance.org) which focuses on the high-priority, MGI strategic goal of building a network of materials data stakeholders to identify community needs and bring together infrastructure providers to leverage strengths while pushing forward better capabilities and stronger integration of materials data resources.

 

 

Goals:

  • Create FAIR Materials Data infrastructure
  • Provide MIP with FAIR compliance for their data 

Results:

  • FAIR components implemented
    • Findable:
      • Digital Object Identifier (DOI)
      • DataCite Metadata
      • www.paradim.org browsing
    • Accessible:
      • Permanent landing page
      • Data retrievable by DOI
    • Interoperable:
      • Open file formats
      • Instrument standard formats
    • Reusable:
      • Explicit license (CC4.0-BY-NC-ND)
  • Three levels of compliance:

    1. Public

    2. Public and citable

    3. Public, citable, and highly curated


 

FAIR Data Tools

Figure 2: The FAIR principles (findable, accessible, interoperable, and reuseable) for materials data lead to the creation of three levels of compliance that users can choose from to share the generated data.

 

What Has Been Achieved:

As we develop better, more meaningful ways to open materials data, PARADIM users can provide the generated data in three ways:

  1. Publicly available through browsing at https://www.paradim.org/publications/data_sets; Example: https://data.paradim.org/176
  2. Publicly available with the addition of citation though PARADIM minting of Digital Object Identifiers (DOIs) that provide DataCite schema compliant metadata, and permanent landing pages; Example: https://doi.org/10.34863/1lk1-pd01
  3. Publicly available, citable, and highly curated which adds richer information including associated analysis codes or insights as befits the study, Example: https://doi.org/10.34863/g4wa-0j57.

Importance of the Achievement: FAIR compliance for materials data is evolving with need for improvements in metadata richness, standards for interoperability, and integration with publishers.  Despite these challenges, PARADIM’s 2DDF supplement work provides meaningful FAIR compliance through DataCite DOIs, use of open or standard formats, and a data policy that defaults to open data with an explicit, Creative Commons license.

 

Research Highlight