Research Data Management

The numerical and experimental investigation techniques within the Transregio (TRR) 287 “BULK REACTION” cover a broad range of research areas, leading to a strong heterogeneity of research data and research software. Linking physical and mathematical descriptions, the resulting computational models and the corresponding experimental results is a central aspect of the TRR 287. As a consequence, research data management (RDM) is of enormous importance to enable a systematic, objective and reproducible validation and verification of data through rigorous comparisons and evaluations. It also provides the basis for making the results of the TRR publicly available.

One of the primary objectives of the TRR 287 is the generation of a durable and accessible infrastructure of organised, quality-assured and well described knowledge in the form of research data and research software. The RDM strategy of TRR 287 is based on three pillars: knowledge management, central versioning of software code, and a central storage system. As far as possible, the RDM services are integrated into the work and research environment of the scientists as much as possible and the scientists can find adequate support at every point of the Data Life Cycle.

To facilitate internal communication and collaboration, a Mattermost server has been established as an open-source, self-hostable online chat software. A knowledge management software is used as a central knowledge base for project, knowledge (e. g. description of experimental setups) and data documentation. As one of the most important components of the knowledge base, the data management plans (DMP) of the TRR’s projects act as living documents that gather the most important information of planned and produced research data and software. To ensure a high quality and up-to-date content, the data management plans are reviewed by the project leaders. For sustainable software development, GitLab is in use. GitLab offers an issue tracking system with Kanban board and a system for continuous integration and continuous delivery (CI/CD) as well as many other useful functionalities for professional software development. Unique measurement and simulation datasets, which are the basis of publications and often of fundamental importance for the TRR 287, will be stored in a Long Term Storage (LTS). To allow for an easy reuse of the data, it is combined with detailed information on the simulation or experiment in a template-based markdown file. All datasets get a unique identifier (UID) and are enhanced with descriptive metadata. UIDs and metadata are stored in the knowledge management system as a database. In summary, the services described above form a network of data, information and knowledge.

RDM activities at the TRR 287 are supplemented by a data steward ((50% position in the central IT.SERVICES). This role serves two main functions:

  • Firstly, the data steward is responsible for awareness campaigns, RDM training and acts as a central RDM support within TRR 287.

  • Secondly, the data steward serves as a link between the university’s Research Data Services teams and the TRR 287 to ensure long-term compatibility of institutional strategies and TRR’s needs regarding data management and its underlying infrastructure.

Network of data

    Workshop about archiving and sharing research date on the 8h of September 2022

The Code and Data sprint was a two day event and took place in Magdeburg. 

This CDC continued the first CDC in Berlin and further actions were undertaken to develop RDM in the CRC287. Data Management plans of the projects were updated by the doctoral researchers and example data sets were prepared. Further in-depth diving into the central data management platform Confluence was conducted. Group discussions with experts helped to answer urgent questions regarding RDM. In addition, the projects conducted several steps of internal review process to optimize the preparation and documentation of their datasets. Lastly, a pilot group for RDM was formed to support the CRCs data steward and their colleagues.

Archiving and sharing of research data is one key issue of large scientific collaboration groups like the CRC287.

Challenges include

  • Numerical and experimental investigations covering
    a broad range of research disciplines – strong
    heterogeneity of research data
  • Linking physical and mathematical and the resulting
    computational models with high-resolution
    experiments
  • Research data management enormously important
    to promote necessary internal exchange of
    research data

During the workshop the topics archiving, sharing, metadata, data documentation and long term storage were elaborated. Furthermore, the doctoral researcher practiced on several exercises concerning the handling of research data.

    Workshop about data management and reproducible science on the 6h of July 2022