Research Data

The numerical and experimental investigation techniques within the Transregio (TRR) 287 “BULK REACTION” cover a broad range of research areas, leading to a strong heterogeneity of research workflows, data and software. This results, in turn, in a strong heterogeneity of datasets generated by different methods, ranging from camera images, Magnetic Resonance Imaging to data from computational simulations and the associated software stacks. Linking physical and mathematical descriptions, the resulting computational models and the corresponding experimental results is a central aspect of the TRR 287. As a consequence, research data management (RDM) is of enormous importance to enable a systematic, objective and reproducible validation and verification of data through rigorous comparisons and evaluations. It also provides the basis for making the results of the TRR publicly available.

One of the primary objectives is the generation of a durable and accessible infrastructure of organised, quality-assured and well described knowledge in the form of research data and research software. The RDM strategy of TRR 287 is based on four pillars: knowledge management, central versioning of software code,  documentation and tracking of experiments and a central storage system in combination with an institutional repository. As far as possible, the RDM services are integrated into the work and research environment of the scientists as much as possible and the scientists can find adequate support at every point of the Data Life Cycle. 

Figure 1: Domain-model of services for RDM in BULK REACTION

A knowledge management software is used as a central knowledge base for project, knowledge and data documentation. As one of the most important components of the knowledge base, the data management plans (DMP) of the TRR’s projects act as living documents that gather the most important information of planned and produced research data and software. To ensure a high quality and up-to-date content, the data management plans are reviewed by the project leaders. For sustainable software development, GitLab is in use. Based on the experiences during Funding Period 1 (FP1) the INF project will accompany the systematic deployment of Electronic Lab Notebooks during the second funding period, to document the planning, execution and evaluation of experiments.

Unique measurement and simulation datasets, which are the basis of publications and of fundamental importance for the TRR, will be stored in a Long Term Storage (LTS). To allow for an easy reuse of the data, it is combined with detailed information on the simulation or experiment in a template-based markdown file. All datasets get a unique identifier (UID) and are enhanced with descriptive metadata. UIDs and metadata are stored in the knowledge management system as a database.

Figure 2; Long Term Storage – Ingest process of datasets

For the public release of selected reference datasets from the TRR, the repository system ReSeeD will be used for the community-specific public BULK-REACTION database. ReSeeD supports easy-to-use ingest processes for research data and thus enables sustainable research data storage in accordance with the DFG’s recommendations for safeguarding good scientific practice. It provides comprehensive support for storing, structuring, archiving, and publishing research data.

As an important step towards the systematization and intensification of the RDM activities, two Data Stewards – one at OVGU and one at RUB will accompany the TRR, looking at the development and implementation of tailored hardware and software solutions for the RDM requirements of BULK-REACTION. They will be supported by a pilot group of young scientists working internally for the TRR. To facilitate further dissemination, the members of the pilot group will act as catalysts towards their colleagues.

The Code and Data sprint was a two day event and took place in Magdeburg. 

This CDC continued the first CDC in Berlin and further actions were undertaken to develop RDM in the CRC287. Data Management plans of the projects were updated by the doctoral researchers and example data sets were prepared. Further in-depth diving into the central data management platform Confluence was conducted. Group discussions with experts helped to answer urgent questions regarding RDM. In addition, the projects conducted several steps of internal review process to optimize the preparation and documentation of their datasets. Lastly, a pilot group for RDM was formed to support the CRCs data steward and their colleagues.

Archiving and sharing of research data is one key issue of large scientific collaboration groups like the CRC287.

Challenges include

  • Numerical and experimental investigations covering
    a broad range of research disciplines – strong
    heterogeneity of research data
  • Linking physical and mathematical and the resulting
    computational models with high-resolution
    experiments
  • Research data management enormously important
    to promote necessary internal exchange of
    research data

During the workshop the topics archiving, sharing, metadata, data documentation and long term storage were elaborated. Furthermore, the doctoral researcher practiced on several exercises concerning the handling of research data.

Dr. Andreas Schramm

The workshop served to explain the basic principles of research data management, discuss existing guidelines and present available services. The main topics of the workshop were

– Raising awareness for RDM

– Data Life Cycle

– Discussion of existing guidelines

– Data organization

– Data documentation

– Metadata

– Available services for RDM in TRR 287

Dr. Andreas Schramm

Basic functionalities of the Confluence software were presented in the workshop. The focus of the hands-on training was on

– Creation of a project page

– Creation of a template-based protocol

– Preparation of the own data management plan (DMP) using the existing template

Prof. Dr. Francesca di Mare

Jun.-Prof. Dr. Christian Lessig

The workshop served to explain the basic principles of research data management. Basic principles and existing guidelines were discussed. In addition, developed templates such as those for the data management plan were presented.