publications
publications by categories in reversed chronological order. generated by jekyll-scholar.
2024
- dtool and dserver: A flexible ecosystem for findable dataJohannes L. Hörmann, Luis Yanes, Ashwin Vazhappilly, and 5 more authorsPLOS ONE, Jun 2024
Making data FAIR—findable, accessible, interoperable, reproducible—has become the recurring theme behind many research data management efforts. dtool is a lightweight data management tool that packages metadata with immutable data to promote accessibility, interoperability, and reproducibility. Each dataset is self-contained and does not require metadata to be stored in a centralised system. This decentralised approach means that finding datasets can be difficult. dtool’s lookup server, short dserver, as defined by a REST API, makes dtool datasets findable, hence rendering the dtool ecosystem fit for a FAIR data management world. Its simplicity, modularity, accessibility and standardisation via API distinguish dtool and dserver from other solutions and enable it to serve as a common denominator for cross-disciplinary research data management. The dtool ecosystem bridges the gap between standardisation-free data management by individuals and FAIR platform solutions with rigid metadata requirements.
2022
- Lightweight research data management with dtool : a use caseJohannes L. Hörmann, and Lars PastewkaIn Proceedings of the 7th bwHPC Symposium, Nov 2022
With the dtool data management framework, we adhere to the FAIR principles – findability, accessibility, interoperability, reusability – beginning at an early stage of the data lifecycle without introducing overwhelming administrative overhead. We show how the use of dtool has been implemented within IMTEK Simulation. In particular, we make data accessible, interoperable and reusable by packaging data and descriptive metadata in dtool datasets. The dtool lookup server makes data on a group-wide S3 object storage repository findable. The dtool ecosystem has proven applicable both for manual research data management as well as rapid generation of thousands of datasets in automized workflows.
2019
- Lightweight data management with dtoolTjelvar S. G. Olsson, and Matthew HartleyPeerJ, Mar 2019
The explosion in volumes and types of data has led to substantial challenges in data management. These challenges are often faced by front-line researchers who are already dealing with rapidly changing technologies and have limited time to devote to data management. There are good high-level guidelines for managing and processing scientific data. However, there is a lack of simple, practical tools to implement these guidelines. This is particularly problematic in a highly distributed research environment where needs differ substantially from group to group and centralised solutions are difficult to implement and storage technologies change rapidly. To meet these challenges we have developed dtool, a command line tool for managing data. The tool packages data and metadata into a unified whole, which we call a dataset. The dataset provides consistency checking and the ability to access metadata for both the whole dataset and individual files. The tool can store these datasets on several different storage systems, including a traditional file system, object store (S3 and Azure) and iRODS. It includes an application programming interface that can be used to incorporate it into existing pipelines and workflows. The tool has provided substantial process, cost, and peace-of-mind benefits to our data management practices and we want to share these benefits. The tool is open source and available freely online at http://dtool.readthedocs.io.