Integrated Data Analysis Pipelines for Large-Scale Data Management, HPC and Machine Learning.

Most of my research at the IT University of Copenhagen, including Delilah, is part of the DAPHNE project.

The DAPHNE project aims to create an open-source infrastructure for integrated data analysis pipelines, including data management and processing, HPC, and ML training and scoring. The project seeks to improve productivity and eliminate performance bottlenecks by developing appropriate APIs, DSL, and scheduling strategies. The project will evaluate the technological results on real-world use cases and datasets and create a new benchmark to quantify progress compared to the state-of-the-art. DAPHNE is a consortium of academic and industrial partners, with parallel work packages feeding into each other and project management, dissemination, and exploitation work packages to ensure efficient execution and widespread adoption.

I mainly contribute to work package 6, which focuses on integrating computational storage into DAPHNE. Delilah is a proof-of-concept eBPF-based computational storage processor, one of the main deliverables of work package 6. Delilah will serve as the baseline integration of computational storage in DAPHNE.