publications | Niclas Hedam

2023

DaMoN

Delilah: eBPF-offload on Computational Storage

Niclas Hedam, Morten Tychsen Clausen, Philippe Bonnet, Sangjin Lee, and Ken Friis Larsen

In 19th International Workshop on Data Management on New Hardware (DaMoN ’23), 2023

Abs PDF

The idea of pushing computation to storage devices has been explored for decades, without widespread adoption so far. The definition of Computational Programs namespaces in NVMe (TP 4091) might be a breakthrough. The proposal defines device-specific programs, that are installed statically, and downloadable programs, offloaded from a host at run-time using eBPF. In this paper, we present the design and implementation of Delilah, the first public description of an actual computational storage device supporting eBPF-based code offload. We conduct experiments to evaluate the overhead of eBPF function execution in Delilah, and to explore design options. This study constitutes a baseline for future work.
DAPHNE

D6.3 Prototype and overview of data path optimizations and placement

Marcus Paradies, Philippe Bonnet, Constantin Pestka, Alexander Krause, and Niclas Hedam

2023

Abs PDF

Report on Report and prototype of used data path optimization techniques and automatic data placement in hybrid memory and storage configurations. This deliverable describes the second version of the demonstrator of the Delilah computational storage prototype. The report also describes data path optimizations and placement in the context of the DAPHNE storage subsystem, including hardware acceleration on the data path.

2022

CIDR

DAPHNE: An Open and Extensible System Infrastructurefor Integrated Data Analysis Pipelines

Patrick Damme, Marius Birkenbach, Constantinos Bitsakos, Matthias Boehm, Philippe Bonnet, and 37 more authors

In 12th Annual Conference on Innovative Data Systems Research (CIDR ‘22), Jan 2022

Abs PDF

Integrated data analysis (IDA) pipelines—that combine data management (DM) and query processing, high-performance computing (HPC), and machine learning (ML) training and scoring—become increasingly common in practice. Interestingly, systems of these areas share many compilation and runtime techniques, and the used—increasingly heterogeneous—hardware infrastructure converges as well. Yet, the programming paradigms, cluster resource management, data formats and representations, as well as execution strategies differ substantially. DAPHNE is an open and extensible system infrastructure for such IDA pipelines, including language abstractions, compilation and runtime techniques, multi-level scheduling, hardware (HW) accelerators, and computational storage for increasing productivity and eliminating unnecessary overheads. In this paper, we make a case for IDA pipelines, describe the overall DAPHNE system architecture, its key components, and the design of a vectorized execution engine for computational storage, HW accelerators, as well as local and distributed operations. Preliminary experiments that compare DAPHNE with MonetDB, Pandas, DuckDB, and TensorFlow show promising results.
DAPHNE

D6.2 Prototype and Overview of Managed Storage Tiers and Near-Data Processing

Philippe Bonnet, Marcus Paradies, and Niclas Hedam

Jan 2022

Abs PDF

Report on state-of-the-art techniques for computational storage, near-data processing, and potential side effects in the context of the I/O hierarchy, as well as an overview of automatically determining the capabilities of a storage configuration.

2021

eBPF - From a Programmer’s Perspective

Niclas Hedam

Mar 2021

Version Number: 3

Abs PDF

eBPF allows software developers to write programs that are executed in the kernel without requiring recompilation and system restart. These programs can collect critical performance metrics when a kernel function is invoked. In this paper, we will describe and discuss the architecture of eBPF using libbpf as well as the core components of it. We will look at key differences between eBPF programs and typical user-space C programs. Lastly, we will look into some real-world use-cases of eBPF. We will, however, not discuss performance numbers or formal proofs. This paper is merely a summary of countless hours of reading through eBPF textbooks, blog posts, eBPF samples and kernel code.

2020

CIDR

Open-Channel SSD (What is it Good For)

Ivan Luiz Picoli, Niclas Hedam, Pınar Tözün, and Philippe Bonnet

In 10th Annual Conference on Innovative Data Systems Research (CIDR ‘20), Jan 2020

Abs PDF

Open-Channel SSDs are storage devices that let hosts take full control over data placement and I/O scheduling. In recent years, they have gained acceptance in data centers (e.g., Alibaba) and for computational storage (e.g., Pliops). Open-Channel SSDs require a host-based Flash Translation Layer (FTL) that manages the physical address space they expose. Open-source FTLs are now available for OpenChannel SSDs, providing either a generic yet tunable block device interface (e.g., pblk, SPDK, OX-Block), or applicationspeciﬁc FTLs developed for a speciﬁc data system (e.g., LightLSM, OX-ELEOS). In this paper, we share our experience developing three of those FTLs in the context of the OX controller. We position Open-Channel SSDs in the SSD landscape and discuss their relevance for data systems. In particular, we argue that Open-Channel SSDs cannot be considered as a uniform class of devices. Our main contribution is a description of the key design decisions we took in OX related to Open-Channel SSDs. We reﬂect on lessons learned and propose hints for the co-design of data systems and Open-Channel SSDs.
ADMS

Hash-Based Authentication Revisited in the Age of High-Performance Computers

Niclas Hedam, Jakob Mollerup, and Pınar Tözün

In 11th International Workshop on Accelerating Analytics and Data Management Systems (ADMS ’20), Aug 2020

Abs PDF

Hash-based authentication is a widespread technique for protecting passwords in many modern software systems including databases. A hashing function is a one-way mathematical function that is used in various security contexts in this domain. In this paper, we revisit three popular hashing algorithms (MD5, SHA-1, and NTLM), that are considered weak or insecure. More specifically, we explore the performance of the hashing algorithms on different hardware platforms, from expensive high-end GPUs found in data centers and high-performance computing centers to relatively cheaper consumer-grade ones found in the homes of end-users. In parallel, we observe the behavior of different hardware platforms. Our results re-emphasize that despite their theoretical strength, the practical utilization of widely used hashing algorithms are highly insecure in many real-world scenarios; i.e., cracking a password of length 6 takes less than 6 seconds using a consumer-grade GPU.