AI4Flex.Data: AI-Driven Cross-Engine Optimization of Concurrent Workloads (SNF)

Modern data systems are fragmented, making efficient and portable workloads difficult. A unified abstraction is missing. AI4Flex.Data decouples workload definition from execution, uses AI for engine selection, and enables flexible, optimized multi engine processing.

Key data

Description

Modern data ecosystems rely on a growing number of specialized processing engines, each optimized for narrow workload types. This fragmentation forces developers to repeatedly adapt or rewrite workloads, creates operational silos, and leads to inefficient execution when heterogeneous tasks—such as database queries combined with machine‑learning operations—must be processed within a single system. As data volumes and workload complexity rise, these limitations increasingly hinder performance, portability, and cost efficiency.

Currently there is no unified abstraction that enables workloads to be expressed once and executed flexibly across multiple engines. Although existing systems share fundamental internal components, current optimizers operate only within single‑engine boundaries and cannot dynamically allocate operations to the engines where they perform best. As a result, organizations remain locked into specific vendors, face long and costly migration cycles, and lose the opportunity to exploit large performance differences across systems.

In this project we design and implement AI4Flex.Data, a virtualization layer that decouples workload specification from execution. It translates workloads from interfaces such as SQL, Spark, text-to-SQL or visual no‑code tools into a common intermediate representation, enabling seamless portability. AI‑driven cost and performance models then assign each operator to the most suitable engine and determine optimal provisioning strategies. The project develops learned models capable of generalizing to unseen workloads and engines, and introduces a synthetic workload generation pipeline to support their training. AI4Flex.Data ultimately provides a systematic framework for evaluating how learned cost models perform in real multi‑engine optimization scenarios.