Hybrid layouts are widely used to store processed data in highly distributed Big Data systems to perform ad-hoc analysis. These Big Data systems process data on a computers cluster by creating multiple tasks. Typically, they create tasks based on the total size of the table, rather than based on the reading size of the query. Moreover, always using the default configuration has a heavy impact on performance. Thus, we proposed a cost-based framework which utilizes a multi-objective approach to decide the number of tasks and executors for a given query based on the reading size.

Enviornment Settings
Data Statistics
Workload Statistics
All possible configurations with pareto-frontier