|"How can (worst-case optimal) joins be so interesting ?" |
|Speaker ||: ||Asst. Prof. Semih Salihoğlu |
|Date ||: ||12 December 2018 (Wednesday) |
|Time ||: ||15:30 - 17:00 |
|Location ||: ||İdris Yamantürk Conference Hall (1304), Faculty of Electrical and Electronics Engineering |
Worst-case optimality is perhaps the weakest notion of optimality for algorithms. A recent surprising theoretical development in databases has been the realization that the traditional join algorithms, which are based on binary joins, are not even worst-case optimal. Upon this realization, several surprisingly simple join algorithms have been developed that are provably worst-case optimal. Unlike traditional algorithms, which join subsets of tables at a time, worst-case join algorithms perform the join one attribute (or column) at a time. This talk gives an overview of several lines of work that my colleagues and I have been doing on worst-case join algorithms focusing on their application to subgraph queries. I will cover work from both distributed and serial settings. In the distributed setting, worst-case optimality is a yard-stick for two costs of an algorithm: (i) the load, i.e., amount of data per machine; and (ii) the total communication. Both load and communication complexity are at a trade-off with number of rounds an algorithm runs. I will describe how to achieve worst-case optimality in total communication and the performance of this algorithm on subgraph queries. It is an open theoretical problem to design constant-round algorithms with worst-case optimal load. In the serial setting, I will describe the optimizer of a prototype graph database called Graphflow that we are building at University of Waterloo. Graphflow's optimizer for subgraph queries mixes worst-case optimal join-style column-at-a-time processing seamlessly with traditional binary joins. Short Biography:
Semih Salihoglu is an Assistant Professor at University of Waterloo. His research focuses on graph databases, distributed systems for processing graphs, and algorithms and theories for distributed evaluation of database queries. He holds a PhD from Stanford University and is a recipient of the 2018 VLDB best paper award.