Multi-Model Databases and Tightly Integrated Polystores

Half-day tutorial at CIKM 2018- Friday, 26 October 2018

Jiaheng Lu

University of Helsinki, Finland

Jiaheng Lu is an Associate Professor at the University of Helsinki, Finland. His main research interests lie in the big data management and database systems, and specifically in the challenge of efficient data processing from real-life, massive data repository and Web. He has published more than eighty journal and conference papers. He has published several books, on XML, Hadoop and NoSQL databases. His book on Hadoop is one of the top-10 best-selling books in the category of computer software in China in 2013. He has frequently served as a PC member for conferences including SIGMOD, VLDB, ICDE, EDBT, CIKM etc.

Bogdan Cautis

University of Paris-Sud, France

Bogdan Cautis is a Professor in the Department of Computer Science of University of Paris-Sud, France, since 2013. He obtained his PhD degree from University of Paris-Sud and INRIA in 2007 and was an Associate Professor at Telecom Paristech, Paris, between 2007 and 2013. His current research interests lie in the broad area of data management and information retrieval, including social data management and database theory. He has frequently served as a PC member for conferences including SIGMOD, VLDB, ICDE, EDBT etc.

Irena Holubová

Charles University, Czech Republic

Irena Holubová is an Associate Professor at the Charles University, Prague, Czech Republic, where she received Ph.D. degree in 2007. Her current main research interests include big data management and NoSQL databases, big data generators and benchmarking, evolution and change management of database applications, analysis of real-world data, and schema inference. She has published more than 80 conference and journal papers; her works gained 4 awards. She has published 2 books on XML and NoSQL databases.

Abstract

As more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. One of the most challenging issues in the era of big data is the “Variety” of the data. It may be presented in various types and formats – structured, semi-structured and un-structured – and produced by different sources, and hence natively have various models. In general, there are two solutions to manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware over multiple single-model data stores. In this tutorial, we review and compare these two approaches giving insights on their advantages, trade-offs, and research opportunities. In particular, we dive into four key aspects of technology for both systems, namely (1) theoretical foundation of multi-model data management, (2) storage strategies for multi-model data, (3) query languages across models, and (4) query evaluation and its optimization. We provide a comparison of performance for the two approaches and discuss related open problems and remaining challenges.

Detailed Outline

Section

Topics

Motivation and Multiple Model Examples

E-commerce application scenario
Concepts of multi-model databases and tightly integrated polystores
Examples of OrientDB and ArangoDB

Theoretical Foundations

Category theory
Associative array and algebra

Multi-Model Data Storage

Classification of multi-model systems, timeline
Supported data models
Examples of storing in distinct classes of multi-model systems

Multi-Model Data Query Languages

Classification of approaches to querying in multi-model databases
Examples of querying in distinct classes of multi-model systems

Multi-Model Data Query Processing

Query optimization strategies in distinct classes of multi-model systems

Overview on Tightly Integrated Polystores

Taxonomy and general framework
Recent reference systems and comparison

Query Processing in Tightly Integrated Polystores

Query processing overview
Optimization strategies and materialization in the reference systems

Advanced Aspects of Tightly Integrated Polystores

Self-tuning, data placement / transfer
Transactions in polystore systems

Comparison of Multi-Model Databases and Tightly Integrated Polystores

Tradeoffs and key differences in features and design
Application scenarios

Open Problem and Challenges

Query processing and optimization in multi-model databases
Schema design and optimization in multi-model databases
Evolution management in multi-model databases
Extensibility in multi-model databases
Research issues ahead on tightly integrated polystores

Link to External Resources

Tutorial resources