Jiaheng Lu is an Associate Professor at the University of Helsinki, Finland. His main research interests lie in the big data management and database systems, and specifically in the challenge of efficient data processing from real-life, massive data repository and Web. He has published more than eighty journal and conference papers. He has published several books, on XML, Hadoop and NoSQL databases. His book on Hadoop is one of the top-10 best-selling books in the category of computer software in China in 2013. He has frequently served as a PC member for conferences including SIGMOD, VLDB, ICDE, EDBT, CIKM etc.
Bogdan Cautis is a Professor in the Department of Computer Science of University of Paris-Sud, France, since 2013. He obtained his PhD degree from University of Paris-Sud and INRIA in 2007 and was an Associate Professor at Telecom Paristech, Paris, between 2007 and 2013. His current research interests lie in the broad area of data management and information retrieval, including social data management and database theory. He has frequently served as a PC member for conferences including SIGMOD, VLDB, ICDE, EDBT etc.
Irena Holubová is an Associate Professor at the Charles University, Prague, Czech Republic, where she received Ph.D. degree in 2007. Her current main research interests include big data management and NoSQL databases, big data generators and benchmarking, evolution and change management of database applications, analysis of real-world data, and schema inference. She has published more than 80 conference and journal papers; her works gained 4 awards. She has published 2 books on XML and NoSQL databases.
As more businesses realized that data, in all forms and sizes, is critical to making the best possible decisions, we see the continued growth of systems that support massive volume of non-relational or unstructured forms of data. One of the most challenging issues in the era of big data is the “Variety” of the data. It may be presented in various types and formats – structured, semi-structured and un-structured – and produced by different sources, and hence natively have various models. In general, there are two solutions to manage multi-model data currently: a single integrated multi-model database system or a tightly-integrated middleware over multiple single-model data stores. In this tutorial, we review and compare these two approaches giving insights on their advantages, trade-offs, and research opportunities. In particular, we dive into four key aspects of technology for both systems, namely (1) theoretical foundation of multi-model data management, (2) storage strategies for multi-model data, (3) query languages across models, and (4) query evaluation and its optimization. We provide a comparison of performance for the two approaches and discuss related open problems and remaining challenges.