From Big Data to Big Information and Big Knowledge: The Case of Earth Observation Data

Half-day tutorial at CIKM 2018- Friday, 26 October 2018

Konstantina Bereta

University of Athens, Greece

Konstantina Bereta is a Research Associate in the Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, and she holds a BSc. and MSc. from the same department. She is also a PhD candidate under the supervision of Prof. Manolis Koubarakis (expected date of graduation: Fall 2018). She has worked as a scientific programmer and research associate in several EU FP7 projects. Her research interests focus in the areas of spatiotemporal databases, Semantic Web and Cloud Computing.

Stefan Manegold

CWI and Leiden University, Netherlands

Stefan Manegold is the lead of the Database Architectures group of CWI and a Professor in Leiden University. He is a nationally and internationally recognized expert in system-oriented database research. He is particularly known for his pioneering work on hardware-conscious database technology, and for disseminating his research via the open-source columnar analytical database management system MonetDB, which is widely used in academia and business. Dr. Manegold’s research is focused on bridging the gap between database architectures and demanding applications areas, such as large-scale data analytics (Big Data), data intensive scientific discovery (eScience), and semantic web. His expertise comprises database architectures, query processing algorithms, and data management technology, with a particular focus on hardware- conscious algorithms and data structures, query optimization, scalability, performance, benchmarking and testing.

Manolis Koubarakis

University of Athens, Greece

Manolis Koubarakis is a Professor in the Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens. He previously held positions at the Dept. of Electronic and Computer Engineering, Technical University of Crete (Assistant and Associate Professor), the Dept. of Informatics, University of Athens (Visiting Researcher), the Dept. of Computation, UMIST (now University of Manchester) (Lecturer) and the Dept. of Computing, Imperial College, London (Research Associate). He has published more than 180 papers that have been widely cited in the areas of Artificial Intelligence (especially Knowledge Representation), Databases, Semantic Web and Linked Data. In 2015, he was elected Fellow of the European Association of Artificial Intelligence (EurAI). He has served in the program committee of various international conferences and workshops, and he has organized various international events. He has attracted more than 6M Euros in funding from the European Commission, the Greek General Secretariat from Research and Technology, the European Space Agency and industry sources.

George Stamoulis

University of Athens, Greece

George Stamoulis is a Research Associate in the Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, and a PhD candidate under the supervision of Prof. Koubarakis. He holds a Bsc. and Msc. from the Department of Informatics and Telecommunications of the National and Kapodistrian University of Athens. His research interests focus in the areas of Semantic Web, Data Visualization and Integration and User Interfaces.

Begüm Demir

Technische Universität Berlin, Germany

Begum Demir is a Professor and Chair of the Remote Sensing Image Analysis (RSiM) group at the Faculty of Electrical Engineering and Computer Science, Technische Universitat Berlin (TU Berlin), Germany. Before joining to TU Berlin, she was an Assistant Professor at the Department of Computer Science and Information Engineering, University of Trento, Italy, from 2013 to 2017 while in 2017 she became an Associate Professor at the same department. Her main research interests include machine learning and big data management with applications to remote sensing image analysis. She was a recipient of an ERC Starting Grant with the project “BigEarth- Accurate and Scalable Processing of Big Data in Earth Observation’ in 2017 and the IEEE Geoscience and Remote Sensing Society Early Career Award in 2018. She is a senior member of IEEE since 2016.


Some particularly important rich sources of open and free big geospatial data are the Earth observation programs of various countries such as the Landsat program of the US and the Copernicus programme of the European Union. Earth observation data is a paradigmatic case of big data and the same is true for the information and knowledge extracted from it. Earth observation data (satellite images and in-situ data) and the information and knowledge extracted can be utilized in many applications with financial and environmental impact in areas such as emergency management, climate change, agriculture and security. This potential has not been fully realized up to now, because Earth observation data and the information extracted from it “is hidden” in various archives operated by NASA, ESA and national space agencies. Therefore, a user that would like to develop an application needs to search in these archives, discover the needed data and information and integrate it in his application. In this tutorial we show how to “break these silos open” by publishing their data as RDF, enable their discovery by modern search engines, interlink it with other relevant data, and make it freely available on the Web to enable the easy development of geospatial applications. We present a complete data science pipeline that starts with Earth Observation datasets in various formats that are made freely available in the archives of space agencies like ESA and NASA, and ends with the deployment of an interactive visual application that uses Earth Observation data together with other collateral data (e.g., open government data, closed enterprise data, model data etc.) using linked data technologies. The tutorial will give an in-depth coverage of the techniques, systems and applications of linked Earth observation data developed by the presenters in the last 8 years in the context of 5 European projects. Related work by other researchers will also be covered in depth. Finally, open problems and directions for future research in this area will also be discussed.

Detailed Outline

Satellite data
Copernicus data as a paradigmatic case of big data
A data science pipeline for linked EO data
Database Techniques for Satellite Data
The DBMS systems MonetDB/SciQL, paradigm4/SciDB and rasdaman
Knowledge Discovery from Satellite Images
Pattern recognition and machine learning techniques for knowledge extraction from satellite images
RDF and SPARQL Extensions for Geospatial and Temporal Data
Geospatial and temporal ontologies
The data model stRDF
The query languages stSPARQL and GeoSPARQL
Incomplete information and the RDFi model
Spatiotemporal RDF Stores
The spatiotemporal RDF store Strabon
The OBDA system Ontop-spatial
Comparison of Strabon and Ontop-spatial with other geospatial and temporal RDF stores
Interlinking Geospatial and Temporal RDF Data
Geospatial entity resolution
Discovering geospatial and temporal relationships with the tools Silk and Radon
Searching, Browsing, Exploring and Visualizing Remote Sensing Data and Linked Spatiotemporal Data
Content based retrieval techniques for satellite image archives
The tool Sextant
Building environmental applications with sextant and big linked data sources

Link to External Resources

Tutorial resources