Motivation

The field of data analytics includes techniques, algorithms and tools for the inspection of data collections in order to extract patterns, generalizations and other useful information. The success and effectiveness of such analysis depend on numerous challenges related to the data itself, the nature of the analytics tasks, as well as the computing environment over which the analysis is performed. These issues have given rise to many diverse programming models, execution engines and data stores to enable large-scale data management. While all these systems have had great success, they still showcase their advantages on a limited subset of applications and types of data: For instance, graph-processing engines limit the amount of freedom in the computation at each node (or part of a graph) and fail to fully exploit possible parallelism. In addition, modern analytics workflows are tremendously complex: Data sources are heterogeneous and distributed. The tasks may be long- or short-running and entail different execution details depending on the user role and expertise. Furthermore, such tasks may range from simple or complex data operations and queries, to algorithmic processing, like data mining, text retrieval, data annotation, etc. Finally, the analysis may require multiple query engines. To harvest the benefits of this plethora of data and compute engines as well as programming models, libraries and tools available, we need coordinated, adaptive and integrative efforts on collectively tapping their potential. This central goal is the focus of this workshop. These efforts include the definition of versatile programming models, engine performance modeling and monitoring, extended planning and optimization algorithms, deployment/execution on multiple engines, as well as workflow management and visualization techniques, for complex analytics queries over large, heterogeneous, irregular or unstructured data over diverse compute environments.

Workshop focus and related topics

The goal of the 1st International Workshop on Multi Engine Data Analytics (MEDAL) is to bring together researchers and practitioners from both academia and industry to explore, discuss and possibly redefine the state of the art in big data analytics relative to modeling, methods and tools applied over any part of algorithms and computing infrastructures, as well as use-cases and applications that relate to big data analytics over multi-engine environments. Concretely, the workshop is expected to provide insight into:

This workshop will solicit original research work on fundamental aspects of big data analytics as well as the design, implementation and evaluation of novel tools, methods and applications for optimizing big data workflows (in parts or as a whole). We note here that contributions may span a wide range of topics, including (but not limited to):

Workshop Program

EDBT/ICDT and Workshops Program Overview

09:00 - 10:30 Session 1


10:30 - 11:00 Coffee Break


11:00 - 12:30 Session 2 - Keynote Talks

11:00 - 11:55 Keynote1: Big Data Management and Scalable Data Science: Key Challenges and (Some) Solutions, Prof. Dr. Volker Markl, TU Berlin
11:55 - 12:30 Keynote2: Enabling Cross-Platform Applications with Rheem, Dr. Jorge Quiane-Ruiz, QCRI

12:30 - 14:00 Lunch Break
14:00 - 15:30 Session 3


15:30 - 16:00 Coffee Break
16:00 - 17:30 Session 4

Keynote Talks

Keynote 1

Big Data Management and Scalable Data Science: Key Challenges and (Some) Solutions
Prof. Dr. Volker Markl

Abstract:
The shortage of qualified data scientists is effectively limiting Big Data from fully realizing its potential to deliver insight and provide value for scientists, business analysts, and society as a whole. Data science draws on a broad number of advanced concepts from the mathematical, statistical, and computer sciences in addition to requiring knowledge in an application domain. Solely teaching these diverse skills will not enable us to on a broad scale exploit the power of predictive and prescriptive models for huge, heterogeneous, and high-velocity data. Instead, we will have to simplify the tasks a data scientist needs to perform, bringing technology to the rescue: for example, by developing novel ways for the specification, automatic parallelization, optimization, and efficient execution of deep data analysis workflows. This will require us to integrate concepts from data management systems, scalable processing, and machine learning, in order to build widely usable and scalable data analysis systems. In this talk, I will present some of our research results towards this goal, including the Apache Flink open-source big data analytics system, concepts for the scalable processing of iterative data analysis programs, and ideas on enabling optimistic fault tolerance.

Bio:
Volker Markl is a Full Professor and Chair of the Database Systems and Information Management (DIMA) group at the Technische Universität Berlin (TU Berlin) and also holds a position as an adjunct full professor at the University of Toronto. He is director of the research group “Intelligent Analysis of Mass Data” at DFKI, the German Research Center for Artificial Intelligence and director of Berlin Big Data Center, a collaborative research center bringing together research groups in the areas of distributed systems, scalable data processing, text mining, networking, machine learning and applications in several areas, such as healthcare, logistics, Industrie 4.0, and information marketplaces.
Earlier in his career, Dr. Markl lead a research group at FORWISS, the Bavarian Research Center for Knowledge-based Systems in Munich, Germany, and was a Research Staff member & Project Leader at the IBM Almaden Research Center in San Jose, California, USA. His research interests include: new hardware architectures for information management, scalable processing and optimization of declarative data analysis programs, and scalable data science, including graph and text mining, and scalable machine learning.
Volker Markl has presented over 200 invited talks in numerous industrial settings and at major conferences and research institutions worldwide. He has authored and published more than 100 research papers at world-class scientific venues. He has been speaker and principal investigator of the Stratosphere collaborative research unit funded by the German National Science Foundation (DFG), which resulted in numerous top-tier publications as well as the "Apache Flink" big data analytics system. Dr. Markl currently serves as the secretary of the VLDB Endowment and was elected as one of Germany's leading "digital minds" (Digitale Köpfe) by the German Informatics Society (GI).

Keynote 2

Enabling Cross-Platform Applications with Rheem
Dr. Jorge Quiane-Ruiz

Abstract:
The world is fast moving towards a data-driven society where data is the most valuable asset. Organizations need to perform very diverse analytic tasks using various data processing platforms. In doing so, they face many challenges; mainly, platform dependence, poor interoperability, and poor performance when using multiple platforms. In this talk, I will present Rheem, our vision for big data analytics over diverse data processing platforms. Rheem provides a three-layer data processing and storage abstraction to achieve both platform independence and interoperability across multiple platforms. I will discuss how Rheem allows for cross-platform applications. In particular, I will present a machine learning and a data cleaning application and show how applications can leverage the platform-independence and cross-platform execution features of Rheem to boost performance. I will conclude with a discussion on the multiple research challenges that we need to address to achieve our vision.

Bio:
Jorge-Arnulfo Quiané-Ruiz is a Scientist at the Qatar Computing Research Institute (QCRI) since October 2012. His research interests include cross-platform data management, big data analytics, and big data profiling. Before joining QCRI, Jorge was research associate at Saarland University for three years. He did his Ph.D. in Computer Science at INRIA and University of Nantes, France obtained his degree in September 2008. He received a M.Sc. in Computer Science with specialty in Networks and Distributed Systems from Joseph Fourier University, Grenoble, France, in July 2004. He obtained, with highest honors, a M.Sc. in Computer Science from the National Polytechnic Institute, Mexico, in August 2003.

Committees

Workshop co-chairs:

Technical program committee (tentative):

Important Dates

Submissions

MEDAL will be a full-day event, organized and themed around the ASAP FP7 EU-funded project (http://www.asap-fp7.eu/), which tackles the problem of complex analytical tasks over multi-engine environments that require integrated profiling, modeling, planning and scheduling functions.
Papers will be submitted as PDF files, using the ACM SIG Proceedings double-column template (http://www.acm.org/sigs/publications/proceedings-templates) with a page limit of: 8 pages for full submissions 4 pages for short/visionary paper submissions 2 pages for demo/tutorial submissions.

Submission Site
All submissions will be handled electronically via EasyChair. The link for the submission page is : https://easychair.org/conferences/?conf=medal2016.