Workshop on Cloud Databases (CloudDB)

In conjunction with 49th International Conference on Very Large Data Bases(VLDB)

Vancouver

August 28, 2023

Workshop Overview

Cloud providers and database vendors are investing heavily in the development of competitive cloud database offerings, with the goal of providing optimal performance in a cost-effective way for cloud customers. The database community has already contributed significantly towards developing cloud-native OLTP and OLAP databases. Cloud computing pools an abundance of resources and offers them in a pay-as-you-go model. Due to this unique computational environment and business model, there are various research challenges that need to be addressed. These challenges require new research into resource disaggregation, serverless database services, and data movement across multiple cloud providers. Additionally, existing database research topics require re-evaluation such as:

  • Multitenancy

    How to manage and efficiently make use of cloud resources (CPU, memory, and network/storage I/O) to support multiple tenants with different SLA requirements.

  • Autonomous Databases

    How to automate database tuning and physical design (e.g., data compression, range partitioning scheme, buffer management policy) based on dynamic cloud workloads.

  • Resource Usage Prediction

    How to accurately predict resource usage of workloads to manage the cluster of database resources for different types of workloads, TP, AP, etc.

  • Query Optimizer for Cloud Databases

    How to leverage cloud workloads and resources to design better query optimizers (cardinality estimation, cost model, plan enumeration).

  • Disaggregation

    How to leverage different layers of caches (local cache vs ephemeral storage pool) to accelerate queries.

  • CloudDBA

    How to assist customers in monitoring and optimizing cloud database performance and, when a failure happens, quickly identifying and fixing the failure.

Our workshop aims to bring together researchers and practitioners from both academia and industry to discuss these challenges as well as possible directions to tackle them. Specifically, the workshop has three objectives. Firstly, it provides a platform for researchers to present their latest research results in the area of cloud databases. Secondly, it provides an opportunity for practitioners to assess the results and provide feedback. Thirdly, and most importantly, it helps researchers and practitioners to build connections and explore potential collaboration.

Topics of Interest The suggested topics of interest include, but are not limited to:
  • Disaggregation
  • Transaction and Recovery
  • Query Optimizer
  • Serverless Database Services
  • Multitenancy
  • Autonomous Databases
  • CloudDBA
  • Security
  • HTAP
Agenda

Time

Topic

Speakers

07:30 - 08:30

Light Breakfast - Junior & Pavilion Foyers

-

08:30 - 09:30

Keynote 1: Modern Cloud DBMSs Vindicate Age-Old Work on Shared Disks DBMSs!

C. Mohan HKBU, THU, IBM

09:30 - 10:00

QFilter: Towards a Fine-Grained Access Control for Aggregation Query Processing over Secret Shared Data

Meghdad Mirabi Technical University of Darmstadt
Carsten Binnig TU Darmstadt

10:00 - 10:30

Coffee Break - Junior & Pavilion Foyers

-

10:30 - 11:30

Keynote 2: Towards Practical, Scalable and Private Management of Cloud Data

Amr El Abbadi UCSB

11:30 - 12:00

QuEST: Fast, Expressive, and Cheap Analytics for Distributed Traces Using Cloud Storage

Jessica Berg New York University
Muhammad Haseeb New York University
Haiming Chen New York University
Yaojia Ju New York University
Anirudh Sivaraman Kaushalram New York University
Ravi Netravali Princeton University
Srinivas Narayana Rutgers University

12:00 - 13:30

LUNCH - Junior & Pavilion Foyers

-

13:30 - 14:15

Keynote 3: Towards ML-augmented Cloud Database Systems

Carsten Binnig TU Darmstadt, Google

14:15 - 15:00

Keynote 4: Ronen Grosman Cloud Native OLTP Database – The path to true cloud native

Ronen Grosman Huawei

15:00 - 15:30

Coffee Break - Junior & Pavilion Foyers

-

15:30 - 17:00

Panel: Cloud-native Databases: Challenges and Opportunities

Host: Sanjay Krishnan
Panelist: C. Mohan, Amr El Abbadi, Carsten Binnig, Ronen Grosman

Keynotes

Keynote 1
Modern Cloud DBMSs Vindicate Age-Old Work on Shared Disks DBMSs!

C. Mohan

Distinguished Professor of Science (Hong Kong Baptist University, China)
Distinguished Visiting Professor (Tsinghua University, China)
Retired IBM Fellow (IBM Research, USA)

Abstract: Over 3 decades ago, when the database research community was enamored of shared nothing database management systems (DBMSs), some of us were focused on DBMSs which were based on the shared disks (SD) architecture. While my own work involved IBM’s DB2 on the mainframe, earlier SD product work had been done by DEC, IBM (with IMS), Oracle and a couple of Japanese vendors. The research community didn’t appreciate that much our SD work even though IBM and Oracle have been quite successful with their SD relational DBMS products. With the emergence of the public cloud, many classical on-premises DBMSs have been ported to the cloud arena. New DBMSs have also been developed from scratch to work in the cloud environment. One of the dominant characteristics of the cloud DBMSs is that they are embracing the SD architecture because of the architectural separation of compute nodes and storage nodes (also called disaggregated storage) in the cloud environment to gain several advantages. I feel that these recent developments vindicate our age-old SD work!

In this talk, I will first introduce traditional (non-cloud) parallel and distributed database systems. I will cover concepts like SQL and NoSQL systems, data replication, distributed and parallel query processing, and data recovery after different types of failures. Then, I will discuss how the emergence of the (public) cloud has introduced new requirements on parallel and distributed database systems, and how such requirements have necessitated fundamental changes to the architectures of such systems which includes embracing at least some of the SD ideas. I will illustrate the related developments by discussing the details of several cloud DBMSs.
Bio: Dr. C. Mohan is currently a Distinguished Professor of Science at Hong Kong Baptist University, a Distinguished Visiting Professor at Tsinghua University in China, and a member of the inaugural Board of Governors of Digital University Kerala. He retired in June 2020 from being an IBM Fellow at the IBM Almaden Research Center in Silicon Valley. He was an IBM researcher for 38.5 years in the database, blockchain, AI and related areas, impacting numerous IBM and non-IBM products, the research and academic communities, and standards, especially with his invention of the well-known ARIES family of database locking and recovery algorithms, and the Presumed Abort distributed commit protocol. This IBM (1997-2020), ACM (2002-) and IEEE (2002-) Fellow has also served as the IBM India Chief Scientist (2006-2009). In addition to receiving the ACM SIGMOD Edgar F. Codd Innovations Award (1996), the VLDB 10 Year Best Paper Award (1999) and numerous IBM awards, Mohan was elected to the United States and Indian National Academies of Engineering (2009), and named an IBM Master Inventor (1997). This Distinguished Alumnus of IIT Madras (1977) received his PhD at the University of Texas at Austin (1981). He is an inventor of 50 patents. During the last many years, he focused on Blockchain, AI, Big Data and Cloud technologies (https://bit.ly/sigBcP, https://bit.ly/CMoTalks). Since 2017, he has been an evangelist of permissioned blockchains and the myth buster of permissionless blockchains. During 1H2021, Mohan was the Shaw Visiting Professor at the National University of Singapore. In 2019, he became an Honorary Advisor to the Tamil Nadu e-Governance Agency. In 2020, he joined the Advisory Board of the Kerala Blockchain Academy. Since 2016, Mohan has been a Distinguished Visiting Professor of China’s prestigious Tsinghua University. In 2023, he was named Distinguished Professor of Science of Hong Kong Baptist University. In 2021, he was inducted as a member of the inaugural Board of Governors of the new Indian university Digital University Kerala. Mohan has served on the advisory board of IEEE Spectrum, and on numerous conference and journal boards. During most of 2022, he was a consultant at Google with the title of Visiting Researcher. He has also been a Consultant to the Microsoft Data Team in 2020. Mohan is a frequent speaker in North America, Europe and Asia. He has given talks in 43 countries. He is highly active on social media and has a huge network of followers. More information can be found in the Wikipedia page at https://bit.ly/CMwIkP and his homepage at https://bit.ly/CMoDUK

Keynote 2
Towards Practical, Scalable and Private Management of Cloud Data

Amr EI Abbadi

Professor of Computer Science (UCSB)

Abstract: Due to the widespread use of cloud applications, searching for data from cloud servers is ubiquitous. However, accessing data stored in a cloud server comes with severe privacy concerns owing to numerous attacks and data breaches. Much research has focused on preserving the privacy of data stored in the cloud using various advanced cryptographic techniques. Our goal in this talk is to demonstrate how private access of data can become a practical reality in the near future. Our focus is on supporting oblivious queries and thus hide any associated access patterns on both private and public data. For private data, ORAM (Oblivious RAM) is one of the most popular approaches for supporting oblivious access to encrypted data. However, most existing ORAM datastores are not fault tolerant and hence an application may lose all of its data when failures occur. To achieve fault tolerance, we propose QuORAM, the first datastore to provide oblivious access and fault-tolerant data storage using a quorum-based replication protocol. For public data, PIR (Private Information Retrieval) is the main mechanism proposed in recent years. However, current PIR proposals are inefficient especially with large data sets and require the server to consider data as an array of elements and clients retrieve data using an index into the array. This latter restriction limits the use of PIR in many practical settings, especially for key-value stores, where the client may be interested in a particular key, but does not know the exact location of the data at the server. In this talk we will discuss recent efforts to overcome these limitations, using Fully Homomorphic Encryption (FHE), to improve the performance, scalability and expressiveness of privacy preserving queries on public data.
Bio: Amr El Abbadi is a Professor of Computer Science. He received his B. Eng. from Alexandria University, Egypt, and his Ph.D. from Cornell University. His research interests are in the fields of fault-tolerant distributed systems and databases, focusing recently on Cloud data management, blockchain based systems and privacy concerns. Prof. El Abbadi is an ACM Fellow, AAAS Fellow, and IEEE Fellow. He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He served as Associate Graduate Dean at the University of California, Santa Barbara from 2021--2023. He has served as a journal editor for several database journals, including, The VLDB Journal, IEEE Transactions on Computers and The Computer Journal. He has been Program Chair for multiple database and distributed systems conferences, including most recently SIGMOD 2022. He currently serves on the executive committee of the IEEE Technical Committee on Data Engineering (TCDE) and was a board member of the VLDB Endowment from 2002 to 2008. In 2007, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. In 2013, his student, Sudipto Das received the SIGMOD Jim Gray Doctoral Dissertation Award. Prof. El Abbadi is also a co-recipient of the Test of Time Award at EDBT/ICDT 2015. He has published over 350 articles in databases and distributed systems and has supervised over 40 PhD students.

Keynote 3
Towards ML-augmented Cloud Database Systems

Carsten Binnig

Full Professor (TU Darmstadt)
Visiting Researcher (Google Systems Research Group)

Abstract: Database Management Systems (DBMSs) are the backbone for managing large volumes of data efficiently in the cloud. For providing high performance, many of the most complex DBMS components such as query optimizers or schedulers involve solving non-trivial problems. To tackle such problems, recent work has outlined a new direction of so-called learned DBMSs where core parts of DBMSs are being replaced by machine learning (ML) models which have shown to provide significant performance benefits. However, a major drawback of the current approaches to enabling learned DBMS components is that they not only cause very high overhead for training an ML model to replace a DBMS component but that the overhead occurs repeatedly which renders these approaches far from practical to be used for cloud DBMSs. In the first part talk, I will present our vision to tackle these issues. In the second part, I will then outline very recent work on LLM-augmented DBMSs to extend DBMS with new capabilities such as seamless querying of multimodal data which is composed of tables, text, and images.
Bio: Carsten Binnig is a Full Professor in the Computer Science department at TU Darmstadt and a Visiting Researcher at the Google Systems Research Group. Carsten received his Ph.D. at the University of Heidelberg in 2008. Afterwards, he spent time as a postdoctoral researcher in the Systems Group at ETH Zurich and at SAP working on in-memory databases. Currently, his research focus is on the design of scalable data systems on modern hardware as well as machine learning for scalable data systems. His work has been awarded a Google Faculty Award, as well as multiple best paper and best demo awards.

Keynote 4
Cloud Native OLTP Database - The path to true cloud native

Ronen Grosman

Distinguished Engineer, Chief Architect Gauss Canada Database Research Team (Huawei)

Abstract: As database systems have moved to the cloud they have progressively taken on more and more cloud native characteristics. They have moved from either lift and shift managed hosting or very functionality limited systems to more full featured, built for cloud systems. The most recent generation of cloud native database systems display many of the characteristics you would expect from any cloud native system (Server less, low cost, self-optimizing etc…). However especially for full featured OLTP systems a number of gaps still exist around either scalability, performance, or full functionality. In this talk I will discuss what it means to be a truly cloud native database, how existing systems have moved closer to this ideal, and what remains to get to a truly cloud native database system.
Bio: Ronen is the chief architect of the Gauss Canada research team focusing on next generation database systems in particular around the areas of cloud native OLTP, storage engine design and system scale out. Throughout his career Ronen has also been the architect for numerous other OLTP and HTAP focused database systems such as Db2 pureScale and Db2 Eventstore, and has had over 25 filed patents
Important Dates The important dates are listed below:
  • Paper submission

    May 31st, 2023

  • Notification of acceptance

    June 30th, 2023

  • Camera-ready submission

    July 20th, 2023

  • Workshop

    August 28th, 2023

Submission

Submission

https://cmt3.research.microsoft.com/CloudDB2023

Submission Instructions

We accept both long papers (limited to 12 pages + unlimited space for references) and short papers (limited to 6 pages + unlimited space for references).

Submissions are to be formatted following the standard VLDB template available at: http://vldb.org/pvldb/vol16-formatting/

Submissions will be reviewed in a single-blind manner. Each submission must include all author names and affiliations. We will use CMT’s conflict management system. All the authors of a submission must declare their conflicts before the paper submission deadline. Papers with incorrect or incomplete CoI information as of the submission closing time will be subject to desk rejection.