The Data and Distributed Systems track at GOTO Chicago 2016 was one of our most popular! Below are videos of the conference sessions with fantastic speakers discussing the verification of distributed systems, building such systems at scale, applications in the world of stream processing and more.
Building a Distributed Build System at Google Scale
with Aysylu Greenberg, Engineer at Google
It’s hard to imagine a modern developer workflow without a sufficiently advanced build system: Make, Gradle, Maven, Rake, and many others. In this talk, we’ll discuss the evolution of build systems that leads to distributed build systems, like Google’s BuildRabbit. Then, we’ll dive into how we can build a scalable system that is fast and resilient, with examples from Google. We’ll conclude with the discussion of general challenges of migrating systems from one architecture to another.
The Verification of a Distributed System
with Caitie McCaffrey, Distributed Systems Diva at Twitter
Distributed Systems are difficult to build and test for two main reasons: partial failure & asynchrony. These two realities of distributed systems must be addressed to create a correct system, and often times the resulting systems have a high degree of complexity. Because of this complexity, testing and verifying these systems is critically important. In this talk we will discuss strategies for proving a system is correct, like formal methods, and less strenuous methods of testing which can help increase our confidence that our systems are doing the right thing.
Applications in the Emerging World of Stream Processing
with Neha Narkhede, Co-Founder and CTO at Confluent and Co-Creator of Apache Kafka
Modern businesses have data at their core, and this data is changing continuously. How can we harness this torrent of continuously changing data in real time? The answer is stream processing along with enabling the collection of large amounts of data in real-time. The most obvious advantage of stream processing is the ability to move many analytical or reporting processes into real time. However, the excitement around stream processing goes well beyond just faster analytics or reporting. What stream processing really enables is the ability to build a company’s business logic and applications around the data that was previously only available in batch fashion in the data warehouse; as well as do that in a continuous fashion rather than once-a-day.
This presentation will give a brief introduction to Apache Kafka and describe it’s usage as a platform for streaming data. It will explain how Kafka serves as a foundation for both streaming data pipelines and applications that consume and process real-time data streams. It will introduce some of the newer components of Kafka that help make this possible, including Kafka Connect, framework for capturing continuous data streams, and Kafka Streams, a lightweight stream processing library for use in microservices. Finally it will describe the lessons learned by companies like LinkedIn building massive streaming data architectures.
Providing Flexible Database Consistency Levels with Manhattan at Twitter
with Boaz Avital, Tech Lead for Core Storage at Twitter
Manhattan is Twitter’s primary distributed key-value store. This talk will cover the general architecture of the storage system, along with the motivations, tradeoffs and benefits of highly-available eventual consistency. You will learn about some of the challenges presented by the proliferation of use cases in a multitenant environment. We will then discuss the practicalities and surprises in evolving an eventually consistent database to provide strongly consistent capabilities.
Declarative, Secure, Convergent Edge Computation: An Approach for the Internet of Things
with Christopher Meiklejohn, PhD Student at Université catholique de Louvain
Consistency is hard and coordination is expensive. As we move into the world of connected ‘Internet of Things’ style applications, or large-scale mobile applications, devices have less power, periods of limited connectivity, and operate over unreliable asynchronous networks. This poses a problem with shared state: how do we handle concurrent operations over shared state, while clients are offline, and ensure that values converge to a desirable result without making the system unavailable?
We look at a new programming model, called Lasp. This programming model combines distributed convergent data structures with a dataflow execution model designed for distribution over large-scale applications. This model supports arbitrary placement of processing node: this enables the user to author applications that can be distributed across data centers and pushed to the edge. In this talk, we will focus on the design of the language and show a series of sample applications.