Some systems just can’t go down. The speakers in the Always On track at GOTO Chicago 2016 have hands-on experience with the challenges of running systems that can’t go down including case studies from Netflix and Uber.
Watch the videos (and download the slides) from the conference sessions in the Always On track at GOTO Chicago 2016 below.
What is Rugged all about?
with Matt Konda, Founder of Jemurai
Stability Patterns & Antipatterns
with Michael T. Nygard, Author of Release It!
Once you hit Release 1.0, your system will be living in the real world. It has to survive everything the messy, noisy real world can throw at it: from flash mobs to Twitter. Once the public starts beating on your system, it has to survive–without you.
Did you know that just having your database behind a firewall can bring down your system? You’ll learn about that and many other risks to your system. You will learn the biggest risks and how to counter them with stability design patterns. We’ll talk about the best way to define the term “availability” and why the textbooks get it all wrong.
In this session, you will learn why the path to success begins with a failure-oriented mindset. I’ll share war stories about antipatterns that have caused and accelerated millions of dollars worth of system failures. I’ll share some of my scars with you so that you can avoid them.
Chaos & Intuition Engineering at Netflix
with Casey Rosenthal, Engineering Manager for the Traffic Team and the Chaos Team at Netflix
Most systems at scale are initially optimized for one of the following: performance, availability, or fault tolerance. Netflix chooses to embrace development velocity. I will talk about the impact that this has on availability, and then specifically how my teams use Chaos Engineering and Intuition Engineering to navigate one of the largest scale deployments on the Internet.
What I Wish I Had Known Before Scaling Uber to 1000 Services
with Matt Ranney, Chief Systems Architect at Uber
To keep up with Uber’s growth, we’ve embraced microservices in a big way. This has led to an explosion of new services, crossing over 1,000 production services in early March 2016. Along the way we’ve learned a lot, and if we had to do it all over again, we’d do some things differently. If you are earlier along on your personal microservices journey than we are, then this talk may save you from having to learn some things learn the hard way.
Resilient Predictive Data Pipelines
with Siddharth “Sid” Anand, Data Architect at Agari Inc.
Big Data companies (e.g. LinkedIn, Facebook, Google, and Twitter) have historically built custom data pipelines over bare metal in custom-designed data centers. In order to meet strict requirements on data security, fault-tolerance, cost control, job scalability, and uptime, they need to closely manage their core technology. Like serving systems (e.g. web application servers and OLTP databases) that need to be up 24×7 to display content to users, data pipelines need to be up and running in order to pick the most engaging and up-to-date content to display. In other words, updated ranking models, new content recommendations, and the like are what make data pipelines an integral part of an end user’s web experience. We call these predictive data pipelines since their output is source data marked with ranking or classification data. At the heart of these systems lies Airflow, Airbnb’s thriving open-source workflow scheduler. Come to this talk to learn how Agari leverages Airflow and other best practices from both Cloud (AWS SNS, SQS, Kinesis, Auto-Scaling, S3, Lambda, etc…) and Big Data (Spark, Airflow, Avro, etc…) to build its fault-tolerant predictive data pipeline.