We are surrounded by tons of data. Our software is collecting massive data that needs to be put into a form so we can work with is. How do other companies deal with their data? What are best practices, tools and unknown ways?
The Data track at GOTO Berlin 2016 includes sessions with speakers seeking to answer these questions. Watch the videos from the Data track below.
Conflict Resolution for Eventual Consistency
with Martin Kleppmann, Researcher at University of Cambridge
What do collaborative editors like Google Docs, the calendar app on your phone, and multi-datacenter database clusters have in common?
Answer: They all need to cope with network interruptions, and still work offline. They all allow state to be updated concurrently in several different places, and asynchronously propagate changes to other nodes. If data is concurrently changed on different nodes, you get conflicts that need to be resolved.
There are different approaches to handling those conflicts: some systems let the user manually resolve them; some systems choose one version as the winner and throw away the other versions; and some systems try to merge concurrent updates automatically. For example, Google Docs uses an algorithm called Operational Transform (OT) to perform this merge, while Riak uses Conflict-Free Replicated Datatypes (CRDTs) to achieve a similar thing.
In this talk we will explore these algorithms for automatic merging. They start out quite simple, but as we shall see, they soon become fascinatingly mind-bending once you start trying to do more ambitious things. For example, if you wanted to write your own spreadsheet app or graphics software that allows several users to edit the same document concurrently, how would you go about doing that?
Building a culture of Experimentation at Spotify
with Ben Dressler, Experimentation Lead at Spotify
Running an experiment is trivial: Make a change and see what happens.
Running experiments at scale, however, is a different story.
It is not trivial to simultaneously run hundreds of experiments across 100 million users. It’s not trivial to cover dozens of platforms and markets while staying on top of the technical and methodological complexities. And it’s also not trivial to build a culture of curiosity where people are happy to expose their ideas to a reality check and are not afraid to see them fail.
This talk will touch on Spotify’s approach to scaling Experimentation and lessons learned along the way.
Enabling the Panama Papers Investigations with Open Source Tools
with Michael Hunger, GraphAddict at Neo4j
The biggest leak in journalistic history has not only been mind blowing for everyone but also challenging for the team of journalists and developers known as the ICIJ. With more than 11M documents sizing 2.6TB of information, it is truly impressive that a small team of 3 developers could support more than 400 journalists in a year’s worth of investigative work.
This became possible through the efficient use of open source technology for scanning and extracting text and metadata from the documents. The biggest difference though made the power of a graph database to connect the people, companies and accounts revealed in the investigation.
Especially for the non-technical journalists, the ability to unearth all those connections “was like magic”. Collaborating in research they benefited from each other’s work to see the bigger picture grow more interesting every day.
In this talk I detail the process and the technologies used by the journalists for their investigative work, including Apache Solr Apache Tika, and Neo4j. Then I focus on their work with Neo4j, the data model they developed and the types of queries and interactions that helped them to grow their understanding. We discuss how tools for visual graph exploration and search enable even non-technical users to benefit from working with large amounts of connected data.
Using the officially published dataset of 3.4M records we demonstrate how new insights in your existing, disconnected data are just one graph query away.
Mining Repository Data to Debug Software Development Teams
with Elmar Juergens, Consultant at CQSE GmbH
If the team architecture and the technical architecture do not fit together, problems arise. Both architectures evolve, however, often causing misalignment. How can we notice such mismatches and react in time?
In this talk, I present modern analysis techniques that mine data from different software artifacts (e.g. lightweight architecture specifications, code and trace files) and version control systems. They reveal problems in the code or design that result from communication problems in the development team.
I present the analysis techniques using examples from open source and commercial software systems. I also outline both best practices and limitations in employing such analyses that I collected over 10 years.
Monitor your containers with the Elastic Stack
with Monica Sarbu, Software Engineer at Beats at Elastic
Containers as well as orchestration systems like Kubernetes are quickly gaining popularity as the prefered tools for deploying and running microservices. While being easier to deploy and isolate, containerized applications are creating new challenges for the logging and monitoring systems.
One popular solution for logging and monitoring is the Elastic Stack composed of Elasticsearch, Logstash, Kibana, and Beats. This talk shows you how to use the Elastic Stack, and in particular the Beats lightweight shippers, to collect logs and metrics from your containers.
The session includes details about how to:
- fetch the logs of the containers with Filebeat
- collect container metrics with Metricbeat
- monitor the network traffic exchanged between containers with Packetbeat
- automatically discover metadata from Docker containers
- visualize the collected data with predefined Kibana dashboards
- scaling Logstash deployments