A Beginner’s Guide to Leveraging Agile Data Science

All
By
MathCo Team
November 22, 2020 4 minute read

Great data products employ Data and Machine Learning as fundamental tools to serve the user’s needs. They can also provide a ‘data moat’ to organizations giving them a significant competitive advantage.

To build a great data product, however, requires a marriage of the product-and-business perspective to a tech-and-data perspective. This means that data science is not an exercise in isolation but moves in lockstep with engineering and product design as a cross-functional collaboration.

No longer can we expect teams to invest a lot of time in validating a technical solution before validating a product-market fit. This requires an iterative approach with a series of MVPs being released and evaluated on both the algorithmic performance and accuracy as well as solving the problems that matter the most to the users.

In that regard, the lifecycle of developing a data product is not wildly different from a regular software product, given that most software products do employ data in one way or another to deliver solutions anyways.

Common reasons why projects fail:

The CHAOS report from the Standish group released in 2015 reported that about 2/3rd of all software projects fail to deliver on their intended outcomes.

There are several reasons why projects can fail. Some common ones are:

  • Project requirements are not clear, we do not know what ‘Done’ looks like.
  • Deadlines are estimated and set to unrealistic values.
  • Project priorities change during development.
  • There is a lack of communication among the different project stakeholders.
  • The project does not have a robust quality control process.
  • Project managers are expected to deliver on multiple outcomes simultaneously.

Nurturing an Agile mindset:

Agile is not a step-by-step program that one can follow to its end and be able to say that we have ‘done’ Agile nor is it a set of tools that one can use and ‘be Agile’. There is no Agile checkbox that one can tick on a goal sheet.

Agile is a mindset, a different way of thinking about work. It is the pursuit of never-ending improvement in getting work done. Agile gives us methods to deliver increments of completed work in a timely, predictable fashion, iteratively improving the product over the course of its lifetime.

What is Agile Data Science?

Agile data science follows a similar iterative process to deliver on the data science outcomes. Instead of starting with user requirements, we start with a hypothesis, based on which we process the data to arrive at an analysis outcome and generate insights which results in a new hypothesis.

This iterative approach with agile data science helps us build great data products as everything in the process can proceed simultaneously. The data engineering, data science, DevOps and product engineering teams – all move together as a cohesive unit.

This means that data science teams must iterate over the methods and models to build one that is ready to be productized. We become comfortable in shipping intermediate output, even if the iteration resulted in a failed experiment. We acknowledge that there are a multitude of paths to be taken to build a great data product.

These can be the various stages that the data goes through, like, ETL (extract-transform-load) processes, data munging, statistical techniques, security audits, explorations, modelling, machine learning and integration and deployment as a product. Agile data science provides a way to find the optimal, critical path to success from the bottom up.

Agile frameworks:

The agility in ‘Agile’ refers to the lightweight methods used to deliver project outcomes. Agile frameworks offer the scaffolding to run an Agile workflow. These frameworks provide a combination of Agile methods and team structures to start working with an Agile mindset. Two of the most common Agile frameworks are Scrum and Kanban.

Scrum: incorporates small, self-directed teams working in sprints at the end of which a working product is delivered. Roles include a product owner, who sets priorities and is accountable for results, and a scrum master, who manages the work.

Kanban: was developed in lean manufacturing to manage and improve work by balancing demands with available capacity and improving the handling of bottlenecks. The Kanban board is a tool to visualize the work so it can be managed more effectively.

To sum it all up, using Agile frameworks is not the goal of Agile. Outcomes are. There are significant challenges in changing the way we think about work. Anyone thinking about Agile should be aware of that. Agile is a team effort, no one team member in a team can be Agile on their own.

It must be done as a team. Agile requires a commitment to change the work mindset and the way we manage teams and their outcomes but promises great rewards in terms of delivery, clarity, transparency and learning for its practitioners.