Date

Three Questions Towards a Successful Data Science Project

When is a data science project successful? Most articles you will find about data science projects cover how to do machine learning. Don’t get me wrong, this is an important part of being a data scientist. But to generate actual business value there is more to do…

laptop computer on glass-top table
Photo by Carlos Muza on Unsplash

Successful Data Science Projects

As a data scientist, you may not be responsible for the success of the project. However, when no actual business value is generated, the business will probably come to you for an explanation. That means that as a data scientist you can only be as successful as the business generated through your project.

That brings me to the question, what makes a data science project successful? From my experience, business is happy when every phase of the project is completed, we are ‘in production’ and generating business impact.

Three Main Phases

Projects can be organized in many different ways, yet all successful data science projects will somehow follow three main phases:

  1. Business Needs
  2. Machine Learning
  3. Operations

So how can you tell if you are on the right track? I use three questions that I ask myself a few times during the project to make sure I’m on the right track.


Three Questions

During my data science projects I have kept myself sharp by asking the following three questions. These truly helped me to see the big picture especially at times when you are really deep into the details.

  1. What is the concrete business need?
  2. How do my machine learning choices affect the business impact and operations?
  3. How is business changed after implementation?

Let’s dive into each of these…


What is the concrete business need?

This is the first question and perhaps the most important question. Because let’s be fair, why would I start a project without a concrete business need? If there is none it doesn’t make sense to invest time and money.

Why are we doing this?

There are different options of answers such as increasing profit, saving costs or increasing customer satisfaction. I found it even better to be concrete: “We want to ensure that our sales reps save time so that they can spend more time with potential customers which will drive a higher profit”.

Can I explain the situation and context?

How does the current process look like? What is the IT landscape, Data availability and quality? And are there perhaps any trigger events that make us dealing with this topic right now?

Do I understand the business requirements?

There may be specific needs or wishes by the end-user. There could be specific conditions in which you are allowed to move. Regulations such as GDPR may be relevant to check. And lastly, I need to speak to the business stakeholders and truly understand their needs.

The typical pitfalls

Now it may seem quite clear that these things need to be covered. In practice we often find teams that don’t spend enough time with the business need and get into trouble later on in the project. They deliver something that was not required in the first place and they lose the interest.

Sometimes business stakeholders and data scientists are in the same room under the impression they are on the same page, yet later on it turns out they were not. To avoid this, get somebody on the table that understands both sides such as an analytics translator.

Lastly, teams often fail to test feasibility. Before diving into machine learning, it will be helpful to understand the feasibility of implementing. Is IT ready? Will there be enough resources available? We may not know at the start, but we need to keep an eye on this!


How do my machine learning choices affect the business impact and operations?

At any time I can make link between the machine learning and the business needs, and on the other side to operations.

Am I doing the right analysis?

You may think, “that’s clear isn’t it?”. Unfortunately, that it’s not always the case. Will my model do what is required from the business? This can be related this to the objective or loss function. As an example; when making predictions, we could ask our business stakeholder whether a large error is worse than a small error or not. This allows to make a choice between root mean squared error (RMSE) or mean absolute error (MAE).

Will spending more time on optimizing actually improve the business result?

For example, I could ask myself if 0.5% more accuracy will actually make a difference for our case? Of course this can be the case, asking yourself the question will not harm.

If my model takes a few minutes to make a prediction, there may be cases where this is fine and other cases where this is not. If not, perhaps I first implement and see if my solution brings business value before optimizing my code.

Is my model sexy or goal-oriented?

I know it’s cool to build a Neural Network or a Random Forest. Do we really need it though? The question is whether a simpler model will do the job, a logistic or linear regression often works and it is easier to explain to the business for sure.

The typical pitfalls

Most data scientists don’t spend enough time in understanding the connection of their modeling to the business goals. Data scientists (believe me, I’ve been there) tend to get onto this train of trying out fancy things without focusing on creating a better business result.


How is business changed after implementation?

Implementing your solution is the only way to create an actual business impact. For this you need both the right IT landscape and business processes. Without implementation, no business impact.

Is my organization ready to implement the technical changes?

This is about skills, budget, maintenance possibilities as well as dealing with legacy systems and bureaucratic processes.

What are the concrete and actionable results?

What is the exact procedure that causes the actual business impact? A dashboard that is not read or not executed upon is no actual business impact.

Is the business ready for this change?

Are my business stakeholders open for this change, do they have the time to deal with this? If my business stakeholders do not understand or are not data literate enough to accept the solution it will be difficult to create a change. Acceptance is key for creating business impact.

Who will work with the changes and which processes will be affected?

Getting a picture of where changes will take place, drawing up the processes before and after, and listing down the people that will be affected. I need to speak to them upfront to find out any roadblocks. This will help me to get a good understanding of what needs to be done.

The typical pitfalls

Not clarifying the implementation of your project may lead to finding out K.O. criteria down the line. Teams sometimes fail to clarify IT readiness early enough and find it difficult to get IT on board. Change management is important too, just ‘put it in production’ may not be sufficient and training might be appropriate. Lastly, if you don’t prepare for operations, maintenance and services, what will happen when your model needs to change or something breaks?


Summary

Any successful data science projects will somehow run through the three phases of business needs, machine learning, and operations to generate actual business value.

There are three questions I ask myself when working on a data science project. “What is the concrete business need?”, “How do my machine learning choices affect the business impact and operations?”, and “How is business changed after implementation?”

Answering these questions at different times during my projects has allowed me to keep an eye on the big picture and focus on creating true business value.

About me: I am an AI Management Consultant with a mission to make data scientists happy (again) and helping organizations generating business value with AI.

More
articles