The Importance of a Proper Data Culture

The basis of AI, Machine Learning or any type of Analytics starts with being a data-driven organization

Beginning with AI means you need a proper data culture to start with. AI is not magic, despite what many may still think. Before even thinking of AI, the data needs to be in order. You need documentation, policies, and most importantly a proper data culture. How to achieve this? Continue reading…

Afke Schouten in conversation with Assaad Moawad

This is the first in a series of interviews with practitioners in the field about generating business value with AI. Assaad co-founded the company DataThings in Luxemburg, which is all about translating data into actionable insights. He believes that data and data culture can help you better understand your business.

The core technology of DataThings is a Temporal Many-World Graph database. In a nutshell, this defines graph storage and process framework for all data that is in motion be it communication flows, social networks, smart grids, etc. This is actually a great database to use for organizational network analysis.  

In our conversation, we speak about Assaad’s experience working with clients, and his view on what is needed in the field.

Afke: “We spoke about this so often, but once more — Can you tell the difference from clients that are just starting with the topic of AI, compared to your clients that are more advanced?”

Assaad: “When we come in with new clients, the usual expectations towards AI, is that that AI is a magic genius. Advanced clients have gone through the realization that this is a utopia themselves. They now understand the investment that is needed to be done to create a proper data infrastructure before even starting with AI. 

New clients expect AI to behave like a magic hat, they expect that you can put in a mix of uncleaned data and you get a rabbit out.”

Afke: “That doesn’t sound uncommon to me, there is a lot of work that needs to happen to get rid of this rabbit from a hat. What advice would you give to people that are in this situation, what are the key success factors?”

Assaad: “First and foremost, a proper data culture is needed. Part of this is a unified data policy in big companies. Many big companies have sub-teams, and each team is using a different technology or format for data storage. The cost of aggregating data will be big. Data documentation is very important too because with time the people who collected the data leave the company, and nobody can understand anymore what the data is about. Same for a unit of measurements, if they are not documented, or not the same (uniform) in an industrial context, it’s hard to use the data.”

Assaad: “A proper data culture, a unified data policy, data documentation, and consistent terminology is what I would recommend”

“To give you an example, the challenges in working in a multicultural environment like Luxembourg is that it can get reflected in the data, you probably know the same from Switzerland. We find several languages, abbreviations, data formats, data schemas, terminologies, within the same dataset. Each coming from a different culture (French/German/English…). That is why consistent terminology, language, schema, format, unit, is so important”.

Afke: “I can relate to that, without a proper data basis, it will be difficult to get value out of your data, what would you recommend, how to get started?”

Assaad: “Investing in the data infrastructure is very important especially when speed/throughput in the data processing pipeline is needed. Our temporal graph database can process at about 400 000 values/second. Older technologies only do 10,000 v/s, e.g in the banking sector, if you have 1 billion transactions, it’s the difference between 41 minutes and 28 hours. 

Also, investing in hardware is important; AI is very consuming in processing power, GPU, and hungry in memory. When you have a fast database, you can iterate several times, test several models, wasting less the time of data scientists (just waiting for model training), using fewer servers, reducing infrastructure costs. Tons of benefits. GPUs are very important for image processing or for very big datasets. They can accelerate the machine learning time by 10-20x.

Investing in the right software is as important to get the maximum usage of the hardware. That’s why we’re developing a technology dedicated to AI at large scale.”

Assaad: “Invest in your data infrastructure, hardware and software

Afke: “You and I have spoken about unhappy data scientists in the past, what do you think are the main reasons that practitioners in the field are frustrated?”

Assaad: “Few like data cleaning, and it’s a tedious and time-consuming process. The same holds for data aggregation from different sources. We end up spending half of the project time writing importers and exporters to connect to all the different formats within a company.

Furthermore, data analytics itself is not enough for a final product, it needs to be integrated into a full software environment. Many stakeholders expect it’s the data scientist’s job to do everything: from data cleaning to modeling, to the storage, to analytics, to visualization, to software orchestration (docker container), to running in production. But that is actually the job of a full IT team. It is important to staff the team with different profiles and skills, which is what we aspire in our projects.”

Assaad: “Many stakeholders expect it’s the data scientist’s job to do everything, but that is actually the job of a full IT team.”

Afke: “What would you suggest companies do about this?

Assaad: “Learn about the topic of AI, implement data culture policies, and be ready to invest in the proper infrastructure

I like to use the analogy of building a building: you invest first in the infrastructure, before building a fancy looking wall — AI is just the fancy wall, behind there is a lot of data infrastructure to be put in place.”

“No magic, no free lunch, no shortcuts.”

Afke: “And what would you suggest practitioners do about this?

Assaad: “Be patient, be curious and learn about the different topics, software pipeline, and work on fixing the problem of data cleaning at the source, at the collection, database level.”

In summary, you need proper data culture. For companies starting, the advice is to invest in your data structure, invest in hardware, and invest in software. Educating yourself and setting the right expectations towards the team is important as well. For data scientists, yes data cleaning is part of the job, let’s make data engineering sexy!

Thank you, Assaad for this interesting conversation and your tips for those who want to get started with AI. I wish you all the best with DataThings. Do you want to read more from Assaad? Check out his medium posts or the blog of DataThings.

About me: I am an AI Management Consultant and Director of Studies for “AI Management” at a local business school. I am on a mission to help organizations generating business value with AI and creating an environment in which Data Scientists can thrive.