How to Efficiently Keep your Data Clean to Drive Performance

How to Efficiently Keep your Data Clean to Drive Performance

An effective data cleansing strategy has the capability of being a core driver in business performance. If so, then take a look at those strategies...

Data cleansing is a form of data management to ensure that the information your business stores is accurate, complete, formatted, unique, relevant, and up-to-date. How often you clean your data and the methods used can vary depending on the business or industry but no matter what, an effective data cleansing strategy has the capability of being a core driver in business performance.

In a digital dominant now, we rely on data more and more. This could be sending emails to the correct address or ensuring the right amount is on an invoice but in perhaps a more innovative sense, telling our customers what products they might like or what movies to watch and music to listen to. Data is becoming the central force behind an enormous amount of business activity.

This article looks at why clean data is so important across several different industries and the potential risks of negating any sort of data governance at all.

1. Why should data be cleansed?

It is thought that anything up to 70% of data could be out of date after just one year, resulting in neglected sales opportunities or misguiding marketing behavior. IBM has suggested that in 2016, bad data took $3 trillion out of the US economy, and spending on redundant, obsolete, or trivial data in the UK costs about £435k per year.

old data

Royal Mail

Beyond the marketing and sales opportunities, bad data means you walk the tightrope of being blacklisted by email providers if you don’t have a verification process, have a higher chance of complaints if you continually send poor or wrong information, or as a worst-case, financial implications if you are not able to align all their sources.

The reason for data cleansing ultimately depends on your industry, budget, and objective of doing so.

dirty lead data

Other sources suggest an even higher rate of data decay, saying that only 80% of data is valid after each year. (Source: Finelyfetted). If we consider that there are 0.9% of deaths in the population each year, it’s quite a big turn-off to customers if you are known to continue sending such inappropriate communications.

2. The data cleansing process

Prior to beginning your data cleansing, a proper data audit should be conducted as each business may have different data quality issues that should be addressed.

Imagine you are in a meeting with the CEO, COO, and CFO looking at revenue for the previous month. The CEO thinks you made $10 million, the COO believes it is $11 million and the CFO thinks it was $9.5 million. In theory, they are all correct, but the problem has stemmed from them all using different data sources and metrics to get their numbers. Confusion like this can quickly cause mass hysteria in a business which is why definitions are highly important before even beginning a cleansing strategy.

Once the data sources are aligned, you can start reviewing the best methods for cleansing as you have a far better visualization of the potential problems. For example, it might show you are missing lots of email addresses so you need an email verification system, or perhaps the customer's date of birth is in different formats and you need to focus on validation.  

StrategicDB offers a free data audit report designed to help clients work out which metrics should be normalized or which should be benchmarked. The image below shows how a data audit report can help you begin your data cleansing journey.

data audit report

You can also use the likes of SolarWind’s database observability platform in order to ensure optimal database performance. This will monitor the health of your database over time, giving you metrics to base your data-cleansing efforts, and empowering you to be proactive in identifying and fixing flaws that might otherwise go unnoticed for a protracted period of time.

3. Overview of data cleansing methods

A. Email verification

Ensuring you hold the correct email addresses for your customer base has become of the utmost importance. This isn’t just about being able to contact the customers but also the fact that continually mailing incorrect addresses could cause a business to be blacklisted and be a major blocker.

Most digital businesses will deploy email verification systems that check if the data entered is correct to avoid typos and spam. The best option is to have this running in real-time behind your website to filter out bad quality straight away but if not, an entire cleanse each quarter as a minimum is probably a good idea.

As well as email, many businesses will follow these procedures for checking cell phone details or mailing addresses before the customer is able to register their details. Common methods might be sending you an SMS with a code to verify your details before placing an order perhaps.

B. Data validation

Like email verification, data validation makes sure that any piece of data coming into your business is correct. For example, are dates of birth formatted correctly, and are customers giving their correct cell phone numbers?

Front-end sites can validate all this information and it is the best way to cleanse data before it even comes into your database. This might work in the form of input masks which force the customer to enter data in a specific way rather than it being free-form text.

C. De-duplication

Finding duplicates in your data and removing or merging them is very important. If a customer is able to have two records, they may start getting duplicate emails, phone calls, letters, or text messages. This is both a poor customer experience and an unnecessary business cost.

De-duplication may not always be 100% accurate but you can do the best job possible. It is quite common for customers to use multiple email addresses for example and picking up on that is not always simple when searching for duplicate accounts.

However, minimizing duplicate records by checking details on the front end or having rules built in for certain flags can be an excellent cost-saving exercise that ultimately improves customer experience.

D. Data standardisation and normalisation

Data standardization and normalization are management practices for the optimization and streamlining of your company data.

Normalization takes your data and stores it in more logical columns, akin to a relational database. Everything is put into a central location, often referred to as a “single source of truth” so that all departments are using the same metrics all of the time. In simple terms, your data reads the same across every single database record. Having data in this logical order also makes it faster and more efficient to use.

Standardization is a way of taking disparate datasets and turning them into the same scale for more accurate analysis using averages or standard deviations. A good example of this is with seasonal businesses. Say you sell ice cream for example at an average of $420 per day but in the summer, you sell $520 per day and a standard deviation of $50. To standardize the value, you would do 520-420/50 = 2 and this is your result. If you sold 600 in a day, it would be 600-420/50 = 3.6. This turns large values in standard formats for more accurate data analysis.

4. How often should data be cleansed?

There isn’t really a definitive answer to how often data should be cleansed but in an ideal world, you’d like everything to happen in real time, certainly in retail.

For much of the time, you can get email finder or front-end tools to manage email verification, postcode validation, or de-duplication virtually in real-time and if they are costing your business a lot of money this could be the right thing to invest in. If your business is working with Big Data, cleansing, or not doing so, can have a large impact on campaign performance and ROI if quality is not up to scratch. However, some smaller businesses could still be working from spreadsheets, in which case an annual cleanse may be sufficient to ensure your records are kept up-to-date.

The best solution here is to base the regularity of your data cleansing on how much not doing so might be costing you as a business.

5. Creating a data quality dashboard – cleansing as an asset

Arguably one of the most difficult parts of data cleansing is finding out whether there is a problem and if so, what the problem is. Beyond that, the data quality dashboard is set up to show the cost of any poor data management, allowing senior business leaders to get a view of the impact of data governance on their goals.

Taking a practical example, let’s imagine you are getting several customer returns meaning the company couriers keep bringing packages back to be checked and redistributed. The root cause of the returns is that you don’t have an automated address finder on your business website so the customer is being asked to manually enter their details each time.

The returns you are getting back are due to human errors in typing addresses and could be resolved if you invested in an automated address finder but the Board is not willing to sign off the investment in such technology.

The data quality dashboard will break down the cost of the errors and show how it impact business income and efficiency. It will highlight the need for investment and give a cost/benefit view for better strategic decision-making. Data will be focused on more as an asset in a ledger-type sense by having this cost/benefit view available.

data quality dashboard


As Big Data becomes even bigger, a strong data cleansing strategy has the potential to drive business performance by creating an environment of trustworthy data, improving the efficiencies that impact bottom-line profit. As of July 2017, 90% of the World’s data had been created in the previous 2 years. Whilst businesses continue to explore new methods of working with this data, cleansing only becomes a greater challenge if it isn’t fully managed.

Sales and marketing teams run the risk of higher costs, poor customer experience, lower customer loyalty, and the potential for blacklisting if they don’t drive data-cleansing strategies in the business. An article by The Havard Business Review has pointed at data costing the US over $3 trillion per year with a major cause being sales teams working with erred prospect or customer information and service teams wasting time dealing with incorrect orders.

Without a cleansing strategy, it is no wonder why senior business leaders fail to trust what they see and rely on gut feelings rather than the data they are provided as marketing or sales forecasts prove to be inaccurate.

Data cleansing provides the quality to get the quality back out.

This post was submitted by a TNS experts. Check out our Contributor page for details about how you can share your ideas on digital marketing, SEO, social media, growth hacking and content marketing with our audience.

How to Create Engaging Live Video Content

Live video can be daunting. There are several important...

7 mins read

5 Step Guide for Creating Viral Content

“Viral” is one of the most sought-after statuses in...

5 mins read

Unspoken Paid Search Issues: Know How You Can Combat Them

Paid Search Is Awesome. You Aren’t Killing it?  There...

6 mins read