Connect with us

Hi, what are you looking for?

Business

Message to businesses: Now is the time to spring clean your data

If company data has inconsistencies or errors, then chances are results will be flawed.

View of London, from the Shard. Image by Tim Sandle
View of London, from the Shard. Image by Tim Sandle

Data cleaning refers to the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database. This is a practice intended to identify incomplete, incorrect, inaccurate or irrelevant parts of data.

Once identified, businesses should move to replace, modify, or delete the incorrect data. The problem is that too many companies put off the exercise due to the level of resources required.

According to Andy Palmer, co-founder and CEO of Tamr, businesses need to place increased focus on the cleaning of data. Palmer explains to Digital Journal  why the time is now for businesses to review their data storage approaches.

This is because, Palmer points out: “Data mastering and cleaning have always been challenging for many organizations. Now that organizations are trying to use their data as a strategic asset, they are finding that mastering their data is the most time-consuming and least-rewarding task for data scientists and data engineers.”

Data cleansing can be performed interactively with data wrangling tools, or as batch processing through scripting. However, do these conventional approaches work in the most effective way?

Palmer thinks the old approaches to this subject are not likely to succeed and data cleaning requires different tactics, says Palmer. He notes:  Traditional master data management with rules has become untenable. Because of the sheer volume and variety of data from different sources, by the time you figure out the thousands of rules needed, a new data source is introduced and invalidates the rules.”

As to the optimal methods, Palmer sees: “Human guided machine learning is the only way that today’s organizations can solve data mastering problems to deliver the comprehensive, high quality data necessary to answer important business questions in a timely, accurate, and scalable manner.”

This makes good business sense for if data has inconsistencies or errors, then chances are results will be flawed. The consequence is that when business decisions based on those insights are made, there is a significance chance of getting things wrong.  

In terms of the advantages, Palmer summarizes these as: “Benefits include the ease of integrating multiple data sources, higher accuracy, and much less manual effort. Having clean data will ultimately increase overall productivity, allowing for the highest quality information in your decision-making.”

Avatar photo
Written By

Dr. Tim Sandle is Digital Journal's Editor-at-Large for science news. Tim specializes in science, technology, environmental, business, and health journalism. He is additionally a practising microbiologist; and an author. He is also interested in history, politics and current affairs.

You may also like:

World

US President Joe Biden delivers remarks after signing legislation authorizing aid for Ukraine, Israel and Taiwan at the White House on April 24, 2024...

World

AfD leaders Alice Weidel and Tino Chrupalla face damaging allegations about an EU parliamentarian's aide accused of spying for China - Copyright AFP Odd...

Business

Meta's growth is due in particular to its sophisticated advertising tools and the success of "Reels" - Copyright AFP SEBASTIEN BOZONJulie JAMMOTFacebook-owner Meta on...

Business

Tony Fernandes bought AirAsia for a token one ringgitt after the September 11 attacks on the United States - Copyright AFP Arif KartonoMalaysia’s Tony...