Connect with us

Hi, what are you looking for?

Business

Message to businesses: Now is the time to spring clean your data

If company data has inconsistencies or errors, then chances are results will be flawed.

View of London, from the Shard. Image by Tim Sandle
View of London, from the Shard. Image by Tim Sandle

Data cleaning refers to the process of detecting and correcting corrupt or inaccurate records from a record set, table, or database. This is a practice intended to identify incomplete, incorrect, inaccurate or irrelevant parts of data.

Once identified, businesses should move to replace, modify, or delete the incorrect data. The problem is that too many companies put off the exercise due to the level of resources required.

According to Andy Palmer, co-founder and CEO of Tamr, businesses need to place increased focus on the cleaning of data. Palmer explains to Digital Journal  why the time is now for businesses to review their data storage approaches.

This is because, Palmer points out: “Data mastering and cleaning have always been challenging for many organizations. Now that organizations are trying to use their data as a strategic asset, they are finding that mastering their data is the most time-consuming and least-rewarding task for data scientists and data engineers.”

Data cleansing can be performed interactively with data wrangling tools, or as batch processing through scripting. However, do these conventional approaches work in the most effective way?

Palmer thinks the old approaches to this subject are not likely to succeed and data cleaning requires different tactics, says Palmer. He notes:  Traditional master data management with rules has become untenable. Because of the sheer volume and variety of data from different sources, by the time you figure out the thousands of rules needed, a new data source is introduced and invalidates the rules.”

As to the optimal methods, Palmer sees: “Human guided machine learning is the only way that today’s organizations can solve data mastering problems to deliver the comprehensive, high quality data necessary to answer important business questions in a timely, accurate, and scalable manner.”

This makes good business sense for if data has inconsistencies or errors, then chances are results will be flawed. The consequence is that when business decisions based on those insights are made, there is a significance chance of getting things wrong.  

In terms of the advantages, Palmer summarizes these as: “Benefits include the ease of integrating multiple data sources, higher accuracy, and much less manual effort. Having clean data will ultimately increase overall productivity, allowing for the highest quality information in your decision-making.”

Avatar photo
Written By

Dr. Tim Sandle is Digital Journal's Editor-at-Large for science news. Tim specializes in science, technology, environmental, business, and health journalism. He is additionally a practising microbiologist; and an author. He is also interested in history, politics and current affairs.

You may also like:

Entertainment

Veteran actress Danielle Kennedy chatted about starring in "The Burbs" on Peacock.

Entertainment

Jared Monaco, the guitar player of The Maine, chatted about their new single "Quiet Part Loud."

Business

Oil prices shot around 30 percent higher in Asian trading. The main international oil contract, Brent crude, is currently up roughly 41 percent.

Business

The Iran war sent oil prices soaring with a new barrage of missiles targeting Israel and the Gulf energy industry.