Connect with us

Hi, what are you looking for?

Tech & Science

New software speeds big-data analysis 100-fold

Scientists from the Massachusetts Institute of Technology have created a new system that automatically produces code optimized for sparse data. This process leads to a massive speeding up of big data analysis.

This development is likely to appeal to many businesses and universities. Analyzing big data allows researchers and business professionals to make better informed and faster decisions using data that was hitherto inaccessible or unusable. Deploying advanced analytics techniques like text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing, businesses can analyze discover patterns from untapped data sources in order to gain new insights.

Analyzing ‘big’ data is complex. If an e-commerce site like Amazon is taken as an example, supposing a Amazon wish to match each of its customers against every product it sells, using, a “1” for each product a given customer bought and a “0” otherwise. The outcome would be an incredibly large data set, mostly made up of zeroes. This is what’s called ‘sparse data.’

When such data is analysed, analytic algorithms undertaken many series of addition and multiplication by zero. This leads to wasted computation and these steps use up a lot of computing power and they take a relatively long time.

The latest research from MIT is based around a new system that automatically produces code optimized for sparse data. This improved code leads to a 100-fold speedup over existing, non-optimized software packages.

The system to do this is termed Taco (which stands for tensor algebra compiler). A tensor is a higher-dimensional analogue of a matrix (with a matrix being the data set that requires analysis). The efficiency happens due to the mathematical manipulation of tensors (tensor algebra). For this to happen efficiently, each sequence of tensor operations requires its own “kernel” (a computational template).

To achieve this a form of machine learning is required to undertake the necessary big-data analytics. To run Taco a programmer specifies the size of a tensor – full or sparse – together with the location of the file. When running, Taco uses an efficient indexing scheme to store only the nonzero values of sparse tensors.

With zero entries included a tensor from Amazon would typically take up 107 exabytes of data. By running the Taco compression scheme this takes up only 13 gigabytes of data, which is far faster for any computer to analyse.

Avatar photo
Written By

Dr. Tim Sandle is Digital Journal's Editor-at-Large for science news. Tim specializes in science, technology, environmental, business, and health journalism. He is additionally a practising microbiologist; and an author. He is also interested in history, politics and current affairs.

You may also like:

Tech & Science

AI is destined to be a critical part of medicine. It needs to be safe.

Social Media

Tech giants have blocked 4.7 million accounts under Australia's world-first social media ban for under-16s.

Business

OpenAI announced Friday it will begin testing advertisements on ChatGPT in the coming weeks.

World

If America falls over, nobody will be in any hurry to pick this mess up.