Email
Password
Remember meForgot password?
    Log in with Twitter

article imageNew software speeds big-data analysis 100-fold

By Tim Sandle     Nov 3, 2017 in Technology
A new system developed for performing 'tensor algebra' can offer a 100-fold speedup over previous software packages for performing big data analysis.
Scientists from the Massachusetts Institute of Technology have created a new system that automatically produces code optimized for sparse data. This process leads to a massive speeding up of big data analysis.
This development is likely to appeal to many businesses and universities. Analyzing big data allows researchers and business professionals to make better informed and faster decisions using data that was hitherto inaccessible or unusable. Deploying advanced analytics techniques like text analytics, machine learning, predictive analytics, data mining, statistics, and natural language processing, businesses can analyze discover patterns from untapped data sources in order to gain new insights.
Analyzing ‘big’ data is complex. If an e-commerce site like Amazon is taken as an example, supposing a Amazon wish to match each of its customers against every product it sells, using, a "1" for each product a given customer bought and a "0" otherwise. The outcome would be an incredibly large data set, mostly made up of zeroes. This is what’s called ‘sparse data.’
When such data is analysed, analytic algorithms undertaken many series of addition and multiplication by zero. This leads to wasted computation and these steps use up a lot of computing power and they take a relatively long time.
The latest research from MIT is based around a new system that automatically produces code optimized for sparse data. This improved code leads to a 100-fold speedup over existing, non-optimized software packages.
The system to do this is termed Taco (which stands for tensor algebra compiler). A tensor is a higher-dimensional analogue of a matrix (with a matrix being the data set that requires analysis). The efficiency happens due to the mathematical manipulation of tensors (tensor algebra). For this to happen efficiently, each sequence of tensor operations requires its own "kernel” (a computational template).
To achieve this a form of machine learning is required to undertake the necessary big-data analytics. To run Taco a programmer specifies the size of a tensor - full or sparse – together with the location of the file. When running, Taco uses an efficient indexing scheme to store only the nonzero values of sparse tensors.
With zero entries included a tensor from Amazon would typically take up 107 exabytes of data. By running the Taco compression scheme this takes up only 13 gigabytes of data, which is far faster for any computer to analyse.
More about big data analytics, Tensor algebra, data science, Computing
More news from