Using big data to predict the future

Posted Dec 2, 2018 by Tim Sandle
Advances with machine learning and big data analytics are helping researchers, and ultimately companies, to make predictions about future trends by analysing patterns. A new application looks at medicine.
File photo: Open Data Day  Washington D.C.  2015.
File photo: Open Data Day, Washington D.C., 2015.
Joshua Tauberer (CC BY 2.0)
Researchers from the University of Córdoba have been exploring how large volumes of data can be organized, analyzed and cross referenced to predict certain patterns. This forms part of the process commonly referred to as 'big data analytics'. The focus of the review was on predicting the response to specific examples, such as medical treatments, operational improvements for smart buildings, and even the behavior of the Sun. Each prediction process is based on the input of key variables.
In assessing how effective big data analysis is, the researchers set out to improve models that are intended to predict several variables simultaneously based on the same set of input variables. The aim here was to find ways to reduce the size of data necessary for the forecast to be exact, thereby speeding up the data analysis process.
Central to the optimization process is with filtering out background ‘noise’ and eliminating those variables which are not significant to the overall assessment of the purpose of the analysis.
The researchers developed a new technique that can inform the person responsible for the analysis which examples are required so that any forecast made is not only reliable but can be enhanced to deliver the most accurate result.
The technique was a type of multi-output regression model. These are categorized as problem transformation and algorithm adaptation methods. Multiple-output regression models require estimating multiple parameters, one for each output.
As a consideration, the research group looked at a method that predicts several parameters connected to soil quality. This was based on a set of data variables such as the types of crops planted, tillage (preparation of the land) and the use and types of pesticides. By applying the new model, the amount of data inputs required to deliver a prediction about crop growth was reduced.
In all eighteen different databases were examined, and by applying the new approach the researchers were able to reduce the amount of information by 80 percent without affecting the predictive performance. This led to less than half the original data being used, and far faster responses being gathered.
Commenting on the study, lead researcher Dr. Sebastian Ventura said: “When you are dealing with a large volume of data, there are two solutions. You either increase computer performance, which is very expensive, or you reduce the quantity of information needed for the process to be done properly.”
The research has been published in the journal Integrated Computer-Aided Engineering. The research paper is called “An ensemble-based method for the selection of instances in the multi-target regression problem.”