Data continues to accumulate at an exponential rate. This presents challenges for all types of businesses and their Information Technology departments in particular. In turn, AI itself has led to a large increase in the amount of data storage that businesses require, with the amount of data expected to increase 122 percent by 2026. Consequently, storing, managing and tagging data to ensure quality for use in AI models is becoming more challenging.
This is reflected in the Hitachi Vantara State of Data Infrastructure Survey. The report has found that every company in the study has adopted AI in some capacity in order to help to manage the data surge. The report surveyed 1,200 IT leaders and C-suite executives from large organizations in 15 markets.
Of these companies, 76 percent have progressed beyond limited adoption of AI use cases, and 37 percent say AI is already critical to their business. In terms of how the technology has been absorbed, 70 percent of firms are implementing AI, testing, and improving as they go, and only 5 percent indicate they are using sandboxes to test their AI experiments before implementing them.
The reason why so many firms are fast-tracking AI implementation as a top priority, 85 percent of executives expressed concern about losing competitive ground without rapid AI adoption.
With the form of AI, 61 percent of large organizations are focused on developing general, larger LLMs rather than smaller specialized models. This is despite large-scale models being hungrier than regular models to train, consuming up to 100 times more power.
The higher-risk strategy for implementing in ‘real time’ and without off-line evaluation is surprising. There is concern this approach risks poisoning AI models, destroying users’ trust in AI as a tool, and opening the door to new security vulnerabilities. In addition, this method of adoption has revealed gaps in data governance and flags problems relating to sustainability.
If these problems can be adequately addressed, then data infrastructure and data management can play an important role in terms of overall data quality and the ability to drive positive AI outcomes.
The survey notes a disconnect with proper data management, with only 38 percent of respondents saying that data is available when they need it the majority of the time. Even less (33 percent) say the majority of the outputs of their AI models are accurate, and three quarters (80 percent) say the majority of their data is unstructured, which poses greater risk as data volumes rise.
In response, only a few companies are taking steps to improve their data quality: nearly half (47 percent) do not tag data for visualization, only 37 percent are enhancing training data quality to explain AI outputs, and more than a quarter (26percent) fail to review datasets for quality.