A new computer language for scanning and interrogating petabyte-scale unstructured data lakes has been developed, termed the Datadobi Query Language (DQL). Such technology is needed as, according to IDC, there will be about 175 zettabytes of data worldwide by 2025 compared to 64.2 zettabytes in 2020.
This vast quantity of data tallies with the finding that 95 percent of businesses cite the need to manage unstructured data as a problem for their business. One reason for this is a consequence of digital transformation and the increased the amount of unstructured data within networks.
Unstructured data refers to information that is not arranged according to a pre-set data model or schema, and therefore cannot be stored in a traditional relational database. Examples of unstructured data includes things like video, audio or image files, as well as log files, sensor or social media posts.
To address this concern, the query language can scan large file systems containing billions of files. The aim is to aid businesses in drawing meaningful inferences from unstructured data. Each of these scans produces huge lists of file paths and their metadata in a format that permits performant and storage-efficient handling, analysis, and comparison of the files to enhance unstructured data management.
For a customer to dissect the composition of the data, this requires some data reduction and aggregation in sets of billions of files. Datadobi initially developed the Datadobi Query Language (DQL) to enhance the file system assessment service in order to optimize and organize data lakes internally.
This led to the roll out of DQL more widely to enable others to handle their multi-petabyte data lakes. This includes the identification of cold data sets (data that is infrequently accessed); old data sets; data sets owned by a specific user or group; and with identifying shares, exports or directories trees that are homogeneous (cold, old, owner, file types) and which can be handled as one data set.
There is also the capability to move data, analyze the file metadata of large data lakes, and simplify how an administrator can look at their data and identify logical subsets of data.
Given that the volume of data is only expected to grow over the next few years, IT administrators need to develop or purchase data management solutions that can transform data into digestible material.
