Connect with us

Hi, what are you looking for?

Tech & Science

New database query language developed for super-fast searching

Unstructured data results in irregularities and ambiguities that make it difficult to understand using traditional programs.

Many algorithms designed for criminal justice were meant to eliminate bias. — Image: © AFP
Many algorithms designed for criminal justice were meant to eliminate bias. — Image: © AFP

A new computer language for scanning and interrogating petabyte-scale unstructured data lakes has been developed, termed the Datadobi Query Language (DQL). Such technology is needed as, according to IDC, there will be about 175 zettabytes of data worldwide by 2025 compared to 64.2 zettabytes in 2020.

This vast quantity of data tallies with the finding that 95 percent of businesses cite the need to manage unstructured data as a problem for their business. One reason for this is a consequence of digital transformation and the increased the amount of unstructured data within networks.

Unstructured data refers to information that is not arranged according to a pre-set data model or schema, and therefore cannot be stored in a traditional relational database. Examples of unstructured data includes things like video, audio or image files, as well as log files, sensor or social media posts.

To address this concern, the query language can scan large file systems containing billions of files. The aim is to aid businesses in drawing meaningful inferences from unstructured data. Each of these scans produces huge lists of file paths and their metadata in a format that permits performant and storage-efficient handling, analysis, and comparison of the files to enhance unstructured data management.

For a customer to dissect the composition of the data, this requires some data reduction and aggregation in sets of billions of files. Datadobi initially developed the Datadobi Query Language (DQL) to enhance the file system assessment service in order to optimize and organize data lakes internally.

This led to the roll out of DQL more widely to enable others to handle their multi-petabyte data lakes. This includes the identification of cold data sets (data that is infrequently accessed);  old data sets; data sets owned by a specific user or group; and with identifying shares, exports or directories trees that are homogeneous (cold, old, owner, file types) and which can be handled as one data set.

There is also the capability to move data, analyze the file metadata of large data lakes, and simplify how an administrator can look at their data and identify logical subsets of data.

Given that the volume of data is only expected to grow over the next few years, IT administrators need to develop or purchase data management solutions that can transform data into digestible material.

Written By

Dr. Tim Sandle is Digital Journal's Editor-at-Large for science news. Tim specializes in science, technology, environmental, and health journalism. He is additionally a practising microbiologist; and an author. He is also interested in history, politics and current affairs.

You may also like:

World

The Cold War-era submarine has become a symbol of Albania's tumultuous communist past - Copyright Russian Defence Ministry/AFP HandoutBriseida MEMARetired sergeant Neim Shehaj spends...

Tech & Science

An illustration provided by NASA of the Mars InSight lander.Lucie AUBOURGAfter some four years probing Mars’ interior, NASA’s InSight lander will likely retire this...

World

The IMF approved a $6 billion bailout for Pakistan in 2019, but payment tranches have been stalled over the pace of economic reforms -...