In 2021 globally, organizations spent $39.2 billion on cloud databases. Yet, despite the growth, most organizations do not have a majority of their enterprise data in a modern data cloud warehouse.
Monte Carlo’s survey of more than 200 data leaders at the 2022 Snowflake Summit revealed that less than 25 percent of business data lives within a cloud system. For advocates of cloud solutions, this means that organisations are not realising the flexibility, scalability and security benefits that cloud computing can provide.
To make change means challenging those who prefer on-premises data storage. As an example, with Equifax a buggy code on a legacy server led to the miscalculation of credit scores for millions of customers.
One cloud advocate is Lior Gavish, co-founder and CTO of Monte Carlo. Gavish considers the main trends that are likely to impact the broader data engineering and analytics space in 2023.
Data contracts
According to Gavish data contracts are designed to prevent data quality issues that occur upstream when data generating services unexpectedly change. As he notes: “Data contracts are very much en vogue. Why? Thanks to changes made by software engineers who unknowingly create ramifications via updates that affect the downstream data pipeline and due to the rise of data modelling gives data engineers the option to deliver the data into the warehouse, pre-modelled. 2023 will see broader data contract adoption as practitioners attempt to apply these frameworks.”
Data monetization
This area is driven by economic pressures. Gavish notes: “In lean times, data teams have more pressure than ever to align their efforts with the bottomline. Data monetization is a mechanism for data teams to directly tie themselves to revenue. It also allows for the addition of data insights and reporting to products, a differentiator within an increasingly competitive marketplace”
Infrastructure as code
A newer service approach is centred on infrastructure. Gavish states: “Modern data operations require hyper-scalable cloud infrastructures, but constantly provisioning and maintaining these services can be tedious and time consuming. Infrastructure as code allows data engineers to create more seamless data pipeline infrastructure that is easier to provision, deprovision, and modify – critical when budgets are tight and headcount is limited.”
Data reliability engineering
Data needs to be reliable to meet business need and to keep clients onside. Gavish considers: “All too often, bad data is first discovered by stakeholders downstream in dashboards and reports instead of in the pipeline— or even before. Since data is rarely ever in its ideal, perfectly reliable state, data teams are hiring data reliability engineers to put the tooling (like data observability platforms and data testing) and processes (like CI/CD) in place to ensure that when issues happen, they’re quickly resolved and impact is conveyed to those who need to know before your CFO finds out.”
