"Our database requires ZERO maintenance, it just works!" If anyone is telling you this, they don't know what they're talking about. Discover the opportunities waiting for you when you see past the "zero management" hoax.
Hi, I’m Jared Hillam.
Just last week I attended a cloud database conference, and the message from the host was that their databases required 0 management. “It just works.” he said … OK, anybody telling you there is 0 management… doesn't know what they’re talking about. Is there less management than 8 years ago? Yes, but there are new opportunities to take advantage of, and you don’t get there with 0 management. So in this video I want to discuss how we design to take advantage of these opportunities.
First we’ve got to get the data into the cloud database. There are a number of vendors that handle that challenge quite gracefully. But once the data lands into the cloud, it comes in a form that is… well a mess. This is just what happens when triggers occur in real-time. So you can have late arriving data, updates that aren’t really updates, inefficient sorting, etc. In other words raw data from source replication is gonna be quite raw. This isn’t a product problem, it’s just the result of live replication. So to address that, data engineers create a Landing Zone that shields users from that messiness. Intricity has built up a deep portfolio of scripts that handle the data in the Landing Zone. This is an area that is not for human hands. Once the Landing Zone is processed it automatically drops the data into the data lake. Now I’m not going to get into new buzzwords for the data lake. The basic purpose of the data lake is to create an application-pristine version of the replicated sources, and act as the foundation for the entire remaining architecture. Data Science, Aggregated Analytics, Operational Reporting, Lab Environments, and more all pull from the Data Lake.
Now remember that the application data is represented as it exists in the application. So their tables and content are in their transaction processing form. For businesses to be able to conduct Aggregated Analytics the data needs to be represented in a form that is logical to the business. So for example, conformed Dates, conformed Stores, conformed Accounts, conformed Employees, conformed Orders, etc. This conformity of the dimensions allows free-form analysis of the business's metrics. But to generate that conformed analysis the last thing you want to do is wire up a BI tool directly to the clunky transaction data. Doing so tightly couples you to that single BI vendor making it impossible to allow other tooling to investigate data.
The correct place to house that logic is the same cloud database our lake is using.
There are two ways to go about building this logic up in a database. Each has advantages and disadvantages.
One way to build up that logic is to have it surface at runtime directly from the application-pristine data from the lake. The logic from that surfacing exists in in-memory structures which are executed each time a query is run if there is new data. The advantage of this approach is that queries reach down to the very latest record. The disadvantage of this approach is that processing can be many levels of logic deep, and expensive if you’re operationalizing thousands of analytics with highly volatile data.
Another way to build up that logic is to have the final form of the data’s transformation be persisted physically as a separate set of tables. This approach’s advantages and disadvantages are the inverse of the in-memory method. The cost of supplying queries to a large landscape of users is highly optimized. However, there is a gap in processing time to generate all the persisted data.
In terms of commonality, organizations typically will persist a data warehouse for cost reasons, and sometimes a concern that the query complexity for conformed aggregations at runtime might be too difficult to piece apart if everything is all virtual. Having said that, there is a sexiness to having everything happen in real time.
If you’d like to learn more about some of the advantages and disadvantages of virtual queries I’ve linked a past grid that points those out, and if you’d like to connect with an Intricity Specialist, I’ve included a link for that as well.