Videos, Snowflake, Blog

What is a Database Schema?

Jared Hillam

Jared Hillam

December 3, 2019

 

The other day I had somebody ask me, what is a database schema? This is a great question and something that I think should get its own video title. I explain it in other videos but it's something that needs its own attention.

In essence the schema represents any kind of structure that we’re defining around the data. This would include  tables, views, fields, relationships, packages, procedures, indexes, functions, types, sequences, materialized views, queues, triggers, synonyms, database links, directories, XML schemas, and other elements. Without any of this stuff it’s easy to get lost in data.

Imagine for example, that I gave you access to a folder housing a bunch of data from different sources all garbled together. Imagine trying to make heads or tails of such data. Now with a lot of serious time and effort you could eventually get to some kind of structure for understanding the data. That structure, and its corresponding data relationships is what we would call the schema of the database. The application of that structure is something that each database vendor does differently. For instance there are unstructured data stores which apply schema only when data gets read. In other words the data lives in its garbled state and we apply the structure to the query code. This is what is called Schema on Read. On the other hand, there are databases which force structure as a condition before data gets written. This is called schema on write.

The method of how a schema gets designed can influence different behaviors in a database. For example, if a database schema is designed as a series of tables connected by primary keys, then it is likely something designed for reading and writing singular records which is ideal for applications like Salesforce.com.

However if a schema has a central table connected to keys supplied by surrounding tables, then it’s likely something designed to make read output highly efficient. This kind of schema is ideal for high scale information delivery, which is called a star schema.

When we’re interacting with data, the schema of the data becomes an important topic. A good schema can mean the difference between a query lasting a few seconds to a query lasting many hours. And this isn’t something that we can cheat our way around. The Big Data craze proved that lesson to the entire market. Many organizations jumped into Hadoop hoping for magic, but consuming oceans of data with no conformed schema just made for painfully slow analytics. To get a sense of this concept further you may want to take a look at the analogy I gave at the beginning of my video titled “What is NoSQL?”

Back when the Hadoop craze was at its peak, and the emperor was still walking naked without anybody noticing. We had a new customer asking, “Why are we doing all this data modeling? Isn’t data modeling an old concept?” To which our Solution Architect said, “I agree that data modeling is a long tested concept, but it works. Sort of like Pi, why do we still use pi? Isn’t that an old method of calculating a circle?” In other words, the concept of engineering an analytically focused schema will remain an important part of being able to feed information to a body of business users. Not only does it lower the processing time and cost, but it also reduces the code complexity of SQL queries. Incidentally that customer is today another happy and referenceable Intricity account.

Generating a data warehouse schema is a process that requires quite a bit of iteration. Usually it starts with a series of interviews which provide inputs into some documentation assets. Those assets ultimately lead to an early trial schema which we call a conceptual data model. This can act as a backbone for determining development costs for the Data Warehousing project etc.

Once a project is ready to say go on development, that early model will get iterated on multiple times. Often it will undergo the scrutiny of a room full of people attempting to ask questions of the model which it can’t answer. This beefs up the model’s resilience to being a representation of the business's fundamental building blocks for analytics. Once this model is vetted, it will then by physicalized into a database schema which the source data will be loaded and conformed to. Intricity specializes in this strategic process of defining future state Data Warehouses, and executing on their deployment. I have an earlier video which walks through the data warehouse strategy in detail. If you would like to see this video click on the link in the video description, and of course if you would like to talk with an Intricity specialist about your situation I’ve included a link for that as well. 

Intricity 101 Video: https://youtu.be/3BZz8R7mqu0

Talk with a Specialist: https://www.intricity.com/intricity101/

Related Post

Snowflake ProTalk: Building a Document AI Pipeline in Snowflake

Discover how to leverage Snowflake's Document AI to transform complex documents into structured data. Explore model building, workflow automation, and integration for real-time data...

Register now

Snowflake Data Breach... Now What?

Snowflake's data breach affected 165 customers through stolen credentials. Discover the security measures that could have prevented it and how to protect your data in the future.

Read More

Using AI for Code & Metadata Conversions on Data Systems

LLM-based code conversions can have challenges and successes. Explore real-world insights and best practices for navigating these projects.

Read More