Over the last 10 years, we’ve seen a massive expansion of what I call Data Brokerage Companies. These are companies that do nothing more than sell data to other organizations. Today, Data Brokers are an important fabric in the data management space. For example, data is often used from 3rd parties to enrich existing internal records for things like addresses, company names, product names, prices, you name it. But starting the process of sharing data between the broker and the customer can be really difficult. This is because databases traditionally have been a back-office thing. So sharing data from a database has required a lot of that front office trickery through REST and managed file transfers.
We’ve experienced this multiple times with various clients. More recently we were at a pharmaceutical company that received a file every week, they used a managed file transfer solution that onboarded the file, but it took roughly 12 hours to load the data from the file into a database. However, the Pharmaceutical only needed about 20% of the data in the file. The rest was either redundant data or simply of no interest to them. This required them to hire a team of internal development resources to babysit this process along.
Now, there ARE ways of controlling what data gets transferred from a vendor, but it requires the vendor to do a lot of work to build a web services interface that will allow their customers to make specific data requests. The problem with web services like REST is that they ideally work in small snippets of data, if I need to transfer a lot of data then we’re back to establishing managed file transfer solutions.
Then there are the data protocols themselves. We’ve seen all sorts of protocols from EDI to XML to now JSON, AVRO, and Parquet. The basic goal of these protocols is to provide identifiers on what the data is so that the person on the receiving end could do something useful with it. Ultimately, most organizations are wanting this data to end up somewhere that they can run a SQL query against it.
Now like anything else, the advent of cloud computing is changing the paradigm of the Data Syndication Industry.
Now that we have database vendors that live in the cloud, we’re beginning to see the ability to do data sharing straight from the cloud database itself. This means without requiring a custom clunky web services framework, people can just plug into the database in the cloud through a secured ODBC connection and start running SQL queries. And of course, these solutions are built from the bottom up to securely deliver the data.
I believe this is going to completely change the bulk data syndication market. Not only will the data queries run faster, but customers will have the full power of writing SQL to get the exact data they need.
What's even more exciting is if the customers themselves have their own data sets already in the cloud, they’ll then be cutting yet another distance for the data to traverse.
Oh, and who pays for the cost of running those queries? That’s flexible as well because it can be billed to whoever is running that compute power.
What this means is that pretty much any company out there can now share data with very little effort. I’ve included a whitepaper about data sharing, which you'll find here. You can also reach out to Intricity and talk with a specialist about how this could be used within your organization.