It’s often the first problem you solve when moving to the cloud: Your enterprise is using dozens, sometime hundreds, of different heterogenous databases, and now you need to bind them together into hundreds of virtual views of the data in the cloud.
What’s good about this is that you don’t need to migrate to new databases, or even move the data from where it’s being currently hosted in the cloud. After all, there may be applications that are dependent on that data, and the last thing you want to do is to store redundant data.
So, you federate. That gives you logical centralization of data without having to change where the data is physically stored, cloud or not.
But not so fast. There are roadblocks to consider. Here are my top two.
First, performance. You can certainly mix data from an object-based database, a relational database, and even unstructured data, using centralized and virtualized metadata-driven view. But your ability to run real-time queries on that data, in a reasonable amount of time, is another story.
The dirty little secret about federated database systems (cloud or not) is that unless you’re willing to spend the time it takes to optimize the use of the virtual database, performance issues are likely to pop up that make the use of a federated database, well, useless. By the way, putting the federated database in the cloud won’t help you, even if you add more virtual storage and compute to try to brute-force the performance.
The reason is that so much has to happen in the background just to get the data in place from many different databases sources. These issues are fixed typically with figuring out good federated database design, tuning the database, and placing limits on how many physical databases can be involved in a single pattern of access. I’ve found that the limit is typically four or five.
Second, security. I’m pretty sure that most cloud-based federated databases running in the cloud have a vulnerability that can be exploited now, and most enterprises that own the data don’t know it.
The cause is the same as why you typically have performance problems: There are so many moving parts that it’s difficult to make sure all data, access points, metadata, etc., are locked down but at the same time easily accessible.
While your systems using federated databases may encrypt data at rest, they often do not encrypt data in flight. Or, if you do encrypt data in flight, you likely aren’t encrypting data at rest. Or, there’s a direct path to the physical database that bypasses the federated database architecture and the security it provides.
To date, I’ve not seen a federated database with sound centralized security that works at both the virtual and physical database layers. So get busy plugging those holes!