Best Practices for Deploying Data Lakes
POST WRITTEN BY: JOHN GRAY, InterVision CTO
August 9, 2019
Although still a burgeoning term, data lakes have recently gained more recognition among IT teams as data increasingly becomes a foundation of modern business. Conceived as a solution to reduce data sprawl and data siloes, data lakes emerged from the industry of data warehousing, which targeted the frustrations IT encountered when trying to create an organized repository of strategic datasets on which to make key business decisions. This use can range from data analytics to better understand customer needs to artificial intelligence to solve for real-time challenges.
Data lakes, in many ways, are an evolution of data warehousing. Many data warehouse projects failed: They were too costly, took too long, and only achieved a small subset of the original goals. With data changing and growing so rapidly, the need for quickly getting value out of data has grown ever more pressing. Nobody can afford to spend months or years analyzing and modeling data for business use. By the time the data is usable in a data warehouse, the business needs have changed.
In a similar vein to data warehouses, data marts emerged to embrace data with a specific use or cataloged by a certain quality (marketing departmental data, for example). Data marts have been more successful because the usage of the data is better understood, and the results can be delivered more quickly. However, the compartmentalized nature of data marts has made them less useful to businesses that have massive amounts of data and that need to use that data cross-functionally and across several parties… read more