Creating Data Products in a Data Mesh, Data Lake or Lakehouse for Use in Analytics – How to Industrialise and Speed Up Data Engineering in Analytical Environments. 2-Day Virtual Course
Speaker: Mike Ferguson
3-4 March 2022
This course has already taken place
LIVE STREAMING - £995 + VAT (£199) = £1,194
All public courses are available as in-house training. Contact us for more information.
Overview
In most companies today, analytical systems are centralised and siloed with data integration occurring in each system and inconsistent data being made available across all of them. To address these issues, new data architectures like Data Mesh and, Data Lakehouse have emerged.
This 2-day class examines the strengths, and weaknesses of data lakes, data mesh and data lakehouses and at how decentralised teams can create trusted, compliant, reusable, data products for others to consume and analyse to drive value.
All public courses are available as in-house training. Contact us for more information.
Learning Objectives
Attendees will learn about:
- Strengths and weaknesses of centralised data architectures used in analytics
- What is a Data Mesh, a Data Lake and a Data Lakehouse? What benefits do they offer?
- The critical importance of a data catalog, business glossary and data fabric software
- An Implementation methodology to rapidly produce ready-made, trusted, reusable data products using DataOps pipelines
- How to govern data quality, privacy, access security, versioning, and lifecycle of data products in a shared analytical environment
Course Outline
What Is Data Mesh, Data Lake And A Data Lakehouse? Why Use Them?
- Data complexity and the growth in data sources
- Centralised analytical data architectures and their pros & cons
- Introducing Data Mesh, its principles and how it works
- What is a data product?
- Is federated data governance possible?
- Decentralised development of data products
- Pros and cons of Data Mesh and how it impacts your current IT organisation
- Introducing Data Lakehouse and its pros and cons
- Requirements to implement a Data Mesh or Data Lakehouse
- Key technologies needed: Data Fabric, Data Catalogs, Data Marketplace
- Vendor software offerings in the market
Methodologies For Creating Data Products
- Creating a program office
- Decentralised development of data products, in a Dat Mesh, Data Lake or Lakehouse
- The special and critical case of master data
- A best practice step-by-step methodology for building reusable data products
- Applying DataOps development practices to data product development?
Using A Business Glossary To Define Data Products
- Why is a common vocabulary relevant?
- Data catalogs and the business glossary
- Vendors in the Data Catalog market
- Roles, responsibilities, and processes needed to manage a business glossary
- Jumpstarting a business glossary with a data concept model
- Defining semantically linked data products using glossary terms
Standardising Development And Operations In A Data Mesh, Data Lake Or Lakehouse
- The importance of a program office
- Implementing Data Mesh on a single cloud Versus a hybrid multi-cloud environment
- Implementing a Data Lake or Lakehouse
- Standardising the domain implementation process – ingest, process, persist, serve
- Selecting Data fabric software for building data products
- Step-by-step data product development
- Data source registration
- Automated data discovery, profiling, sensitive data detection, governance classification, lineage extraction and cataloguing
- Data ingestion
- Global and domain policy creation for federated governance of classified data
- Data product pipeline development
- Data product publishing for consumption
Building DataOps Pipelines To Create Multi-Purpose Data Products
- Designing component based DataOps pipelines to produce data products
- Using CI/CD to accelerate development, testing and deployment
- Designing in sensitive data protection
- Processing streaming data and unstructured data in a pipeline
- Generating data pipelines using Data Warehouse Automation tools
- The Enterprise Data Marketplace – enabling consumers to shop for data products
- Serving up trusted data products for use in multiple analytical systems and in MDM
Implementing Federated Data Governance To Produce And Use Compliant Data Products
- Implementing federated data governance
- Across a hybrid, multi-cloud distributed data landscape
- Understanding compliance obligations
- Types, global Vs local data governance policies when creating a Data Mesh, a Data Lake or Data Lakehouse
- Using the data catalog for automated data profiling, quality scoring and sensitive data type classification
- Defining and attaching policies to classified data in a data catalog
- Protecting sensitive data in data product development for data privacy compliance
- Governing data product version management
- Creating sharable master data products and reference data products for MDM and RDM
- Governing consumer access to data products containing sensitive data
- Prevent accidental oversharing of sensitive data products using DLP
- Governing data retention of data products in-line with compliance and legal holds
- Monitoring and data stewarding to ensure policy enforcement
- Technologies to help govern data across a distributed data landscape
Who It's For
- Chief Data Officers
- Data Architects
- Business Data Analysts
- Data Scientists
- IT ETL developers
- Data Governance Professionals
It assumes a basic understanding of data management, data architecture, data integration, data catalogs, data lakes and data governance.
Speaker
Mike Ferguson
Managing Director
Intelligent Business Strategies
Mike Ferguson is Managing Director of Intelligent Business Strategies Limited. As an independent analyst and consultant, he specialises in data management and analytics. With over 40 years of IT experience, Mike has consulted for dozens of companies. He has spoken at events all over the world and written numerous articles. Mike is Chairman of Big Data LDN – the fastest growing Big Data conference in Europe. Formerly he was a principal and co-founder of Codd and Date Europe Limited – the inventors of the Relational Model, a Chief Architect at Teradata on the Teradata DBMS, and European Managing Director of Database Associates. He teaches popular master classes on Big Data Fundamentals, Modern Data Architecture, Data Governance of a Distributed Data Landscape, Data Warehouse Modernisation, Migrating to a Cloud Data Warehouse, Master Data Management and Machine Learning and Advanced Analytics. Follow Mike on Twitter @mikeferguson1.
Fees
- 2 days
- £995
- LIVE STREAMING - £995 + VAT (£199) = £1,194
Group Booking Discounts
Delegates | |
---|---|
2-3 | 10% discount |
4-5 | 20% discount |
6+ | 25% discount |