21 December, 2022 | Irina Steenbeek

Dr. Irina Steenbeek

Here more from Irina at our Master Data Management & Data Governance Conference on 9 – 12 May 2023.

A data lineage initiative is on the agenda of many data management and business professionals. However, some of these initiatives are far from being successful. I’ve dealt with the data lineage subject for six years already. Throughout all these years, I have seen the common challenges many companies face during their data lineage journey. I’ve shared my experiences in numerous publications, namely, the book, ”Data Lineage from a Business Perspective.”  In this blog, I want to summarize some “lessons learned” or “golden rules,” that every company dealing with data lineage must consider.

Let’s start with the definition of data lineage we will use in this article:

Data lineage is a description of data movements and transformations at various abstraction levels along data chains, and relationships between data at these levels. A data lineage initiative should include three phases:

  • Scope
  • Implement
  • Use

I will share one “golden” rule per each phase. Of course, much advice can be given, but let’s concentrate on these three, most significant “Golden Rules.”

Phase 1: Scope your data lineage initiative

The key goal of this phase is to identify the feasible scope.  The data lineage initiative in any form is time and resource consuming. Therefore, it must be feasible for the company in the long term.

The preparation of the scope includes the consideration of the following factors:

  1. Business drivers

A company should have substantial reasons to start this initiative. Different factors of internal and external environment play the role of business drivers. However, a company should concentrate only on 1-2 business drivers that make the most significant impact on a company’s business.

  • Sponsors

Data lineage should become a strategic initiative for the company. C-suite members must understand the importance of this initiative to the company and be ready to finance it in the long term.

  • Metamodel of data lineage

Data lineage is a much more complex concept than people think. You can document data lineage at different levels of abstraction. You should also think about metadata or data value lineage. Therefore, the metamodel to be implemented must fit the company’s needs.

  • A company’s readiness

The implementation of data lineage requires multiple other data management capabilities. A company should have a data management function matured enough for data lineage.

  • An “enterprise” scope

Different business drivers require various business units to be involved in the initiative. For example, the list of business units for a data lineage initiative to comply with data privacy regulations differs greatly from those to be involved in the finance digital transformation.

  • Critical chains and data

Even if a company has already put some limitations on the scope, it still can make it feasible by identifying only data and data chains critical for a specific driver.

  • Data lineage scope

You can limit the scope of a data lineage initiative by breaking long data chains in sectors and documenting data lineage at one level of abstraction. You can also minimize the number of data lineage objects to be documented. For example, you can choose the sector that includes 3-4 applications, document only physical data lineage at table and column levels, and skip business rules.

Phase 1 should shape your initiative to the feasible scope.

Phase 2: Implement data lineage


I very often come across the same question from multiple professionals: “Which software do we need to buy to start our metadata management/ data lineage/ … initiative?” This type of question is “the road to hell” that “is paved with good intentions.”

To choose the required method and solution, a company should first undertake the following steps.

  1. Define requirements

The chosen scope has already defined the list of stakeholders. Different stakeholders could have quite diverse requirements. A company must spend enough time to understand these requirements and translate them into the “data lineage” terms.

  • Choose an approach and method of documentation

The defined scope of data lineage will make a big impact on the required approach and method of implementation. For example, a large company can opt for a decentralized or hybrid approach of implementation. A smaller company can manage the implementation by using a centralized approach. Various options exist. A company must choose the best that fits its purpose.

  • Choose an appropriate software solution

Only after a thorough preparation, a company should start looking for different options. Many different providers offer their tools. Open-source data lineage tools exist as well. The most important is to choose that which fits the company’s needs, requirements, and resources.

  • Perform documentation

A company must understand that independent of the chosen method, the implementation could be time-and resource-consuming. Furthermore, data lineage indicates the dynamic nature of data and its transformation: it changes constantly. Therefore, data lineage maintenance is more important than its initial documentation.

Phase 3: Use data lineage


Data lineage implementation is not the biggest challenge. The biggest challenge is to bring its results into “business as usual” operations. Data lineage is not complex. Data lineage demonstrates the complexity of an application landscape. This is very difficult for business users to understand. Data lineage outcomes are not easy to understand or work with. It might require technical skills which business users don’t have. It might happen that at this stage you discover that business users’ expectations and the reality are far away from each other. That is why, a company should educate and gather requirements of business users at the earliest possible moment of a data lineage initiative.

About the author:
Dr. Irina Steenbeek is a well-known expert in implementing Data Management (DM) Frameworks and Data Lineage and assessing DM maturity. Her 12 years of data management experience have led her to develop the “Orange” Data Management Framework, which several large international companies successfully implemented. 

Irina is a celebrated international speaker and author of several books, multiple white papers, and blogs. She has shared her approach and implementation experience by publishing The “Orange” Data Management Framework, The Data Management ToolkitThe Data Management Cookbook, and Data Lineage from a Business Perspective.

Irina is also the founder of Data Crossroads, a coaching, training, and consulting services enterprise in data management. 

To inquire about Irina’s training, coaching, or participating in your company webinar or event, please, email to irina.steenbeek@datacrossroads.nl or book a free 30-min session at https://datacrossroads.nl/free-strategy-session/

Latest Posts