February 8, 2020
- Posted by: admin
What is Data Lineage?
According to Wikipedia: “Data lineage includes the data origin, what happens to it and where it moves over time. Data lineage gives visibility while greatly simplifying the ability to trace errors back to the root cause in a data analytics process.”
As such, Data Lineage is a map of the movement of your data as it flows between the place it’s created, the places it’s stored and the places it’s used.
Download the Fact Sheet about Data Lineage here.
Why Should I Care About Data Lineage?
Data Lineage is about Trust, Speed, and Cost Savings. If you have an end-to-end map of where your data comes from and where it goes to, you can trust that data and you can make better decisions with that data. You can also use this to rid the organisation of redundancy and waste, and to ensure you have adequate protection on sensitive or Private data.
Who Needs Data Lineage?
Here’s the main three reasons why Data Lineage will benefit you:
Data Lineage Drives up Trust in Data:
Most organisations have critical data (typically housed in external or key internal reports) that they use to make forward-looking projections to their key investors or regulators. Without the ability to trace these reporting figures back to their source, the ability to trust the figures is diminished.
A good analogy here is that of a rare Artistic Masterpiece – how much is it worth? If the artwork has good ownership lineage and documentation from the artist to the current owner, the value is greater than if it’s a potential masterpiece discovered in someone’s attic or basement with no paperwork to prove its origin. The same is true of your data – if you know where it comes from, how it got to your report, and the quality of the data used along each step of the journey, it has substantially more value and trust for decision making.
Data Lineage Improves Operational Speed and Agility:
On the flip side of the coin, many organisations need to make changes to data in their source systems, but are concerned about what impact this might have downstream to other reporting systems.
Without a strong map of where this data element goes, how it’s used and transformed by the rest of the business, you may not be able to confidently make the change. The only way to know for sure is to investigate – a time-consuming process if you do so by hand. A very simple process if you’ve got an automated tool scanning the lineages across your system, updating as and when a system change is detected.
Obviously the example above is a clear indication of a cost saving – if it takes less staff time to understand an impact, it should cost less to decide whether or not to proceed with the change. In addition to this, many organisations find they have a large legacy IT footprint, with multiple redundant sources of the same data. Through using Data Lineage you can understand which sources are valuable and used, and which are redundant or obsolete. Switch off the obsolete systems for an immediate cost saving.
How Can I Get Data Lineage in my Organisation?
You could begin by mapping the Data Lineages manually in spreadsheets. At a high level you design a map of the systems that house your key data, as well as the connecting transformations that move data from system to system, which is relatively simple. The challenge comes as you “peel the onion” and map data at the database, schema, table or column level, and try to trace it through the multiple transformations you have in your business.
- If this is done manually, the data discovery process will require months of human involvement to discover cross-system data relationships, derive transformation logic, assess data consistency, and identify exceptions.
Fortunately, there are options to automate the process, deploying metadata scanners that can automatically harvest lineage from Source Systems, ETL transformations, underlying code such as COBOL, Java, .NET, PL/1, SQL Hibernate, Spark etc, and stitch this all together into a usable document that is a living, breathing representation of your Data Flows across your Enterprise.
What Can Cognopia Do For Me?
Our team are experts in the Data Lineage space, and can quickly help you understand more about this important topic. Whilst this document presents the benefits of Data Lineage, you do not need Data Lineage for all your data. Cognopia can help you identify the areas where you’ll unlock major business value, improving Trust across your Data value chain, removing duplication and inefficiency in your IT architecture, saving money and making your decision making more agile and more effective.
Talk to us before embarking on a Lineage project – we can advise on the right approach, tie the use-case to underlying business benefits, and recommend, provide and implement Industry-Leading Lineage software for those that need it.