Want to pass the CDMP in ONE MONTH? Click here

How to Choose a Data Catalog: The Ultimate Guide for Business Leaders

How to choose a data catalog

Has your Data Team asked to choose a Data Catalog, and now you’re wondering “What is a Data Catalog?” and “How will a Data Catalog help our business?”?

Or have you hired in a Consultant to help become “data-driven”, they’ve recommended buying a data catalog, and now you’re wondering how to choose the best data catalog for your firm? 

If so, this guide was written for you.

Why we wrote this Guide

In 2024 we ran extensive research to answer the question “What causes a Business Leader to say ‘Today’s the day we buy a Data Catalog’?”.

We found:

  1. Business Leaders are often confused by technical terminology, worried they’re buying a solution they don’t need
  2. Data Leaders struggle to connect the Data Catalog features to Business outcomes that matter, and;
  3. Choosing a Data Catalog is HARD WORK – with many selection processes stalling or stopping without a beneficial outcome

Read on to find out how to choose a Data Catalog for your business successfully.

Interested in how we ran this research and what you can do with it? Watch this video to learn more:
Table of Contents

How to Understand the Jargon When Choosing a Data Catalog

Our first research finding was that Business Leaders struggle to understand the jargon and language used on Data Catalog vendor websites. As such, this guide is written in plain English.

If you’ve struggled to understand the jargon, you’re not alone. Here’s a quote from a CEO we interviewed:

Well, my CDO, when he came to me, he said, “We really need to take this data catalog seriously”. And what I was picturing in my head is, was more like a catalog, like, I don't know if you get what I'm trying to say, like a traditional catalog, like the one you see in the magazines, or stores, I'm like, what the hell is this?

The CEO imagined a Catalog like you’d find in a store, full of pictures of products to buy. Whilst he felt embarrassed by a lack of knowledge, this view is a good baseline to understand what a Data Catalog can do for your data.

Imagine you walk into a store that has NOT catalogued its products, and there is no resource to understand what products they have, what features and benefits these products provide, and whether they’re in stock for you to purchase:

  • How will you know if they have a product to meet your needs?
  • Who knows what the products can be used for and what they’ll do for you?
  • Will you feel confident making a purchase, or concerned that you’re buying something you don’t need?

This is the situation your data team finds themselves in when shopping for data to help you achieve your business goals. Without a Data Catalog, they struggle to know:

  • What data we have available and where it is stored
  • What the data we have discovered can be used for (legally and ethically)
  • How to understand the features of the data, to understand whether it’s fit for purpose, and;
  • Is this data the latest version, and is it the right quality for our needs?

You use a Shopping Catalog to make better purchase decisions, more quickly.

Data teams use a Data Catalog to find good data that can deliver the business analysis you need, so you can make better business decisions.

What is a Data Catalog?

A product catalog in a store lists all the items for sale, describes each product, tells you where to find it, and provides prices.

Similarly, a Data Catalog in an organisation lists all the data resources available, provides descriptions to help you understand what each dataset contains, shows where the data is stored, and includes other useful information.

This helps everyone in the company easily find and use the data they need, in the same way a product catalog helps shoppers find what they are looking for in a store.

Without a Data Catalog, you have to search for data across multiple sources and business silos, you need to spend time trying to understand what this data is and what you are allowed to use it for, and you need to invest time to make sure the data meets the needs of the project you’re working on. This means:

  • Slower response times when asking for new reports or insights
  • Additional manpower costs, as data teams are less efficient than they could be
  • Risks of inconsistent reports because different teams found and used different data
  • Risk of using data and breaching regulations like GDPR, HIPPA or CCPA (and subsequent fines/loss of customer trust)

What are some key Data Catalog features?

In order to understand these tools in more detail, you’ll need to learn a few additional technical terms and jargon. The technical details can be complex, but the concepts are relatively simple.

Metadata Management and Data Catalogs

First – what is data? In our Shopping Product Catalog, data might be the Price, the Colour, or the Weight of an item. We need this data to decide whether we want a $10 Blue Widget that weighs 5 tons or whether we prefer a $5 Red Widget that weighs 3 tons. The data describes features that we are interested in learning more about.

  • Learn more about data here

Metadata describes the data itself so we can understand and use the data appropriately.

  • Think of it like the column headings in an Excel spreadsheet – if we have a column without a heading with the data: “red”, “blue”, “orange” etc, we can guess that this is a list of colours but we don’t know what these colours mean.
  • We don’t know what the colours mean because we lack context – what is this data describing?
  • If the column heading says “Widget Colour” we can now use this to understand what the data is describing and make better decisions

Metadata here is the label “Widget Colour” which enables us to accurately find, understand and use this data.

A Data Catalog allows us to provide this additional context to the data we have in our organisation. Metadata is collected and created to help others understand what the data means, how it can be used, and who is responsible for its upkeep.

  • Learn more about Metadata here
How to Choose a Data Catalog - is Metadata Management important?

Metadata Management is a foundational use case for Data Catalogs.

Key decisions and gotchas:

  1. Does your organisation have a formal metadata management process in place today?
    • If not, you might want to start small before buying an expensive tool to automate a process that you do not perform yet
  2. How easy is it to import metadata from your existing IT systems into the Data Catalog?
    • Which systems do you really need to track and manage? The more you bring in, the more expensive and the harder to manage – set scope carefully
  3. Who will manage the metadata once it is inside the Data Catalog?
    • This is an active process – it requires resources and time investment from the subject matter experts in your business
    • Unless you make time to populate the metadata and maintain it, the Data Catalog will become stale and no value will be delivered
    • Note that the subject matter experts are likely to be in high-demand in your business already – can you afford to free up enough of their time to perform this new role?

Data Governance and Data Catalogs

Your business has rules including:

  • What time you expect employees to start and end work
  • How much people are paid in different roles
  • Who is responsible and who is accountable for e.g. sales performance and revenue growth
  • Whether staff can work from home or must show up in person

These rules are a form of governance that determines what people can and cannot do. You have escalation paths if more junior staff disagree and you need someone with decision making power to call the shots. 

Data also requires rules and standards, so all staff understand how you expect them to create, store, consume or delete data. 

  • Note: even if you don’t formally have a “Data Governance function” these rules will exist informally – e.g. sales staff might be encouraged to “update the close dates in the CRM” weekly so your sales forecasts are accurate. 

A Data Catalog makes it easier to determine whether the rules are being followed or not. You can give people roles to make decisions about data (e.g. should Employee X be allowed access to this financial report?) and collaborate to improve documentation (the metadata management we described above).

  • Note: A Data Catalog IS NOT necessary in order to govern your data. It is perfectly acceptable to establish rules and use manual solutions such as this Excel template – so do not buy a Catalog in the hope it will govern your data for you
How to Choose a Data Catalog - is Data Governance important?

In our research, many firms were purchasing a tool for their Data Team to use to document data without investing in a formal Data Governance program first.

Others were purchasing a Data Catalog in order to make it more likely that business users will want to help with this process, at the same time they were appointing people to be “Data Stewards” and “Data Owners” (these are formal roles for business users that are commonly part of a Data Governance rollout).

Key decisions and gotchas:

  1. If you have no formal Data Governance program and no plans to invest in one, your data documentation will likely be incomplete as data teams lack the specific domain expertise
    • For example, who decides what counts as a “Customer” in your business?
    • Is that a decision you want to make as the business leadership, or are you comfortable for your data team to decide this on your behalf?
  2. If you have no formal Data Governance program and you are about to appoint staff to governance roles, consider delaying the selection of your Data Catalog until these staff have learned what to do, how to do it, and what they need from a solution
    • Many firms try buying a solution that is “easy to use” in the hope that they will get better business adoption – this seldom works
    • Until staff have experience managing metadata they will not know which features to prioritise and what capabilities they need
    • If you are worried about a lack of business engagement this indicates you have a bigger problem – no one sees the value in data governance – and a data catalog will not fix this

Data Lineage and Data Catalogs

Imagine a “Mom and Pop” store – it has one single location, and the owner has a good understanding what products they have, can quickly find them, and can ship those products to customers knowing the products are in top condition. 

Now imagine that this “Mom and Pop store” grows to the size of Amazon.com – and has millions of products in thousands of locations around the world. One single person could never keep track of this, so we have inventory systems that tell us what’s in stock and help us deliver it to the customers on time.

Data lineage does the same thing for data.

As a small business, you probably don’t need Data Lineage – when Cognopia started we didn’t even have a CRM, we just used a Google sheet.

Once your business scales, more applications are added and it becomes harder to know where the data in your report came from, and therefore whether it’s trustworthy and up-to-date or not. Often there are multiple reports with inconsistent data, because different teams pull reports from different systems at different times.

Data lineage allows us to understand where our data comes from and where it goes to. This is important so we can:

  • Make sure data in reports is accurate, up-to-date, and reliable for the decisions we’re making
  • Make changes to source systems with confidence, because we know whether these changes will break important downstream systems, processes and reports
  • Know which reports are used so we can remove data sources that are unnecessary and cost money to maintain
How to Choose a Data Catalog - is Data Lineage important?

Data Lineage is an important feature when your business has scaled (either organically or by acquisition) and when you are trying to reduce costs and improve efficiency for the data team.

It is also important when you are trying to reduce inconsistencies between reports, and to demonstrate compliance with regulations (you know where sensitive data is stored, where it goes, who has access, and how to safely purge this data when it is no longer required). 

Key decisions and gotchas:

  1. Your Data/IT team will be keen to automatically harvest metadata and data lineage from as many data sources as possible – so they have a comprehensive view of the data estate
    • The more data sources, the more expensive it will be – so strike a balance between comprehensive coverage and cost control
    • Some data sources change very infrequently – ask whether you need to automatically scan these sources or whether a manual workaround is sufficient
  2. Ensure there is a focus on proving the value of Data Lineage rather than simply documenting it
    • Ask your team to find and remove duplicate data sources or reports to reduce costs
    • Aim to reduce the time it takes to make business changes to source systems safely and securely
    • Ensure you know why you are documenting lineage and put a process in place to establish its value
    • Example ROI calculations can be learned here

Data Quality and Data Catalogs

Data Quality is commonly misunderstood. Simply put it is data that meets business needs. For example:

  • A list of customer records with no email address would be LOW QUALITY if we intend to run an email marketing campaign
  • The same list of customer records with no email address could be HIGH QUALITY if it is used in-store to personalise product recommendations

Data Governance sets the rules and standards so we understand whether data is fit-for-purpose or not. A Data Catalog can show whether data meets these rules or standards, or whether it is currently under the quality standards we have set. As such, it allows us to make better decisions because we understand whether data meets our business decision needs or whether we are “flying blind”.

  • Note: not all Data Catalogs come with Data Quality tools included. Some organisations opt to select multiple “best-of-breed” solutions and integrate them, others aim to buy from a “one-stop-shop”. 
How to Choose a Data Catalog - is Data Quality important?

Clearly, one reason to invest in a Data Catalog is the need for access to high-quality data, to set and enforce data standards (so people know when data reaches this quality level) and provide a mechanism to certify data and reports that have met this quality standard.

But does your Data Catalog need to have capabilities to measure Data Quality? Or is it OK to use another tool to measure Data Quality and simply display the results in the Data Catalog?

Key questions and gotchas:

  1. Do you have an existing Data Quality solution?
    • If so, are you satisfied with it or will you use this as an opportunity to find a better tool?
    • If not, do you want to buy a single from a single vendor (“one throat to choke”) or are you aiming to get “best-of-breed”?

Note: if your organisation has limited experience with Data Quality solutions you may want to choose a single vendor or the cheapest solution so you can develop capabilities to use more advanced tools without high cost or risk

Other Data Catalog Related Jargon

The above list is written as a beginners guide. There are many other technical terms your data/IT team might reference. Here are a few elements with simple definitions, key questions and gotchas:

Data Catalog vs Master Data Management

Master Data is data that defines your most important business relationships – Customers, Suppliers, Products and Employees are all so important to your business that you need to pay special attention to data about these relationships.

  • You need to focus on this IF your business struggles with inconsistent data across departments – usually talked about the need for “a single version of the truth”
  • Dedicated software solutions exist to manage Master Data, so you can ensure everyone accessing data about these key relationships has the same data at any one time

Assume you are aiming to have a Single View of your Customer (often called a “Customer 360” which is one type of Master Data project):

  • The Data Catalog helps you identify where all the sources of customer data are today
  • A Master Data Management solution then merges these data sources to provide a single “Golden Record” so all users receive the same view of the most important Customer data 

Decide whether you need one or both tools by understanding the problem you must solve, including the budget you have.

Data Catalog vs Data Warehouse

A Data Warehouse is a single system where you can store and analyse information from multiple systems across your organisation. They are typically established for specific analytic needs, and may also be launched to provide “a single version of truth”.

  • Building a Data Warehouse is challenging, a Data Catalog can make the build easier because you know where the trusted sources of data are before you begin the build
  • Adopting a Data Catalog alongside a Data Warehouse project adds time (because we need to populate 2 systems rather than one)
    • This approach has the advantage of “seeding” the Data Catalog and ensuring the effort spent finding data for the Warehouse is not “wasted” because you are left with documentation and shared knowledge for future projects/changes
    • If you take this approach and you fall behind schedule, resist the urge to stick to your roadmap deadlines by cutting corners and skipping the documentation step, as you will waste money twice
Data Catalog vs Data Lake

Unlike a Data Warehouse, a Data Lake allows you to store more varied data types. Data Lake projects are usually kicked off to enable the organisation to unlock value from data analytics and data science programs – where the goal is to uncover new insights from data beyond its original purpose.

  • Data Lake projects often fail to deliver value because the data in the lake lacks context (metadata) and Data Scientists are unable to understand the data available to them
  • Adopting a Data Catalog alongside a Data Lake project adds time (because we need to populate 2 systems rather than one)
    • This approach has the advantage of “seeding” the Data Catalog and ensuring the effort spent finding data for the Lake is not “wasted” because you are left with documentation and shared knowledge for future projects/changes
    • If you take this approach and you fall behind schedule, resist the urge to stick to your roadmap deadlines by cutting corners and skipping the documentation step, as you will waste money twice
    •  

Many firms will skip the documentation step because they have set arbitrary timelines to unlock value from their Data Lake, and the work to document data rather than load it ASAP is seen as “holding us back” rather than “enabling project success”. 

  • You would be wise to set more realistic timelines and invest in documenting data – Data Scientists are expensive resources and Data Lake projects fail at an alarmingly high rate. Consider this an insurance policy.

What are the Business Benefits of a Data Catalog?

Rather than asking “What value can a Data Catalog bring to me?”, our research asked the question “What causes a Business Leader to say ‘today’s the day we buy a data catalog’?”

  • i.e. In which situations will a business leader see more business benefits than costs for adopting a new tool?
  • The situation your business is in changes how much you value various features and impacts how you justify budgets

Whilst Data Catalog vendor websites try to list every business benefit, our research discovered there are only 4 situations in which a Business Leader will invest money:

  1. “Help me comply”: this context focuses ONLY on the compliance capabilities of Data Catalog tools, aiming to demonstrate that the organisation meets regulatory data requirements to “check the box” 
  2. “Help me follow the plan”: in this situation a Business Leader is being advised by an external consultant to help become “Data Driven”. The Consultant’s plan includes buying a Data Catalog, and the selection process is set in motion
  3. “Help me be more efficient”: your business has grown and now you are seeing waste from poor data practices. It takes too long to get analytic insights, your data team are working overtime/adding project cost, and management meetings waste time arguing about inconsistent reports
  4. “Help me keep growing”: your business has used data analytics and insights to grow revenues and market share, however this growth trajectory is unsustainable because there is too much new data for you to manage. You need to protect this growth at all costs

We will now explore each situation so you can identify the context your business is in today, and determine which features and benefits of a Data Catalog are important to consider in your selection process.

Data Catalog Business Benefits: "Help me comply"

How closely does this statement align with your current business situation?

When there are data regulations in force, and we struggle to win business because we are not fully compliant, help me quickly and easily check the compliance box so I can get back to business-as-usual

This is a pure compliance play. Data Catalogs are always used to help organisations meet their data compliance objectives, but this specific context is only interested in the value of checking a compliance box.

Points to note:

  1. The data regulations DO NOT cause people in this context to purchase on their own – most firms respond to a new regulation with workarounds or add-ons to existing systems
  2. Investment becomes more likely if a competitor or comparable firm is publicly fined or exposed for breaching this regulation
  3. Investment becomes inevitable if customers demand compliance with the regulation to continue doing business with your firm

 

You are in the "Help me Comply" context if

Do these statements match your business situation today?

  • When a compliance regulation is in force AND we see its negative impact on our business or operations
  • When we are worried our poor data practices will cause us to lose customers or prospects
  • When someone in the organisation steps up and takes accountability for solving it
  • So we can minimise the risk of failing a regulatory requirement and losing customer trust
  • So we have the quickest path between agreeing to solve the problem and getting on with solving the problem
  • So we can improve our financial performance and maintain a competitive edge in the market
Help me comply is more about
  • Regulatory failings – specifically losing customers or losing customer trust as a result of a data security breach
  • Speed; once you commit to becoming compliant, you wanted to go as fast as possible to check the box
  • Specific regulations that your customers want you to comply with
  • Seeing another firm in their market on the front pages for cyber issues and not wanting to join them
  • Technology and IT leading the selection
  • Getting rid of manual processes
Help me comply is less about
  • Solving a specific business problem – you ARE likely to look for a problem or benefit to tie this to beyond pure compliance, but this is not the trigger for your selection
  • Adopting a Data Platform or becoming a data-driven organisation
  • Being led by consultants – if anything you will want to take the lead on this
  • Adding capabilities to support a Data Governance Function or providing a tool to help Data Stewards (you probably do not have these roles in place formally)
How to Choose a Data Catalog when you're in "Help me comply"

Here are the value tradeoffs you will make if you are in this situation:

  • Price is more important than buying a comprehensive or complex tool
  • A cheap solution that can be piloted quickly will be valued over a comprehensive solution with full features
  • You will select the Catalog for your business unit rather than waiting for an Enterprise solution to be rolled out
  • You will fund this from existing IT budgets, rather than creating a formal business case for a big investment
  • You will value vendors with local support and good self-service documentation over vendors with full features or high placements on analyst reports
  • Compliance is more important than ROI – and as such you will have a small budget available to spend
Alternatives to choosing a Data Catalog when you're in "Help me comply"

See, GDPR had already come in, but nobody gave a damn because our data protection officer, our data protection commissioner really didn't take any scalps. They were getting caught up with Meta and TikTok and all this kind of stuff. So they didn't really go after a [small fry like us]

Just because your business enters this situation does not mean you will invest in a Data Catalog. You may also consider:

  • Training and Workplace Certification about the importance of data security and protection
  • Continuing with workarounds and manual processes

Your motivation in this context is low. It is principally driven by C-Suite or Board Members worrying “Can our business survive if we are exposed in the news/fined for non-compliance?” – whilst this sounds scary, many senior leaders believe their organisation has the risk covered and therefore investment is a grudge/insurance purchase rather than an investment in growth or expansion.

What risk is there in choosing a Data Catalog in the "Help me comply" context?

The risk in this context is typically low – investments are largely in the time taken to make a selection decision, and in the effort spent by your in-house staff to ensure you are able to get certified as compliant with the regulation.

Given the fact budgets for this purchase context are low, the risk is low too. Aim for short contracts (1-2 years) to further reduce the risk of being locked into a solution if it fails to help you comply.

Data Catalog Business Benefits: "Help me follow the plan"

How closely does this statement align with your current business situation?

When our C-Suite wants to be data driven and a vendor has sold us a roadmap, help me organise my data so I can support a range of data-driven use cases

These selection processes often begin with a change in the C-Suite – often a new CEO has been appointed and they are looking for ways to drive growth and business expansion. 

A trusted consultant (often a Strategy Consulting partner or Big 4 vendor) will be hired to help. They will run a Data Maturity assessment and determine that your business is less data-driven than comparison companies, and tell you that, if you catch up, you’ll unlock $$$ hundreds of millions (depending on your business size) in incremental value IF you follow their plan to become data-driven.

Points to note:

  1. The focus in this situation is more about Data Use Cases – your consulting partner will present a list of use cases and your management or board will cherry pick some of these use cases to adopt
  2. You are very likely to appoint a Chief Data Officer – probably for the first time – and you will look first at internal candidates rather than hiring an external expert (because you are already following the plan of your trusted consulting partner)
  3. The Data Catalog will just be one of many investments and the business case is likely to be more about the Data Use Cases and less about the value of a Data Catalog

You are in the "Help me Follow the Plan" context if

Do these statements match your business situation today?

  • When the business has been stagnating and want to use data as an opportunity for growth
  • When a partner we trust paints a roadmap we can believe in
  • So I can enhance data comprehension and provide effective user support across the organisation
  • So I can easily access and manage all data in one central, user-friendly location
  • So we can harness data insights to drive business growth, enhance customer experiences, and innovate in our products and services
  • So I can rely on a consistent and trustworthy single source of truth for all data
  • So we have a path to follow from a vendor we can trust
  • So I can trust that this vendor will be a good partner
Help me follow the plan is more about
  • Top-down CEO direction to change – this is a big shift and will be a major program for the organisation
  • Following a roadmap from a vendor or partner
  • Adopting Data Platforms/Data Lakes and centralising all the data for further analysis
  • Establishing and supporting Data Science teams and enabling new use-cases for data
  • A different business model where Data may be monetised for the first time
  • Having someone to hold your hand along this journey and telling you what to do, so your staff can learn how to do it themselves
Help me follow the plan is less about
  • Efficiency in data operations to reduce the effort cost of existing data analysis processes, unless this is a use-case the partner has provided
  • Collaboration between teams or sharing of insights, unless this is a use-case the partner has provided
  • Solving a specific, known problem – this has not grown organically within your Data or Business team, it is a move into “new territory”
  • The Data Catalog itself – in many cases the choice of Data Catalog you’ll end up with is “just part of the stack” that the partner recommends.
  • Supporting existing Data Governance processes – it is highly likely you’re implementing a Data Governance Framework at the same time as hiring data scientists, data engineers and other “new roles”. 
How to Choose a Data Catalog when you're in "Help me follow the plan"

Here are the value tradeoffs you will make if you are in this situation:

  • You will cut corners on an RFP process or avoid one entirely if it means you make this decision more quickly (because you do not want to be seen to hold back the plan)
  • Data Catalog features are less important than whether the Data Catalog fits the overall data stack you are implementing (you are not performing Data Cataloging today, so do not know which features you need)
  • You will value a vendor you trust more than the Data Catalog features or how they work (because you’re following a plan and have limited experience, trusting the vendor to deliver is most important)
Alternatives to choosing a Data Catalog when you're in "Help me follow the plan"

No, no, no RFPs. Uh, there was only one short document that basically explained what should be available in a data catalog and other data tools in order to develop the data roadmap.

In this case it is more likely that the competition is for the overall Data Platform/Data Stack, and you will select whichever Data Catalog fits within this stack.

Your budget will be pre-allocated to purchase each component in the plan, usually based on indicative costs from the partner who provided your plan. This has a few implications:

  • You ought to be up-front with vendors about your budget, because you are unlikely to be able to change the budget if you discover a Data Catalog that you want but cannot afford
    • Doing so would slow down the plan, and given the fact no-one has used a Data Catalog yet they are unlikely to see material differences between different solutions at different price points
  • You will feel compelled to spend the budget in the time-frame, which negatively impacts your ability to run a thorough selection process

The emotions driving your decision are likely:

  • Completing the steps that are laid out in your roadmap
  • Trust – you need to trust your vendor and be confident they can keep up with your aggressive roadmap timelines
  • Being seen as a valuable part of the overall process and not holding back progress in the plan
What risk is there in choosing a Data Catalog in the "Help me follow the plan" context?

Of all the situations we researched, this context has the greatest risk of failure and wasted spend. In many cases these programs lasted for 24-36 months before being abandoned as “Zombie assets” .

What goes wrong and what can you do to prevent it?

  1. As a business leader, realise the time pressure you create for “following the plan” makes it more likely you’ll make a selection mistake – be flexible about the selection process to allow proper due diligence
  2. Consider NOT purchasing a Data Catalog, or using a solution that has low up-front costs
    • Your team has little or no experience with these tools, would you buy a Ferrari as the first car for your teenage son or would you buy them something cheap, safe and reliable?
  3. Challenge your partner firm that is providing the plan to retain “skin in the game” – they have led you to this place, it is only fair that they take ownership and accountability for these recommendations

Aim to spend small in order to learn and reduce uncertainty, rather than investing big in new technologies and capabilities that you currently lack experience with.

Data Catalog Business Benefits: "Help me be more efficient"

How closely does this statement align with your current business situation?

When we have outgrown our data landscape and we're struggling with inefficiency, waste and rework, help me get a single pane of glass for our data so I can easily access and manage all data in one central, user-friendly location

These selection processes are more likely to come from someone with an Operational viewpoint – a COO role or from an Enterprise Architect within the IT team.

The major driver is efficiency – or specifically, the fact that your firm’s data is inconsistent and difficult to trace, which leads to waste and re-work for the data team as well as delays or miscommunications for the business teams. 

The main value drivers in this context are:

  • Reducing time wasted in meetings of senior executives because data is inconsistent or misleading
  • Reducing the cost to deliver data to the business – headcount in the data team is growing, overtime is all the time, and these costs are eating into profit margins
  • Costs for external contractors or consultants that work with your data or need access have ballooned and you need to get them under control 
  • Costs to comply with regulations are growing and you want a better way to keep track of data assets so you can comply without so much manual effort

You are in the "Help me Be More Efficient" context if

Do these statements match your business situation today?

  • When the business has grown faster than the data team can keep up
  • When our technical team struggles to know what data we have and where it is
  • When we cross into a new financial period and can have money to invest
  • When a compliance regulation is in force AND we see its negative impact on our business or operations
  • When we are worried our poor data practices will cause us to lose customers or prospects
  • When someone in the organisation steps up and takes accountability for solving this problem
  • So I can easily access and manage all data in one central, user-friendly location
  • So I can rely on a consistent and trustworthy single source of truth for all data
  • So I can automate data processes to increase efficiency, save time, reduce manual effort, and proactively manage data changes
  • So we can improve our financial performance and maintain a competitive edge in the market
  • So I can trust that this vendor will be a good partner
  • So we have a place to collaborate on our data to help reduce data silos

Note: this context has a lot more drivers behind it than “Help me comply” and “Help me follow the plan” – this is because your organisation is solving a known problem that many stakeholders have experienced.

  • This context is timed to match corporate financial budgeting and planning – budget is set aside to solve a known problem
Help me be more efficient is more about
  • Increasing the efficiency of data teams or those that are using data in the business
  • Reducing the time it takes to find the data needed to deliver an outcome
  • Gaining control over a sprawling set of data assets
  • Having a “single pane of glass” for the data
  • Being able to prioritise data improvement work and fit it into a purchasing decision (planning and ensuring it is high enough priority to get funded)
  • Reducing the risk of data by having more visibility across the data estate
  • Thoroughly researching the market and identifying the best fit solution for the problem
  • Selecting as a group or panel rather than driven by any single individual
Help me be more efficient is less about
  • Driving business growth – cost control to maintain or improve profit margins is more important
  • Responding to external customer pressure for better data (although being more efficient serving customers is important)
  • Data use-cases or being data-driven
  • Time-to-value – there’s less urgency because the problem needs to be well defined and solutions thoroughly researched
  • Following a roadmap or plan from a partner – this is led internally 
How to Choose a Data Catalog when you're in "Help me be more efficient"

This context is the most likely to run a lengthy, formal RFP process and will likely involve detailed workshops and Pilot/POC processes to demonstrate that the tool will actually deliver against the promises vendors make.

Here are the value tradeoffs you will make if you are in this situation:

  • You will take longer in the selection process to ensure you get the best tool you can afford at the best price
  • You will value connectivity and integration to your existing data landscape over specific product features
  • You will value vendor reputation and trust over product features – likely building the selection list through Gartner/Forrester research and word-of-mouth recommendations
  • Because you are aiming to reduce costs, the solution cost is more important than it’s perceived quality – “good enough is good enough”
Alternatives to choosing a Data Catalog when you're in "Help me be more efficient"

Well, it's quite easy to calculate the time horizon for the ROI. I think within a year you'll have had the money back anyway. Uh, so we calculated, we were actually going to save 300 hours per month, essentially per team. And if you're looking at how much you pay that for any single member of the team, essentially per month it is a lot. If you look at what we pay people like contractors that we work with, say contractors to develop a particular system for us. I know how much we pay per day for contractors, so. So, yeah, within a year, you would have had a return on investment anyway.

The competition in this case is more staff headcount, more contractors and more consultants to come and help deliver the analytics and compliance requirements. 

These projects also compete for resources with other CapEx investments – if the business environment changes during the selection process they may be put on hold.

Your CFO or finance/budget team will challenge “Why are we investing in this technology?” and “Why are we investing now?” – and will push back for the cheapest option (and often they will win this fight).

The status quo is the biggest competitor when you are in this context, and if you want to proceed you need a mechanism to move this from “Important but not urgent” into “Urgent and important” to unlock budgets.

What risk is there in choosing a Data Catalog in the "Help me be more efficient" context?

If you are in this business situation the biggest risk is wasting time, money and effort running a selection process that ends in “no decision”.

As noted above, the status quo is a powerful competitor, as efficiency improvements are always “nice to have” and seldom “must have”. In our research, the “final straw” to change also needed:

  • A positive business environment (bumper profits/a good year) so money was available to invest
  • An event triggering the move from important to urgent – e.g. a new regulation or a client issue caused by poor data
  • Another project that fails or struggles where the cause was poor data efficiency

So if “no decision” is the default decision, why is this the biggest risk?

  1. Some Data Catalog vendors have started offering price structures with no up-front fees – and these will seem attractive to your data team, who want to solve the problem they’ve spent so much time exploring
  2. These solutions may end up costing more over the long-run than purchasing a solution directly
  3. Your team has run an extensive software selection process with far more rigour than any other context – however their final choice is based purely on price, not on functions or features
  4. If “no decision” is the end state, you have wasted resources trying to buy a solution and ended up at square one

Data Catalog Business Benefits: "Help me keep growing"

How closely does this statement align with your current business situation?

When our business has grown and we struggle with manual, inefficient data processes, help us automate, collaborate and trust our data so we can continue our growth trajectory

This context is firmly driven from the business leaders – anyone from the CEO, CMO, or those in charge with maintaining or increasing growth for the business. 

Typically, the business has experienced a growth spurt that is a result of using data – for example, more accurate targeting of customer marketing enabling a surge in new customer wins.

This additional growth overwhelms the data team and their processes and systems can no longer keep up. As a result, continued growth is put at risk as the data team cannot continue to provide the insights that were used to get us where we are today.

You are in the "Help me keep growing" context if:

Do these statements match your business situation today?

  • When the business has grown faster than the data team can keep up
  • When our data team is inefficient and can no longer meet the needs of the business
  • When our technical team struggles to know what data we have and where it is
  • When business users are unable to find, access or use the data and reports they need
  • When we are worried our poor data practices will cause us to lose customers or prospects
  • So we have a place to collaborate on our data to help reduce data silos
  • So I can automate data processes to increase efficiency, save time, reduce manual effort, and proactively manage data changes
  • So we have the quickest path between agreeing to solve the problem and getting on with solving the problem
  • So we can improve our financial performance and maintain a competitive edge in the market
  • So I can enhance data comprehension and provide effective user support across the organisation

Note: Like “Help me be more efficient”, this context has a lot of driving motivations and you do not need all of them. The main driving force here is the data team saying “we are stuck” and “we cannot support the growth plan you’re outlining because we cannot deliver you the data you need to get there”

  • This context has the highest urgency is and is the most likely to spend money by taking it from existing budgets this calendar year
Help me keep growing is more about
  • Being used to using data to drive growth, then seeing the negative impact of the growth on the data team’s ability to execute (too much data and complexity)
  • Being able to collaborate effectively across the business with the confidence that you’re making decisions based on consistent data
  • Having manual, time consuming data processes that add risk of misunderstanding that are slowing the business down
  • Running the risk of not being able to acquire more customers or keep current customers happy if we mis-handle their data
  • Data Engineering teams pushing a need to change so they can keep providing strategic insights vs working on operational data challenges
  • Data Lineage and being able to see where all the new data is coming from so we can improve the reports and analysis on which it’s based
  • FAST implementations – this has been a festering problem, and now it’s gotten high enough up the food chain to someone who wants it fixed yesterday
  • The CEO finally “getting it”, learning that this is needed, then driving for the best solution possible
Help me keep growing is less about
  • Efficiency for the sake of reducing costs or headcount, in this case the drive is for efficient delivery of data  so that the data team can keep fuelling business growth
  • Regulations or Risks around data – except the risk of exposing poor data practices to customers
  • Data platforms, vendor-directed roadmaps, or specific data use cases
  • Reducing costs or saving time to increase profit margins, this is all about increasing revenues
  • The price of the solution – this group of buyers are happy to spend if they believe it solves their problem
How to Choose a Data Catalog when you're in "Help me keep growing"

Of all the contexts in which people choose Data Catalogs, this is the one where the selection process will be the quickest. This is because growth is at risk, and there is a need to ensure the organisation does not “lose its eyes”.

Typically in this context, your CDO or Data Leadership will already have a tool in mind, and your role as a Business Leader is to validate and ensure you get the best value for money. 

The value tradeoffs you’ll make are:

  • You will create budget in this calendar year (often repurposing existing IT spend) rather than waiting for the next budget cycle (because you expect the continued growth to fund the purchase in year 2 onwards)
  • You will want “the best solution with the most features” rather than the cheapest solution
  • You will go for the tool that is more intuitive for a business user to use vs a cheaper solution, because you want business teams to collaborate effectively around your data and this is the tool to enable that collaboration
  • You want something “easy to use and install” (so we can move fast because the learning curve is shallow) and I’ll pay more to get it
  • Vendors promising the shortest implementation will win vs those with richer features or lower costs
Alternatives to choosing a Data Catalog when you're in "Help me keep growing"

I am of the opinion that I think the only time where we waste time to get a solution is when we're not fully convinced that it's the right choice for us. So we just try to put in the budget of the current year, instead of pulling the budget to the next year. And since we are fully convinced that this solution is the right one for us, after all the adequate research has been done, we just go with it. No time to waste, no time to pull it to the next year.

There is little competition to purchasing a Data Catalog in this context – by the time you get into this situation you have already exhausted all the options to grow using additional headcount and contractors, and your Head of Data has probably been clamouring for this tool for 6+ months. 

You could keep trying with manual processes and workarounds, and your CFO/Procurement organisation will push back against spending money and specifically against spending money NOW. These objections get bulldozed by the growth demands of the business, and often you’ll end up shopping for the most expensive solution. 

What risk is there in choosing a Data Catalog in the "Help me keep growing" context?

The biggest risk here is that the selection process is short and the people driving it care mostly about being able to get to their outcome (continued growth) than they do about learning the in-depth details of the solution being selected.

Interviewees in this context expressed the greatest confidence that their selection process was rigorous, however many relied on simple demonstrations and POC processes demonstrating canned features of the product they select.

Your business users have little understanding of the domain or jargon used, but once they are convinced that a Data Catalog is THE solution to their problem, they will move mountains to ensure it is selected.

Those choosing the tool can be misled that a simple interface with clean inputs and few clicks to achieve an outcome will be “easy to use”. This is because these users are not performing the tasks that are being demonstrated today, and therefore lack the experience to know which features matter and which are for show only.

Choosing a Data Catalog - ensure you have the right Data Catalog Implementation Plan

So, you know what a Data Catalog is, you’ve learned the jargon, and you know which of the 4 selection contexts your organisation is in. 

Before you select a tool, ensure you have the right implementation plan mapped out. Remember this is not just about installing a new software tool, you will be changing business processes and changing the roles of the people working in your organisation.

Because the implementation plan is a function of your existing IT landscape, culture and the product you have selected, this section will focus only on a few critical tips for the context in which you are in.

Data Catalog Implementation Plan: Help me comply

In a compliance mindset:

  1. Minimise the scope to the smallest achievable outcome that delivers compliance
  2. Engage business users early, as you will be changing their business processes and roles (and they will not want this to happen)
  3. Communicate that this is non-negotiable; it is unlikely anyone wants to change so focus on the fact you’re making this as painless as possible
  4. Identify at least one key “win” that you can use to claim the project delivered business value

Your first attempt is likely to be a pilot or extended POC. If you succeed then there may be a follow up rollout, so communication is critical to ensure people don’t think “this is over” before you have fully solved your compliance obligations.

Data Catalog Implementation Plan: Help me follow the plan

This context carries the highest risk of failure. Because it has been driven top-down (use cases the C-Suite/Board care about) and not based on problems faced by the “D-Suite” (the people who do work in your firm), there is a real risk that the Data Catalog is rejected before it has a chance to succeed.

The Implementation plan will be totally focused on the Use Cases that your consulting partner has provided. To minimise risk:

  1. Ask yourself “Whose role will be most impacted by implementing the Data Catalog? “
    • Are there any benefits to this person or are you asking them to do more work?
    • Do these users have any experience working with a Data Catalog? If not, consider Excel or manual work to kick things off
    • Are you actually able to get subject matter expertise to document your data, or have you assigned this to the IT team?
  2. Did the plan telling you to buy a Data Catalog come from the same vendor you have purchased the Data Catalog from?
    • If yes, ensure they are heavily involved in the implementation, with milestone payments linked to the achievement of use case objectives
    • If no, challenge your partner that provided the plan telling you to buy a Data Catalog to have skin in the game for the implementation. Link the payments for the plan to the outcomes you attain from the use cases in the plan.
      • If the vendor is so confident in their use cases then they should be happy to capture some upside and share in the downside risk
  3. Do not involve third party implementation providers (even if this is corporate policy and you have a preferred vendor) – ensure the vendor of the Data Catalog supplies the best available staff to support your implementation
  4. Remember that time is NOT on your side – as such prioritise tools that are easy to install in your environment, have out-of-the-box features for all known requirements, and only implement the features you need for the use-cases
  5. Focus on one use case that has the highest chance of success – one where metadata management is a core enabler, the amount of change to current business processes is lowest, and where data sources for this use case are best understood today
    • You will also need to “play ball” and support other use cases, but this is a situation where you need at least one win on the board

If you have a Change Management function then this is the ideal opportunity to engage them. You need all the help you can get to get the D-Suite on your side and keep them there for the duration of this project.

Data Catalog Implementation Plan: Help me be more efficient

This context has lower risk as it tends to follow a lengthy selection process and is aligned with budget and planning cycles. As such, you will have conducted cost/benefit analysis and thoroughly tested each solution to ensure it integrates with your existing IT landscape and provides the features you need.

Still, there are some key considerations when creating an implementation plan for the Data Catalog in this context:

  • Because of the desire to have “a single pane of glass” across all enterprise data, there is a risk that your Data Catalog Implementation Plan will want to connect to every data source; this can be an expensive mistake
  • If your project aims to automatically harvest metadata from data sources, consider:
    • How frequently do these data sources change? If they change infrequently then save money by avoiding automatic scanning and instead add a process step to update the catalog when the source is updated
    • Do you really need to connect automatically to this data source, or can you find a manual workaround?
    • Who will document the metadata you harvest, and what value will that bring to the business? There is a real risk of “boiling the ocean” so focus only on data that is actively being used or changed
  • Make sure you have a clear understanding of your end state – these projects can continue way past the point of delivering value

Data Catalog Implementation Plan: Help me keep growing

The main risk in this context is that your selection process was fast, and you are likely to select a tool that looks good and where the vendor promised the implementation would be quick. Data Catalog projects are never quick, but you do need to move fast – so limit the scope to only the data necessary to support your growth ambitions.

  • Unless a data source contains data that is used to maintain growth or stores data that is a result of your growth, leave it out of scope
  • If your chosen Data Catalog lacks an out-of-the-box integration with a data source, use manual workarounds to capture this metadata. 
  • Invest in training and awareness to ensure business end users understand this project – if you are reading this guide then you know what it’s like to struggle with Data Catalog jargon.
    • Use simple, business friendly language and do not promise that this solution will solve all their data problems – many projects fail because they over promise and under-deliver
  • Rely heavily on vendor resources to “teach you how to fish” with the new tool – your data organisation is likely to be swamped trying to maintain growth so you need additional hands to help get this implemented with minimal disruption to your business 
  • Ensure you support business users through this transition and encourage them to collaborate with one another – this is an opportunity to improve your ways of working with data, and your business leadership is critical to its success

Choosing a Data Catalog - unbiased, honest opinion service

Hopefully this guide has helped you understand what data catalogs are, how they are used, and the context(s) in which a business will fund their purchase.

Cognopia has been actively engaged in the Data Catalog market since 2016, when we acted as a reseller for many of the market leading tools at that time. Since 2019 we have stepped back from selling these tools in an effort to educate and inform data teams and their business leaders.

If this guide has left you with more questions and you’d like our guidance, please get in touch.

Help for Business Leaders considering a Data Catalog

If your Data team are asking you to fund a Data Catalog purchase you want to ensure you make an informed decision. You also want to ensure you understand the opportunity, cost and risk of this decision and need someone that can challenge your Data Leadership so you’re certain that this is money well spent.

We can act as the guide by your side to help you understand the problem and ensure you only purchase a tool that adds value.

Please note: our advise is independent of vendors and will provide our honest opinion of the best path ahead. This is not a service to rubber stamp a decision that has already been made or endorse a completed selection process.

Help for Data Leaders considering a Data Catalog

Are you struggling to gain traction and need help to persuade your Business Leaders to make the purchase? If so, consider sharing this guide with them. 

You might also be mid-way through a selection process and need help to work out which context your business is in and what to do about it. 

Lastly, you might have a Data Catalog installed with low end-user adoption. If so, you’re probably wondering whether to renew the license or cut your losses, or how to increase engagement from your end users. 

If you need more help, get in touch.

Please note: our advise is independent of vendors and will provide our honest opinion of the best path ahead. This is not a service to rubber stamp a decision that has already been made or endorse a completed selection process.

Help for Data Catalog vendors looking to improve their products, service or traction in the market

The Data Catalog market is brutal, with long sales cycles, high competition and pressure on prices. Our research can help you navigate this market, beat the competition, and onboard customers that love your product and company.

Talk to us if:

  • Your growth has stagnated and you need a better way to understand your customers and market
  • Recent product improvements or launches have failed to make the impact you hoped for
  • Customers are churning to the competition and you don’t know why
  • Your product roadmap is overly influenced by a few key customers or by copying features from other vendors, or;
  • You feel out-of-touch with customers and need an external perspective to hear their voice and improve your business model

We can share our existing research or run a bespoke study to understand what’s holding your business back

Please note: our advise is independent  and will provide our honest opinion of the best path ahead. This is not a service to rubber stamp product or sales decisions that have already been made or endorse an existing Go-To-Market approach. 

Help for vendors of other Data Management products

You don’t sell Data Catalogs, but you’d like to understand what causes your customers to say “today’s the day I’m buying {your product category}” and how to position your business to capture as much market share as possible.

We can run a custom research project on your behalf to find out what your market needs and wants so you blast past the competition. 

Advance Your Data Career

Practical, bite-sized training that you can immediately use to engage your business and make more progress. Ditch the theory and gain the skills you need to advance your data career.

Learn more