Using Semarchy to Master Data Management

10 Jul, 2023 | 6 minutes read

What does master data mean? What is master data management?

Master data is a key organizational data asset, also known as a golden record or master record. It contains the newest and up-to-date information that are needed for the business. One example is master data with information regarding customer details, sales report, sold products and more. Using master data establishes a standard definition for business-critical information and this information is shared across the enterprise with one purpose – to make better business decisions.

Master data management on the other hand is one of the many types of data management. It is the core process of centrally managed business information that can be critical. Every company uses multiple applications, has multiple data sources and destinations, different systems across the departments – this brings us the main problem, different versions of the same data between the systems. Only one version can be updated, that means the other one will remain the same. So, we can define master data management as a continuous process of using a single version of the current and correct information that will allow the organizations to make decisions.

Semarchy xDM

Semarchy is a unique all-in-one Intelligent Data Hub platform. Using Semarchy you can easily identify discrepancies and areas of improvement to prepare for the data governance process. It allows designing business applications that can help in managing core data assets for any domain. Behind the business views it uses advanced matching rules and algorithms to identify duplicate data, computes relevance score and suggests suspicious merges for further screening. It integrates into the information system using REST API and synchronizes the data with the applications. Semarchy is an enormously powerful MDM that leverages data management and helps the business to organize the data on a high level.

(Figure 1: Source – Semarchy)

Types of data in Semarchy

(Figure 2: Source – Semarchy)

When we see the entity is Semarchy, we are not actually seeing the tables that are stored in the database – meaning entity is not equal to a database table. The most important types of data in Semarchy are the following three:

  • Master Data is key business information that supports the transactions. For Master data we can say that it is not high-quality data, it is scattered and duplicated and not truly managed.
  • Golden data is a cleansed, de-duplicated, consolidated, validated version of the original master data.
  • Errors data – errors raised by records pushed in the hub when violating the data quality rules defined in the model.

Semarchy model

Every model in Semarchy starts with an entity. The entities support all the features that are needed for a Data Hub, including match, merge, and data historization. Also, every entity is composed of attributes, starting with the ID. Attributes can be simple types or complex types, like list of values etc. For example, if you notice in the diagram below, we can see that we have an entity called Customer and, in this entity, we have multiple attributes. A model usually starts with one core entity, and it is expanded with more entities and relationships between them. There are three different types of entities Basic, Fuzzy and ID Matched entity. Every entity has its own specific features:

  • Basic Entity: this means that we cannot use any matching rule to match the records. If we define a basic entity that means that we are assuming that the records are coming from one source.
  • Fuzzy Entity: this means that we can use matching and merging of the records. When there are multiple records coming from a different source, we can define a matching rule that will merge them into one golden record.
  • ID Matched Entity: this means that the records are coming from different data sources, and they share the same ID and will be matched by using that ID.
(Figure 3: Model)

Semarchy Information Quality

Data quality is at the core of the xDM model with enrichment and validation rules. These rules guarantee the highest level of quality and consistency for all the data managed in the hub. Enrichment and validations are declared by SemQL expressions. There are more than a hundred functions available. Below is a simple example of one enricher. In this case, we are calculating the years of the customer.

(Figure 4: Enricher)

Let us see this example on the front-end application and what is shown in this field.

(Figure 5: Enricher)

When it comes to validation, our team ensures strict adherence to data quality rules and appropriate standardization. Specifically, in this example, we are committed to guaranteeing that the Customer’s first name and last name are never a single letter or an empty value.

(Figure 6: Validation rule)

Let us see how this validation helps us when we want to insert some new data. We can see that the fields are marked in red, and we cannot process until we fill these fields with correct data.

(Figure 7: Validation rules)

Data Consistency by Using Steppers, Workflows, and Action Sets.

We can define the stepper as a group of related records that is manipulated when authoring and the wizard-like steps that walk the user through the operations. Workflows allow users to collaborate for data authoring. Each task of a workflow is processed by a user who modifies and reviews data using a stepper.

Action sets are groups of actions that appear together in a business view menu to modify one or more records. For example, if we want to create/edit a customer, we can see that we have multiple steps. Once we have filled the three steps the data is inserted in the database. The complete form for creating or editing a record is called Action set.

(Figure 8: Stepper and action set)
(Figure 9: Workflow)

Matching rules

One of the most powerful features in Semarchy is the matching rules. What does a matching rule mean? How is this helpful? How can we improve the data quality by using a matching rule? There are two different matching rules that can be defined:

  • ID matching – this kind of matching is used in ID Matched Entity (also, can be used in Fuzzy Entity). It will match the records based on their ID.
  • Fuzzy matching – will find the duplicates based on the rule or algorithm defined in the matching rule. These matching rules are used in Fuzzy Entity.

Let’s check the following data example:

(Figure 10: Example of data)

When we define a matching rule, we can match these two records for example by first and last name. We are matching the records with a confidence score of one hundred, meaning the records must be completely the same in order to be merged into one.

This is an example of an exact match:

(Figure 11: Example of data)

Once the data matching process is complete, a single golden record is created based on the established matching rule. Now, let’s address how the golden record is formed and which data is included in it, particularly when dealing with two records. This is where survivorship rules come into play. These rules are defined to determine which data will prevail or “survive” over the other.

Taking the example of different email addresses, we need to establish which email address will be present in the golden record. Below is an illustration of a survivorship rule, where we can observe a consolidation strategy. In this case, records from Salesforce are given priority, and therefore the golden record will include the email address from Salesforce Global.

By defining survivorship rules and consolidation strategies, we ensure that the most relevant and reliable data sources contribute to the formation of the golden record.

(Figure 12: Survivorship rule)

In the scenario where we aim to match records based on email address, we can utilize fuzzy matching to achieve this. In this case, considering the similarity between the two records, we can merge them based on their resemblance. Fuzzy matching allows us to calculate the similarity between the records, and if the similarity exceeds 70%, they will be merged into a single golden record.

By leveraging fuzzy matching, specifically using the email address as the matching criterion, we can effectively consolidate and merge records that bear a high resemblance, streamlining the process of creating accurate and comprehensive golden records.

(Figure 13: Example of data)

Conclusion

Master data management plays a critical role in business operations as it ensures data quality and consistency. By synchronizing all versions of data with the master data, it guarantees uniformity and accuracy across different sources. This systematic approach enables more efficient execution of tasks such as marketing campaigns, sales conversations, and production output, leading to improved outcomes and streamlined operations.