Master Data Management (MDM) and the Semantic Web class=

Master Data Management (MDM) and the Semantic Web

Ray Wang of the Altimeter Group, “the software insider“, can be counted on to provide interesting food for thought. Last week, it was about changes to the Master Data Management (MDM) market: “Focus on Outcomes Drives Push for Value.” According to Wikipedia, MDM includes the “set of processes and tools that consistently defines and manages the non-transactional data entities of an organization (also called reference data).”

Rationalizing data definitions across large organizations has been a difficult problem for many years. An expanding problem thanks to more and more data. Hence the need for more effective methods to manage meta data – the description and logic behind the data being used. MDM is needed to ensure that everyone is reporting on the same concepts.

The core government financial “master data” in Government Resource Planning (GRP) is located in the Chart of Accounts (COA). The COA holds information about the organization, budget codes, accounting codes, programs, projects, activities, objectives and statistics. MDM in government becomes more challenging with procurement (vendors), revenue (customers), taxation (taxpayers) and civil service (employee) information. The advent of performance management highlights the need to rationalize data across multiple systems: governments need consistent data definitions for reports and dashboards.

I had just returned from a meetup at the International Semantic Web Conference when I read Ray’s piece on MDM. Semantic web is sometimes referred to as “Web 3.0.” (I won’t get into the debate about how “semantic” semantic web is – or whether this is “Web 3.0”.) Semantic technology has moved from the academic to the business world. It can be used to classify both structured and unstructured data. (And, integrate with the “deep web” through databases.)  It occurred to me that this technology represents the future of MDM.

 To expand on my comments on Ray’s blog:

1. Vertical: Semantic technology is ideal in building vertical taxonomies. Machine learning has been most effective when applied to single domains. (This is changing as the technology improves to handle multiple domains.)
2. Structured and Unstructured: designed to leverage both structured and unstructured content. Semantic technologies can pull concepts and identifiers directly from unstructured data. It can also show unexpected patterns with structured data because it is not limited to the explicitly relational database structure.
3. Data in the cloud: Can use web and “linked data” from external systems. Current search technology indexes web pages. Semantic web technologies can pull data from databases. And, there does not need to be a single source of data – this is the advantage of “linked data” that enables multiple servers to expose information.
4. Styles: Semantic technology tends to focus on business concepts rather than the physical layer. (At the same time, supports data rationalization at the physical layer.)  Users need to have information presented as concepts to discover important facts. Otherwise, users need to be database experts.
5. Governance: It might be possible to leverage semantic web technologies for governance – trap improper uses of classifications, identity facts that could change classifications. And, it can reduce the burden to ensure that data is classified in a particular fashion.
6. Social networks: Semantic technology is being used today to analyze customer reaction on social network sites to gauge opinions. Semantic technologies can help determine whether a blog post or series of Tweets refers directly to your organization or not.
7. All data types: Semantic technology extends well into all text-related content. There is also some work in integrating with all media. This technology is helping to break free of the arbitrary containers for data (documents, videos, databases etc.)