Interoperability and how to sustain it
Vocabulary and concepts for the age of analytics
One critical aspect of any digital transformation is interoperability. Interoperability of components, devices and systems is necessary because, without interoperability, organizations will continue spending precious resources on costly, ineffective and brittle data searching, preparation and aggregation functions. The real value of analysis and automation will remain lacking.
In other words, the ability to locate, understand, access and trust data is a key enabler of digital transformation. In this context, interoperability is the ability of systems ─ including organizations ─ to exchange and use information without knowledge of the characteristics or inner workings of the collaborating systems ─ or organizations.
Further, we observe, by convention, “levels” of interoperability, wherein each level increases interoperability in a network or community. The suggestion is that greater interoperability leads to greater autonomy. For our purposes, the salient levels are standards-based, semantic and sustained interoperability.
Standards-based interoperability includes dedicated reference models covering many business areas and related application activities. From the design phase to production and commercialization, standards are developed to enable organizations to exchange information based on common models.
Semantic interoperability is the ability of computer systems to exchange data with unambiguous, machine understandable meaning. Semantic interoperability is required to enable machine computable logic, inferencing, knowledge discovery, and data federation among information systems.
In other words, despite standards for data formats and structures, information may not always be understood by recipients. Explicit knowledge can be encoded, but tacit knowledge requires human interactions. Semantic interoperability adds semantic annotations and knowledge enrichment to address these issues. Ontologies represent the contemporary approach to implement knowledge enrichment and reach semantic interoperability.
Sustained interoperability maintains network harmonization. As an area of research this involves theory related to complex adaptive systems (CAS). Organizations ─ and networks ─ must adapt to survive. Change is constant. Models and semantics change, which can break the harmony of the network, introducing a new dimension to interoperability.
Thus, when one network member adapts to a new requirement, it creates a ripple that propagates through the network, and the network begins experiencing interoperability problems. One model for sustaining interoperability includes a monitoring system that detects actions that break the network harmony. Upon discovery of the event, an intelligence integration layer interprets the change and devises a strategy to adapt to the change. Then a decision-support system assesses the strategy and decides on a course of action, using the communications layer to notify the network of the action to restore network harmony. Thus, the network evolves to restore harmony.
We believe that semantic interoperability is the key enabler for digital transformation. But, how do we achieve semantic interoperability? It has long been realized that interoperability could benefit by having content understandable and available in a machine processable form, and it is widely agreed that ontologies will play a key role in providing much enabling infrastructure to support this goal.
Ontology is the key
In the broadest sense, ontology is the study of the nature of existence, beings and their relations. In information science, ontology provides a means to create unambiguous knowledge. “An” ontology is a formal specification of the concepts, types, properties and interrelationships of entities within a domain of the real world. Ontologies provide humans and machines an accurately understandable context or meaning. Ontologies ensure a common understanding of information. In practice ontologies describe and link disparate and complex data. Important architectural considerations of ontologies include the following.
- Ontologies enable reuse of foundational concepts in (upper) ontologies that are domain independent and can be used across domains.
- Modularity of ontologies allows separation and recombination of different parts of an ontology depending on specific needs, instead of creating a single common ontology.
- Extensibility of ontologies allows further growth of the ontology for the purpose of specific applications.
- Maintainability of ontologies facilitates the process of identifying and correcting defects, accommodates new requirements, and copes with changes in an ontology.
- Ontologies enable separation of design and implementation concerns, so they are flexible to changes in specific implementation technologies.
Notably, informal ontologies may lead to ambiguities. Systems based on informal ontologies are more error-prone than systems based on formal ontologies. Formal ontologies allow automated reasoning and consistency checking. Formal ontologies span from taxonomies of concepts related by subsumption relationships to complete representations of concepts related by complex relationships. Formal ontologies include axioms to constrain their intended concept interpretations.
We require a language to create standard and shareable ontologies. When one models a portion of the real world, i.e., some domain of interest, a conceptualization exists in one’s mind. This is based on the concepts existing in the domain and their salient relationships. An ontology language provides a mechanism to represent the concepts. The entire domain specification is expressed in the language. Thus, an ontology is an explicit specification of a conceptualization of some domain.
So how do we arrive at a standard ontology language?
In the 1990s, there was a recognition that languages such as HTML and XML were insufficient for knowledge representation. HTML is oriented to rendering information in a human friendly presentation. XML provides a platform-independent data exchange model.
In 1999, the European Union sponsored development of the Ontology Inference Layer (OIL). Note, sometimes “Information” is used in place of “Inference.” OIL was based on strong formal foundations of Description Logics, namely SHIQ. OIL was compatible with a very lightweight model called Resource Description Framework Schema (RDFS), which was already standardized in 1998.
In 2000, the Defense Advanced Research Projects Agency (DARPA) initiated the DARPA Agent Markup Language (DAML) project. DAML was to serve as the foundation for the next generation of the Web which would increasingly utilize “smart” agents and programs. One goal was to reduce the heavy reliance on human interpretation of data. DAML extended XML, RDF and RDFS to support machine understandability. DAML included “some” strong formal foundations of Description Logics but focused more on pragmatic application.
Circa 2001, groups from the US and the EU collaborated to merge DAML and OIL, the result of which was known as DAML+OIL. DAML+OIL provided formal semantics that support machine and human understandability. This new language also provide axiomatization, or inference rules to expand reasoning services, which provided machine operationalization.
In 2004, the World Wide Web Consortium (W3C) derived the Web Ontology Language (OWL) from DAML+OIL and published it as a “standard” knowledge representation language for authoring ontologies. The initial OWL specification featured three “species” of OWL: OWL Lite, OWL DL, and OWL Full, each providing increasing expressiveness and sophistication. In 2009, the W3C released OWL 2 which articulated different versions of OWL tailored to different reasoning requirements and application areas. The latest W3C OWL 2 recommendation is dated 11 December 2012.
This entire evolution can be aptly characterized as “making the data intelligent instead of the software.” Since the data is “common” to all software processes — and the areas within digital transformation — we can more effectively realize interoperable and autonomous systems.
Standardizing the enabler
Subsequent to the initial release of OWL, the W3C articulated a set of standards and methods under the “Semantic Web” label. In this construct, the primary standards that adopters and vendors implement to create machine understandable, rich contextualized knowledge include Resource Description Framework (RDF), RDF Schema (RDFS), Web Ontology Language (OWL), and SPARQL Protocol And RDF Query Language (SPARQL).
RDF provides the means to create, store and exchange semantic data. RDF is a Directed Acyclic Graph (DAG) which, for our purposes, means that concepts are neither defined in terms of themselves nor in terms of other concepts that indirectly refer to them. RDFS is a set of classes with certain properties that build on RDF to provide basic elements for the description of concepts in the RDF data. OWL builds on RDFS to add significantly more constructs to specify or model domains or applications. SPARQL provides the means to query semantic data, including from distributed sources. The “Protocol” portion of SPARQL standardizes the means to publish and communicate with semantic data services, known, in general, as “SPARQL endpoints.” RDF, RDFS and OWL are materialized in a simple, three element structure, commonly called a “triple,” statement or fact. Existing data sources may be represented as triples. Triples from otherwise disparate data sources may be linked to create a universal and machine understandable “data fabric.”
In addition, Semantic Web standards enable machine reasoning services that infer new facts from existing facts; that is, semantic technologies make implicit data explicit. Semantic Web standards allow human and machine data consumers to know unambiguously what data mean.
Semantic Web standards create machine understandable context in a standard and repeatable methodology. Just as the Web is distributed and decentralized, so are data sources that employ semantic web technologies. The idea is for data producers to publish machine understandable content that software and human consumers can discover and consume in a reliable and repeatable manner. The concepts in service-oriented architecture are germane. As the ecosystem grows, the need for standardized publishing, finding and invoking semantic data and services applies. Unlike conventional data standards, ontologies need not be centrally managed. Ontologies grow, evolve and adapt over time as adoption increases. The superior ontologies naturally become more popular and gain traction. Because ontology is based on existence of beings and their relationships, terminology in information systems tends toward alignment.
The need for “top down” or “highly coordinated” planned and implemented data architectures is diminished because the model is decentralized, distributed and based on formal ontology that is designed to achieve semantic interoperability in a federated manner. A subject beyond the scope of this paper, ontology-based approaches assume that “one never has all the facts.” Previous approaches did not make this assumption, which resulted in inflexible designs wherein requirements had to be known in the design phase.
It is worth noting that semantic technologies are intended for machine-to-machine interactions. Of course, applications that leverage semantic technologies may be user-facing. But the larger vision is to enable a “Machine Web,” one that understands correctly and operates more autonomously.