From Data Graveyards to Knowledge Landscapes

Dieser Artikel ist auch auf Deutsch verfügbar

Europe possesses enormous public datasets, yet their economic and societal value has consistently fallen short of expectations. The causes are fragmented data portals, incompatible interfaces, and increasing dependencies on non-European platforms – all factors that hinder genuine innovation.

In parallel, new data spaces are emerging in industry, for example within the framework of Gaia-X, Catena-X, or the International Data Spaces (IDS) reference model. Unlike traditional open data portals, these data spaces enable secure, structured, and cross-industry exchange of sensitive information – based on standardized contracts, identity, and access mechanisms. Each participant retains control over their data while creating a trustworthy ecosystem for collaborative value creation.

As a result, two extensive but largely separate data worlds exist today:

freely available open data holdings from the public sector
domain-specific industry data spaces with high granularity and clear semantic standards

Both areas contain vast knowledge – yet as long as there is no connecting, easily usable bridge, this potential remains untapped. In this article, you will learn how artificial intelligence (AI) and the Model Context Protocol (MCP) together pave the way from open data to open knowledge – and how you as a decision-maker can advance seamless networking of public and industrial datasets without endangering your company’s digital sovereignty.

Open Data & Data Spaces: Accessible Data, but Without Benefit

The Paradox of the European Open Data Strategy

Open data stands for the principle of providing administrative and research data openly, machine-readable, and without legal or technical access barriers. This concept is by no means new: as early as 2003, the PSI Directive (Public Sector Information) committed EU member states to make publicly funded data as accessible as possible. In 2019, the framework was strengthened again with the Open Data Directive – with the goal of boosting innovation, transparency, and economic growth. As a result, authorities provide a wide variety of datasets, from geodata and weather information to budget figures and traffic data – with the aspiration of enabling new services and insights from them.

However, reality looks different: there is a significant gap between aspiration and actual added value.

Open data is abundant, but its societal impact remains marginal.

Despite considerable investment, only a fraction of this potential is being exploited. In Germany, for example, there are over 500 open data portals – yet each works with its own metadata structures, formats, and interfaces. For developers, this means: complex integration, inconsistent formats, and poorly understandable documentation make usage difficult. On average, a portal counts fewer than 100 accesses per month, some datasets are virtually never used. Since 2010, over 250 million euros have been invested in such infrastructures, yet fewer than five percent of the provided data finds productive use.

The result: expensive isolated solutions emerge, unnecessary duplication of work, and frustration among all stakeholders. Additionally, dependency on global cloud providers grows – particularly for analysis, hosting, or processing services. This, in turn, contradicts the goal of strengthening Europe’s digital sovereignty.

In the following sections, this article explains how modern technologies like artificial intelligence (AI) and the Model Context Protocol (MCP) can help overcome these blockages and bring the vision of open knowledge within reach.

Open Data from Industry: Data Spaces

While open data portals primarily target the general public, so-called data spaces emerged in industry – especially since 2015 with the International Data Spaces (IDS) initiative by the Fraunhofer Institute. Their goal is to enable a trustworthy, decentralized data marketplace where companies can exchange sensitive information without losing sovereignty over this data. Projects like Gaia-X or its industry-specific implementations (Catena-X for the automotive industry, Manufacturing-X for production, Agricultural Data Space, etc.) build on this concept.

Key features of a data space:

Sovereignty: Each party retains complete control over access rights, usage purposes, and processing rules (so-called Data Usage Policies).
Federation instead of central storage: Data remains physically with the originating company; only defined segments are exchanged via standardized connectors.
Semantic interoperability: Common ontologies, such as the Asset Administration Shell (AAS) as a digital twin wrapper, ensure that data is described in a machine-understandable way.
Trust anchors: Identity and certification services verify participants, allowing even business-critical or confidential data to be shared securely.

This enables digital twins to be built along the entire value chain – from raw material procurement to recycling – providing real-time information about the condition, usage, and carbon footprint of a product. Machine suppliers, logistics service providers, suppliers, and operators thus receive a common but finely granulated situational picture without having to disclose their proprietary databases.

Data spaces thus complement the public open data ecosystem with high-resolution, domain-specific knowledge treasures. When both worlds are connected and made easily accessible for AI systems like large language models, a solid foundation for truly data-driven innovations emerges – from preventive maintenance via resilient supply chains to sustainable product cycles.

Digital Sovereignty: More Than Server Locations

Digital sovereignty means control over data, infrastructure, and value creation. Three misconceptions stand in the way of genuine sovereignty:

Infrastructure fixation – servers in Frankfurt ≠ sovereignty when the algorithm comes from California.
Portal centralism – one portal for everything fails due to federal reality and data sovereignty.
Download culture – open CSV files are not user-centric services.

Only when data is contextualized, accessible, and processable does value emerge. Wouldn’t it be easier to transfer the open data portals into data spaces?

Data space architectures – whether Gaia-X, Catena-X, Manufacturing-X, or the cross-sectional IDS reference – promise the holy grail of data sovereignty. Technically, they deliver:

federated connectors, that enforce fine-grained usage rules,
identity services, that authenticate and certify partners,
policy languages, with which “use yes, pass on no” can be expressed in machine-readable form,
domain-specific semantic models (e.g. Asset Administration Shell), that enable unambiguous interpretation.

Thus, the question “Who may see which data under which conditions?” is now well answered. However, the much more important question remains open: “And for what purpose?”

Utility Gap: In many pilot projects, the added value ends with “being able to provide.” Data may move sovereignly from connector A to connector B, but hardly anyone incorporates it into productive applications like predictive maintenance or CO₂ accounting. The result is a veritable “empty-shelves effect”: elaborately sorted shelves, but hardly any marketable products.
Semantic fragmentation: Although standards exist, each industry – often each consortium – models additional classes and properties. A seemingly simple term like “batch” can mean different things in chemistry, food production, or pharmaceuticals. Integration therefore still costs manual mapping efforts.
Missing “last-mile services”: Data spaces specify transport and governance, but not searchability, visualization, or decision support. Without these services, the data stream remains abstract – comparable to highways without exits.

In short: Data spaces lay a secure pipeline, but the water still needs to be refined into drinking water. When AI-based services like LLMs and lightweight protocols like MCP make data automatically discoverable, semantically harmonized, and translatable into natural language, the “sovereignty-value creation gap” closes. Then genuinely usable knowledge emerges from sovereignly shared raw data – from supply chain resilience via digital twins to circular economy.

From Open Data and Data Spaces to a Federated Knowledge Architecture

This section describes a concrete use case that shows how, through the combination of open data and data spaces using AI and MCP, a sovereign architectural pattern – the Federated Knowledge Architecture (FKA) – can emerge that creates genuine added value. Using the example of the planned construction of a manufacturing hall, it illustrates how this architectural pattern bridges open and domain-specific data spaces, enabling innovative knowledge landscapes.

Federated Knowledge Architecture and System

Federated Knowledge Architecture (FKA) refers to an architectural pattern in which distributed knowledge and data services are federated via MCP. It connects open data sources and domain-specific data spaces into a sovereign knowledge layer for AI-supported analysis – with clear governance and without central data storage.

The Federated Knowledge System (FKS) ) is the concrete implementation of the FKA in an organization or ecosystem – including MCP servers, LLM orchestration, and domain adapters

Complexity in Modern Construction Projects

Planning and building a modern manufacturing facility is inherently complex. It involves pulling in data from a wide range of sources: public open data portals on soil quality, drinking water protection zones, and flood risks, as well as scientific and ecological considerations for sustainable construction to minimize environmental impact. At the same time, industry-specific data spaces need to be tapped to identify low-emission materials—whether construction materials or electrical components. Ideally, sustainability metrics should be calculated across the full lifecycle of the facility, not just during construction.

Today, this process is largely manual: consulting with multiple agencies, filling out forms, and navigating time-consuming coordination loops. Each step introduces potential delays and errors.

A New Approach: MCP and AI for Smarter Building

This is where the concept of a Federated Knowledge Architecture comes in. Using the Model Context Protocol (MCP) as a federated data translator—combined with locally hosted Large Language Models (LLMs)—this architectural pattern simplifies complexity and unifies access to all relevant data through a single intelligent layer.

How it works:

Automated Open Data Access: Public datasets—e.g. environmental restrictions or water protection areas—are accessed programmatically. MCP enables these sources to be queried by AI systems.
Seamless Integration of Industry Data Spaces: Standardized interfaces connect to sector-specific datasets, including carbon footprints, circular economy metrics, and material attributes.
Contextualization via Lightweight LLMs: A compact, locally hosted LLM (e.g. Mistral, LLaMA) aggregates and contextualizes the data to generate meaningful recommendations—for material selection, permitting, or environmental documentation.
Document Generation: Building permits, environmental assessments, and funding applications can be automatically generated based on the aggregated data.

The Outcome: Rather than contacting agencies individually or relying on expert input, a project manager could submit a query like: “How do I design an environmentally optimized factory in a drinking water protection zone?” The system would return relevant regulations, environmental assessments, and supporting documentation—within seconds. These resources remain accessible throughout the entire construction lifecycle and can be continuously updated as the project evolves.

Technically, this setup relies on local LLMs enriched in real time via MCP servers, connected to public and private data sources. No migration of legacy systems is required, making implementation straightforward.

Data-Driven Construction as a Business Enabler

This approach fundamentally transforms how industrial building projects are planned and executed:

Reduced Time-to-Decision: What once took weeks can now be validated within hours or even minutes.
Sustainable by Design: Environmental metrics like carbon footprint and resource efficiency are integrated from the start.
Lower Costs: Automation and the reduction of duplicate effort increase efficiency—particularly valuable for SMEs.
Competitive Advantage: Faster, data-backed decisions give organizations an edge in fast-moving markets.

This is more than a technical innovation—it’s a blueprint for AI-augmented infrastructure that leverages Open Data and Data Spaces to move from fragmented silos to actionable Open Knowledge.

Anatomy of a Federated Knowledge Architecture

What does such an architecture look like in practice? The diagram below outlines a representative setup: it shows how LLMs, MCP servers, and data sources interact to deliver actionable insights to project stakeholders. For clarity, not all data sources from the example are shown. The MCP servers illustrated can be deployed either within the organization’s own Federated Knowledge Architecture or in an external system, such as one operated by an Open Data Portal provider.

Federated knowledge system diagram showing interactions among components such as LLM, MCP servers for Electric Equipment and Environmental Health, data providers, and project manager interface. — Federated Knowledge System – data flow and interactionsm

#	Description
1	A project manager submits a request for a specific building type, including parameters such as location, lifecycle, and production use. The request may also originate from a project planning tool.
2	The Federated Knowledge System receives the request and: 1. Retrieves available MCP features from active MCP servers 2. Builds a composite prompt using the request and available features 3. Instructs the LLM to call additional MCP features if further data is needed.
3	The LLM analyzes the prompt, identifies missing data, and requests the client to execute a specific MCP feature. Once all needed data is returned, it generates a final response.
4	The LLM client invokes the requested MCP feature and sends the result back to the LLM.
5	MCP servers implement the features by querying relevant data sources. Servers may run locally or be operated by external data providers.
6	The final response, enriched with contextualized data, is returned to the project manager.

Getting Started: Building a Federated Knowledge System

The example above demonstrates how different systems can interoperate effectively. The more MCP servers are connected, the richer and more complete the responses become. This spans use cases from generating summaries and recommending actions to pre-filling applications—or even controlling infrastructure components directly. The potential of this architectural pattern is significant: it enables organizations to turn existing data into actionable innovation. But how do you make it real?

Before implementing a Federated Knowledge Architecture, you need to define its core structure—specifically, how the MCP server and LLM will fit into your overall system landscape.

In early-stage projects, chat-based interfaces offer a lightweight and flexible way to get started. They allow teams to quickly prototype with LLMs, experiment with prompts, and develop MCP features tailored to specific domains. If needed, you can integrate a custom MCP client as a plugin within the chat environment.

As your proof of concept matures, and the complexity of domain logic grows, a structured user interface becomes more practical. Advanced use cases often require collecting more detailed inputs—something that’s much easier to manage through structured fields than free-text prompts. Structured interfaces also help scope your system appropriately. A construction-focused knowledge architecture doesn’t need access to zoological data—even if some environmental datasets touch on those areas.

Most organizations already have what they need: existing data silos, many of which are accessible via APIs. Even a basic database can be connected to an MCP server with minimal effort. The open-source MCP community continues to release ready-to-use integrations. For example, APIs described using OpenAPI can be automatically translated into MCP features. Prebuilt adapters also exist for many common databases.

In its initial form, an MCP server doesn’t need to do much. It simply acts as a bridge between the LLM and the data source. The logic for processing and interpreting the data lives in the prompt itself and is executed by the LLM. Over time, as more use cases are implemented, your MCP features can evolve to handle more complex logic. Selecting an LLM isn’t just about accuracy or benchmark scores. Deployment strategy, data governance, and regulatory fit all play a role. Here are key considerations:

Local vs. Cloud-Hosted: Running an LLM locally gives you full control and data sovereignty, but it requires significant compute resources. For many teams, a hosted LLM is faster to set up and more cost-effective—especially in the early stages of a project.
EU-Based vs. US-Based Hosting: If you’re opting for a hosted model, consider whether the LLM needs to be EU-compliant (e.g., Mistral). US-hosted models often lead in maturity and capabilities, but data residency and compliance may be harder to guarantee. For text-heavy contextual use cases, smaller EU-hosted models may be entirely sufficient.

What About gpt-oss (Open-Weight Model)?: The release of gpt-oss-120b and gpt-oss-20b, both available under Apache 2.0 licensing, has shifted the equation. Local deployment is now more affordable, more powerful, and more flexible. A Federated Knowledge Architecture can remain model-agnostic: switching to an open-weight model like gpt-oss is as simple as changing the model reference in your MCP layer—no architectural changes required. That said, governance remains critical. Prompt logging, evaluation pipelines, and access controls should be in place from the start.

Your LLM should be integrated in a way that allows for easy replacement down the line. That’s not just smart architecture—it’s a necessity. The LLM ecosystem is evolving fast, and models will almost certainly need to be swapped out over time. Reasons include cost optimization, improved performance, changing business needs, or new regulatory requirements.

Once your technical foundation is in place, the most important step is: get started. Thanks to years of investment in Big Data, Open Data, and Data Spaces, most enterprises and public-sector organizations already have data silos in place. Many of them are API-ready and just waiting to be connected. Chances are, there’s already a perfect starting point in your environment.

Article