Think Locally: On-Premise LLMs as Drivers of Competitive Advantage

Dieser Artikel ist auch auf Deutsch verfügbar

This article is part of a series

Part 1: Managing Geopolitical Risks with Enterprise Architecture
Part 2: Digital Sovereignty: Why Architecture Matters and How to Make Your Company Resilient
Part 3: A Governance Framework for Digital Sovereignty
Part 4: EU Data Act: The Beginning of the End for Cloud Monoculture?
Part 5: Data Inventories in the EU Data Act: The Democratization of IoT Devices
Part 6: The Path to Heterogeneous Cloud Platforms
Part 7: Achieving Digital Sovereignty with Standard Software
Part 8: The Sovereignty Trap: Between Tiananmen and Trump
Part 9: Think Locally: On-Premise LLMs as Drivers of Competitive Advantage (this article)
Part 10: From Data Graveyards to Knowledge Landscapes
Part 11: Digital Sovereignty as Self-Understanding

This scenario became reality in May 2025, when a US federal judge ordered OpenAI to stop deleting output data from ChatGPT, requiring the company to preserve all output log data.[1] For users, this means any sensitive information shared with OpenAI’s systems could remain stored indefinitely, creating potential vulnerabilities for professionals in sectors like law, healthcare, or finance. The order affects hundreds of millions of users globally, demonstrating how quickly external dependencies can transform from strategic assets into uncontrollable liabilities. This illustrates why AI sovereignty has moved from theoretical concern to board-level risk.

Europe’s Narrow Window of Opportunity

When ChatGPT arrived in November 2022, it did more than answer questions—it put artificial intelligence on everyone’s lips. Its underlying large-language models let AI tools converse with human-like fluency. ChatGPT shook up the software industry despite its penchant for ‘hallucinations’—the industry’s euphemism for inventing facts. From startup pitches to enterprise updates, every new product now boasts some form of AI.

US and Chinese LLM providers dominate the market of AI technology. This raises challenges for EU firms.[2] The quality of AI models’ output depends on the quantity and quality of data fed as input. Privacy-minded businesses frown upon the prospect of uploading their sensitive corporate data or intellectual property to servers outside their jurisdiction. EU regulation such as GDPR further constrains transfers of personal data.[3]

German Aleph Alpha illustrates the challenges the European LLM developers face when trying to compete with overseas providers. The firm raised over €500m in funding but pivoted away from model development to AI consulting services in 2024.[4] As CEO Jonas Andrulis explained to Bloomberg: “Just having an European LLM is not sufficient as a business model”.

Despite the hurdles, European LLMs providers offer a competitive advantage that EU organisations cannot afford to ignore. French Mistral caters to the continental privacy-savvy clientele. Their models run on the firm’s European infrastructure, alleviating the compliance headaches around shipping sensitive data overseas. Moreover, their training sets include extensive multilingual European materials. That allows them not only to produce more culturally aligned output, but also have a better command of the bloc’s linguistic diversity, including abilities to translate. Lastly, the producers’ domicile in Europe makes them fall under EU’s environmental legislation, enabling customers to expect sustainability reporting and thus make informed choices when running the energy-hungry models.[5]

The Hidden Price of Hosted AI

While the AI services hosted by European providers do address a number of concerns of businesses worried about dependence on overseas firms, certain risks remain unhedged. Consider the financial angle. LLM providers have been struggling with costs of keeping their models running. As reported by The Economist, a magazine, OpenAI lost an estimated $5bn in 2024, with no profitability in sight. The cost of running more complex models, such as OpenAI’s o3, that invest copious processing power to generate their answers, shortens the runway. The LLM firms’ ability to fund their growth by continuous recruitment of new investors is bound to wane, and with it their ability to provide their services at a loss. Should they decide to shift the costs to their clients, the hitherto affordable hosted AI tools might become a significant contributor to the customers’ cloud-services bills.[6]

European businesses have been valued customers of US digital platforms. In their report from April 2025, Asterès, a consultancy, reckons EU firms spend ca. €264bn annually on cloud expenses with American providers.[7] Authors of the report reckon the sum at 1.5% of European GDP, and to be comparable to the EU total energy import bill, estimated at €376bn in 2024.[8]

Open Weights, Open Doors

Open-weights LLMs offer a compelling alternative. OpenAI and Anthropic keep their models as closely guarded secrets. The models’ weights—billions of parameters that determine how neural networks process and generate text—are not public. On the other hand, open-weights models can be downloaded and used on hardware that is not controlled by the model’s manufacturer.

LLM providers release their open-weights models under varying terms and conditions. Meta, Facebook’s parent firm, used to restrict the applicability of the early versions of LLaMa to scientific or private purposes. The United Arab Emirates’ Technology Innovation Institute has released Falcon models under permissive terms, enabling nearly unlimited commercial usage. Mistral has published open-weights models under both restrictive academia-focussed and laissez-faire licenses.

Local deployment of open-weights models enables organisations to maintain data within their own infrastructure while retaining control over model versions and updates. Leading open-weights models like Meta’s Llama 4 perform competitively in many benchmarks, though deployment requires additional technical expertise and infrastructure investment.

Organisations can leverage powerful hosted models for development and prompt optimisation, then deploy these refined workflows on local infrastructure for production scenarios where data sovereignty is critical. This approach balances competitive performance with regulatory compliance and strategic control.

The availability of models’ weights enables new usage scenarios that are not possible in the case of hosted alternatives. Consider three aspects: the ability to host the model on the hardware of your choice; amenability to customisation and specialisation to particular business needs; and full control over the flow of sensitive data the models process.

Hosting AI on Your Terms

Start with the flexibility of hosting. Open-weights models, unlike, say, alternatives provided by OpenAI, are compatible with both enterprise servers and commodity laptops. The hardware platform must fulfil certain needs, e.g. have GPU chips with a sufficient amount of memory. Once hardware requirements are satisfied, firms are free to run models on the platforms of their choice, subject to licences that govern models’ usage.

That enables a wide choice of environments in which businesses might install their AI tools. Those can range from servers, either in the cloud or in cellars of corporate headquarters, to developers’ workstations and laptops. Recent hardware advances—e.g. Apple’s ARM-based systems—and improvements in memory efficiency thanks to techniques such as quantisation—i.e. compression of weights—make LLMs compatible with a wide spectrum of platforms.

Now consider the second aspect. Open-weights models are amenable to the process of “fine-tuning”, that enables their users to further align them with particular needs. The tuning process can customise the model to enhance it with domain-specific specialist knowledge, adjust to cultural norms or corporate nomenclature, and embed business-specific facts or data. While more expensive than modifying prompts, fine-tuning can lead to the creation of models specialised for the needs of particular business domains. Furthermore, they might address risks caused by biases embedded in models by their creators, as discussed on page (see Robert Glaser’s article in this issue).

Thirdly, running open-weights LLMs locally allays an entire dimension of risks involved with hosted AI tools. All data, be it intellectual property, personally identifiable information, or financial records, remain in the corporate network and hardware throughout the operation. In the case of LLMs running on software engineers’ devices, the entirety of the interaction with AI tools happens on the device and no data leave the workstation. That strengthens the case for local LLMs in the light of constraints set by GDPR and the complementary EU AI Act.[9] The May 2025 court order requiring OpenAI to preserve user conversations—despite privacy laws and user deletion requests—illustrates the jurisdictional risk that local deployment eliminates.

Users in fields with strict compliance requirements—e.g. DORA or NIS-2—might find the air-gapped quality of local LLMs compelling. Precise control over deployment, updates, and versioning of both the models and the supplementary data sets used for tuning enables stricter monitoring, controlling, and auditing capabilities. That leads to compliance without sacrificing the ability to innovate and experiment.

Moreover, the usage of local LLMs changes the cost structure. Instead of perpetual fees for the usage of hosted models, businesses deploying their own AI tools can do it on their own appropriately sized hardware. In the case of developer workstations, the laptops with the right capabilities might already be in the inventory. Early adopters will have time to develop in-house expertise with regard to running and potentially fine-tuning their LLMs, reducing their risk of a lock-in by providers of hosted solutions.

The True Cost of Going Local

Locally deployed models do come with a new set of costs that prospective users must take into account. While perpetual fees are reduced, the upfront expense for hardware in the case of in-house deployment, along with the need to train qualified staff with operating and maintaining new technology, pile up quickly.

As of May 2025, Apple’s laptops with ARM chips powerful enough to run local models start around €2,000. Corresponding NVIDIA GPUs that can be fitted into software developers’ workstations start at €1,500. Hourly rental of a top-tier NVIDIA H100 GPU is €2–7, adding up to ca. €1,400–5,000 per month of continuous usage. High demand has sustained the retail unit price at €20,000–30,000. Buyers need to take into account operational expenses around energy and maintenance.

Instead of going for one of the extremes, firms will benefit from exploring deployments combining strengths of on-site hardware and rental capacity. A shared cluster of smaller GPUs and laptops will likely provide sufficient processing power for typical daily demands. Expensive computational tasks, such as periodic fine-tuning of new models, is likely best executed on a powerful chip rented from a domestic provider. This saves costs by eliminating underutilised resources.

Models hosted on-premises require qualified and sought-after employees to configure, maintain, and troubleshoot deployed LLMs. On top of that, businesses must incorporate the costs of IT security and the training of staff who will interact with models. Firms looking to hedge that risk enter partnerships that allow them to tap into the resources of experts in the field. Consider BNP Paribas, a bank, who entered an agreement with Mistral to help introduce LLM tools into the strictly regulated financial domain[10], or GovTech Campus Deutschland, a nonprofit, collaborating with tech partners to build AI platforms in the state of Baden-Württemberg.[11]

A Three-Step Plan for Digital Sovereignty

The OpenAI preservation order of May 2025 demonstrates that digital sovereignty concerns are not theoretical—they represent immediate business risks that European firms can no longer afford to ignore. Forward-thinking organisations should adopt a three-step plan. Start with low-risk pilots that deliver immediate value. Build expertise through broader experimentation. Finally, turn local AI into competitive advantage

Pilot projects should focus on proven tools and uncontroversial problems. Automated meeting assistance, for example, requires €5,000 in hardware while delivering immediate productivity gains. Tools such as whisper.cpp allow voice-to-text transcription of the proceedings. A local LLM can consume the transcribed minutes and, on the fly, turn them into summaries, extract action items, and draft agendas for future appointments. All the processing happens in the space the meeting takes place in; no data leave the physical room.

Success opens doors to bigger challenges. Think about problems LLMs have shown to excel at, such as document summarisation and identifying semantic similarity. Consider methods such as retrieval-augmented generation, that allow your LLM to pull relevant information from other sources, e.g. your corporate wiki, proprietary technical documentation, or CRM system. Use those capabilities to build automated pipelines for processing incoming correspondence and documentation for archiving and search purposes. Consider how an LLM could accelerate responses to RFPs, taking advantage of your existing database of bids and tenders, all the while keeping the sensitive data in-house. Encourage your teams experiment with various models and quantisation levels.

Proven technology becomes strategic leverage. At this stage, your teams should feel confident to reach for the cutting-edge tools. That allows your organisation to take on more ambitious challenges and use LLMs to unlock decades of institutional knowledge. An air-gapped due diligence AI tool will help you navigate the risks of M&As or regulatory filing preparations, while keeping the documentation in your secure infrastructure. Introduce an onboarding assistant that helps new software developers navigate the complex history and architecture of your enterprise software portfolio. Deploy a local LLM fine-tuned on GDPR, DORA, NIS-2, and internal compliance documentation in order to proactively address regulatory requirements without exposing sensitive information to external providers.

Conclusion: Beyond Compliance

Move decisively: weeks for pilots, months for expansion, quarters for transformation. Encourage early mistakes—they teach valuable lessons about model performance and organisational readiness. Increase your maturity and stability expectations as the transformation progresses. Conclude each step with a review of data governance implications and ROI metrics, e.g. time saved on meeting administration, manual document processing, and onboarding new employees. In an era where data determine competitive advantage, organisations that master local AI deployment go beyond regulatory compliance—they take control of their digital future.

Martin Steiger, Court Order: OpenAI May No Longer Delete User Conversations with ChatGPT, 18 May 2025] ↩
Europe's Cloud Customers Eyeing Exit from US Hyperscalers, The Register, 17 April 2025. ↩
European Data Protection Board, International Data Transfers, accessed May 2025. ↩
Mark Bergen, The Rise and Pivot of Germany's One–Time AI Champion, Bloomberg, 5 September 2024. ↩
Directorate–General for Energy, Commission Adopts EU–Wide Scheme for the Assessment of the Sustainability of Data Centers, 15 March 2024. ↩
Will OpenAI Ever Make Real Money?, The Economist, 15 May 2025. ↩
Asterès, Technological Dependence on US Cloud Software: An Estimate of the Economic Consequences in Europe, April 2025. ↩
Eurostat, Imports of Energy Products into the EU Declined in 2024, 21 March 2025. ↩
Bommasani et al., Foundation Models under the EU AI Act, Stanford Center for Research on Foundation Models, 2024. ↩
BNP Paribas and Mistral AI, Partnership Agreement Covering All Mistral AI Models, 10 July 2024. ↩
GovTech Campus Germany, STACKIT and Aleph Alpha Create a Platform for AI Applications for the German Administration, 25 July 2024. ↩

Article