AI

Our LLMaaS Gateway (Large Language Models as a Service) provides high-performance access to a curated selection of current open-weight language models. Inference runs entirely on our Swiss-hosted GPU infrastructure — your prompts, embeddings, and generated responses never leave Switzerland.

Available Models

Currently available in production via the gateway:

Top Models

MiniMax-M2.7
Deepseek v3.2
Qwen3.6-35B-A3B
Gemma4

Other available Models

apertus-70b
apertus-8b
bge-reranker
deepseekr1-670b
gpt-oss-120b
kimi-k2
llama4-maverick
qwen3-vl-235b
qwen3-embedding-4b
qwen3-reranker-4b
voxtral-4b-tts-2603
whisper-large-v3-turbo

More top-tier models are in the evaluation phase and will be added soon. All models are addressed using the same provider/model format (e.g. ew/minimax27), so switching models is typically a one-line change.

OpenAI-Compatible API

The gateway exposes an OpenAI-compatible REST interface — existing code using the OpenAI SDK (Python, Node, Go, …) can be pointed at our endpoints with no changes to application logic:

POST /v1/chat/completions — chat and reasoning requests, including streaming and tool calling
POST /v1/embeddings — vector embeddings for RAG, semantic search, classification
POST /v1/rerank — re-ranking of search results for higher hit quality
GET /v1/models — list of all currently available models

→ Full interface specification in the API Reference.

Virtual Keys & Governance

The gateway supports virtual keys (prefix sk-bf-...) for fine-grained access control, model routing, and per-team / per-project / per-use-case usage tracking. Self-service management of virtual keys will soon be available in the Cloud Service Portal — until then, keys are issued on request through our support team.

Typical Use Cases

RAG pipelines — document search with embeddings + rerank, context-aware answer generation
Code assistance — internal developer tooling, code review, and refactoring suggestions
Classification & extraction — structured data extraction from emails, reports, tickets
Agents & automation — tool-calling-enabled workflows with controlled write access
Multilingual content — translation and localisation with a focus on German-speaking markets

Early Adopter Access

Would you like to evaluate LLMaaS now for internal pilot projects? The gateway is currently being opened gradually to selected early adopters.

Request access

Contact our support team to receive credentials, an API key, and tailored model recommendations for your use case.