Skip to content

AI

Our LLMaaS Gateway (Large Language Models as a Service) provides high-performance access to a curated selection of current open-weight language models. Inference runs entirely on our Swiss-hosted GPU infrastructure — your prompts, embeddings, and generated responses never leave Switzerland.

Available Models

Currently available in production via the gateway:

Top Models

  • MiniMax-M2.7
  • Deepseek v3.2
  • Qwen3.6-35B-A3B
  • Gemma4

Other available Models

  • apertus-70b
  • apertus-8b
  • bge-reranker
  • deepseekr1-670b
  • gpt-oss-120b
  • kimi-k2
  • llama4-maverick
  • qwen3-vl-235b
  • qwen3-embedding-4b
  • qwen3-reranker-4b
  • voxtral-4b-tts-2603
  • whisper-large-v3-turbo

More top-tier models are in the evaluation phase and will be added soon. All models are addressed using the same provider/model format (e.g. ew/minimax27), so switching models is typically a one-line change.

OpenAI-Compatible API

The gateway exposes an OpenAI-compatible REST interface — existing code using the OpenAI SDK (Python, Node, Go, …) can be pointed at our endpoints with no changes to application logic:

  • POST /v1/chat/completions — chat and reasoning requests, including streaming and tool calling
  • POST /v1/embeddings — vector embeddings for RAG, semantic search, classification
  • POST /v1/rerank — re-ranking of search results for higher hit quality
  • GET /v1/models — list of all currently available models

→ Full interface specification in the API Reference.

Virtual Keys & Governance

The gateway supports virtual keys (prefix sk-bf-...) for fine-grained access control, model routing, and per-team / per-project / per-use-case usage tracking. Self-service management of virtual keys will soon be available in the Cloud Service Portal — until then, keys are issued on request through our support team.

Typical Use Cases

  • RAG pipelines — document search with embeddings + rerank, context-aware answer generation
  • Code assistance — internal developer tooling, code review, and refactoring suggestions
  • Classification & extraction — structured data extraction from emails, reports, tickets
  • Agents & automation — tool-calling-enabled workflows with controlled write access
  • Multilingual content — translation and localisation with a focus on German-speaking markets

Early Adopter Access

Would you like to evaluate LLMaaS now for internal pilot projects? The gateway is currently being opened gradually to selected early adopters.

Request access

Contact our support team to receive credentials, an API key, and tailored model recommendations for your use case.