📄️ OpenAI
LiteLLM supports OpenAI Chat + Embedding calls.
📄️ OpenAI (Text Completion)
LiteLLM supports OpenAI text completion models
📄️ OpenAI-Compatible Endpoints
To call models hosted behind an openai proxy, make 2 changes:
📄️ Azure OpenAI
API Keys, Params
📄️ Azure AI Studio
LiteLLM supports all models on Azure AI Studio
📄️ VertexAI [Anthropic, Gemini, Model Garden]
vertex_ai/ route
📄️ Gemini - Google AI Studio
Pre-requisites
📄️ Anthropic
LiteLLM supports all anthropic models.
📄️ AWS Sagemaker
LiteLLM supports All Sagemaker Huggingface Jumpstart Models
📄️ AWS Bedrock
Anthropic, Amazon Titan, A121 LLMs are Supported on Bedrock
📄️ LiteLLM Proxy (LLM Gateway)
LiteLLM Providers a self hosted proxy server (AI Gateway) to call all the LLMs in the OpenAI format
📄️ Mistral AI API
https://docs.mistral.ai/api/
📄️ Codestral API [Mistral AI]
Codestral is available in select code-completion plugins but can also be queried directly. See the documentation for more details.
📄️ Cohere
API KEYS
📄️ Anyscale
https://app.endpoints.anyscale.com/
📄️ Huggingface
LiteLLM supports the following types of Hugging Face models:
📄️ 🆕 Databricks
LiteLLM supports all models on Databricks
📄️ IBM watsonx.ai
LiteLLM supports all IBM watsonx.ai foundational models and embeddings.
📄️ Predibase
LiteLLM supports all models on Predibase
📄️ Nvidia NIM
https://docs.api.nvidia.com/nim/reference/
📄️ Cerebras
https://inference-docs.cerebras.ai/api-reference/chat-completions
📄️ Volcano Engine (Volcengine)
https://www.volcengine.com/docs/82379/1263482
📄️ Triton Inference Server
LiteLLM supports Embedding Models on Triton Inference Servers
📄️ Ollama
LiteLLM supports all models from Ollama
📄️ Perplexity AI (pplx-api)
https://www.perplexity.ai
📄️ FriendliAI
https://suite.friendli.ai/
📄️ Groq
https://groq.com/
📄️ 🆕 Github
https://github.com/marketplace/models
📄️ Deepseek
https://deepseek.com/
📄️ Fireworks AI
https://fireworks.ai/
📄️ Clarifai
Anthropic, OpenAI, Mistral, Llama and Gemini LLMs are Supported on Clarifai.
📄️ VLLM
LiteLLM supports all models on VLLM.
📄️ Xinference [Xorbits Inference]
https://inference.readthedocs.io/en/latest/index.html
📄️ Cloudflare Workers AI
https://developers.cloudflare.com/workers-ai/models/text-generation/
📄️ DeepInfra
https://deepinfra.com/
📄️ AI21
LiteLLM supports j2-light, j2-mid and j2-ultra from AI21
📄️ NLP Cloud
LiteLLM supports all LLMs on NLP Cloud.
📄️ Replicate
LiteLLM supports all models on Replicate
📄️ Together AI
LiteLLM supports all models on Together AI.
📄️ Voyage AI
https://docs.voyageai.com/embeddings/
📄️ Aleph Alpha
LiteLLM supports all models from Aleph Alpha.
📄️ Baseten
LiteLLM supports any Text-Gen-Interface models on Baseten.
📄️ OpenRouter
LiteLLM supports all the text / chat / vision models from OpenRouter
📄️ PaLM API - Google
Warning: The PaLM API is decomissioned by Google The PaLM API is scheduled to be decomissioned in October 2024. Please upgrade to the Gemini API or Vertex AI API
📄️ Custom API Server (Custom Format)
Call your custom torch-serve / internal LLM APIs via LiteLLM
📄️ Petals
Petals//github.com/bigscience-workshop/petals