@font-face {
  font-family: "Saans-Light";
  src:
    url("https://docs-assets.friendli.ai/fonts/Saans-Light.woff2")
      format("woff2"),
    url("https://docs-assets.friendli.ai/fonts/Saans-Light.woff") format("woff");
  font-weight: 300;
  font-style: normal;
  font-display: swap;
}
@font-face {
  font-family: "Saans-Regular";
  src:
    url("https://docs-assets.friendli.ai/fonts/Saans-Regular.woff2")
      format("woff2"),
    url("https://docs-assets.friendli.ai/fonts/Saans-Regular.woff")
      format("woff");
  font-weight: 400;
  font-style: normal;
  font-display: swap;
}
@font-face {
  font-family: "Saans-Medium";
  src:
    url("https://docs-assets.friendli.ai/fonts/Saans-Medium.woff2")
      format("woff2"),
    url("https://docs-assets.friendli.ai/fonts/Saans-Medium.woff")
      format("woff");
  font-weight: 500;
  font-style: normal;
  font-display: swap;
}
@font-face {
  font-family: "Saans-SemiBold";
  src:
    url("https://docs-assets.friendli.ai/fonts/Saans-SemiBold.woff2")
      format("woff2"),
    url("https://docs-assets.friendli.ai/fonts/Saans-SemiBold.woff")
      format("woff");
  font-weight: 600;
  font-style: normal;
  font-display: swap;
}
@font-face {
  font-family: "Saans-Bold";
  src:
    url("https://docs-assets.friendli.ai/fonts/Saans-Bold.woff2")
      format("woff2"),
    url("https://docs-assets.friendli.ai/fonts/Saans-Bold.woff") format("woff");
  font-weight: 700;
  font-style: normal;
  font-display: swap;
}
@font-face {
  font-family: "Saans-Heavy";
  src:
    url("https://docs-assets.friendli.ai/fonts/Saans-Heavy.woff2")
      format("woff2"),
    url("https://docs-assets.friendli.ai/fonts/Saans-Heavy.woff") format("woff");
  font-weight: 800;
  font-style: normal;
  font-display: swap;
}

body {
  font-family: "Saans-Regular", sans-serif;
  letter-spacing: 0.01em;
}

.tracking-tighter {
  letter-spacing: -0.04em;
}
.tracking-tight {
  letter-spacing: 0.01em;
}
.tracking-normal {
  letter-spacing: 0.01em;
}
.tracking-wide {
  letter-spacing: 0.01em;
}
.tracking-wider {
  letter-spacing: 0.05em;
}
.tracking-widest {
  letter-spacing: 0.1em;
}
.prose
  :where(h1, h2, h3):not(:where([class~="not-prose"], [class~="not-prose"] *)) {
  letter-spacing: 0;
}

.card > div > div:last-child > h2 {
  font-size: 1.25rem;
}
.card > div > div:last-child > div {
  margin-top: 1rem;
}

.font-medium:not(.font-mono):not(.font-mono *) {
  font-family: "Saans-Medium", sans-serif;
}

h1:not(.font-mono):not(.font-mono *),
h2:not(.font-mono):not(.font-mono *),
h3:not(.font-mono):not(.font-mono *),
h4:not(.font-mono):not(.font-mono *),
h5:not(.font-mono):not(.font-mono *),
strong:not(.font-mono):not(.font-mono *),
.font-semibold:not(.font-mono):not(.font-mono *),
.prose
  :where(a):not(:where([class~="not-prose"], [class~="not-prose"] *)):not(
    .font-mono
  ):not(.font-mono *) {
  font-family: "Saans-SemiBold", sans-serif;
}

#topbar-cta-button > .font-medium {
  font-family: "Saans-Medium", sans-serif;
}

.border-gray-100 {
  border-color: #d9e2ec;
}

:root {
  --gray-300: 159 179 200;
}

.dark\:bg-codeblock div[role="tablist"] button[role="tab"] {
  font-family: var(--font-jetbrains-mono), ui-monospace, SFMono-Regular, Menlo,
    Monaco, Consolas, "Liberation Mono", "Courier New", monospace;
}


import * as mdx from "eslint-plugin-mdx";

export default [
  {
    files: ["**/*.mdx"],
    ...mdx.flat,
    processor: mdx.createRemarkProcessor({
      lintCodeBlocks: true,
      languageMapper: {},
    }),
  },
  {
    files: ["**/*.mdx"],
    ...mdx.flatCodeBlocks,
    rules: {
      ...mdx.flatCodeBlocks.rules,
      "no-var": "error",
      "prefer-const": "error",
    },
  },
];


Prerequisites

Starting the Friendli Container with gRPC

Sending Requests with the Client SDK

Properly Closing the Client

Blog

GitHub

Friendli Docs

Guides

API Reference

Get Started

Run gRPC inference server with Friendli Container and interact with it through friendli-client SDK.

Inference with gRPC

Get started with FriendliAI products and explore APIs.

Overview

Friendli Documentation

Friendli Suite empowers you to explore generative AI with three solutions: Serverless Endpoints for quick access to open-source models, Dedicated Endpoints for deploying custom models on dedicated GPUs, and Containers for secure, on-premise control. Powered by the optimized Friendli Engine, each option ensures fast, cost-efficient AI serving for text, code, and image generation.

Introduction

Unleash the Power of Generative AI with Friendli Suite: Your End-to-End Solution

Learn how to manage credentials in Friendli Suite, including using personal access tokens for authentication and authorization.

Personal Access Tokens

Friendli Dedicated Endpoints gives you the reins to explore the full potential of your custom generative AI models on the hardware of your choice, whether you're crafting innovative eloquent texts, generating stunning images, or even more.

Introducing Friendli Dedicated Endpoints

Learn how to get started with Friendli Dedicated Endpoints in this step-by-step guide. Create an account, select your project, choose a model you wish to serve, deploy your endpoint, and seamlessly generate text, code, and more with ease.

QuickStart

QuickStart: Friendli Dedicated Endpoints

Friendli Dedicated Endpoints projects are a basic working unit for your team.

Projects

Within your Friendli Dedicated Endpoints projects you can prepare and manage the models that you wish to deploy. You may upload your models within your project to deploy them directly on your endpoints. Alternatively, you may manage them on the HuggingFace repository or Weights & Biases artifacts, as our endpoints can load models from your project, HuggingFace repositories, and Weights & Biases artifacts.

Models

Endpoints are the actual deployments of your models on your specified GPU resource.

Endpoints

Effortlessly fine-tune your model with Friendli Dedicated Endpoints, which leverages the Parameter-Efficient Fine-Tuning (PEFT) method to reduce training costs while preserving model quality, similar to full-parameter fine-tuning.

Fine-tuning

Hands-on tutorial for launching and deploying LLMs using Friendli Dedicated Endpoints with Weights & Biases artifacts.

Tutorial | W&B Models

Deploy with W&B Models

Hands-on tutorial for launching and deploying LLMs using Friendli Dedicated Endpoints with Hugging Face models.

Tutorial | Hugging Face Models

Deploy with Hugging Face Models

While following through our tutorials, you might have had questions regarding the details of the requirements and specifications. We have listed out the frequently asked questions and as a separate document.

FAQs and Troubleshooting

Frequently Asked Questions and Troubleshooting

While Friendli Serverless Endpoints and Dedicated Endpoints offer convenient cloud-based solutions, some users crave even more control and flexibility. For those pioneers, Friendli Container is the answer.

Introducing Friendli Container

Learn how to get started with Friendli Container in this step-by-step guide. Activate your free trial, access to the Container registry, perpare you container secret, run your Friendli Container, and monitor using Granfana.

QuickStart: Friendli Container Trial

Friendli Container enables you to effortlessly deploy your generative AI model on your own machine. This tutorial will guide you through the process of running a Friendli Container.

Running Friendli Container

The Friendli Engine supports CUDA-enabled NVIDIA GPUs, which means it relies on a specific version of CUDA and necessitates proper CUDA compute compatibilities.

CUDA Compatibility

Tutorial for serving quantized model with Friendli Engine. Friendli Engine supports FP8, IN8, and AWQ-ed model checkpoints.

Serving Quantized Models

The Friendli Engine introduces an innovative approach to this challenge through Multi-LoRA (Low-Rank Adaptation) serving, a method that allows for the simultaneous serving of multiple LLMs, optimized for specific tasks without the need for extensive retraining.

Serving Multi-LoRA Models

Explore the steps to serve Mixture of Experts (MoE) models such as Mixtral 8x7B using Friendli Container.

Serving MoE Models

For specialized cases like MoE or quantized models, optimizing the execution policy in Friendli Engine can boost inference performance by 1.5x to 2x, improving throughput and reducing latency.

Optimizing Inference with Policy Search

Create a real-time inference endpoint in Amazon SageMaker with Friendli Container backend. By utilizing Friendli Container in your SageMaker pipeline, you'll benefit from the Friendli Engine's speed and resource efficiency.

Running Friendli Container on SageMaker

Observability is an integral part of DevOps. To support this, Friendli Container exports internal metrics in a Prometheus text format.

Monitoring

Observability for Friendli Container

Guide for Friendli Serverless Endpoints, allowing you to seamlessly integrate state-of-the-art AI models into your workflows, regardless of your technical expertise.

Introducing Friendli Serverless Endpoints

Learn how to get started with Friendli Serverless Endpoints in this step-by-step guide. Create an account, choose from powerful AI models like Llama 3.1, and seamlessly generate text, code, and more with ease.

QuickStart: Friendli Serverless Endpoints

Dive into the characteristics of six popular Text Generation Models (TGMs) available on Friendli Serverless Endpoints.

Text Generation

Text Generation Models

Friendli Serverless Endpoints is compatible with the OpenAI API standard through the Python API Libraries and the Node API Libraries. Friendli Dedicated Endpoints and Friendli Container are also OpenAI API compatible.

OpenAI Compatibility

Friendli integrates with LangChain, LiteLLM, LlamaIndex, and MongoDB to streamline GenAI application deployment. LangChain and LlamaIndex enable tool calling AI agents and Retrieval-Augmented Generation (RAG), while MongoDB provides memory via vector databases, and LiteLLM boosts performance through load balancing.

Integrations

Stay tuned for detailed guides on how to perform tasks like Retrieval-Augmented Generation (RAG), Conditional Image Generation, Fine-tuning Custom Models.

Applications

Advanced Applications on Friendli Serverless Endpoints (Coming Soon!)

Understand the rate limits for Friendli Serverless Endpoints, including Requests per Minute (RPM) and Tokens per Minute (TPM), to ensure efficient usage of resources and balanced performance when interacting with AI models.

Rate Limits

Friendli Serverless Endpoints offer a range of models tailored to various tasks.

Pricing

Build AI agents with Friendli Serverless Endpoints using tool calling for dynamic, real-time interactions with LLMs.

Tool calling with Serverless Endpoints

Build an AI agent with LangChain and Friendli Serverless Endpoints, integrating tool calling for dynamic and efficient responses.

Build an agent with LangChain

Build and deploy smart AI agents with Friendli Serverless Endpoints and Gradio in under 50 lines.

Build an agent with Gradio

Go Playground with Next.js

Chat docs with LangChain

Chat docs with MongoDB

RAG app with LlamaIndex

Effortlessly integrate FriendliAI models into your projects with support for popular SDKs and frameworks.

Friendli Integrations

Integrate FriendliAI with OpenAI Python SDK for chat, streaming, and more.

Python

OpenAI Python SDK

Easily integrate FriendliAI with the OpenAI Node.js SDK.

Node.js

OpenAI Node.js SDK

Utilize the LangChain Python SDK with FriendliAI for easy integration and advanced tool calling in your applications.

LangChain Python SDK

Utilize the LangChain Node.js SDK with FriendliAI for seamless integration and enhanced tool calling capabilities in your applications.

Langchain Node.js SDK

Utilize the Weaviate to build applications with less hallucination open-source vector database.

FriendliAI + Weaviate (Python)

FriendliAI + Weaviate (Node.js)

Easily integrate FriendliAI models with the Vercel AI SDK, supporting serverless, dedicated, and fine-tuned endpoints.

Vercel AI SDK

Easily integrate large language models with the LlamaIndex SDK, featuring FriendliAI for seamless interaction.

LlamaIndex

LiteLLM SDK supports all FriendliAI models, offering easy integration with serverless, dedicated, and fine-tuned endpoints.

LiteLLM

OpenAPI reference of Friendli Endpoints API. You can interact with the API through HTTP requests from any language or via our Python SDK.

Friendli Endpoints API Reference

OpenAPI reference of Friendli Endpoints API.

Inference Overview

Given a list of messages forming a conversation, the model generates a response.

Chat completion

Generate text based on the given text prompt.

Completion

By giving a text input, generate a tokenized output of token IDs.

Tokenization

By giving a list of tokens, generate a detokenized output text string.

Detokenization

Represents a streamed chunk of a chat completion response returned by model, based on the provided input.

Chat completion chunk object

Represents a streamed chunk of a completion response returned by model, based on the provided input.

Completion chunk object

OpenAPI reference of Friendli Serverless Endpoints API.

Serverless Overview

Given a list of messages forming a conversation, the model generates a response. Additionally, the model can utilize built-in tools for tool calls, enhancing its capability to provide more comprehensive and actionable responses.

Tool assisted chat completion (Beta)

Represents a streamed chunk of a tool assisted chat completion response returned by model, based on the provided input.

Tool assisted chat completion chunk object (Beta)

Install friendli-client package to access advanced features for AI integration. Supports Python 3.8+, with options for machine learning libraries and Hugging Face checkpoint conversion.

Installation

Sign in to Friendli using the command line interface.

friendli login

Sign out to Friendli using the command line interface.

friendli logout

Check the installed package version of Friendli using the command line interface.

friendli version

Show my user information of Friendli using the command line interface.

friendli whoami

Create chat completions using the Friendli API. Customize your requests with various options like model selection, message input, token limits, and more to generate tailored results.

friendli api chat-completions create

Creat text completions using the Friendli API. Customize your completions with various options like prompts, model selection, token limits, and more to create precise, tailored outputs.

friendli api completions create

Create and deploy new endpoints with the Friendli API. Customize with model selection, GPU configuration, and more to efficiently serve your machine learning models.

friendli endpoint create

Get detailed information about a specific endpoint using the Friendli API.

friendli endpoint get

View all your deployed endpoints with the Friendli API. Easily list endpoints for efficient model management.

friendli endpoint list

Terminate a running endpoint with the Friendli API using the endpoint ID. Easily manage and stop your deployed models when needed.

friendli endpoint terminate

Convert Hugging Face model checkpoints to Friendli format for deployment. Includes options for quantization, data type selection, and model optimization using the Friendli API.

friendli model convert

View all available models with the Friendli API. Easily list models to streamline your deployment and optimization processes.

friendli model list

List all accessible projects with the Friendli API. Easily manage your available projects for efficient workflow management.

friendli project list

Switch between project contexts using the Friendli API. Quickly change the active project by providing the project ID for smooth workflow management.

friendli project switch

View all available teams with the Friendli API. Easily list teams for project organization.

friendli team list

Switch between team contexts using the Friendli API. Quickly change the active team by providing the team ID for efficient collaboration and management.

Get Started

Products

Tutorials

Inference with gRPC

Prerequisites

Starting the Friendli Container with gRPC

Sending Requests with the Client SDK

Properly Closing the Client

Get Started

Products

Tutorials

​Prerequisites

​Starting the Friendli Container with gRPC

​Sending Requests with the Client SDK

​Properly Closing the Client

Prerequisites

Starting the Friendli Container with gRPC

Sending Requests with the Client SDK

Properly Closing the Client