Skip to main content

friendli model convert


friendli model convert [OPTIONS]


Convert huggingface's model checkpoint to Friendli format.

When a checkpoint is in the Hugging Face format, it cannot be directly served. It requires conversion to the Friendli format for serving. The conversion process involves copying the original checkpoint and transforming it into a checkpoint in the Friendli format (*.h5).


The friendli checkpoint convert is available only when the package is installed with pip install "friendli-client[mllib]".

Apply quantization

If you want to quantize the model along with the conversion, --quantize option should be provided. You can customize the quantization configuration by describing it in a YAML file and providing the path to the file to --quant-config-file option. When --quantize option is used without providing --quant-config-file, the following configuration is used by default.

# Default quantization configuration
mode: awq
device: cuda:0
seed: 42
offload: true
path_or_name: lambada
format: json
split: validation
lookup_column_name: text
num_samples: 128
max_length: 512
quant_bit: 4
quant_group_size: 64
  • mode: Quantization scheme to apply. Defaults to "awq".
  • device: Device to run the quantization process. Defaults to "cuda:0".
  • seed: Random seed. Defaults to 42.
  • offload: When enabled, this option significantly reduces GPU memory usage by offloading model layers onto CPU RAM. Defaults to true.
  • calibration_dataset
    • path_or_name: Path or name of the dataset. Datasets from either the Hugging Face Datasets Hub or local file system can be used. Defaults to "lambada".
    • format: Format of datasets. Defaults to "json".
    • split: Which split of the data to load. Defaults to "validation".
    • lookup_column_name: The name of a column in the dataset to be used as calibration inputs. Defaults to "text".
    • num_samples: The number of dataset samples to use for calibration. Note that the dataset will be shuffled before sampling. Defaults to 512.
    • max_length: The maximum length of a calibration input sequence. Defauts to 512.
  • awq_args (Fill in this field only for "awq" mode)
    • quant_bit : Bit width of integers to represent weights. Possible values are 4 or 8. Defaults to 4.
    • quant_group_size: Group size of quantized matrices. 64 is the only supported value at this time. Defaults to 64.

If you encounter OOM issues when running with AWQ, try enabling the offload option.


If you set percentile in quant-config-file into 100, the quantization range will be determined by the maximum absolute values of the activation tensors.


Currently, AWQ is the only supported quantization scheme.


AWQ is supported only for models with architecture listed as follows:

  • GPTNeoXForCausalLM
  • GPTJForCausalLM
  • LlamaForCausalLM
  • MPTForCausalLM


--model-name-or-path, -mTEXTHugging Face pretrained model name or path to the saved model checkpoint.-
--output-dir, -oTEXTDirectory path to save the converted checkpoint and related configuration files. Three files will be created in the directory: model.h5, tokenizer.json, and attr.yaml. The model.h5 or model.safetensors is the converted checkpoint and can be renamed using the --output-model-filename option. The tokenizer.json is the Friendli-compatible tokenizer file, which should be uploaded along with the checkpoint file to tokenize the model input and output. The attr.yaml is the checkpoint attribute file, to be used when uploading the converted model to Friendli. You can designate the file name using the --output-attr-filename option.-
--data-type, -dtCHOICE: [bf16, fp16, fp32, int8, int4]The data type of converted checkpoint.-
--cache-dirTEXTDirectory for downloading checkpoint.None
--dry-runBOOLEANOnly check conversion avaliability.False
--output-model-filenameTEXTName of the converted checkpoint file.The default file name is model.h5 when --output-ckpt-file-type is hdf5 or model.safetensors when --output-ckpt-file-type is safetensors.None
--output-ckpt-file-typeCHOICE: [hdf5, safetensors]File format of the converted checkpoint file.hdf5
--output-attr-filenameTEXTName of the checkpoint attribute file.attr.yaml
--quantizeBOOLEANQuantize the model before conversionFalse
--quant-config-fileFILENAMEPath to the quantization configuration file.None