Skip to content

Extra: Fine-tuning OpenAI OSS Model

This content has been tested on real hardware (vgpu 32g)

If you find a bug, please open an issue or contact the author

It would be even better if you could directly submit a PR fix


Please Note

During testing, using unsloth/gpt-oss-20b-unsloth-bnb-4bit seemed to fine-tune and merge successfully. However, conversion to GGUF failed. When testing unsloth/gpt-oss-20b, encountered the error 'GptOssTopKRouter' object has no attribute 'weight'. This appears to be a widespread issue; I’ve found many others encountering it during fine-tuning as well. Please give the Unsloth and OpenAI teams some time — they’ll fix it. Once updated, I will immediately update this document and the code.


Fine-tuning the OSS Model

Due to the release timing of OSS, fine-tuning methods for OSS and Qwen do not seem interchangeable. Also, it’s best to use the latest versions of unsloth, torch, transformers, etc.

Here’s Unsloth’s OSS fine-tuning experience

Quick Fine-tuning Guide

It’s recommended to use a new virtual environment Separate it from your Qwen fine-tuning environment

Please ensure you have: torch>=2.8.0 triton>=3.4.0And! Make sure unsloth and unsloth_zoo are the latest versions

The original requirements.txt only supports unsloth up to version 2025.8.1, which cannot fine-tune OSS.

Run the following before installing dependencies

bash
pip install "unsloth_zoo[base] @ git+https://github.com/unslothai/unsloth-zoo" "unsloth[base] @ git+https://github.com/unslothai/unsloth" torchvision bitsandbytes git+https://github.com/huggingface/transformers git+https://github.com/triton-lang/triton.git@main#subdirectory=python/triton_kernels

Install dependencies

bash
pip install -r requirements_oss.txt

Note: This model requires OpenAI Harmony format training data for fine-tuning

Use chatml_to_harmony.py to convert ChatML format training data to Harmony format:

bash
python3 chatml_to_harmony.py --input training_data.jsonl --output training_data_harmony.txt

Download the model

bash
huggingface-cli download unsloth/gpt-oss-20b-BF16 --local-dir gpt-oss-20b

If you don’t have huggingface-cli, install it first:

bash
pip install huggingface-hub

If you need a mirror site, run:

bash
export HF_ENDPOINT=https://hf-mirror.com

During testing, using unsloth/gpt-oss-20b-unsloth-bnb-4bit seemed to fine-tune and merge successfully. However, conversion to GGUF failed. When testing unsloth/gpt-oss-20b, encountered the error 'GptOssTopKRouter' object has no attribute 'weight'. This appears to be a widespread issue; I’ve found many others encountering it during fine-tuning as well. Please give the Unsloth and OpenAI teams some time — they’ll fix it. Once updated, I will immediately update this document and the code.

Start Fine-tuning

bash
python3 run_finetune_oss.py

ParameterTypeDefault ValueOptional ValuesDescription
--repo_idstrunsloth/gpt-oss-20b-unsloth-bnb-4bit-HF repository ID
--local_dirstrgpt-oss-20b-unsloth-bnb-4bit-Local model directory
--use_unslothstrfalsetrue, falseWhether to use unsloth
--use_qlorastrtruetrue, falseWhether to use QLoRA
--data_pathstrtraining_data.jsonl-Training data path
--eval_data_pathstr / NoneNone-Evaluation data path
--max_samplesstr / NoneNone-Max number of training samples
--max_eval_samplesstr / NoneNone-Max number of evaluation samples
--model_max_lengthstr2048-Max sequence length
--output_dirstrfinetune/models/qwen3-30b-a3b-qlora-Output directory
--seedstr42-Random seed
--per_device_train_batch_sizestr1-Training batch size per device
--per_device_eval_batch_sizestr1-Evaluation batch size per device
--gradient_accumulation_stepsstr16-Gradient accumulation steps
--learning_ratestr2e-4-Learning rate
--num_train_epochsstr3-Number of training epochs
--max_stepsstr-1-Max steps (-1 for unlimited)
--lora_rstr16-LoRA rank
--lora_alphastr32-LoRA alpha
--lora_dropoutstr0.05-LoRA dropout rate
--target_modulesstrq_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj-LoRA target modules
--weight_decaystr0.0-Weight decay
--moe_enablestrfalsetrue, falseEnable MoE
--moe_lora_scopestrexpert_onlyexpert_only, router_only, allLoRA injection scope for MoE
--moe_expert_patternsstrexperts.ffn.(gate_proj|up_proj|down_proj),layers.[0-9]+.mlp.experts.[0-9]+.(w1|w2|w3)-Expert linear layer patterns (regex)
--moe_router_patternsstrrouter.(gate|dense)-Router/gating layer patterns (regex)
--moe_max_experts_lorastr-1-Max LoRA experts per layer
--moe_dry_runstrfalsetrue, falseOnly print matched modules and exit
--load_precisionstrfp16int8, int4, fp16Model load precision
--use_flash_attention_2strfalsetrue, falseEnable FlashAttention2
--logging_stepsstr1-Logging step interval
--eval_stepsstr50-Evaluation step interval
--save_stepsstr200-Model save step interval
--save_total_limitstr2-Max saved model count
--warmup_ratiostr0.05-Warmup ratio
--lr_scheduler_typestrcosine-LR scheduler type
--resume_from_checkpointstr / NoneNone-Resume from checkpoint
--no-gradient_checkpointingflagFalse-Disable gradient checkpointing
--no-merge_and_saveflagFalse-Do not merge and save model
--fp16strtruetrue, falseUse fp16
--optimstradamw_torch_fused-Optimizer
--dataloader_pin_memorystrfalsetrue, falsePin dataloader memory
--dataloader_num_workersstr0-Number of dataloader workers
--dataloader_prefetch_factorstr2-Dataloader prefetch factor
--use_gradient_checkpointingstrtruetrue, false, unslothGradient checkpointing setting
--full_finetuningstrfalsetrue, falseEnable full fine-tuning
--data_formatstrharmonyharmony, jsonlData format

Below is an example for fine-tuning gpt-oss-20b-unsloth-bnb-4bit:

bash
python3 run_finetune_oss.py --output_dir /root/autodl-fs/gpt-oss-20b-unsloth-bnb-4bit --local_dir gpt-oss-20b-4bit --data_path ./harmony_small.txt --eval_data_path ./harmony_small_eval.txt --use_qlora true --lora_dropout 0.05 --num_train_epochs 8 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --gradient_accumulation_steps 8 --learning_rate 2e-5 --lr_scheduler cosine --logging_steps 5 --eval_steps 40 --save_steps 200 --warmup_ratio 0.05 --dataloader_num_workers 16 --fp16 true --use_unsloth true --no-gradient_checkpointing --dataloader_prefetch_factor 4 --load_precision int4 --data_format harmony