Compile llama.cpp
The following three steps depend on having
llama.cpp
compiled.
bash
git clone https://github.com/ggerganov/llama.cpp --depth 1
cd llama.cpp
mkdir build && cd build
cmake .. -DLLAMA_BUILD_EXAMPLES=ON -DLLAMA_NATIVE=ON
cmake --build . --config Release
Convert HuggingFace Weights to GGUF
Command format:
bash
python3 ./convert_hf_to_gguf.py <HF_model_path> --outfile <output_GGUF_path> --outtype <precision_type>
<HF_model_path>
: Path to the HuggingFace format model directory (usually after fine-tuning or downloading).<output_GGUF_path>
: Path where the converted.gguf
model will be saved.<precision_type>
: Precision type —'f32'
,'f16'
,'bf16'
,'q8_0'
,'tq1_0'
,'tq2_0'
,'auto'
.
Example:
bash
python3 convert_hf_to_gguf.py /root/autodl-tmp/finetune/models/qwen3-8b-qlora/merged --outfile /root/autodl-fs/qwen3-8b-fp16-agent.gguf --outtype f16
Quantize the Model
Command format:
bash
./build/bin/llama-quantize <input_GGUF_path> <output_GGUF_path> <quantization_level>
<input_GGUF_path>
: Path to the unquantized.gguf
file.<output_GGUF_path>
: Path where the quantized.gguf
file will be saved.<quantization_level>
: For exampleQ4_0
,Q4_K_M
,Q8_0
, etc., depending on your needs and hardware.
Example:
bash
./build/bin/llama-quantize \
/root/autodl-fs/qwen3-8b-fp16-agent.gguf \
/root/autodl-fs/qwen3-8b-q8_0-agent.gguf \
Q8_0
7. Run Model Test
Command format:
bash
./build/bin/llama-run <GGUF_model_path>
<GGUF_model_path>
: Path to the GGUF model you want to test (either original or quantized).
Example:
bash
./build/bin/llama-run /root/autodl-fs/qwen3-8b-fp16-agent.gguf
8. High-Speed File Download from Server
You can download directly from your server provider’s storage without powering on (saving costs).
Or, using lftp
:
Command format:
bash
lftp -u {username},{password} -p {port} sftp://{server_address} -e "set xfer:clobber true; pget -n {threads} {server_file_path} -o {local_file_name_or_path}; bye"
pget
: Enables parallel download.-n
: Number of threads (recommend 64+, even 256 for better performance).
Example:
bash
lftp -u root,askdjiwhakjd -p 27391 sftp://yourserver.com -e "set xfer:clobber true; pget -n 256 /root/autodl-fs/qwen3-8b-fp16-agent.gguf -o qwen3-8b-fp16-agent.gguf; bye"