Project Introduction

This is a digital avatar project, the core idea is to use C2C chat records as a dataset to fine-tune large models, making the model restore your unique expression style and chat patterns as much as possible.

This project is a personal digital twin built by fine-tuning a large language model on your own chat history. The goal is to recreate your unique style of expression and conversational behavior with high fidelity.

The project includes bilingual support.

Project includes bilingual support

Chinese Documentation

English Documents

This project includes a complete tutorial, covering:

Decrypting and processing QQ databases
Cleaning and converting chat data
QLoRA fine-tuning workflow
Testing and using fine-tuned models
Accelerating training with Unsloth!

I know there are already quite a few similar projects out there, but maybe my tutorial, workflow, and code implementation can offer you something different or spark new ideas. If you find it useful, feel free to give it a star — it'll make me happy!

The project still has its shortcomings:

I'm not sure what they are for now
(If you run into issues, feel free to open an Issue)
But it's already capable of fine-tuning Qwen3-8B with FP8 precision on a 4090 24G GPU (tested and working)

"Some code referenced from Weclone"If you also want to create your own digital persona, give it a try!

—— X: @qqqqqf5 Email: qingf622@outlook.com Github:@qqqqqf-q

Project Version

V 0.1.4 Develop

Project Status

Due to many code refactorings in version 0.1.4
There might be more bugs
Welcome all developers to submit Issues and PRs
Contribute to this small project

Development Issues

The cli train and data convert functions have issues, for now you can only use the old version calls
New LLM cleaning functionality is under development (needs to include LLM scoring, LLM output usable segments, etc.)
Support for more Parsers and tutorials (including Telegram, Wechat, etc.)
Fine-tuning script needs refactoring (thinking about whether to continue with Qlora+Unsloth or switch to Llama Factory)
Some documentation hasn't been updated due to project refactoring
Refactored parts don't have bilingual support yet
todo1. Add server API in preparation for WebUI

Project Introduction ​

This is a digital avatar project, the core idea is to use C2C chat records as a dataset to fine-tune large models, making the model restore your unique expression style and chat patterns as much as possible. ​

This project is a personal digital twin built by fine-tuning a large language model on your own chat history. The goal is to recreate your unique style of expression and conversational behavior with high fidelity. ​

The project includes bilingual support. ​

Project includes bilingual support ​

Chinese Documentation ​

English Documents ​

This project includes a complete tutorial, covering: ​

—— X: @qqqqqf5 Email: qingf622@outlook.com Github:@qqqqqf-q ​

Project Version ​

V 0.1.4 Develop ​

Project Status ​

Development Issues ​