[May 29] Init doc
Available LLM Fine-Tuning Frameworks#
- LLaMA-Factory
- xtuner
- unsloth
Brief Introduction#
- Llama_Factory offers the most fine-tuning methods, with many from the latest academic papers, including LongLora, etc.; the latest framework includes Unsloth.
- Xtuner provides relatively rich documentation and many optimization techniques, but the fine-tuning technology is somewhat limited, offering only basic Lora and QLora.
- Unsloth offers decent documentation but also provides only a small number of fine-tuning options.
- If your needs are simple, such as fine-tuning a short dialogue instruction dataset (like alpaca) on a general model like Llama3 with 24G of GPU memory, any of the above libraries can be used.
General Steps#
Creating the Dataset#
Datasets can generally be divided into two types based on format: alpaca and sharegpt.
According to fine-tuning type, they can be divided into Supervised Fine-Tuning Dataset and Pretraining Dataset, with the former used for instruction fine-tuning dialogue purposes and the latter for incremental pre-training.
For methods of creating datasets, you can refer to LLaMA-Factory/data/README.md at main · hiyouga/LLaMA-Factory.
Choosing Fine-Tuning Techniques#
The most basic fine-tuning method is Lora; if you want to use less GPU memory, you can use QLora, where Q means Quantized.
If there are long sequence requirements but only limited GPU memory, consider Unsloth + Flash Attention 2.
Llama_factory offers a wide variety of fine-tuning techniques to choose from.
Following the Framework's Documentation#
Common Fine-Tuning Techniques#
- RoPE Scaling
- It supports fine-tuning of arbitrary lengths; for example, Llama3 is pre-trained only at 8K length, but it can be fine-tuned at any length using this.
- FlashAttention
- Reduces training time and GPU memory usage.
Solutions to encountered problems:
- Issues in the repo