Introduction to LLMs Fine-tuning

[May 29] Init doc

Available LLM Fine-Tuning Frameworks#

LLaMA-Factory
xtuner
unsloth

Brief Introduction#

Llama_Factory offers the most fine-tuning methods, with many from the latest academic papers, including LongLora, etc.; the latest framework includes Unsloth.
Xtuner provides relatively rich documentation and many optimization techniques, but the fine-tuning technology is somewhat limited, offering only basic Lora and QLora.
Unsloth offers decent documentation but also provides only a small number of fine-tuning options.
If your needs are simple, such as fine-tuning a short dialogue instruction dataset (like alpaca) on a general model like Llama3 with 24G of GPU memory, any of the above libraries can be used.

General Steps#

Creating the Dataset#

Datasets can generally be divided into two types based on format: alpaca and sharegpt.
According to fine-tuning type, they can be divided into Supervised Fine-Tuning Dataset and Pretraining Dataset, with the former used for instruction fine-tuning dialogue purposes and the latter for incremental pre-training.
For methods of creating datasets, you can refer to LLaMA-Factory/data/README.md at main · hiyouga/LLaMA-Factory.

Choosing Fine-Tuning Techniques#

The most basic fine-tuning method is Lora; if you want to use less GPU memory, you can use QLora, where Q means Quantized.
If there are long sequence requirements but only limited GPU memory, consider Unsloth + Flash Attention 2.
Llama_factory offers a wide variety of fine-tuning techniques to choose from.

Following the Framework's Documentation#

Common Fine-Tuning Techniques#

RoPE Scaling
- It supports fine-tuning of arbitrary lengths; for example, Llama3 is pre-trained only at 8K length, but it can be fine-tuned at any length using this.
FlashAttention
- Reduces training time and GPU memory usage.

Solutions to encountered problems:

Google

Issues in the repo