Note that the flash-attn library is installed in this image and the qwen model will use it automatically.
However, if the the host machine's GPU is not compatible with the library, an exception will be raised during the training process as follows:
FlashAttention only supports Ampere GPUs or newer.
So if the --flash_attn flag is not set, an additional patch for the qwen model's config is necessary to set the default value of use_flash_attn from "auto" to False.
Former-commit-id: e75407febdec086f2bdca723a7f69a92b3b1d63f
Modify installation method of extra python library.
Utilize shared memory of the host machine to increase training performance.
Former-commit-id: 6a5693d11d065f6e75c8cdd8b5ed962eb520953c
Already tested with the model of Qwen:1.8B and the dataset of alpaca_data_zh. Some python libraries are added to the Dockerfile as a result of the exception messages displayed throughout test procedure.
Former-commit-id: 3d911ae713b901d6680a9f9ac82569cc5878f820