mirror of
				https://github.com/hiyouga/LLaMA-Factory.git
				synced 2025-11-04 18:02:19 +08:00 
			
		
		
		
	Former-commit-id: 26c6bfd21de06cc56be9a58e2ef69045ea70cc14
This commit is contained in:
		
							parent
							
								
									e49f7f1afe
								
							
						
					
					
						commit
						728dfb1be7
					
				@ -14,11 +14,11 @@
 | 
			
		||||
 | 
			
		||||
## Changelog
 | 
			
		||||
 | 
			
		||||
[23/09/27] We supported **$S^2$-Attn** proposed by [LongLoRA](https://github.com/dvlab-research/LongLoRA). Try `--shift_attn` argument to enable shift short attention.
 | 
			
		||||
[23/09/27] We supported **$S^2$-Attn** proposed by [LongLoRA](https://github.com/dvlab-research/LongLoRA) for the LLaMA models. Try `--shift_attn` argument to enable shift short attention.
 | 
			
		||||
 | 
			
		||||
[23/09/23] We integrated MMLU, C-Eval and CMMLU benchmarks in this repo. See [this example](#evaluation) to evaluate your models.
 | 
			
		||||
 | 
			
		||||
[23/09/10] We supported using **[FlashAttention](https://github.com/Dao-AILab/flash-attention)** for the LLaMA models. Try `--flash_attn` argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.
 | 
			
		||||
[23/09/10] We supported using **[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)** for the LLaMA models. Try `--flash_attn` argument to enable FlashAttention-2 if you are using RTX4090, A100 or H100 GPUs.
 | 
			
		||||
 | 
			
		||||
[23/08/18] We supported **resuming training**, upgrade `transformers` to `4.31.0` to enjoy this feature.
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
@ -14,11 +14,11 @@
 | 
			
		||||
 | 
			
		||||
## 更新日志
 | 
			
		||||
 | 
			
		||||
[23/09/27] 我们支持了 [LongLoRA](https://github.com/dvlab-research/LongLoRA) 提出的 **$S^2$-Attn**。请使用 `--shift_attn` 参数以启用该功能。
 | 
			
		||||
[23/09/27] 我们针对 LLaMA 模型支持了 [LongLoRA](https://github.com/dvlab-research/LongLoRA) 提出的 **$S^2$-Attn**。请使用 `--shift_attn` 参数以启用该功能。
 | 
			
		||||
 | 
			
		||||
[23/09/23] 我们在项目中集成了 MMLU、C-Eval 和 CMMLU 评估集。使用方法请参阅[此示例](#模型评估)。
 | 
			
		||||
 | 
			
		||||
[23/09/10] 我们支持了 LLaMA 模型的 **[FlashAttention](https://github.com/Dao-AILab/flash-attention)**。如果您使用的是 RTX4090、A100 或 H100 GPU,请使用 `--flash_attn` 参数以启用 FlashAttention-2(实验性功能)。
 | 
			
		||||
[23/09/10] 我们针对 LLaMA 模型支持了 **[FlashAttention-2](https://github.com/Dao-AILab/flash-attention)**。如果您使用的是 RTX4090、A100 或 H100 GPU,请使用 `--flash_attn` 参数以启用 FlashAttention-2(实验性功能)。
 | 
			
		||||
 | 
			
		||||
[23/08/18] 我们支持了**训练状态恢复**,请将 `transformers` 升级至 `4.31.0` 以启用此功能。
 | 
			
		||||
 | 
			
		||||
 | 
			
		||||
@ -160,4 +160,8 @@ class CMMLU(datasets.GeneratorBasedBuilder):
 | 
			
		||||
    def _generate_examples(self, filepath):
 | 
			
		||||
        df = pd.read_csv(filepath, header=0, index_col=0, encoding="utf-8")
 | 
			
		||||
        for i, instance in enumerate(df.to_dict(orient="records")):
 | 
			
		||||
            question = instance.pop("Question", "")
 | 
			
		||||
            answer = instance.pop("Answer", "")
 | 
			
		||||
            instance["question"] = question
 | 
			
		||||
            instance["answer"] = answer
 | 
			
		||||
            yield i, instance
 | 
			
		||||
 | 
			
		||||
@ -51,7 +51,9 @@ SUPPORTED_MODELS = {
 | 
			
		||||
    "InternLM-7B-Chat": "internlm/internlm-chat-7b",
 | 
			
		||||
    "InternLM-20B-Chat": "internlm/internlm-chat-20b",
 | 
			
		||||
    "Qwen-7B": "Qwen/Qwen-7B",
 | 
			
		||||
    "Qwen-14B": "Qwen/Qwen-14B",
 | 
			
		||||
    "Qwen-7B-Chat": "Qwen/Qwen-7B-Chat",
 | 
			
		||||
    "Qwen-14B-Chat": "Qwen/Qwen-14B-Chat",
 | 
			
		||||
    "XVERSE-13B": "xverse/XVERSE-13B",
 | 
			
		||||
    "XVERSE-13B-Chat": "xverse/XVERSE-13B-Chat",
 | 
			
		||||
    "ChatGLM2-6B-Chat": "THUDM/chatglm2-6b",
 | 
			
		||||
 | 
			
		||||
		Loading…
	
	
			
			x
			
			
		
	
		Reference in New Issue
	
	Block a user