大模型GPT2在单卡超分训练场景下，batch_size=2时GMEM性能提升较Nvidia UVM不足60%

【环境信息】
6.4.0-8.0.0.16.oe2309.aarch64
【问题复现步骤】，请描述具体的操作步骤
batch_size=2，训练gpt2
【实际结果】，请描述出问题的结果和影响
GMEM性能提升较Nvidia UVM不足60%
【其他相关附件信息】

Hi zhangxiaofeng-melody, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at Here.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers.

1.3B模型，图单算子模式batch_size=2时未超分，性能符合预期

GVP openEuler / kernel

内容风险标识

评论 (2)

GVPopenEuler / kernel

内容风险标识

大模型GPT2在单卡超分训练场景下，batch_size=2时GMEM性能提升较Nvidia UVM不足60%

评论 (2)

搜索帮助

GVP openEuler / kernel