【环境信息】
6.4.0-8.0.0.16.oe2309.aarch64
【问题复现步骤】,请描述具体的操作步骤
GPT2设置batch_size=4跑满step
【实际结果】,请描述出问题的结果和影响
训练结束时系统panic
【其他相关附件信息】
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
这个问题多个机器没有复现出来,经过分析影响可控。
对vma同步的处理上存在问题,没有使能大页,导致存在大小页共存,使用上存在问题
问题已解决
2023-09-25 18:57:19,128 - mindformers - INFO - .........Build Callbacks For Train..........
2023-09-25 18:57:19,129 - mindformers - INFO - .........Build Callbacks for Train From Config..........
2023-09-25 18:57:19,130 - mindformers - INFO - .........Build Running Wrapper From Config For Train..........
2023-09-25 18:57:19,130 - mindformers - INFO - .........Build Model Wrapper for Train From Config..........
2023-09-25 18:57:19,141 - mindformers - INFO - .........Starting Init Train Model..........
2023-09-25 18:57:19,142 - mindformers - INFO - .........Starting Training Model..........
2023-09-25 18:57:19,142 - mindformers - INFO - .........Model Compiling, Please Wait a Moment...........
2023-09-25 20:02:17,571 - mindformers - INFO - Epoch:[ 1/ 2], step:[ 591/ 591], loss:[6.594/6.594], time:3890915.021 ms, lr:5.5152283e-05, overflow cond: False, loss_scale: 16384.0
2023-09-25 20:02:51,406 - mindformers - INFO - Epoch time: 3932263.041 ms, per step time: 6653.575 ms, avg loss: 6.594
2023-09-25 21:07:08,583 - mindformers - INFO - Epoch:[ 2/ 2], step:[ 591/ 591], loss:[6.702/6.702], time:3857161.517 ms, lr:1.0152285e-05, overflow cond: False, loss_scale: 16384.0
2023-09-25 21:07:41,340 - mindformers - INFO - Epoch time: 3889930.756 ms, per step time: 6581.947 ms, avg loss: 6.702
2023-09-25 21:07:41,346 - mindformers - INFO - .........Training Over!.............
Sign in to comment