name | about | labels |
---|---|---|
Bug Report | Use this template for reporting a bug | kind/bug |
resnet50模型地址:https://gitee.com/mindspore/models/tree/master/official/cv/ResNet
resnet50-imagenet网络pynative模式在910环境8p训练,训练性能在arm、x86差异太大,请定位根因
Ascend
/GPU
/CPU
) / 硬件环境:Please delete the backend not involved / 请删除不涉及的后端:
/device Ascend
Software Environment / 软件环境 (Mandatory / 必填):
-- MindSpore version (e.g., 1.7.0.Bxxx) :
-- Python version (e.g., Python 3.7.5) :
-- OS platform and distribution (e.g., Linux Ubuntu 16.04):
-- GCC/Compiler version (if compiled from source):
run包:HiAI/HISI_C29/20230301
MindSpore 版本:r2.0.0.B180_master_20230309002957_8b868e8a
Excute Mode / 执行模式 (Mandatory / 必填)(PyNative
/Graph
):
Please delete the mode not involved / 请删除不涉及的模式:
/mode pynative
测试仓库地址:solution_test/cases/02network/00cv/resnet50/pynative
用例:
test_ms_resnet50_imagenet_pynative_train_infer_910_8p_0001.py
分别在arm、x86环境执行以下步骤
resnet50-ImageNet网络在910环境训练成功,性能在arm和x86差异不大
网络 数据集 架构 性能(ms/step) 性能(FPS) mode
resnet50 ImageNet2012 arm 353.053689 5800 pynative
resnet50 ImageNet2012 x86 428.799385 4776 pynative
resnet50 ImageNet2012 arm 225.601191 18155 graph
resnet50 ImageNet2012 x86 326.748022 12535 graph
走给肖天赐
Please assign maintainer to check this issue.
请为此issue分配处理人。
@zhongjicheng
此处可能存在不合适展示的内容,页面不予展示。您可通过相关编辑功能自查并修改。
如您确认内容无涉及 不当用语 / 纯广告导流 / 暴力 / 低俗色情 / 侵权 / 盗版 / 虚假 / 无价值内容或违法国家有关法律法规的内容,可点击提交进行申诉,我们将尽快为您处理。
Please add labels (comp or sig), also you can visit https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md to find more.
为了让代码尽快被审核,请您为Pull Request打上 组件(comp)或兴趣组(sig) 标签,打上标签的PR可直接推送给责任人进行审核。
更多的标签可以查看https://gitee.com/mindspore/community/blob/master/sigs/dx/docs/labels.md
以组件相关代码提交为例,如果你提交的是data组件代码,你可以这样评论:
//comp/data
当然你也可以邀请data SIG组来审核代码,可以这样写:
//sig/data
另外你还可以给这个PR标记类型,例如是bugfix或者是特性需求:
//kind/bug or //kind/feature
恭喜你,你已经学会了使用命令来打标签,接下来就在下面的评论里打上标签吧!
ARM:
X86:
X86:
2023-03-16 17:13:46,777:INFO:epoch: [1/90] loss: 5.355258, epoch time: 207.561 s, per step time: 665.260 ms
epoch time: 207561.62285804749, per step time: 665.2616117245112
2023-03-16 17:15:31,480:INFO:epoch: [2/90] loss: 4.376903, epoch time: 104.702 s, per step time: 335.584 ms
epoch time: 104702.94141769409, per step time: 335.5863506977375
2023-03-16 17:17:15,987:INFO:epoch: [3/90] loss: 3.895827, epoch time: 104.507 s, per step time: 334.958 ms
epoch time: 104507.56478309631, per step time: 334.9601435355651
2023-03-16 17:19:00,174:INFO:epoch: [4/90] loss: 3.552855, epoch time: 104.186 s, per step time: 333.931 ms
epoch time: 104187.00623512268, per step time: 333.9327122920599
2023-03-16 17:21:10,958:INFO:epoch: [5/90] loss: 3.402023, epoch time: 130.783 s, per step time: 419.175 ms
epoch time: 130782.88888931274, per step time: 419.1759259272844
2023-03-16 17:22:55,375:INFO:epoch: [6/90] loss: 3.243053, epoch time: 104.417 s, per step time: 334.669 ms
epoch time: 104417.30642318726, per step time: 334.670853920472
2023-03-16 17:24:39,220:INFO:epoch: [7/90] loss: 3.097744, epoch time: 103.844 s, per step time: 332.835 ms
epoch time: 103844.88248825073, per step time: 332.83616182131647
2023-03-16 17:26:23,350:INFO:epoch: [8/90] loss: 3.023839, epoch time: 104.129 s, per step time: 333.748 ms
epoch time: 104130.01346588135, per step time: 333.7500431598761
2023-03-16 17:28:07,850:INFO:epoch: [9/90] loss: 2.981352, epoch time: 104.500 s, per step time: 334.935 ms
epoch time: 104500.08916854858, per step time: 334.9361832325275
2023-03-16 17:29:52,694:INFO:epoch: [10/90] loss: 2.861949, epoch time: 104.843 s, per step time: 336.036 ms
epoch time: 104843.44387054443, per step time: 336.0366790722578
2023-03-16 17:31:36,638:INFO:epoch: [11/90] loss: 2.832540, epoch time: 103.944 s, per step time: 333.155 ms
epoch time: 103944.82946395874, per step time: 333.1565046921755
2023-03-16 17:33:20,641:INFO:epoch: [12/90] loss: 2.981315, epoch time: 104.002 s, per step time: 333.340 ms
epoch time: 104002.53438949585, per step time: 333.34145637658924
2023-03-16 17:35:04,616:INFO:epoch: [13/90] loss: 3.020468, epoch time: 103.974 s, per step time: 333.251 ms
epoch time: 103974.94506835938, per step time: 333.2530290652544
2023-03-16 17:36:48,442:INFO:epoch: [14/90] loss: 2.818917, epoch time: 103.825 s, per step time: 332.774 ms
epoch time: 103825.89030265808, per step time: 332.77528943159643
ARM:
2023-03-16 17:21:01,044:INFO:epoch: [1/90] loss: 5.336744, epoch time: 221.602 s, per step time: 710.264 ms
epoch time: 221603.19828987122, per step time: 710.2666611854846
2023-03-16 17:22:56,450:INFO:epoch: [2/90] loss: 4.481668, epoch time: 115.405 s, per step time: 369.887 ms
epoch time: 115405.68733215332, per step time: 369.8900235004914
2023-03-16 17:24:47,316:INFO:epoch: [3/90] loss: 3.887445, epoch time: 110.865 s, per step time: 355.338 ms
epoch time: 110866.11890792847, per step time: 355.3401247048989
2023-03-16 17:26:37,867:INFO:epoch: [4/90] loss: 3.555289, epoch time: 110.550 s, per step time: 354.327 ms
epoch time: 110550.67467689514, per step time: 354.329085502869
2023-03-16 17:29:20,074:INFO:epoch: [5/90] loss: 3.385780, epoch time: 162.206 s, per step time: 519.890 ms
epoch time: 162206.50339126587, per step time: 519.8926390745701
2023-03-16 17:31:11,138:INFO:epoch: [6/90] loss: 3.249865, epoch time: 111.064 s, per step time: 355.973 ms
epoch time: 111064.30864334106, per step time: 355.97534821583673
2023-03-16 17:33:00,693:INFO:epoch: [7/90] loss: 3.122056, epoch time: 109.554 s, per step time: 351.136 ms
epoch time: 109555.08518218994, per step time: 351.1380935326601
2023-03-16 17:34:52,209:INFO:epoch: [8/90] loss: 3.040946, epoch time: 111.515 s, per step time: 357.421 ms
epoch time: 111516.05272293091, per step time: 357.42324590682983
2023-03-16 17:36:41,979:INFO:epoch: [9/90] loss: 3.053561, epoch time: 109.769 s, per step time: 351.824 ms
epoch time: 109769.82259750366, per step time: 351.8263544791784
2023-03-16 17:38:31,735:INFO:epoch: [10/90] loss: 2.841896, epoch time: 109.755 s, per step time: 351.778 ms
epoch time: 109755.4886341095, per step time: 351.7804122888125
性能问题只在111.101环境出现,对比单跑数据处理的性能:
111.101:
112.32:
均step性能并无差异,非数据处理引起
在111.101环境通过设置ImageFolderDataset的num_samples参数,调小step数,性能则能够达标:
定位此问题为环境问题。
PYNATIVE模式下数据迭代单step耗时在200ms以内,网络计算单step耗时在300ms左右,数据处理耗时能异步隐藏于迭代间隙,整体对外耗时的差异非数据处理造成,转动态图组件继续定位
在天赐的结论基础上,做了profiler实验
101 profiler
32 profiler
101的算子执行的慢,或许是环境影响了什么
101 单卡 图模式
101 单卡 图模式
101 单卡 pynative
32 单卡 pynative
图模式没有差别,大概是101在pynative下算子执行慢
@chujinjin 请继续定位
该机器上其他CV类网络,也有同样现象,与上述Profiling结果一致,对BN,Conv类算子性能变慢
2023/4/3 CCB结论:该问题降级为提示单,后续跟踪系统整体重装后的性能。
登录 后才可以发表评论