401 Star 1.4K Fork 1.3K

GVPopenEuler / kernel

 / 详情

[openEuler 22.03] Introduce x86 assembler accelerated implementation for SM4 algorithm

已完成
需求 拥有者
创建于  
2021-09-15 20:11

https://lwn.net/Articles/863574/

Introduce x86 assembler accelerated implementation for SM4 algorithm
From:	 	Tianjia Zhang <tianjia.zhang-AT-linux.alibaba.com>
To:	 	Herbert Xu <herbert-AT-gondor.apana.org.au>, "David S. Miller" <davem-AT-davemloft.net>, Eric Biggers <ebiggers-AT-google.com>, Eric Biggers <ebiggers-AT-kernel.org>, Gilad Ben-Yossef <gilad-AT-benyossef.com>, Ard Biesheuvel <ardb-AT-kernel.org>, "Markku-Juhani O . Saarinen" <mjos-AT-iki.fi>, Jussi Kivilinna <jussi.kivilinna-AT-iki.fi>, Catalin Marinas <catalin.marinas-AT-arm.com>, Will Deacon <will-AT-kernel.org>, Thomas Gleixner <tglx-AT-linutronix.de>, Ingo Molnar <mingo-AT-redhat.com>, Borislav Petkov <bp-AT-alien8.de>, "H. Peter Anvin" <hpa-AT-zytor.com>, x86-AT-kernel.org, linux-crypto-AT-vger.kernel.org, linux-arm-kernel-AT-lists.infradead.org, linux-kernel-AT-vger.kernel.org, Jia Zhang <zhang.jia-AT-linux.alibaba.com>, "YiLin . Li" <YiLin.Li-AT-linux.alibaba.com>
Subject:	 	[PATCH v3 0/4] Introduce x86 assembler accelerated implementation for SM4 algorithm
Date:	 	Tue, 20 Jul 2021 11:46:38 +0800
Message-ID:	 	<20210720034642.19230-1-tianjia.zhang@linux.alibaba.com>
Cc:	 	Tianjia Zhang <tianjia.zhang-AT-linux.alibaba.com>
Archive-link:	 	Article
This patchset extracts the public SM4 algorithm as a separate library,
At the same time, the acceleration implementation of SM4 in arm64 was
adjusted to adapt to this SM4 library. Then introduces an accelerated
implementation of the instruction set on x86_64.

This optimization supports the four modes of SM4, ECB, CBC, CFB, and
CTR. Since CBC and CFB do not support multiple block parallel
encryption, the optimization effect is not obvious. And all selftests
have passed already.

The main algorithm implementation comes from SM4 AES-NI work by
libgcrypt and Markku-Juhani O. Saarinen at:
https://github.com/mjosaarinen/sm4ni

Benchmark on Intel Xeon Cascadelake, the data comes from the mode 218
and mode 518 of tcrypt. The abscissas are blocks of different lengths.
The data is tabulated and the unit is Mb/s:

sm4-generic   |    16      64     128     256    1024    1420    4096
      ECB enc | 40.99   46.50   48.05   48.41   49.20   49.25   49.28
      ECB dec | 41.07   46.99   48.15   48.67   49.20   49.25   49.29
      CBC enc | 37.71   45.28   46.77   47.60   48.32   48.37   48.40
      CBC dec | 36.48   44.82   46.43   47.45   48.23   48.30   48.36
      CFB enc | 37.94   44.84   46.12   46.94   47.57   47.46   47.68
      CFB dec | 37.50   42.84   43.74   44.37   44.85   44.80   44.96
      CTR enc | 39.20   45.63   46.75   47.49   48.09   47.85   48.08
      CTR dec | 39.64   45.70   46.72   47.47   47.98   47.88   48.06
sm4-aesni-avx
      ECB enc | 33.75  134.47  221.64  243.43  264.05  251.58  258.13
      ECB dec | 34.02  134.92  223.11  245.14  264.12  251.04  258.33
      CBC enc | 38.85   46.18   47.67   48.34   49.00   48.96   49.14
      CBC dec | 33.54  131.29  223.88  245.27  265.50  252.41  263.78
      CFB enc | 38.70   46.10   47.58   48.29   49.01   48.94   49.19
      CFB dec | 32.79  128.40  223.23  244.87  265.77  253.31  262.79
      CTR enc | 32.58  122.23  220.29  241.16  259.57  248.32  256.69
      CTR dec | 32.81  122.47  218.99  241.54  258.42  248.58  256.61

---
v3 changes:
  * Remove single block algorithm that does not greatly improve performance
  * Remove accelerated for sm4 key expand, which is not performance-critical
  * Fix the warning on arm64/sm4-ce

v2 changes:
  * SM4 library functions use "sm4_" prefix instead of "crypto_" prefix
  * sm4-aesni-avx supports accelerated implementation of four specific modes
  * tcrypt benchmark supports sm4-aesni-avx
  * fixes of other reviews


Tianjia Zhang (4):
  crypto: sm4 - create SM4 library based on sm4 generic code
  crypto: arm64/sm4-ce - Make dependent on sm4 library instead of
    sm4-generic
  crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation
  crypto: tcrypt - add the asynchronous speed test for SM4

 arch/arm64/crypto/Kconfig              |   2 +-
 arch/arm64/crypto/sm4-ce-glue.c        |  20 +-
 arch/x86/crypto/Makefile               |   3 +
 arch/x86/crypto/sm4-aesni-avx-asm_64.S | 589 +++++++++++++++++++++++++
 arch/x86/crypto/sm4_aesni_avx_glue.c   | 459 +++++++++++++++++++
 crypto/Kconfig                         |  22 +
 crypto/sm4_generic.c                   | 180 +-------
 crypto/tcrypt.c                        |  26 +-
 include/crypto/sm4.h                   |  25 +-
 lib/crypto/Kconfig                     |   3 +
 lib/crypto/Makefile                    |   3 +
 lib/crypto/sm4.c                       | 176 ++++++++
 12 files changed, 1330 insertions(+), 178 deletions(-)
 create mode 100644 arch/x86/crypto/sm4-aesni-avx-asm_64.S
 create mode 100644 arch/x86/crypto/sm4_aesni_avx_glue.c
 create mode 100644 lib/crypto/sm4.c

-- 
2.19.1.3.ge56e4f7

评论 (4)

Xie XiuQi 创建了需求
Xie XiuQi 关联仓库设置为openEuler/kernel
openeuler-ci-bot 添加了
 
sig/Kernel
标签
展开全部操作日志

Hi xiexiuqi, welcome to the openEuler Community.
I'm the Bot here serving you. You can find the instructions on how to interact with me at
https://gitee.com/openeuler/community/blob/master/en/sig-infrastructure/command.md.
If you have any questions, please contact the SIG: Kernel, and any of the maintainers: @Xie XiuQi, @YangYingliang, @成坚 (CHENG Jian).

Xie XiuQi 修改了标题
Xie XiuQi 修改了描述
Xie XiuQi 添加了
 
severity/minor
标签
Xie XiuQi 移除了
 
severity/minor
标签

已提交补丁:
crypto: sm4 - create SM4 library based on sm4 generic code
commit id: 2b31277af577b1b2da62c3ad7d3315b422869102
crypto: arm64/sm4-ce - Make dependent on sm4 library instead of sm4-generic
commit id: c59de48e125c6d49a8abd165e388ca57bfe37b17
crypto: x86/sm4 - add AES-NI/AVX/x86_64 implementation
commit id: a7ee22ee1445c7fdb00ab80116bb9710ca86a860
crypto: tcrypt - add the asynchronous speed test for SM4
commit id: a7fc80bb22eb0f13791ee4f70484e88316cc2a24


测试环境:
CPU:AMD Ryzen 5 PRO 4650U with Radeon Graphics 2.10 GHz
核数:4核
内存:4GB
系统:openeuler21.03
内核:OLK-5.10
测试时间:2s

测试结果如下:

输入图片说明
输入图片说明

测试效果与源patch基本一致

图一为sm4-generic,图二为avx加速后

诚邀Issue的创建人,负责人,协作人以及评论人对此次Issue解决过程给予评价:

   0   1   2   3   4   5   6   7   8   9   10  

 不满意                        非常满意

登录 后才可以发表评论

状态
负责人
项目
里程碑
Pull Requests
关联的 Pull Requests 被合并后可能会关闭此 issue
分支
开始日期   -   截止日期
-
置顶选项
优先级
预计工期 (小时)
参与者(4)
5329419 openeuler ci bot 1632792936 9968373 openeuler survey bot 1637036855
C
1
https://gitee.com/openeuler/kernel.git
git@gitee.com:openeuler/kernel.git
openeuler
kernel
kernel

搜索帮助