- SMB3 file transfer with encryption will be 4-6 times faster with Samba 4.12
- Samba is getting out of the crypto business
- GnuTLS >= 3.4.7 required
- FIPS compliance ahead
More or less since the beginning of Samba, it implemented the cryptography it needed to talk to Windows on its own. One reason is that Windows didn’t follow the standards or used ciphers nobody else really used. This is changing right now!
GnuTLS already was a used by Samba if available and it is a requirement if you build the Samba AD with MIT Kerberos already. So to get out of the crypto business we decided to use GnuTLS as our crypto library.
With Samba 4.11 we did the first step using GnuTLS and required GnuTLS 3.2. With Samba 4.12 the requirement will be at least GnuTLS 3.4.7. The reason is that we require AEAD for AES-CCM and AES-GCM and 3.4.7 is already the requirement if building Samba AD with MIT Kerberos. This allowed us also to delete a lot of code!
The use of GnuTLS allows us to use hardware acceleration for the most important ciphers. We already had that in some degree but it didn’t work very well, at least not with AES-GCM. Also GnuTLS supports ARM and other platforms.
Using AES-GCM we gained a 50x speedup with GnuTLS, because the Samba crypto implementation was so slow. Because of this Samba prefered AES-CCM when establishing connections in the past.
So comparing Samba’s AES-CCM implentation with hardware acceleration to Samba with AES-GCM from GnuTLS we are twice as fast copying files now.
Steve French who is the Kernel CIFS maintainer started to support AES-GCM too. He was able to confirm the numbers when running against a Samba file server. AES-GCM support in mount.cifs shipped with Linux 5.3!
Performance with GnuTLS 3.6.10
Samba uses internally io vectors to handle data packets. To encrypt an io vector I had to bring it into a form that GnuTLS can consume it. That meant to allocate memory and do a lot of memory copies, hand it to GnuTLS and copy the encrypted data back to the vectors. This is expensive!
To avoid that bottle neck I requested functions for GnuTLS which can deal with io vectors. This got implemented by Daiki Ueno in GnuTLS 3.6.10. However when I started to use those functions I discovered that there is a bug, which hopefully will be fixed with a new release soon.
I did some local tests with an updated GnuTLS package. So the new function provide again a 1.6 times to 2 times speedup!
The patch should make Samba 4.12 even more awesome. With 4.12 you should turn on SMB3 encryption in your configurations.
Using a proper crypto library allows us to provide FIPS compliance. If the system is set to FIPS mode certain ciphers wont work like RC4 and MD5 hashes. This means NTLM doesn’t work and many other parts. However we will be able to provide a version which will allow us to work with Kerberos and modern crypto, but it means all the legacy support for older Windows versions is lost in FIPS mode.
To bring Samba into a state that it works well in FIPS mode that may take longer and wont be in the next release.
Thanks to the GnuTLS Team for their help and support!
Samba with GnuTLS 3.6.10 using gnutls_aead_cipher_(en|de)crypt time bin/smbclient //krikkit/test -Uasn%secret -mSMB3 -e -c 'put 4GB.bin; quit' putting file 4GB.bin as \4GB.bin (457843.5 kb/s) (average 457843.5 kb/s) real 0m10.054s user 0m3.604s sys 0m4.923s time bin/smbclient //krikkit/test -Uasn%secret -mSMB3 -e -c 'get 4GB.bin /dev/ null; quit' getting file \4GB.bin of size 4294967296 as /dev/null (620000.6 KiloBytes/sec) (average 620000.6 KiloBytes/sec) real 0m7.425s user 0m2.840s sys 0m3.128s Samba with GnuTLS 3.6.10 using gnutls_aead_cipher_(en|de)cryptv2 (note the v for vector!) time bin/smbclient //krikkit/test -Uasn%secret -mSMB3 -e -c 'put 4GB.bin; quit' putting file 4GB.bin as \4GB.bin (692700.9 kb/s) (average 692700.9 kb/s) real 0m6.761s user 0m2.492s sys 0m2.841s time bin/smbclient //krikkit/test -Uasn%secret -mSMB3 -e -c 'get 4GB.bin /dev/ null; quit' getting file \4GB.bin of size 4294967296 as /dev/null (1293739.6 KiloBytes/ sec) (average 1293739.7 KiloBytes/sec) real 0m3.934s user 0m1.907s sys 0m0.558s