Post by Kyle MarekMr. Marek,
I think you may be missing the point that this is about 2.5 and the
RESTCONF interface, not any kind of VPN.
Post by Kyle MarekI became aware of this after reading the follow up post.
Yes, there are constant time implementations of AES, they’re quite
https://www.netgate.com/blog/more-on-aes-ni.html <
https://www.netgate.com/blog/more-on-aes-ni.html>
Post by Kyle MarekRead the whole thing, please, and please remember that this was our
attempt to explain what is coming for future pfSense, well before it would
occur.
Post by Kyle MarekThere is a whole rewrite that needs to occur for 2.5. All the PHP goes
away, and, as we did with the 2.3 -> 2.4 transition, which eliminated
support for 32-bit Intel), and we promised to continue to release 2.3
images for 32-bit Intel for at least a year past the date of 2.4.0-RELEASE,
we are also on record for support the 2.4 series for at least a year after
the 2.5.0-RELEASE.
Post by Kyle MarekAs I understand, pfSense uses OpenSSL to implement these functions that
utilize AES-NI. Is slow bulk throughput the only reason why OpenSSL's
software implementations are not being allowed?
So many people want to make this about Netgate attempting to sell more
appliances. This is not true, and anyone looking critically at the
assertion would see the fallacy of it. I will attempt to outline why.
Post by Kyle MarekIt’s now early 2018, and, unknown to us (or anyone else in the FreeBSD
community) before December last year, Meltdown and Spectre are here. While
the appliance model of pfSense is, as far as we can tell, unaffected by
these (unless you load software from strange places), we’re committed to
fixing them anyway. This will include support for 32-bit Intel on the 2.3
series as FreeBSD (our upstream) implements and releases same.
Post by Kyle MarekAnd, none too subtly, the Spectre attacks are (non-crypto) cache-timing
attacks. Point-in-fact, the AES cache-timing attack that I referenced last
May is, indeed, referenced on the first page of the Spectre paper.
Post by Kyle Marekhttps://spectreattack.com/spectre.pdf
I understand that Netgate offers support for non-Netgate hardware.
True, but the “support” I’m talking about here is that we continue to
maintain, build and test new releases of 2.3 and 2.4 for a period of time.
These are available to everyone, without charge.
Post by Kyle MarekWhat did anyone running 2.3 on a 32-bit Intel or AMD CPU pay Netgate
for this continued support?
Post by Kyle MarekNothing.
So assume that a miracle occurs, and a year from now we have a
2.5.0-RELEASE on 15-Feb-2019. This would mean that the 2.4 series of
pfSense software would continue to be supported until at least 15-Feb-2020.
Post by Kyle MarekWhat did anyone running 2.4 on a 64-bit Intel or AMD CPU that doesn’t
implement AES-NI pay Netgate for this continued support?
Post by Kyle MarekAgain, nothing.
I'm failing to see why any additional effort is needed to support
non-AES-NI AES implementations considering OpenSSL is implementing it.
If AES-NI is not available, OpenSSL will either use Vector Permutation AES
(VPAES https://www.shiftleft.org/papers/vector_aes/vector_aes.pdf) or
Bit-sliced AES (BSAES https://cryptojedi.org/papers/aesbs-20090616.pdf),
provided the SSSE3 instruction set extension is available. SSSE3 was first
introduced in 2006, so there is a fair chance that this will be available
in most computers used. Both of these techniques avoid data- and
key-dependent branches and memory references, and therefore are immune to
known timing attacks. VPAES is used for CBC encrypt, ECB and "obscure"
modes like OFB, CFB, while BSAES is used for CBC decrypt, CTR and XTS.
The bit sliced (constant-time) implementation in OpenSSL could be used,
but the GUI model with RESTCONF is very (very) different. Except for the
various “monitoring” widgets and graphs, a web browser running against
today’s pfsense is all but silent until something like an “Apply” button is
pushed. With RESTCONF, things are much more “chatty”.
This means that there is more load on the box to keep things encrypted.
The bit sliced implementation in OpenSSL is slow, especially on older
processors. I’ve run it on a J1900, and it’s glacial.
As I explained in the blog post, we’re going to move 2.5 to the RESTCONF
interface. We don’t have the resources to carry both the historic PHP and
RESTCONF interfaces forward.
Post by Kyle MarekNow remember that 2.5 is unlikely to occur by 15-Feb-2019, and thus 2.4
will continue to be supported beyond 15-Feb-2020. Were we to get a
2.5.0-RELEASE by 1 May 2019, 2.4.x would be supported until 1 May 2020, and
this is three years after the initial announcement that 2.5 would require a
CPU with AES-NI (or other hardware crypto offload. I’ll note that ARM8v
CPUs have instructions similar to AES-NI, and that the ARM appliances
released by Netgate have crypto offload available.)
Post by Kyle MarekSo if the goal was to somehow coerce people into buying new appliances,
it’s not working until at least then, and even then, all that occurs if you
choose to remain on 2.4 is that some bugs won’t be fixed.
Post by Kyle MarekSo your “shame” Mr. Marek, while noted, is, in my view, specious.
I’ve documented the cache-timing attacks possible against AES-GCM, and you
haven’t countered these.
Post by Kyle MarekI’ve explained (on the forum and elsewhere) that the bit sliced AES
implementation in OpenSSL is too slow, and you haven’t countered these,
either. Warning: some implementations look fast until you realize that
they’re only fast on large (say 2048 byte) blocks, and that they don’t
“scale down” (a 576 byte payload takes exactly the same amount of time.)
Post by Kyle MarekI’ll note that one can make a bit sliced AES implementation go faster
with AVX instructions, but then one also has AES-NI, so the point is moot.
Post by Kyle MarekSo any *shame*, Mr. Marek would be if I knowingly and willing put the
security of the pfSense community at risk lest I be attacked.
Post by Kyle MarekTo be clear, this has not occurred.
I apologize for my comments of shaming. I was under the impression that
this was a meritless artificial limitation rather than any kind of
Post by Kyle Mareksupport burden. However, I still don't understand why the existing
software solutions are insufficient in any way besides throughput.
If they’re fast, they’re problematic from a security standpoint.
On a Westmere (so one generation forward from your Xeons), AES GCM
performs at 3.54 cycles/byte.
Compare this with 10.68 cycles/byte got bitsliced AES GCM with table
lookups (not secure) or
21.99 cycles/byte without table lookups (much more difficult to mount a
side-channel attack) as implemented in OpenSSL.
On a V4 Xeon, AES-GCM runs at 0.77 cycles/byte, and on the newest Xeon
‘Scalable’ cores, it runs at 0.65 cycles/byte.
Since we just mentioned “really new hardware”… and yes, it’s off the
subject of OpenSSL, but possibly of some interest,
With TNSR (the DPDK re-write) on AWS we’re doing 4.59 gbps IPsec
(AES-GCM-128) or 4.58 gbps IPsec (AES-CBC-128 + HMAC-SHA1) between a pair
of C5.large instances (so those same Xeon ’Scalable’ cores), using a single
core. These instances have a maximum 5 gbps single stream ‘cap’. The
traffic generator hosts used iperf3 to send traffic. Tests were run both
with a single stream and with 4 streams.
The traffic generator hosts (outside the tunnel) had their MTU adjusted
down to 1500 (from the default value of 9001). Tests with iperf3 were
invoked with a flag that set the TCP MSS to 1375 in order to ensure that
each segment sent would not exceed 1500 bytes once the encapsulation
overhead (ESP header, initialization vector, padding, integrity check
value, outer IP header) is added.
Raw (no IPsec) throughput using iperf3 was 4.79 gbps. The measurements
taken by iperf3 use the amount of data sent in the TCP payload to calculate
throughput. The 32 byte TCP header (standard 20 byte TCP header plus 10
bytes for optional field containing timestamps and 2 bytes to pad optional
fields to a multiple of 4 bytes) and 20 byte IP header on each packet are
not included in the calculation. If 52 bytes from each 1500 byte packet are
considered overhead that is not included in the measurement, the maximum
result that iperf3 could achieve on a 5 gbps link would be approximately
4.83 gbps.
Additional overhead from ESP includes 20 bytes for an outer IP header, 8
bytes for an ESP header, 2 bytes for padding length & next header type, 16
(AES-CBC) or 8 (AES-GCM) bytes for an initialization vector, and 12
(HMAC-SHA1) or 16 (AES-GCM) bytes for an integrity check value. The total
extra overhead is 58 bytes (AES-CBC HMAC-SHA1) or 54 bytes (AES-GCM). Thus,
the maximum measurement possible using iperf3 on a 5 Gbps link is 4.63 gbps
for AES-CBC-128 HMAC-SHA1 and 4.65 gbps for AES-GCM-128 ICV16.
Net-net, it’s probably faster than that, since we’re obviously hitting the
Amazon-imposed bandwidth limit. Between a pair of i7-6950s (so Broadwell
cores) we see 13.7 gbps (single-stream) AES-GCM-128 and 7.42 gbps
AES-GCM-128 + HMAC-SHA1 (again, single-stream). Adding our CPIC QAT card
gets us to 32.68/32.73 gbps respectively.
Post by Kyle MarekI cannot counter the attack possibility, but I would like to ask: is
this unsolvable without hardware acceleration?
It has a lot to do with what one might consider “acceptable” performance
of the web gui.
Post by Kyle MarekI side with Mr. Parker here. How long does a project have to wait
before demanding certain features for future revisions, assuming it gives
adequate warning to the existing and future users of that project? I’ll
note that you didn’t answer his question.
Post by Kyle MarekI never answered the question because I did not think the answer or the
question was relevant. Until today, it was my understanding that AES-NI
was simply to improve throughput of applications utilizing AES. I had
previously not been presented with anything to indicate that it helps
with any security issues such as the timing attacks discussed here.
"With AES you either design, test, and verify a bitslice software
implementation, (giving up a lot of performance in the process), leverage
hardware offloads, or leave the resulting system open to several known
attacks. We have selected the “leverage hardware offloads” path. The other
two options are either unthinkable, or involve a lot of effort for
diminishing returns.”
I’ve listed the performance of the various implementations in OpenSSL
above.
Post by Kyle MarekHowever, to address the question in some way, I do agree that features
like this should be taken advantage of as much as possible. However,
unlike other advances such as x86 to x86_64, AES-NI does not create any
new functionality that did not already exist. Until the security
benefits have been presented, I did not see any use case where AES-NI
would be necessary over the software implementation.
I would like to point out that AES-NI is not "in everything" since 2008
as previously indicated. While I understand these may not fall under the
"all major x64 processors" category, Intel has launched CPUs without
AES-NI within the past couple of years.
It’s true that not everything Intel and AMD have released in the last
decade has AES-NI.
Post by Kyle Marekhttps://ark.intel.com/Search/FeatureFilter?productType=
processors&AESTech=false&BornOnDate=Q4%2716
Post by Kyle MarekAnd, finally, Mr. Volotinen called it exactly. We announced this in
May last year, so that those buying hardware in the now would know about
the future requirements. Anyone buying hardware now can make an informed
decision as to if they want to buy or otherwise obtain a platform for
pfSense that supports AES-NI, or not. Either will work as long as they are
running a 2.4.x release of pfSense, and, as above, 2.4 has a plan that
includes support until, at least, 2020.
Post by Kyle MarekThis is acceptable. It just also just sucks, and I understand it must be
faced.
This is, however, beyond just replacing some networking equipment, as I
have to replace my primary VM host due to CPU replacements supporting
AES-NI not existing. Before knowing that the AES-NI requirement was to
address the timing attack, I felt as if I have to pay for new hardware
due to Netgate not "wanting" non-AES-NI AES implementations being
utilized. Until this, I have not exactly had software support issues
with even this aging hardware.
Nor do you now. It’s only (at least) a year after the release of 2.5 that
we’ll stop supporting 2.4, and then it’s a matter of when a security issue
or other bug that is important enough to you switch gets addressed in 2.5
but not in 2.4 might occur (gosh that’s an awful sentence, Jim).
Post by Kyle MarekI understand that a lot of people are effectively threatening to switch
to OpnSense due to this, but I fear that I will *have to* if I can't
replace my hardware by the time support for software AES ends entirely.
People should run what suits their purpose best. Perhaps someone else
will fork pfSense and continue the 2.4 train on a different track. That’s
the beauty of open source software.
Post by Kyle Marekhttps://ark.intel.com/Search/FeatureFilter?productType=
processors&SocketsSupported=LGA771&AESTech=true
Post by Kyle MarekI thank you for addressing this with me. I appreciate your conduct with
me despite my comment.
Sure thing. I also appreciate your response here.
Thanks,
Jim
Post by Kyle MarekJim
Post by Kyle MarekI think you're missing the point that software support exists; pfSense
supports software AES *now*, and this is being removed. New technology
is cool; things not working anymore is not.
Anyway, what are are other projects such as the TLS libraries doing
about this? Is hardware acceleration really the only solution?
Post by Walter ParkerWell, both Intel and AMD starting shipping the AES-NI instructions 8
years
Post by Kyle MarekPost by Kyle MarekPost by Walter Parkerago...
How long does a project need to wait before it can require a feature
found
Post by Kyle MarekPost by Kyle MarekPost by Walter Parkeron all major x64 processors? Waiting 8-9 years seems reasonable to me.
Given the fact that the project is only supporting 64-bit and suggests
using a modern processor this requirement should be a non issue for
most
Post by Kyle MarekPost by Kyle MarekPost by Walter Parkerusers.
The only place where the AES-NI instructions are not found is in a
small
Post by Kyle MarekPost by Kyle MarekPost by Walter Parkernumber of embedded/dev boards using older Celeron processors.
Walter
Post by Kyle MarekThis is silly. I shouldn't have to replace my hardware to support a
feature I will not use...
I shame Netgate for such an artificial limitation...
Thank you for the information.
Post by Eero Volotinenhttps://www.netgate.com/blog/pfsense-2-5-and-aes-ni.html so we are
talking
Post by Eero Volotinenabout 2.5 not 3.x ?
"While we’re not revealing the extent of our plans, we do want to
give
loads
Edition
AES-NI. On
Post by Kyle MarekPost by Kyle MarekPost by Walter ParkerPost by Kyle MarekPost by Eero VolotinenARM-based systems, the additional load from AES operations will be
offloaded to on-die cryptographic accelerators, such as the one
found on
v8 CPUs
Post by Kyle MarekPost by Kyle MarekPost by Walter ParkerPost by Kyle MarekPost by Eero Volotineninclude instructions like AES-NI
<https://www.arm.com/files/downloads/ARMv8_Architecture.pdf> that
can be
platforms."
aes-ni
Eero
Post by Kyle MarekPost by Kyle MarekPost by Walter ParkerPost by Kyle MarekPost by Eero VolotinenPost by Edwin PersVolotinen
Sent: Thursday, February 15, 2018 12:14 PM
Cc: pfSense Support and Discussion Mailing List <
Subject: Re: [pfSense] Configs or hardware?
Well. Next version of pfsense (2.5) will not install into hardware
that
Well Said.
Thank you for sharing the numbers.