lyra

Hi Daniel,

Post by Daniel CegieÅka
Do you have any opinion on lyra vs scrypt? Lyra is compared to scrypt
as a safer alternative.
http://lyra-kdf.net/
http://lyra-leocalm.rhcloud.com/Presentation.pdf

Your question is more appropriate for the PHC discussion list, where
folks are already discussing Lyra. Since the PHC discussion list was
created, I mostly limited my use of the crypt-dev list for recording
ideas/decisions regarding our own KDF / password hashing scheme design.
Yes, there were some recent exceptions to that, but mostly from threads
that were started here before the PHC discussion list was setup.

Since you asked in here anyway, here are some quick comments:

0. Lyra is a very welcome upcoming PHC submission.

1. Disclaimer: I don't fully understand Lyra yet and am not yet
convinced that what's claimed is actually achieved.

2. The claimed higher than scrypt's memory usage for same running time
would be very nice. Whether it is actually achieved, I am not sure.
Where would it come from? scrypt spends half its time on filling
memory, and it uses Salsa20/8 for that. Is it claimed that BLAKE2 is
somehow a lot faster than Salsa20/8? I'd expect these to be roughly
similar speed. Maybe the speedup the authors observed was due to
implementation detail (comparison against less-than-optimal scrypt
implementation and build) rather than due to the design of Lyra.

3. The claimed TMTO resistance would also be very nice, although it's
arguably even nicer when it's optional (like we have in the current
development escrypt tree), because there are some defender's uses for
scrypt's (deliberate) TMTO-friendliness too.

4. The summary table on slide 28/36 (page 31 in the PDF) gives different
meaning to R for scrypt vs. Lyra. To compare the "Intermediate states"
and "Memory-free" times for scrypt vs. Lyra, they first need to be
normalized for same "Sequential (Default)" running time. Let's use Rs
to refer to scrypt's R, and Rl to Lyra's. We have Rl=Rs/T, so that
O(Rl*T) would match scrypt's O(Rs) (normalization for same running time).
For memory-free running times, this gives us O(Rs^2) for scrypt and
O((Rs/T)^(T+1)) for Lyra. This is still reasonably good for Lyra, but
not as good as the slide makes it appear at first.

5. It appears that higher T, as required for Lyra's TMTO resistance,
would have an undesirable side-effect of reducing Lyra's memory usage.
In fact, from that same summary table the "Sequential (Default)" memory
usage for scrypt is O(Rs), but for Lyra it's only O(Rl*C), which means
O(Rs/T*C). The area-time cost of attacks decreases accordingly.
I think this is not an acceptable tradeoff in KDF design; it's no good
to pay with so much lower memory usage merely for higher TMTO resistance.

6. Including the C multiplier for Lyra, but not the r multiplier for
scrypt may be misleading. There's not really a difference here, but the
table makes it appear as if Lyra had an extra multiplier on the memory
usage.

7. The table somehow does not list "Intermediate states" Memory and Time
for scrypt, even though such states do exist for scrypt, and are
arguably of more practical relevance (for both scrypt and Lyra) than the
extreme memory-free case. It'd be nice to compare those.

8. Other TMTO-discouraging changes relative to scrypt had been proposed.
It'd be nice to see how Lyra compares (not trivial to do).

9. I noticed a few weird things on the slides, such as the GPU attack
example ignoring memory bandwidth, even though they also cite a specific
memory bandwidth figure on a preceding slide. They use 2688 "cores",
6 GB memory, 250 GB/s bandwidth, then imagine a KDF using 0.5 MB in 2 ms
and assume that all 2688 "cores" would compute it at 100% efficiency -
but merely filling the 0.5 MB in 2 ms (not even reading any of that data
back) imposes a limit of 1000 "cores" in full use max (or more "cores"
in less than full use, spending most of their time waiting), since it
needs 250 MB/s/core. This also ignores the high latency of off-chip
memory, and needing to run multiple instances of the KDF per "core" in
order to hide instruction latencies (even without any use of off-chip
memory), but that's sort of OK if they want to arrive at a theoretical
highest possible speed (for a simplified model of the GPU rather than
for the real thing) rather than a realistic speed. And indeed this
assumes no TMTO (which may actually help put more "cores" to use, etc).
(And I put the "cores" in quotes because they are not exactly that,
which may also matter in practice.) OK, I am nitpicking. They're
making their point fine anyway.

10. The claim of (irrelevant) "known vulnerabilities" in SHA-1 and
Salsa20/8 looks like marketing/FUD. I assume that was not intentional.
Better wording would be along the lines of: "some prospective users of
KDFs may not be qualified to determine that SHA-1 and Salsa20/8 are
perfectly OK for such use despite of known attacks, yet may be
concerned; for those users, we chose to use BLAKE2 instead". Even that
would still be weird, though, because those type of users would possibly
be unhappy about BLAKE2 not being NIST-approved. Besides, scrypt uses
SHA-256 and not SHA-1. SHA-256 is NIST-approved and currently among the
recommended algorithms. For scrypt's cryptographic security, it is
sufficient that SHA-256 is secure, whereas scrypt's use of Salsa20/8 is
"non-cryptographic" (its properties are important for scrypt's area-time
cost of attacks, but not for its crypto security per se).

11. Disclaimer: I might have made some errors in the comments above,
too. ;-)

12. The above comments apply to Presentation.pdf with SHA-256 of:
0088c41c99104cd6c0317c3e97ac36ae4e304f07a5255cb6d0da6caf49012663
and modification date of:
/CreationDate (D:20140107172630-02'00')
(I think it might be changing as I am not seeing some of what folks on
the PHC list criticized. This is OK; I just felt the need to refer to a
specific revision for that reason.)

Alexander

Solar Designer

2014-01-13 06:41:01 UTC

Post by Solar Designer
2. The claimed higher than scrypt's memory usage for same running time
would be very nice. Whether it is actually achieved, I am not sure.
Where would it come from? scrypt spends half its time on filling
memory, and it uses Salsa20/8 for that. Is it claimed that BLAKE2 is
somehow a lot faster than Salsa20/8? I'd expect these to be roughly
similar speed. Maybe the speedup the authors observed was due to
implementation detail (comparison against less-than-optimal scrypt
implementation and build) rather than due to the design of Lyra.

The paper http://eprint.iacr.org/2014/030.pdf says on page 21 that
scrypt's -nosse was compared against Lyra, because there was no
SSE2-enabled implementation of Lyra. This does sound like a fair
comparison (although a comparison of two SSE2-enabled versions would be
of more practical relevance). The result is unexpected (by me) and not
explained (by paper authors), though. Why would non-vectorized BLAKE2
(12 rounds?) perform so much faster than non-vectorized Salsa20/8?
I think more testing is needed to figure out what went wrong with this
test (leading to a weird conclusion). ;-)

Post by Solar Designer
5. It appears that higher T, as required for Lyra's TMTO resistance,

Actually, Lyra might (and probably does) have some TMTO resistance
relative to scrypt even at T=1. It looks similar to scrypt with Anthony
Ferrara's write into V added - an approach I posted about in here
earlier. (I say "might" rather than "does" only because I haven't
figured out and considered all of Lyra's peculiarities yet.)

In fact, this results in the parallelized variant of Lyra introduced on
page 15 having lower area-time than escrypt with p>1 (Lyra gives each
thread its own read-write region, whereas escrypt quickly moves on to
having all threads read from the same shared region). It's the same KDF
design tradeoff I posted about in here recently, and Lyra resolved it in
favor of lower area-time with higher TMTO resistance, whereas in escrypt
we mostly choose higher area-time with lower TMTO resistance instead.

Post by Solar Designer
would have an undesirable side-effect of reducing Lyra's memory usage.
In fact, from that same summary table the "Sequential (Default)" memory
usage for scrypt is O(Rs), but for Lyra it's only O(Rl*C), which means
O(Rs/T*C). The area-time cost of attacks decreases accordingly.
I think this is not an acceptable tradeoff in KDF design; it's no good
to pay with so much lower memory usage merely for higher TMTO resistance.

I still think so, but I'd like to add that with low T (lower than what
Lyra authors recommend), it's not that bad. With low T, we can estimate
memory usage as Rs/2 for scrypt and Rs/(T+1) for Lyra (for same running
time), so e.g. for T=2, Lyra's is 2/3 of scrypt's. This reduction in
memory usage is not great (unless the defender doesn't want to use more
memory anyway), but it is within consideration. At T=3, Lyra's memory
usage is 1/2 of scrypt's, which I think is usually undesirable despite
of the TMTO resistance benefits - again, unless the defender doesn't
want to use more memory anyway.

T=5 recommended in the Lyra paper feels excessive and non-optimal. It
is based on the benchmark against scrypt (both for non-optimal builds)
rather than on what would optimally increase attack costs on Lyra.

To summarize, Lyra may be OK in this respect at T=1 and T=2, and at
higher T in the special case when we want a long running time and
deliberately don't want to use more memory (accept having much lower
area-time cost of attacks than we could have for same running time).

Alexander

Daniel Cegiełka

2014-01-13 09:20:28 UTC

Thank you Alexander - your summary is very valuable. I think I lost an
interesting discussion on this topic (PHC - but I already have a
subscription).

Daniel

Marcos Simplicio

2014-01-14 14:49:39 UTC

Hi there.

I saw the discussion below and thought I could offer some pointers on
Lyra's design (please see http://eprint.iacr.org/2014/030 or the official
publication at http://link.springer.com/journal/13389 for all details).

Post by Daniel CegieÅka
Do you have any opinion on lyra vs scrypt? Lyra is compared to scrypt
as a safer alternative.
http://lyra-kdf.net/
http://lyra-leocalm.rhcloud.com/Presentation.pdf

Please let us know if you find any flaw in our security claims! :-)

The reason is that we adopt an Alred-like approach, using 1-round Blake2b
instead of 10-round Blake2b. Since we are interested in diffusion rather
than cryptographic resistance to inversion or collisions, this seems to be
a good approach for KDFs. After all, the attacker has no control over the
inputs provided to the hash... We are open to other people's opinions on
the matter, though :)

Post by Solar Designer
3. The claimed TMTO resistance would also be very nice, although it's
arguably even nicer when it's optional (like we have in the current
development escrypt tree), because there are some defender's uses for
scrypt's (deliberate) TMTO-friendliness too.

I'm not sure whether this solves your point or not, but the TMTO
resistance is configurable/optional: T is fully configurable; if you want
the same memory usage with a lower T, but wants to raise the processing
time, just use 2-round Blake2b (or higher).

Post by Solar Designer
4. The summary table on slide 28/36 (page 31 in the PDF) gives different
meaning to R for scrypt vs. Lyra. To compare the "Intermediate states"
and "Memory-free" times for scrypt vs. Lyra, they first need to be
normalized for same "Sequential (Default)" running time. Let's use Rs
to refer to scrypt's R, and Rl to Lyra's. We have Rl=Rs/T, so that
O(Rl*T) would match scrypt's O(Rs) (normalization for same running time).
For memory-free running times, this gives us O(Rs^2) for scrypt and
O((Rs/T)^(T+1)) for Lyra. This is still reasonably good for Lyra, but
not as good as the slide makes it appear at first.

Normalization is indeed a complicating factor when comparing KDFs, and we
are working on make ours clearer. At first sight (but I haven't check all
details), we actually have something like Rl/C = Rs/r. Maybe I'm wrong by
a constant factor, but I'm pretty sure there is no T in this equation
because T has nothing to do with memory usage...

Post by Solar Designer
5. It appears that higher T, as required for Lyra's TMTO resistance,
would have an undesirable side-effect of reducing Lyra's memory usage.
In fact, from that same summary table the "Sequential (Default)" memory
usage for scrypt is O(Rs), but for Lyra it's only O(Rl*C), which means
O(Rs/T*C). The area-time cost of attacks decreases accordingly.
I think this is not an acceptable tradeoff in KDF design; it's no good
to pay with so much lower memory usage merely for higher TMTO resistance.

I'm not sure I followed your reasoning correctly, but I can assure that
this O(Rs/T*C) is incorrect, since a higher T will always (1) leads to a
higher memory usage in attacks and (2) does not influence the memory usage
for legitimate users (only processing time). An attacker can either store
sponge states (which increase in number with T, until it reaches T*R) or
store nothing and simply recompute sponge states when they are required
(which places T at the exponent, leading to the Rl^{T+1} processing cost).

Post by Solar Designer
6. Including the C multiplier for Lyra, but not the r multiplier for
scrypt may be misleading. There's not really a difference here, but the
table makes it appear as if Lyra had an extra multiplier on the memory
usage.
7. The table somehow does not list "Intermediate states" Memory and Time
for scrypt, even though such states do exist for scrypt, and are
arguably of more practical relevance (for both scrypt and Lyra) than the
extreme memory-free case. It'd be nice to compare those.

Indeed... This comparison is missing and may be leading to the confusion
seen in items 5. and 6... We will make sure to work on that in new
versions of the manuscript...

Post by Solar Designer
8. Other TMTO-discouraging changes relative to scrypt had been proposed.
It'd be nice to see how Lyra compares (not trivial to do).

Probably such comparisons will be part of PHC discussions, but since there
are so many ideas an contestants (some of them still unknown), this will
require a task-force rather than a single team :)

Post by Solar Designer
9. I noticed a few weird things on the slides, such as the GPU attack
example ignoring memory bandwidth, even though they also cite a specific
memory bandwidth figure on a preceding slide. They use 2688 "cores",
6 GB memory, 250 GB/s bandwidth, then imagine a KDF using 0.5 MB in 2 ms
and assume that all 2688 "cores" would compute it at 100% efficiency -
but merely filling the 0.5 MB in 2 ms (not even reading any of that data
back) imposes a limit of 1000 "cores" in full use max (or more "cores"
in less than full use, spending most of their time waiting), since it
needs 250 MB/s/core. This also ignores the high latency of off-chip
memory, and needing to run multiple instances of the KDF per "core" in
order to hide instruction latencies (even without any use of off-chip
memory), but that's sort of OK if they want to arrive at a theoretical
highest possible speed (for a simplified model of the GPU rather than
for the real thing) rather than a realistic speed. And indeed this
assumes no TMTO (which may actually help put more "cores" to use, etc).
(And I put the "cores" in quotes because they are not exactly that,
which may also matter in practice.) OK, I am nitpicking. They're
making their point fine anyway.
10. The claim of (irrelevant) "known vulnerabilities" in SHA-1 and
Salsa20/8 looks like marketing/FUD. I assume that was not intentional.
Better wording would be along the lines of: "some prospective users of
KDFs may not be qualified to determine that SHA-1 and Salsa20/8 are
perfectly OK for such use despite of known attacks, yet may be
concerned; for those users, we chose to use BLAKE2 instead". Even that
would still be weird, though, because those type of users would possibly
be unhappy about BLAKE2 not being NIST-approved. Besides, scrypt uses
SHA-256 and not SHA-1. SHA-256 is NIST-approved and currently among the
recommended algorithms. For scrypt's cryptographic security, it is
sufficient that SHA-256 is secure, whereas scrypt's use of Salsa20/8 is
"non-cryptographic" (its properties are important for scrypt's area-time
cost of attacks, but not for its crypto security per se).

Well, the problem with insecure primitives is that they are (or should?)
likely to be discontinued in future releases of crypto libraries. Hence,
depending on them is not the best policy. In the article this is much
clearer that in the presentation (I hope this removes the impression of
FUD, especially because scrypt's security claims do not depend on
Salsa20/8...)

And we never claimed that SHA-1 is used in scrypt, but in PBKDF only. ;)

Post by Solar Designer
11. Disclaimer: I might have made some errors in the comments above,
too. ;-)
0088c41c99104cd6c0317c3e97ac36ae4e304f07a5255cb6d0da6caf49012663
/CreationDate (D:20140107172630-02'00')
(I think it might be changing as I am not seeing some of what folks on
the PHC list criticized. This is OK; I just felt the need to refer to a
specific revision for that reason.)
Alexander

Solar Designer

2014-01-21 16:32:15 UTC

Hi Marcos,

Thank you for your comments, and I'm sorry for my delayed response.

Post by Marcos Simplicio
I saw the discussion below and thought I could offer some pointers on
Lyra's design (please see http://eprint.iacr.org/2014/030 or the official
publication at http://link.springer.com/journal/13389 for all details).

While I wrote the message you're replying to based solely on your
presentation slides (not the paper), I've since found the paper as well,
and added some comments/corrections here:

http://www.openwall.com/lists/crypt-dev/2014/01/13/1

Post by Marcos Simplicio
Please let us know if you find any flaw in our security claims! :-)

The lack of normalization to same running time in your comparisons

This explains it, and the approach makes sense to me. We're also likely
to move to something quicker than Salsa20/8 in escrypt.

Unfortunately, this makes normalization to same running time ambiguous,
or you may do it in two different ways: for actual performance achieved
on a real-world system and thus considering the move from Salsa20/8 to
1-round Blake2b, and separately for a theoretical system that computes
either of these primitives with the same latency. The latter comparison
is relevant in showing the advantage (or lack thereof) of your other
changes, relative to scrypt. I am mostly interested in this theoretical
comparison, because it'd reflect the less trivial ways in which Lyra
differs from scrypt.

As to "the attacker has no control over the inputs" and not needing
"cryptographic resistance to inversion or collisions", these statements
are only partially valid (in some contexts and to some extent). The
attacker definitely does have control over new inputs when attempting a
collision against previously computed hashes, and such collisions, if
possible for the entire KDF, would ease practically relevant attacks.
Then, short cycles are undesirable (need to be very unlikely), since
when they do occur the attacker is able to compute the KDF quicker, and
this is related to having some collision resistance (just not to the
same extent normally expected from full-round-count crypto hashes).

You have writes into M in the Wandering phase, regardless of the value
of T (although with a lower T there are fewer such writes). These
writes introduce some TMTO unfriendliness as compared to scrypt, even at
T=1. This is both good (when TMTO unfriendliness is desired) and bad
(because it is not optional and not always desired).

I'm not sure I followed your reasoning correctly, but I can assure that
this O(Rs/T*C) is incorrect, since a higher T will always (1) leads to a
higher memory usage in attacks

Always? I think not. There's no point in using more memory than the
defender's, so once T is high enough that an attacker finds it optimal
not to use TMTO, further T increases are of no help to the defender (on
the contrary, they help the attacker - see below).

Further, it might be the case that T=1 is optimal, in that it'd be
enough to make it unreasonable for likely attackers to use TMTO. As I
mentioned above, you do have some TMTO discouragement even at T=1.

Post by Marcos Simplicio
and (2) does not influence the memory usage
for legitimate users (only processing time).

Exactly, and that's the problem!

"Does not influence the memory usage" and increases "only processing
time" implies that it reduces memory usage at same processing time.

Post by Marcos Simplicio
An attacker can either store
sponge states (which increase in number with T, until it reaches T*R) or

Why would any attacker, under any scenario want to store more than R
sponge states (including the overwritten ones)? I think they would not,
and you confirmed that above in "does not influence the memory usage"
(even though you were referring to the defender's).

Or by "store" do you mean the number of store operations, rather than
the number of storage locations?

Post by Marcos Simplicio
store nothing and simply recompute sponge states when they are required
(which places T at the exponent,

Yes.

Post by Marcos Simplicio
leading to the Rl^{T+1} processing cost).

I'm not sure this exact formula is correct, because the Rows Loop
updates random rows, so some rows get updated more than T times (and
some fewer than T times, or none at all). For the rows that get updated
more than T times, you have a higher exponent there, and they probably
dominate the total running time. Thus, I think Rl^{T+1} is an
under-estimate.

Overall, I think the values of T recommended in your paper are too high.
I think potentially optimal values are in the range of
slightly-higher-than-0 (Bill chose 1% for NOELKDF, which means T=0.01,
albeit along with revised memory filling loop to have TMTO resistance
via it as well) to 3, and they don't even have to be whole numbers (if
you join the T and R loops into one, turning T into a multiplier for the
iteration count). As Lyra is currently defined, T=1 may well be optimal.

Post by Marcos Simplicio
Well, the problem with insecure primitives is that they are (or should?)
likely to be discontinued in future releases of crypto libraries. Hence,
depending on them is not the best policy.

OK. In scrypt's case, though, Salsa20/8 core generally has to be custom
code bundled with the scrypt implementation anyway, and SHA-256 is still
considered secure at this time, with no intent to discontinue it.

Post by Marcos Simplicio
In the article this is much
clearer that in the presentation (I hope this removes the impression of
FUD, especially because scrypt's security claims do not depend on
Salsa20/8...)

OK.

Post by Marcos Simplicio
And we never claimed that SHA-1 is used in scrypt, but in PBKDF only. ;)

Oh, that's not how I read the slide. "The hash function SHA-1 adopted
by the PBKDF2 algorithm and the hash function Salsa20/8 adopted by the
Scrypt algorithm have known vulnerabilities" reads as implying the use
of PBKDF2 in scrypt itself (because there is, in fact, such use). Yes,
PBKDF2 is commonly used with SHA-1, but newer uses of PBKDF2 generally
use it along with SHA-256 or SHA-512. I suggest that you split this
bullet point in two to avoid confusion, and clarify that PBKDF2 is not
defined as being based on SHA-1 (rather, PBKDF2 is defined as a
higher-level algorithm that can use any underlying crypto hash function).

I hope this helps, and thanks again for your comments on mine.

Alexander

Marcos Simplicio

2014-01-23 22:37:04 UTC

Hi, Alexander.

Post by Solar Designer
Hi Marcos,
Thank you for your comments, and I'm sorry for my delayed response.

My thanks to you too. It is always good to have feedback and new inputs on
our algorithms. :)

BTW, I'm currently on vacation and my wife is insisting in keeping me away
from the Internet, so, whenever she succeeds, I'm likely to be late in my
responses too...

While I wrote the message you're replying to based solely on your
presentation slides (not the paper), I've since found the paper as well,
http://www.openwall.com/lists/crypt-dev/2014/01/13/1

Post by Marcos Simplicio
Please let us know if you find any flaw in our security claims! :-)

The lack of normalization to same running time in your comparisons

Same running time considering that we use the same underlying hash (as
below), right? We will try to take this into account on our new version of
Lyra (ongoing work...)

In the article we went with the first approach, although indeed the lack
of SSE2 optimizations lead to a less practical comparison. We are now more
focused on implementation issues, so I hope this gets solved soon.

A theoretical comparison would probably be focused on the Wandering's
phase read+write approach, which increases resistance to TMTO. However,
only combined with the fast computation of hashes that it makes sense,
since then one can (1) raise T to a achieve high TMTO resistance while not
spending too much processing time, or (2) with a reasonably small T,
achieve a reasonably good TMTO resistance and low processing time. In
scrypt, if we increase the size of the loop in which random memory
positions are read, this does not lead to an increased TMTO.

Post by Solar Designer
As to "the attacker has no control over the inputs" and not needing
"cryptographic resistance to inversion or collisions", these statements
are only partially valid (in some contexts and to some extent). The
attacker definitely does have control over new inputs when attempting a
collision against previously computed hashes, and such collisions, if
possible for the entire KDF, would ease practically relevant attacks.
Then, short cycles are undesirable (need to be very unlikely), since
when they do occur the attacker is able to compute the KDF quicker, and
this is related to having some collision resistance (just not to the
same extent normally expected from full-round-count crypto hashes).

OK, the attacker can control the password (and possibly salt) passed to
the sponge at the beginning of the Setup phase, but the control ends
there: after that input is processed with a full-round Blake2b, the
sponge's internal state (normally 256-512 bits) is highly random unless
Blake2b itself has security vulnerabilities and, thus, cannot be modeled
as a random oracle; similarly, the memory matrix's cells computed during
Setup are determined by that password and cannot be changed later by the
attacker (well, unless he/she is not interested in getting the correct
derived key...)

The point here is that, if the attacker could change these values at will,
then creating collisions should not be so hard, as 1-round Blake2b is not
expected to be collision-resistant, but creating such collisions by
changing rows or the sponge state is useless for attacks against KDFs (at
least as far as I can tell).

If collisions do occur by chance, then I agree that the whole KDF can be
computed much faster than expected. That is exactly reason why we believe
a reduced-round hash is a good primitive to work with: after several
rounds, these collisions come basically from the birthday paradox and can,
thus, be avoided with a reasonable choice of the state lengths.

Hum... you make me think of a scenario in which the memory usage is
already so small that the attacker is unlikely to need to explore TMTO. Is
that what you mean? Are there other scenarios? Seriously, I have not
thought of that until I read your phrase :)

Post by Solar Designer
4. The summary table on slide 28/36 (page 31 in the PDF) gives

different

Post by Solar Designer
meaning to R for scrypt vs. Lyra. To compare the "Intermediate

states"

Post by Solar Designer
and "Memory-free" times for scrypt vs. Lyra, they first need to be
normalized for same "Sequential (Default)" running time. Let's use Rs
to refer to scrypt's R, and Rl to Lyra's. We have Rl=Rs/T, so that
O(Rl*T) would match scrypt's O(Rs) (normalization for same running

time).

Post by Solar Designer
For memory-free running times, this gives us O(Rs^2) for scrypt and
O((Rs/T)^(T+1)) for Lyra. This is still reasonably good for Lyra, but
not as good as the slide makes it appear at first.

memory

Post by Solar Designer
usage for scrypt is O(Rs), but for Lyra it's only O(Rl*C), which means
O(Rs/T*C). The area-time cost of attacks decreases accordingly.
I think this is not an acceptable tradeoff in KDF design; it's no good
to pay with so much lower memory usage merely for higher TMTO

resistance.
I'm not sure I followed your reasoning correctly, but I can assure that
this O(Rs/T*C) is incorrect, since a higher T will always (1) leads to a
higher memory usage in attacks

Always? I think not. There's no point in using more memory than the
defender's, so once T is high enough that an attacker finds it optimal
not to use TMTO, further T increases are of no help to the defender (on
the contrary, they help the attacker - see below).
Further, it might be the case that T=1 is optimal, in that it'd be
enough to make it unreasonable for likely attackers to use TMTO. As I
mentioned above, you do have some TMTO discouragement even at T=1.

OK, I think I need some rephrasing here: a higher T will always lead to a
less attractive TMTO. Of course, attacks that do not explore TMTO will not
use more memory if one raises T (it will only affect processing time). If
the user thinks O(N^2) or O(N^3) is enough, then indeed T=1 or T=2 should
be enough. This makes me think of the "optional TMTO resistance" discussed
above.

Post by Marcos Simplicio
and (2) does not influence the memory usage
for legitimate users (only processing time).

Exactly, and that's the problem!
"Does not influence the memory usage" and increases "only processing
time" implies that it reduces memory usage at same processing time.

I fully agree with the implication. On the other hand, it appears to me
that in a normalization the value of T would appear in the denominator
only if Lyra was slower than scrypt: if we fix the processing time, than
we get the same memory usage with scrypt and Lyra with some T > 1. In
other words, fixing T=2 would still lead to higher memory usage, not to a
lower memory usage as implied by the "O(Rs/T*C)" normalization.

Post by Marcos Simplicio
An attacker can either store
sponge states (which increase in number with T, until it reaches T*R) or

Because sponge states are usually smaller than rows of the memory matrix.
In our tests, for example, a row takes 64*512 bits (i.e., C*bitrate),
while a state takes 1024 bits (the sponge's width). If all sponge states
are stored, then the algorithm can be run with a less unfriendly TMTO.

Also, the "does not influence the memory usage" I mentioned refers to a
legitimate user, not to attackers. The higher T, the higher will be the
memory usage of attacks for achieving a same processing time (this is a
direct result of TMTO unfriendliness). Or are you proposing a
normalization both concerning attackers and defenders?

Post by Solar Designer
Or by "store" do you mean the number of store operations, rather than
the number of storage locations?

Nope. I mean "number of storage locations" (i.e., memory usage). See above
the reason.

Post by Marcos Simplicio
store nothing and simply recompute sponge states when they are required
(which places T at the exponent,

Yes.

Post by Marcos Simplicio
leading to the Rl^{T+1} processing cost).

I'm not sure this exact formula is correct, because the Rows Loop
updates random rows, so some rows get updated more than T times (and
some fewer than T times, or none at all). For the rows that get updated
more than T times, you have a higher exponent there, and they probably
dominate the total running time. Thus, I think Rl^{T+1} is an
under-estimate.
Overall, I think the values of T recommended in your paper are too high.
I think potentially optimal values are in the range of
slightly-higher-than-0 (Bill chose 1% for NOELKDF, which means T=0.01,
albeit along with revised memory filling loop to have TMTO resistance
via it as well) to 3, and they don't even have to be whole numbers (if
you join the T and R loops into one, turning T into a multiplier for the
iteration count). As Lyra is currently defined, T=1 may well be optimal.

I fully agree that our analysis is quite conservative, which is good from
a security perspective but definitely not so much for performance. Your
T*R approach is indeed interesting and just a small tweak: we decided not
to provide it in the algorithm just because it would make the analysis
more cumbersome, but maybe we were too picky...