Discussion:
Password Scrambling
Christian Forler
2013-01-18 21:13:30 UTC
Permalink
Hello crypt-dev,

I'm a (junior) research assistant in the field of symmetric crypto,
especially provable security.

Why I'm here?

Last year I met Jacob Sparre Andersen. He had made myself aware of the
fact that a subset of the BSD community is looking for a new password
scrambling algorithms. The algorithm that is currently used seems to do
not provide adequate resistance against GPU based password recovery
attacks.

After that, I discussed this topic within our research group
(http://www.uni-weimar.de/cms/medien/mediensicherheit/home.html). Since
then we have had a couple of nifty ideas. Actually, we have just started
to write a paper about this stuff.

This week at ESC'13 Jean-Philippe told me that there are is some
activity about this topic within the security community, and that there
will be an upcoming password scrambler competition. So, I'm here. :-)

Short story:
I'm here thanks to JP and some accidental coincidences.


Anyway! In the next couple of weeks, we will write an academic paper
introducing a new password scrambler (key derivation function). After
that, I will try to supply you with an abbreviated version of our
extended abstract, if desired.


Best regards,
Christian
Colin Percival
2013-01-18 23:54:00 UTC
Permalink
Hi Christian,
Post by Christian Forler
Anyway! In the next couple of weeks, we will write an academic paper
introducing a new password scrambler (key derivation function). After
that, I will try to supply you with an abbreviated version of our
extended abstract, if desired.
I'd be happy to see this. I assume you're familiar with my work on scrypt.

Colin Percival
Christian Forler
2013-01-19 08:27:33 UTC
Permalink
Am 19.01.2013 00:54, schrieb
Post by Colin Percival
Hi Christian,
Post by Christian Forler
Anyway! In the next couple of weeks, we will write an academic paper
introducing a new password scrambler (key derivation function). After
that, I will try to supply you with an abbreviated version of our
extended abstract, if desired.
I'd be happy to see this. I assume you're familiar with my work on scrypt.
Of course, I'm familiar with scrypt. You did a great job. Your idea of
using a memory-hard algorithm was beautiful.

For us scrypt was a great start, and you can bet that we will discuss
scrypt in our upcoming paper.

BTW I have two questions regarding scrypt.
1) Why using two different crypto primitives, i.e., Salsa/20 (MFcrypt)
and SHA-1 (PBKDF2), instead of one?
2) Why is PBKDF2 called twice and not once?


Nevertheless, in IMHO is scrypt superior to all other common password
scrambling algorithms like md5crypt, crypt, PBKDF1/2, bcrypt, etc.



Best regards,
Christian
Colin Percival
2013-01-19 08:39:08 UTC
Permalink
Post by Christian Forler
Am 19.01.2013 00:54, schrieb
Post by Colin Percival
Post by Christian Forler
Anyway! In the next couple of weeks, we will write an academic paper
introducing a new password scrambler (key derivation function). After
that, I will try to supply you with an abbreviated version of our
extended abstract, if desired.
I'd be happy to see this. I assume you're familiar with my work on scrypt.
Of course, I'm familiar with scrypt. You did a great job. Your idea of
using a memory-hard algorithm was beautiful.
Note that it needs to be *sequential* memory-hard, not just memory-hard -- it's
easy to construct functions which need a lot of RAM to compute, but much harder
to construct functions which require a lot of RAM *and* cannot be sped up by
using O(N) CPUs. In hardware, of course, extra CPUs are "free".
Post by Christian Forler
For us scrypt was a great start, and you can bet that we will discuss
scrypt in our upcoming paper.
BTW I have two questions regarding scrypt.
1) Why using two different crypto primitives, i.e., Salsa/20 (MFcrypt)
and SHA-1 (PBKDF2), instead of one?
I used salsa20/8 in the sequential memory-hard component because it gave
the best strength against hardware attack -- because it's both fast in
software and slow(ish) in hardware.

I used PBKDF2-SHA256 in the "wrapper" because it's a standard and well
trusted construction.
Post by Christian Forler
2) Why is PBKDF2 called twice and not once?
Because I wanted to allow scrypt to take arbitary input and output sizes,
while having the sequential memory-hard component work with a fixed block
size. I don't use PBKDF2 for any computational hardness; rather, it's a
safe way to spread entropy around.
Post by Christian Forler
Nevertheless, in IMHO is scrypt superior to all other common password
scrambling algorithms like md5crypt, crypt, PBKDF1/2, bcrypt, etc.
I wouldn't have bothered if it wasn't. ;-)

Colin Percival
Christian Forler
2013-01-19 09:18:59 UTC
Permalink
Post by Colin Percival
Post by Christian Forler
Of course, I'm familiar with scrypt. You did a great job. Your idea of
using a memory-hard algorithm was beautiful.
Note that it needs to be *sequential* memory-hard, not just memory-hard -- it's
easy to construct functions which need a lot of RAM to compute, but much harder
to construct functions which require a lot of RAM *and* cannot be sped up by
using O(N) CPU. In hardware, of course, extra CPUs are "free".
In the password recovery scenario, an adversary A tries to compute as
many password candidates as possible in a given time t.

Let's say A has access to multiple CPUs. Then there are two extreme cases:

1) A can test one candidate per CPU in parallel or
2) A can parallelize the KDF function and test the candidates in a
sequential order.


In both cases, I would assume that the number of computed password
candidates is similar apart a constant factor c. Or is this not the case?


Best regards,
Christian
Colin Percival
2013-01-19 09:39:27 UTC
Permalink
Post by Christian Forler
1) A can test one candidate per CPU in parallel or
2) A can parallelize the KDF function and test the candidates in a
sequential order.
In both cases, I would assume that the number of computed password
candidates is similar apart a constant factor c. Or is this not the case?
If he has enough RAM, yes. But case (1) requires N times as much RAM as
case (2), where N is the number of CPUs, since each CPU needs its own
memory rather than all the CPUs sharing. Since the point of memory-hard
functions is to maximize the area-time cost of the RAM used, this makes
a big difference in the overall cost.

Colin Percival
Christian Forler
2013-09-02 16:58:30 UTC
Permalink
On 18.01.2013 22:13, Christian Forler wrote:
[...]
Post by Christian Forler
Anyway! In the next couple of weeks, we will write an academic paper
introducing a new password scrambler (key derivation function). After
that, I will try to supply you with an abbreviated version of our
extended abstract, if desired.
Sorry, for the long delay. But this work took a little longer than
expected. Nevertheless, we came up with Catena, a new memory-hard
password scrambler based on the bit reversal function. A detailed
description of our scheme is available on eprint
(http://eprint.iacr.org/2013/525).

We hope you enjoy this work, and we look forward to fruitful
discussions, comments, and criticism (on this mailing list).

Kind regards,
Christian
Jeremi Gosney
2013-09-02 17:05:42 UTC
Permalink
Post by Christian Forler
Nevertheless, we came up with Catena, a new memory-hard
password scrambler based on the bit reversal function. A detailed
description of our scheme is available on eprint
(http://eprint.iacr.org/2013/525).
Any plans to submit this to the PHC (https://password-hashing.net) ?
Christian Forler
2013-09-02 17:16:22 UTC
Permalink
Post by Jeremi Gosney
Post by Christian Forler
Nevertheless, we came up with Catena, a new memory-hard
password scrambler based on the bit reversal function. A detailed
description of our scheme is available on eprint
(http://eprint.iacr.org/2013/525).
Any plans to submit this to the PHC (https://password-hashing.net) ?
Yes, we planed to submit Catena or a improved version of it to the PHC.


So long,
Christian
CodesInChaos
2013-09-03 18:00:58 UTC
Permalink
Post by Christian Forler
Nevertheless, we came up with Catena, a new memory-hard
password scrambler based on the bit reversal function. A detailed
description of our scheme is available on eprint
(http://eprint.iacr.org/2013/525).
Doesn't the standard "store every 1/sqrt(n)th value and recompute the rest
using parallel cores" attack using a parallel computer break this?

In phase 1 you output values in the order 1 to n.
In phase 2 you iterate the values in some other, but fixed order.

Now an attacker uses two different machines for those two phases:

Phase 1: run phase 1 normally, but only store every 1/sqrt(n) th value in
the natural order of phase 2 for storage. This step runs in time n and
needs memory sqrt(n) for a total cost of n^1.5.
Phase 2: iterate in the standard phase 2 order. You can recompute elements
you didn't store in time sqrt(n), but since you can use sqrt(n) parallel
cores and memory access is predictable, those elements are available
instantly when the 1 sequential iterator needs them. Runs in time n and
needs sqrt(n) space.

Total cost is n^1.5. But that contradicts your security claim (and proof)
that space*time is n^2 on a parallel random access machine i.e. that the
function is sequentially memory? Why doesn't this attack work?
Solar Designer
2013-12-24 00:49:58 UTC
Permalink
Christian (one or/and the other), all -
Post by CodesInChaos
Post by Christian Forler
Nevertheless, we came up with Catena, a new memory-hard
password scrambler based on the bit reversal function. A detailed
description of our scheme is available on eprint
(http://eprint.iacr.org/2013/525).
Doesn't the standard "store every 1/sqrt(n)th value and recompute the rest
using parallel cores" attack using a parallel computer break this?
What's the current understanding on this? I think the attack does work
against Catena, although I only skimmed over the paper. Does this mean
some of the proofs in the paper are flawed?

Not indexing by secret is very nice, but I have yet to see a O(N^2) KDF
that avoids indexing by secret. With typical/expected uses of scrypt
(and bcrypt) the problem happens to be mitigated by the fact that the
salt is normally not known to an attacker unless the hash is also known.
In either case (salt not known or salt+hash are both known), the
side-channel attack is almost irrelevant. It can be used to infer which
user account a password is being tested against, but nothing about the
password and the hash.
Post by CodesInChaos
In phase 1 you output values in the order 1 to n.
In phase 2 you iterate the values in some other, but fixed order.
Phase 1: run phase 1 normally, but only store every 1/sqrt(n) th value in
the natural order of phase 2 for storage. This step runs in time n and
needs memory sqrt(n) for a total cost of n^1.5.
Phase 2: iterate in the standard phase 2 order. You can recompute elements
you didn't store in time sqrt(n), but since you can use sqrt(n) parallel
cores and memory access is predictable, those elements are available
instantly when the 1 sequential iterator needs them. Runs in time n and
needs sqrt(n) space.
Total cost is n^1.5. But that contradicts your security claim (and proof)
that space*time is n^2 on a parallel random access machine i.e. that the
function is sequentially memory? Why doesn't this attack work?
Other assorted feedback on Catena (again, based on having skimmed over
the paper only):

Of the two properties "marketed" as unique to Catena - Server Relief and
Client-Independent Updates - I see most value in a sub-property of the
latter, namely being able to increase not only processing cost, but also
memory cost, without knowing the plaintext password. However, I do not
(yet) fully understand how and whether this sub-property is achieved.
I think reviewing and experimenting with an actual implementation might
help me understand and evaluate this in less time than (re-)reading the
paper for real would. Is an implementation publicly available? (In any
case, it's not long until it must be, for PHC.)

I had very briefly commented on this desirable property here:

http://lists.randombit.net/pipermail/cryptography/2012-November/003451.html

"A much trickier task: support upgrades to a higher memory cost for the
already-computed iterations. Sounds impossible at first? Not quite.
This would probably require initial use of some secret component
(allowing for a lower-memory shortcut) and then dropping it at upgrade
time."

Does Catena drop a previously-available part of the salt+hash for the
memory cost upgrades? Does it achieve the memory cost increase "for the
already-computed iterations", or maybe only for added iterations?
Or is the wording I had used somehow not applicable to Catena?

As to running time upgrades, they're very nice to have, but are
implementable on top of any existing KDF by rolling its output into
another invocation of it. The advantage of having it built-in (and the
disadvantage of not having that in most other KDFs) is only in being
able to say simply that "we use [e.g.] Catena" rather than "we use
[e.g.] scrypt with its output passed back to input n times, as encoded
along with each hash in a custom way in our database".

A disadvantage of having the running time upgrades built-in, and not as
an additional optional outermost loop (like described above in the
around-scrypt example), is that one of the following must be true:

1. The inner workings of the KDF - the part that provides its area-time
cost for attackers - must use crypto strong enough to represent the
cryptographic strength of the entire KDF. This no longer can be some
weaker "non-cryptographic" data mixing (whatever works best in area-time
terms), or we're no longer able to easily and convincingly show that our
KDF as a whole is not less cryptographically secure than e.g.
PBKDF2-HMAC-SHA-*. As a result, we might have to compromise in terms of
the achieved area-time cost (albeit only by a constant factor).

2. The KDF must have some let's say sequence points, at which
data-mixing is interleaved with strong crypto. These will be potential
cost upgrade points. If we're not going to record them along with each
hash (or encrypted filesystem, etc.), separately for each cost upgrade
actually performed, then we need to have them pre-programmed into the
KDF, to occur at regular intervals. This will likely reduce the KDF's
memory cost (the area component in the area-time cost for attacker).

Additionally:

3. At least with straightforward approaches, the memory cost is also
reduced by having to compact the internal state to match the KDF output
size (or less) at the sequence points mentioned above. This may be
avoided in the way I had hinted at in the cryptography list posting I
referenced above (discard portions of secret inputs when upgrading).
Does Catena use this approach? Actually, the paper sounds like it does.
Does this also happen to avoid the unfortunate choice between drawbacks
1 and 2 above, or does one of them apply to Catena (I guess it'd be
number 1 if so)?

As to the proposed Server Relief feature and algorithm, it is also
implementable on top of any other password hashing scheme, by wrapping
it in a fast hash (and this had been suggested on some occasions).
If custom processing or protocol like that can be implemented for a
given deployment, then chances are that the same server could also offer
e.g. SCRAM, which would also protect against replay (and would not make
use of the slow hash having Server Relief as a standard feature). So I
am not sure if there's much value in having this property built into the
hash. OK to have it for free, but implementing it is relevant to the
drawbacks 1 vs. 2 above.

Finally, the paper appears to use the word stretching to refer to both
iterations and throwing salt bits away. IIRC, the Kelsey et al. paper
(reference [14] in the Catena paper) established the term stretching to
refer only to the iterations, and another term - strengthening - to
refer to dropping of salt bits. Also, I previously saw the term pepper
as referring to a possible site-specific local parameter (site salt),
whereas the Catena paper uses it differently. It could be preferable to
stick to the way these terms had been used historically.

I am sorry if I sound too negative. I actually appreciate the effort
very much, and it might influence my own work. It's just that I think
feedback on what might be drawbacks can be most helpful.

I am also sorry for not having read the paper thoroughly yet. I thought
that it could help if I go ahead and comment anyway, instead of delaying
this to possibly after the PHC submission deadline (as I don't expect to
look into this much more closely until we're done with our own submission).

Thanks,

Alexander
Christian Forler
2013-12-24 10:52:46 UTC
Permalink
Post by Solar Designer
Christian (one or/and the other), all -
Post by CodesInChaos
Post by Christian Forler
Nevertheless, we came up with Catena, a new memory-hard
password scrambler based on the bit reversal function. A detailed
description of our scheme is available on eprint
(http://eprint.iacr.org/2013/525).
Doesn't the standard "store every 1/sqrt(n)th value and recompute the rest
using parallel cores" attack using a parallel computer break this?
What's the current understanding on this? I think the attack does work
against Catena, although I only skimmed over the paper. Does this mean
some of the proofs in the paper are flawed?
Yes, the attack is working, but it does not invalidate any of our
security proofs.

In the meantime we have improved our design. The time-memory trade-off
of our current unpublished version ha T = O(G^4/S^3).

This means, having G^1.5 cores you need G^2.5 operations to
compute a hash with G^1/2 memory. For G=2^20 you need
about one billion (2^30) cores and 2^50 operations to compute it with
2^10 memory.
Post by Solar Designer
Not indexing by secret is very nice, but I have yet to see a O(N^2) KDF
that avoids indexing by secret. With typical/expected uses of scrypt
(and bcrypt) the problem happens to be mitigated by the fact that the
salt is normally not known to an attacker unless the hash is also known.
I'm confused. %-/
IMHO the hash and the salt are public knowledge, or at least should be
treated as public knowledge.
Post by Solar Designer
In either case (salt not known or salt+hash are both known), the
side-channel attack is almost irrelevant. It can be used to infer which
user account a password is being tested against, but nothing about the
password and the hash.
We currently working on a basic proof-of-concept implementation of an
password sieve for scrypt. It is exploiting its cache-timing
vulnerability. For you it might look irrelevant but the history of
side-channel attacks have shown that most of them can be useful in the
long run. Just saying.


[square-root attack]
Indeed, this attack is working. This is why we will soon release a
updated version with T(g) = O(G^4/S(g)^3).
Post by Solar Designer
Other assorted feedback on Catena (again, based on having skimmed over
Of the two properties "marketed" as unique to Catena - Server Relief and
Client-Independent Updates - I see most value in a sub-property of the
latter, namely being able to increase not only processing cost, but also
memory cost, without knowing the plaintext password.
We claim that Catena is the first scheme that has biuld-in support for
both server relief and client-independent-updates.
Post by Solar Designer
However, I do not
(yet) fully understand how and whether this sub-property is achieved.
I think reviewing and experimenting with an actual implementation might
help me understand and evaluate this in less time than (re-)reading the
paper for real would. Is an implementation publicly available? (In any
case, it's not long until it must be, for PHC.)
There is a good chance that we will publish our new version including a
reference implementation in the next couple of days.
Post by Solar Designer
http://lists.randombit.net/pipermail/cryptography/2012-November/003451.html
"A much trickier task: support upgrades to a higher memory cost for the
already-computed iterations. Sounds impossible at first? Not quite.
This would probably require initial use of some secret component
(allowing for a lower-memory shortcut) and then dropping it at upgrade
time."
Sound like CI-update. :-)
Post by Solar Designer
Does Catena drop a previously-available part of the salt+hash for the
memory cost upgrades? Does it achieve the memory cost increase "for the
already-computed iterations", or maybe only for added iterations?
Only for the added. But to compute the additional iteration you
need about the doubled amount of effort (memory and time) as for
computing all the other iterations together. The cost per
iteration doubles. To compute the i-th round you need O(2^i) memory and
O(2^i) time.
Post by Solar Designer
Or is the wording I had used somehow not applicable to Catena?
For a CI-update from cost=g to cost=g+j you need at least old hash form
h_g = Catena_g(salt, pwd, ...) and O(2^(g+j)) memory + Time O(2^(g+j))
to compute h_{g+j] = CI-Update(h_g. g, g+j) = Catena_{g+j}(salt,
pwd,...). Therefore, neither salt or pwd is required.
Post by Solar Designer
As to the proposed Server Relief feature and algorithm, it is also
implementable on top of any other password hashing scheme, by wrapping
it in a fast hash (and this had been suggested on some occasions).
Sure, and this is quite cool. :-)
Post by Solar Designer
If custom processing or protocol like that can be implemented for a
given deployment, then chances are that the same server could also offer
e.g. SCRAM, which would also protect against replay (and would not make
use of the slow hash having Server Relief as a standard feature). So I
am not sure if there's much value in having this property built into the
hash. OK to have it for free, but implementing it is relevant to the
drawbacks 1 vs. 2 above.
In Catena it is part of the specification. We explicitly mentioned it.
There is a gap between specified and possible.
Post by Solar Designer
Finally, the paper appears to use the word stretching to refer to both
iterations and throwing salt bits away. IIRC, the Kelsey et al. paper
(reference [14] in the Catena paper) established the term stretching to
refer only to the iterations, and another term - strengthening - to
refer to dropping of salt bits.
"Key stretching is about adding a specific number of bits to a keysearch
or bruteforce attack of some kind." ([14], Chapter 2)

Iterations is only one way to implement key stretching.
Post by Solar Designer
Also, I previously saw the term pepper as referring to a possible
site-specific local parameter (site salt), whereas the Catena paper
uses it differently. It could be preferable to stick to the way
these terms had been used historically.
"As advertized, our scheme is very simple. We deploy two salts, one
public and one secret. The public salt is exactly the same as the
current one." ([19], Chapter 3)

In our paper, we denote "pepper" as secret salt. If you really mind we
can replace pepper by "secret salt".
Post by Solar Designer
I am sorry if I sound too negative.
It is fine by me.
Post by Solar Designer
I actually appreciate the effort very much, and it might influence my
own work.
Cool. Do not forget to refer us. :-)
Post by Solar Designer
It's just that I think feedback on what might be drawbacks can be
most helpful.
Indeed it is.
Post by Solar Designer
I am also sorry for not having read the paper thoroughly yet.
Do not worry. Save your time for the improved version. :-)
Post by Solar Designer
I thought that it could help if I go ahead and comment anyway, instead
of delaying this to possibly after the PHC submission deadline (as I
don't expect to look into this much more closely until we're done
with our own submission).
I wish you success for your submission. May the best scheme win. :-)
We are eager to see all the other designs. We hope that there are tons
of cool ideas.

BTW. Sorry for my brief responds. I'm currently sick at home. :-(
Maybe I give you an extended version in a couple of days. :)

Best regards,
Christian
Solar Designer
2013-12-26 23:27:59 UTC
Permalink
Christian,

Thank you for your prompt response! The brevity is OK.
Post by Christian Forler
In the meantime we have improved our design. The time-memory trade-off
of our current unpublished version ha T = O(G^4/S^3).
This means, having G^1.5 cores you need G^2.5 operations to
compute a hash with G^1/2 memory. For G=2^20 you need
about one billion (2^30) cores and 2^50 operations to compute it with
2^10 memory.
This is very nice. Is this something you're able to prove (that no
better TMTO exists)?
Post by Christian Forler
Post by Solar Designer
Not indexing by secret is very nice, but I have yet to see a O(N^2) KDF
that avoids indexing by secret. With typical/expected uses of scrypt
(and bcrypt) the problem happens to be mitigated by the fact that the
salt is normally not known to an attacker unless the hash is also known.
I'm confused. %-/
IMHO the hash and the salt are public knowledge, or at least should be
treated as public knowledge.
In one of the relevant threat models (perhaps the most relevant one),
yes. But not in all of them.

Let's talk about this one:

When the hash and the salt are known, offline attacks on them are
possible. The remaining unknowns are the password and the internal
state at an earlier point in hash computation (before the final hash is
produced). A timing attack may then be useful to determine this
internal state in order to speedup offline password cracking (allow for
early reject). If we determine a few bits of internal state as of e.g.
1/10th of the computation of the hash of the correct password, we've
made offline password cracking almost 10 times faster (and we've
probably reduced the area-time cost of attack by more than 10x, since
the algorithm's memory usage is lower early on).

So I agree that there's some relevance of this attack. A mitigating
factor (possibly insufficient) is that the attack will have to be
passive - on authentication attempts made with the correct password.
The attacker (usually) can't trigger such authentication attempts at
will (although there are exceptions to that).

Basically, hashing schemes with indexing by secret may be vulnerable to
local timing attacks (from another operating system account or VM on the
same machine) after the hashes have leaked. Such timing attacks may be
used to assist in offline password cracking.

Thinking along these lines, we arrive at a new mitigation idea (I have
not seen it mentioned before[*]): start indexing by secret e.g. half way
through computation of a hash. Then the attack will allow for a 2x
speedup only (speaking in terms of time rather than area-time for now).
In fact, this is already the case in scrypt. :-)

[*] This mitigation idea is similar to the one I had mentioned here,
albeit in different context:
http://www.openwall.com/presentations/ZeroNights2012-New-In-Password-Hashing/mgp00016.html

In area-time terms, unfortunately, some internal state leaks from scrypt
right after SMix's first loop may result in O(N) attack cost, despite of
the area-time cost of that loop (if fully implemented) being O(N^1.5).
That's because the attacker wouldn't need to store V_i if the second
loop is bypassed (if we get early-reject after the first few iterations
of the second loop).

So, yes, things are not great, and there's room for improvement here.
Simply introducing a third loop that would read V elements in a
non-secret order, to be run inbetween SMix's existing two loops, would
keep the attack cost at O(N^1.5).

If we couldn't do better than O(N^1.5) without indexing by secret
anyway, this would possibly be the best solution. But it sounds like
you managed to do better.
Post by Christian Forler
Post by Solar Designer
In either case (salt not known or salt+hash are both known), the
side-channel attack is almost irrelevant. It can be used to infer which
user account a password is being tested against, but nothing about the
password and the hash.
We currently working on a basic proof-of-concept implementation of an
password sieve for scrypt. It is exploiting its cache-timing
vulnerability. For you it might look irrelevant but the history of
side-channel attacks have shown that most of them can be useful in the
long run. Just saying.
I admit I had made an overly bold statement above, thinking that
realistically someone with a copy of the hashes would work on them
solely offline (like it's been happening so far, despite of e.g. bcrypt
theoretically allowing for similar timing attacks) and not bother
passively timing authentication attempts on the server (and possibly
having too few per account for the attack duration). I would be
surprised to see someone mount that attack in the wild - not because it
is impossible (it can be demo'ed, and it's great that you're going to do
just that) - but because it is probably usually impractical, because of
a combination of reasons and constraints.

If we can deal with the attack without compromising on other properties
of the hashing scheme, this is definitely worth considering. It sounds
like you may have achieved just that. :-)

For example, I oppose secret-dependent branching for that reason
(including e.g. in a discussion around SunMD5 years ago) - we can
achieve our desirable properties without that, so let's not take the
risk of having this extra side-channel.

However, if we have to compromise on real-world relevant security
properties in order to mitigate an attack that is unlikely to ever be
used in the wild and with actual benefit to the attacker (or loss for
the defender), then I am not sure this is worth it. I wouldn't
currently trade O(N^2) with indexing by secret for O(N^1.5) without, at
least not unconditionally (supporting this as an option may be OK).
Post by Christian Forler
We claim that Catena is the first scheme that has biuld-in support for
both server relief and client-independent-updates.
This may well be true.

A possible improvement upon your server relief: allow for recovery from
a compromise of a sniffed hash by computing 2 (and so on) final hash
iterations on the server, and accordingly fewer on the client (the server
admin would bump this number when needed). This is essentially S/Key.
Unfortunately, this reduces the memory-hard portion.
Post by Christian Forler
Post by Solar Designer
Does Catena drop a previously-available part of the salt+hash for the
memory cost upgrades? Does it achieve the memory cost increase "for the
already-computed iterations", or maybe only for added iterations?
Only for the added. But to compute the additional iteration you
need about the doubled amount of effort (memory and time) as for
computing all the other iterations together. The cost per
iteration doubles. To compute the i-th round you need O(2^i) memory and
O(2^i) time.
Great, but this is implementable on top of e.g. scrypt, with similar
properties, except that the original password and salt are required due
to how scrypt is defined at high level (the PBKDF2 steps). Right?
Post by Christian Forler
There is a gap between specified and possible.
I agree.
Post by Christian Forler
Post by Solar Designer
Finally, the paper appears to use the word stretching to refer to both
iterations and throwing salt bits away. IIRC, the Kelsey et al. paper
(reference [14] in the Catena paper) established the term stretching to
refer only to the iterations, and another term - strengthening - to
refer to dropping of salt bits.
"Key stretching is about adding a specific number of bits to a keysearch
or bruteforce attack of some kind." ([14], Chapter 2)
Iterations is only one way to implement key stretching.
OK, I may have recalled incorrectly (I did not re-check now). I was
under impression that after that paper some folks started using the
terms stretching vs. strengthening to refer to these different approaches
(and stretching won).
Post by Christian Forler
Post by Solar Designer
Also, I previously saw the term pepper as referring to a possible
site-specific local parameter (site salt), whereas the Catena paper
uses it differently. It could be preferable to stick to the way
these terms had been used historically.
"As advertized, our scheme is very simple. We deploy two salts, one
public and one secret. The public salt is exactly the same as the
current one." ([19], Chapter 3)
In our paper, we denote "pepper" as secret salt. If you really mind we
can replace pepper by "secret salt".
No, "pepper" is fine. Your use is not even all that different.

I use the term "local parameter" for the site-specific salt, so there's
no confusion here.
Post by Christian Forler
We are eager to see all the other designs. We hope that there are tons
of cool ideas.
Yeah. I am even wondering if we maybe should propose a change to PHC
timeline, introducing an extra round of submissions so that there's a
chance for all to reuse ideas from others' submissions (with due
credit). "Tweaks" are going to be allowed, but in some cases this could
be more than mere tweaks.
Post by Christian Forler
I'm currently sick at home. :-(
Get well soon!

Thanks again,

Alexander
Christian Forler
2013-12-31 14:38:04 UTC
Permalink
Post by Solar Designer
Post by Christian Forler
In the meantime we have improved our design. The time-memory trade-off
of our current unpublished version ha T = O(G^4/S^3).
This means, having G^1.5 cores you need G^2.5 operations to
compute a hash with G^1/2 memory. For G=2^20 you need
about one billion (2^30) cores and 2^50 operations to compute it with
2^10 memory.
This is very nice. Is this something you're able to prove (that no
better TMTO exists)?
Yes we have a pebble-game proof. BTW. our new design is based on two
security parameters: 1) the garlic value which reflects the
memory-consumption of the algorithm (i.e., about G=2^g units of memory
are needed to compute Catena with a minimal amount of operations.) 2)
the depth d which influences both the total amount of operations and
the TMTO).

For a fixed depth d, we claim that our new password scrambler Catena-d
(e.g, Catena-1 is specified in the current eprint version) has a TMTO of
about T = O( G^{d+1} / S^d). This conjecture is already proven for
d={1,2,3}. It is quite easy to extend the proof for d=4, but
unfortunately we have no generic prove (for arbitrary values) yet;
this is still a open research problem.
Post by Solar Designer
Post by Christian Forler
I'm confused. %-/
IMHO the hash and the salt are public knowledge, or at least should be
treated as public knowledge.
In one of the relevant threat models (perhaps the most relevant one),
yes. But not in all of them.
[...]


OK. but it is sufficient to show that a password hashing scheme X is
secure against adversaries knowing the salt. Since then X is also
(implicit) secure against adversaries that do not know the salt.

Reason: Assume we have an adversary A which knows the salt and
can efficiently reconstruct the password from a hash. Then,
it quite easy to a construct a successful adversary B which knows the
salt. B strategy: 1) Invoke A with the hash, 2) return the
result of A.

Therefore, we should focus on the class of the mightiest adversaries
(i.e, off-line adversaries that knows everything beside the password).
Post by Solar Designer
Post by Christian Forler
Post by Solar Designer
In either case (salt not known or salt+hash are both known), the
side-channel attack is almost irrelevant. It can be used to infer which
user account a password is being tested against, but nothing about the
password and the hash.
We currently working on a basic proof-of-concept implementation of an
password sieve for scrypt. It is exploiting its cache-timing
vulnerability. For you it might look irrelevant but the history of
side-channel attacks have shown that most of them can be useful in the
long run. Just saying.
I admit I had made an overly bold statement above, thinking that
realistically someone with a copy of the hashes would work on them
solely offline (like it's been happening so far, despite of e.g. bcrypt
theoretically allowing for similar timing attacks) and not bother
passively timing authentication attempts on the server (and possibly
having too few per account for the attack duration). I would be
surprised to see someone mount that attack in the wild - not because it
is impossible (it can be demo'ed, and it's great that you're going to do
just that) - but because it is probably usually impractical, because of
a combination of reasons and constraints.
Sure, there is a very good chance that this attack will never be
mounted in the wild. But this is true for the vast majority of published
cryptanalytic attacks. :-)

It is all about properties that a next generation password scrambler
should (not) have, and the PHC is the best stage to discuss this.
Actually, one of the major design goal for Catena was to build a scheme
where the memory-access pattern does not depend on the password.
Post by Solar Designer
If we can deal with the attack without compromising on other properties
of the hashing scheme, this is definitely worth considering. It sounds
like you may have achieved just that. :-)
We have compromised the sequential-memory hardness property. I think
this can not be achieved by a static memory-access pattern.
Post by Solar Designer
For example, I oppose secret-dependent branching for that reason
(including e.g. in a discussion around SunMD5 years ago) - we can
achieve our desirable properties without that, so let's not take the
risk of having this extra side-channel.
However, if we have to compromise on real-world relevant security
properties in order to mitigate an attack that is unlikely to ever be
used in the wild and with actual benefit to the attacker (or loss for
the defender), then I am not sure this is worth it. I wouldn't
currently trade O(N^2) with indexing by secret for O(N^1.5) without, at
least not unconditionally (supporting this as an option may be OK).
I agree. Actually, this is the reason why we upgraded our old design.
Post by Solar Designer
Post by Christian Forler
We claim that Catena is the first scheme that has biuld-in support for
both server relief and client-independent-updates.
This may well be true.
A possible improvement upon your server relief: allow for recovery from
a compromise of a sniffed hash by computing 2 (and so on) final hash
iterations on the server, and accordingly fewer on the client (the server
admin would bump this number when needed). This is essentially S/Key.
Unfortunately, this reduces the memory-hard portion.
Sure, you can do this. It is a kind of generalization. I doubt that our
PHC submission will support this behavior.
Post by Solar Designer
Post by Christian Forler
Post by Solar Designer
Does Catena drop a previously-available part of the salt+hash for the
memory cost upgrades? Does it achieve the memory cost increase "for the
already-computed iterations", or maybe only for added iterations?
Only for the added. But to compute the additional iteration you
need about the doubled amount of effort (memory and time) as for
computing all the other iterations together. The cost per
iteration doubles. To compute the i-th round you need O(2^i) memory and
O(2^i) time.
Great, but this is implementable on top of e.g. scrypt, with similar
properties, except that the original password and salt are required due
to how scrypt is defined at high level (the PBKDF2 steps). Right?
I'm not quite sure, sorry. We were more focused in "tweaking" the
awesome ROMix algorithm regarding our design goals rather than improving
scrypt.


[...]
Post by Solar Designer
Post by Christian Forler
In our paper, we denote "pepper" as secret salt. If you really mind we
can replace pepper by "secret salt".
No, "pepper" is fine. Your use is not even all that different.
I use the term "local parameter" for the site-specific salt, so there's
no confusion here.
OK. Fine.
Post by Solar Designer
Post by Christian Forler
We are eager to see all the other designs. We hope that there are tons
of cool ideas.
Yeah. I am even wondering if we maybe should propose a change to PHC
timeline, introducing an extra round of submissions so that there's a
chance for all to reuse ideas from others' submissions (with due
credit). "Tweaks" are going to be allowed, but in some cases this could
be more than mere tweaks.
Yes you should propose to introduce a extra round to the PHC since the
timeline is quite tight. Besides, I really do like the idea of an
additional round.

Best regards,
Christian
Stephen Touset
2013-12-29 20:38:01 UTC
Permalink
Post by Christian Forler
Post by Solar Designer
Christian (one or/and the other), all -
Post by CodesInChaos
Post by Christian Forler
Nevertheless, we came up with Catena, a new memory-hard
password scrambler based on the bit reversal function. A detailed
description of our scheme is available on eprint
(http://eprint.iacr.org/2013/525).
Doesn't the standard "store every 1/sqrt(n)th value and recompute the rest
using parallel cores" attack using a parallel computer break this?
What's the current understanding on this? I think the attack does work
against Catena, although I only skimmed over the paper. Does this mean
some of the proofs in the paper are flawed?
Yes, the attack is working, but it does not invalidate any of our
security proofs.
I believe this is because your security proofs only prove that Catena is a memory-hard algorithm and *not* a sequentially memory-had function. Your proof (at least as of the time I read it) involved a pebble game, but not with a ruleset extended to reflect multiple cores.

Correct?


Stephen Touset
***@touset.org
Christian Forler
2013-12-31 14:58:48 UTC
Permalink
Post by Stephen Touset
Post by Christian Forler
Post by Solar Designer
Post by CodesInChaos
Doesn't the standard "store every 1/sqrt(n)th value and
recompute the rest using parallel cores" attack using a
parallel computer break this?
What's the current understanding on this? I think the attack
does work against Catena, although I only skimmed over the paper.
Does this mean some of the proofs in the paper are flawed?
Yes, the attack is working, but it does not invalidate any of our
security proofs.
I believe this is because your security proofs only prove that Catena
is a memory-hard algorithm and *not* a sequentially memory-had
function. Your proof (at least as of the time I read it) involved a
pebble game, but not with a ruleset extended to reflect multiple
cores.
Correct?
Yes, Catena is *NOT* sequentially memory-hard.

It is quite easy to extend our results of our pebble-game proof to
reflect multiple cores. Let c the number of cores, then we can compute
the multiple core TMTO using c cores be the following equation:

(1/c S^a) * (c T) = N^b.

For example, Considering the current eprint-version of Catena, we have
a=1 b=2, and therefore (1/c * S) (cT) = N^2.

For an adversary with c=N^1/2 cores and S=N^1/2 pebbles we have
T = N^1.5. This TMTO is quite good for the adversary. Therefore, we
redesigned Catena. The PHC submission will have a much worse TMTO.

Best regards,
Christian

CodesInChaos
2013-12-30 02:08:13 UTC
Permalink
The attack as I originally envisioned it doesn't work. Value
recalculation produces a tree which can be recomputed on the fly, but
that needs n not sqrt(n) cores, making it useless. I'm leaning towards
considering catena secure, but will need to think more about it.

Some other notes:

* It's important to take amortization into account. An attacker who
computes multiple hashes in parallel (with same or different salt) can
drop the average cost for some schemes. I don't think catena is one of
them, but the security definition doesn't seem to consider this. A
pathological example would be sha2(scrypt(salt) | pass)

* Constants matter. Choose software efficient primitives. The catena
update scheme reduces performance and security as well. Perhaps one
should specify the levels to compute as a list of indices, so one can
skip some. Or hardcode a factor 4 instead of 2.
Stephen Touset
2013-12-30 04:28:28 UTC
Permalink
Post by CodesInChaos
The attack as I originally envisioned it doesn't work. Value
recalculation produces a tree which can be recomputed on the fly, but
that needs n not sqrt(n) cores, making it useless. I'm leaning towards
considering catena secure, but will need to think more about it.
Does this mean it actually satisfies the requirements of a sequentially memory-hard function, then? I’d be extremely interested in seeing the reasoning you used to come to the conclusion that it’s *not* susceptible to a standard TMTO.


Stephen Touset
***@touset.org
Loading...