This book was distributed at the FIDO Authenticate conference in 2024. Its intended format was as a PDF, which you can find here.
The following is the contents of the PDF converted to HTML.
2024-12-23
Passwords are rubbish.
If you’re reading this book then hopefully you’re already on board with this idea, but let’s recap anyway.
The typical practice with passwords is to remember a few different ones and re-use them widely. (Password managers support generating random passwords, but people mostly don’t.) Sites must store hashes of these passwords to recognize them, but most passwords have too little entropy to resist brute-forcing when the hashes leak. (The website haveibeenpwned.com now has records of about 13.5 billion accounts that have been found in account database leaks from nearly 800 websites.)
When a password database leaks, not only can any successfully cracked passwords be used immediately to sign in to that site but, because of password re-use, those users’ accounts on other sites may also be compromised.
Next, because users typically remember their passwords, they can be tricked into entering them on lookalike websites. These “phishing” attacks are common, effective, and can have global implications when used to interfere in elections.
Lastly, passwords can leak from many other parts of the software stack. Facebook inadvertently logged hundreds of millions of passwords over many years, and Javascript-injection attacks can exfiltrate anything entered on a site, including passwords.
This book is about using public key signature schemes to try and build a better system of authentication. These schemes have names like ECDSA, RSA, and ML-DSA. They vary in how large their outputs are, how fast they operate, and whether they’re resistant to (still theoretical) large quantum computers. In this book we’ll consider them only in the abstract—the construction of public key signature schemes is a deep and fascinating topic, but we will cover none of it here.
In the abstract, a public key signature scheme provides three operations:
generate
operation takes some random bits and returns
two byte strings that we will call a public key and a private key.sign
operation that takes a private key and an
arbitrary byte string (called the “message”), and produces another byte
string that we will call a “signature”.verify
operation that takes a public key, a message,
and a claimed signature, and returns whether that signature was
generated by the sign
operation with the corresponding
private key.The names of these operations have obviously been chosen to suggest analogs in the real world, but that is a double-edged sword. These are not handwritten marks on a piece of paper; they are only defined by the operations above. For example, there can be multiple valid “signature” values by a public key for a given message. But while these subtleties are important in some contexts, like cryptocurrencies, they aren’t for us.
To be useful, a public key signature scheme must have properties like the following:
Those properties are vaguely defined. (Indeed, the second implies the first.) Readers are welcome to research the formal definition of security (“EUF-CMA”) to better understand the topic, but a rough understanding is all you need in this context.
Let’s sketch out a toy example of an authentication scheme using public key signatures.
In this sketch, rather than registering with a website by creating a
username and password, people will create a username and then their
computer will run the generate
operation, record the
private key, and submit the public key and username to the site.
In order to sign in, people will enter their username. Then their
computer will run the sign
operation with their private key
on the message “let me in”. It’ll send the username and signature to the
website. Lastly, the website uses the public key that it recorded when
the user signed up, and runs the verify
operation on that,
the message “let me in”, and the submitted signature. If the signature
is valid then the user is signed in.
We have immediately solved the problem of database leaks because now websites only need to record public keys, not password hashes. A public key can’t be used to generate a signature, only validate one. So, unlike passwords, when they leak, they can’t be used to sign in to that site (or any other).
But we haven’t solved the phishing problem because people might submit a signature value to the wrong site. Also, if an attacker learns a single signature value, they can use it to sign in as the user. Lastly, we breezed over where private keys are stored and how many there are.
Don’t worry, we’ll solve all of those issues with more realistic designs. Firstly, let’s deal with phishing.
Phishing involves an attacker proxying a user’s sign-in information. The victim mistakenly signs in to a fake website, and then that information can be reused with the real website. The toy scheme above was vulnerable to phishing because the message, “let me in”, was the same for all sites and so a signature was valid everywhere.
So our first design tweak is to change the signed message to name the intended destination. Let’s start using JSON for the message too.
Now, when a user signs in, their computer runs the sign
operation with the private key. But rather than using the message “let
me in”, we’ll make it {"origin": "https://example.com"}
.
The site needs to run the verify
operation on the resulting
signature, and the verify
operation requires the message as
an input, so we’ll also have the user’s computer send the message along
with the signature and username.
Now consider what happens when the user clicks a malicious link and
tries to sign in to exampl3.com
(an evil phishing website).
The user’s computer will sign the message
{"origin": "https://exampl3.com"}
. When the phishing site
proxies the sign-in details to the real website, the signature will be
valid, but the real website can notice that the user was attempting to
sign in to a different website and reject it. (Computers, unlike humans,
can reliably notice single-character differences in URLs.)
The phishing site can’t change the message because the real site will
reject the signature when verify
checks it, and they can’t
update the signature because they don’t have the private key.
So phishing has been solved with a small tweak, but we’re still left with the problem that if a signature value for the real site leaks then that can be used to sign in as a user. While, unlike password hashes, signature values don’t need to be stored, we’ve seen that similar values can be exfiltrated via Javascript-injection and inadvertent logging.
To solve phishing, we made the message specific to a given site. To solve this problem, we’ll make the message specific to an authentication attempt.
So our next tweak to the design will be that, when a user tries to sign in, the site will send a large random challenge to the user’s computer, to be included in the signed message. Each time a user attempts to sign in, the random challenge will be different (with extremely high probability).
So a signed message will now look like this:
{"origin": "https://example.com", "challenge": "8065afbaa4faee78123ad2061bc78df3"}
.
Now if a signature gets logged or exfiltrated by malicious JavaScript, it quickly becomes useless. A signature is only valid for a specific message, but the challenge (and thus the message) will be different in the future.
We still need to consider how many public keys a person has and where they are stored.
A simple answer would be that each person has a single public key and uses it across all sites and apps, but the obvious problem with that is that it is a unique tracking value for that person, and people don’t want to be linked across all their sites and apps.
For now, we will say that each website or app gets its own set of keys. Things are more complicated than this in practice, but we will cover these complexities in future chapters.
Unlike a password, the value of a private key never needs to be sent anywhere to be used. So for maximum security, we’ll start by generating and keeping private keys in dedicated hardware, usually connected via USB. This hardware can be designed to be resistant even to a degree of physical attack. Later in the book we’ll see how this can be relaxed so that these authentication schemes can be usable in a consumer context.
The next chapter will dive into the concrete and cover the nitty-gritty of the first implementation of this design, but we should keep in mind the limits of any authentication system:
In a digital context, people are always acting through their computer. While we talk about authenticating a user, the thing that directly gains authority as a result of authenticating is that user’s computer. So if that computer is controlled by an attacker, the authentication system is moot. Tackling the authentication problem does not solve all security issues, but many security issues are authentication problems, so better authentication systems are necessary part of fixing the world.
WebAuthn, the subject of this book, is such a system.
The first embodiment of the broad design that we sketched out in the previous chapter was the Universal Second Factor (U2F) system from the FIDO Alliance. The FIDO Alliance is a consortium of companies, all of whom care about the problems of online authentication, and they developed a pair of standards to try and solve phishing by adding a public-key authentication system as a second factor to sign-ins. (I.e. in addition to a password.)
The first of these standards, called CTAP1,
defines a protocol between computers and dedicated devices called
security keys that perform the generate
and
sign
operations. The second defines a Javascript API so
that websites can make use of them.
At this point, the U2F Javascript API is thoroughly obsolete and is not worth covering even for historical reasons. But millions of U2F security keys were produced over the years and, while security keys now use the more modern CTAP2 protocol, CTAP1 is very simple, is still supported, and is worth understanding because it contains the core elements of everything that follows. So we’ll discuss CTAP1 in some detail in this chapter.
CTAP1 only includes two commands: one that implements the
generate
operation and another that implements
sign
. To cause a CTAP1 security key to run
generate
, a computer sends a command consisting of the
following bytes, concatenated together: (There’s quite a long list of
fields here, but don’t worry. Each will be explained.)
generate
.(This message format comes from a smart card format called an application protocol data unit or APDU. This chapter will include the needed details without further reference to APDUs.)
The response consists of the following, concatenated together:
The number of values going back and forth should be somewhat
surprising in light of the previous chapter. There, the
generate
operation was defined as requiring random bits as
the sole input, and as producing a private key (which we expect the
security key to store) plus a public key (which it should report to the
computer). In U2F, the random bits come from within the security key so
there are two unexpected inputs (“client data” and “relying party ID”)
and three unexpected outputs (the credential ID, a certificate, and a
signature).
An ID is the most obvious addition: we need some way to refer to the generated keys. Credential IDs are generated by security keys for that purpose and can be assumed to be globally unique because they must be at least 16 bytes long and contain at least 100 bits of entropy. Given that the ID length in the response is only eight bits, a U2F credential ID is, at most, 255 bytes long.
We have already mentioned that we want to avoid reusing public keys
too widely for privacy reasons. Thus, to stop sites from sharing public
keys and tracking users with them, U2F allows an arbitrary value called
the relying party ID (RP ID) to be associated with each public key. (A
relying party is any entity that does authentication, i.e. a website in
the examples that we’ve been using.) The same relying party ID must be
sent whenever the sign
operation is invoked, therefore
public keys cannot be used outside the context in which they were
intended.
Typically, an RP ID is a domain name, like example.com
.
So even if anothersite.com
knows the ID for an
example.com
credential, it can’t use it because the browser
will specify that the RP ID is anothersite.com
at signing
time, and so the security key will reject the request. (See chapter 5
for more about RP IDs.)
The security key checks the RP ID, and not the browser, because security keys are assumed to move with the user, between different computers. So a browser on a specific computer or phone might not have been involved in creating the credential that is now being used.
The term “user presence” appears in the description of the request flags. This refers to the idea that an operation should only be performed when a human physically touches the security key.
Most security keys will have a capacitive sensor: a metal band or disc that can recognize a touch by the change in capacitance it causes. Requiring user presence (a physical touch) for each operation stops any malware on the computer from making requests to the security key in the background.
Since security keys typically don’t have screens, when users touch them they don’t know what operation is being requested and have to trust that the computer is requesting a legitimate operation. So the benefit of requiring user presence is modest. But the principle of requiring a user interaction for every operation has become a core part of the WebAuthn ecosystem that developed from U2F, and thus of everything covered in this book.
U2F splits the concept into “user presence required” and “user presence consumed”. The former requires that the security key have been recently touched and the latter resets that flag. But this split didn’t survive into later versions of the standards which instead specify when the flag is reset rather than deferring to the request.
To explain the remaining values, we’ll have to briefly cover the topic of attestation.
Suppose you’re a company worried about the security of your employees signing in to your corporate systems. In that case, you may want the private keys that they’re using to do so to be stored in a specific type of security key. (Probably the type of security key you issued them for this purpose.) You may even want to ensure that the security key they use is the specific one that was inventory-tracked from the factory and assigned to them.
Attestation is designed to solve these problems.
The concept is that the security key has a private key installed in
the factory. This private key is not used for signing in, but rather to
prove that future generate
operations were performed in a
known model of security key, or in a specific security key.
The “client data” hash, and the certificate and signature outputs, are all part of this process. They are covered in detail in chapter 7, but most uses of security keys don’t deal with attestation, and so we’ll ignore these fields for now.
generate
operationThe easiest way to trigger a generate operation is in a web browser. We’ll use the following snippet of JavaScript to ask the browser to generate a public and private key, and we’ll have the browser do that with a U2F security key so that we can inspect the request and the response. (This will be our first example of using the WebAuthn Javascript API, which will be covered extensively in chapter 4. We could also have used the APIs on Android or iOS, which are covered in chapter 11.)
navigator.credentials.create({
publicKey: {
// Unused in this example.
challenge: new Uint8Array([0]).buffer,
// Boilerplate required values.
pubKeyCredParams: [{
type: "public-key",
alg: -7,
,
}]// The relying party ID.
rp: {name: window.location.host},
// Required values that are inapplicable in U2F.
user: {
id: new Uint8Array(1),
name: "user",
displayName: ""
,
}
}.then(console.log, console.log); })
Here is the message sent by the browser, to the U2F security key, broken down into the same parts as listed above.
And here’s the response, similarly broken down:
(Note that the UP flag was set in the request so you know that I had to touch the security key before it would generate that response.)
Now that we have performed a generate
operation, the
next step is to perform a sign
operation with the private
key that has just been generated.
sign
operationAgain, we’ll foreshadow future chapters by using a snippet of the
WebAuthn Javascript API to ask a browser to perform a sign
operation with a U2F security key.
navigator.credentials.get({
publicKey: {
allowCredentials: [{
type: 'public-key',
transports: ['usb'],
id: hexStringToArrayBuffer(
"a28874e5d0e17b6796ed14b60447278a" +
"c544e6b8dec18e54ccb178afb797e21e" +
"e54a0cf264741b6cf4f8f89f41d12fff" +
"18aafeff82ee318225c5339fd3fedb2a"),
,
}]challenge: new Uint8Array([0,1,2,3,4,5,6,7,
8,9,10,11,12,13,14,15]).buffer,
,
}.then(console.log, console.log); })
Note the appearance of the credential ID from the previous response in this request, so that the security key knows which private key to sign with.
The request format is an APDU again and so follows a similar structure. This time we’ll dive directly into the concrete request that the browser sent and explain the structure as we go:
00
(all CTAP1 commands start with a zero byte)02
(the command code for sign
)03
(a flags byte with two flags set: 0x1 for “user
presence required” and 0x2 for “user presence consumed”.)00
(an unused flags byte)000081
(the 24-bit length of the following data: 129
bytes)a438...c659
(the SHA-256 hash of the “client data”,
described below.)a379...1947
(the SHA-256 hash of the relying party ID,
example.com
. Note that this is the same value that was sent
when generating the credential, otherwise the request would be
rejected.)40
(the length of the credential ID; 64 bytes.)a288...db2a
(the credential ID. The same value that was
returned from the generate
operation, and given in the
Javascript request to the browser.)0000
(indicating that the maximum response length is
supported).The response to a sign
request is the simplest message
so far:
01
(a flags byte with one flag set: 0x01 for “user
presence”—covered later)00000004
(a 32-bit signature counter, also covered
later)3045...0e2d
(the signature itself).Recall that, in the previous chapter, we considered a series of
shapes for the signed message. Firstly, we tried using a constant value
(“let me in”). Next we realized that the message should include the
site’s origin to prevent phishing, so it became a JSON object:
{"origin": "https://example.com"}
. Then we realized that it
should contain a random challenge value from the site to make it unique
to an authentication attempt. This JSON is called the client data
because it comes from the client (i.e. computer), rather than from the
security key. It’s hashed and combined with the authenticator data
(which comes from the security key) to form the message that is
ultimately signed.
The client data generated by a modern browser has evolved slightly and, for the Javascript request above, looks like this: (With whitespace and line breaks added for clarity.)
{"type": "webauthn.get",
"challenge": "AAECAwQFBgcICQoLDA0ODw",
"origin": "https://example.com",
"crossOrigin": false
}
We’ve seen the origin
and challenge
fields
before, although note that the challenge is encoded with base64url,
rather than the more common base64 encoding. The type
field
is just good security hygiene: you never want to allow ambiguity, so
it’s good to be explicit about the intended meaning of all messages. The
crossOrigin
field communicates whether the authentication
was done within an iframe inside another site.
(Browsers may add more fields in the future, so always parse such JSON rather than assuming that the challenge can be inserted into a template.)
The authenticator data is the concatenation of the following:
sign
response.sign
response.Then the signed message, i.e. the final value given to the
sign
operation, is the authenticator data followed by the
SHA-256 hash of the client data.
We now have all the pieces to implement the public-key authentication scheme sketched out in the previous chapter:
When an account is first registered, we have a Javascript snippet that will cause the browser to send a request to a U2F security key to generate a public and private key. The server can store the credential ID and public key from the response and associate them with the new account. Then, when that account tries to sign in, the server generates a random challenge and makes a Javascript call with that challenge and the credential ID. The browser will send a request to a U2F security key to sign a message containing that challenge (and other values) and return the resulting client data, authenticator data, and signature.
The server can now:
verify
operation with its stored public key and
the response from the browser.type
, origin
,
crossOrigin
, and challenge
fields are as
expected.This is still a simplistic example and there are more steps in a full implementation. (For example, unless the user provides a password as well then it would be possible to take a security key from someone’s desk and sign in as them!) But you can now see the core ideas in action, all the way from a server, through a browser, to concrete messages sent over USB to a security key.
Nearly all U2F security keys are designed in a manner that avoids the need to actually store the private keys, thus they can generate an unlimited number of credentials with only a constant amount of onboard storage. The exact details are private to the security key but go something like this:
Either at the factory or on first use, the security key will generate and store a random symmetric key. We’ll call this the “root secret”. Then, when the security key is asked to generate a public and private key, it chooses a random seed value and encrypts it with the root secret. Next, it authenticates the encrypted seed, along with the hash of the relying party ID and any other pertinent values, using an algorithm like HMAC.
Together, the encrypted seed and the HMAC form the credential ID. Then the security key generates a public key and private key from the seed, discards the private key, and returns the credential ID and public key to the computer.
When asked to do a sign operation, the security key splits the credential ID into an encrypted seed and an HMAC value. It checks whether the HMAC is correct for the relying party ID hash and the encrypted seed. If not, then either this credential ID came from a different security key or the relying party ID is incorrect. In either case, it returns an error.
But, if the HMAC is correct, it decrypts the seed using its root
secret and can then derive the same private key as it did before. Then
it performs the sign
operation with that private key and
returns the resulting signature.
At no point did the security key need to store anything other than the root secret, and there is only a single root secret for all credentials, so the storage requirements are constant. The state necessary for a credential is kept in the credential ID, and that is stored on the server.
There are a couple of consequences of this style of design:
First, a credential can never be used without knowing the credential ID. In chapter 3 we’ll discuss “discoverable” credentials that change this assumption but, in U2F, a credential ID is always needed.
Second, there’s no way to delete an individual credential from a security key because the credential isn’t stored on the security key, it’s “stored” in the ID, on the server. Instead, U2F security keys will typically support a reset command that generates a fresh root secret. Once the root secret has been changed, all previous credential IDs are invalidated and will appear to be IDs generated by a different security key.
In U2F this reset functionality, if it exists for a particular model of security key, is a vendor-specific command. It gets standardized in CTAP2, see here.
The rest of this chapter contains details that can be skipped on a first reading.
The last unexplained field is the “signature counter” that is
returned from sign
operations. In the example above, it was
four, but that doesn’t mean that I discarded three responses while
trying to capture that example.
Signature counters are optional for the security key to implement, although the majority of them do. If there’s no counter then the value in the response is always zero. But once a security key has produced a non-zero value then it has to ensure that the counter, for all future signatures from that credential, strictly increases.
The motivation for having a counter is that it might allow websites to detect when a security key has been cloned. Cloning a security key is supposed to be very difficult but, if you assume someone managed to do it (probably destroying it in the process), then one could create a working replica which could be slipped back into the possession of the legitimate user, leaving them unaware that anything has happened. At this point the attacker, who presumably also created a replica for themselves, can create signatures as easily as the legitimate user.
If all that has happened, then the signature counter might uncover it. Unless the attacker can know exactly when the legitimate user has used their security key, and thus incremented the counter, then eventually either they or the real user will create a signature where the counter didn’t increase.
If the website noticed this, it could sound the alarm. At a minimum, the security key in question should be replaced. Ideally, the account and security key would be investigated carefully for signs of compromise.
This is a rather far-fetched scenario. Generally, when signature counters are checked by a site, any error is treated as a transient authentication failure. But that defeats the point: the user or attacker will simply try again after the signature counter has naturally incremented, and then it’ll work. The user will simply come to think of their security key as a bit worn out and will learn that sometimes it requires a couple of attempts to work.
On the other hand, many security keys only have a single, global signature counter, and this allows different websites to correlate the use of the same security key between them. That is, the current counter value of your security key is somewhat identifying and can be combined with information about how often it increases. (Better security keys will implement more granular signature counters.)
Signature counters are also incompatible with syncing private keys between computers (see chapter 3) and thus are not implemented in an increasing fraction of cases.
So signature counters might be useful in the most extreme cases if carefully implemented and coupled with a robust incident response process, but they can otherwise be ignored.
The computers (or phones) that security keys are used with are called platforms and U2F is a bit more complex for platforms than suggested in the sketch above. There can be multiple security keys plugged into a laptop, and a WebAuthn request can list many credential IDs. The browser has to find the right credential ID for the right security key.
Also, U2F was designed so that security keys could be implemented in a Java-based framework that did not allow requests to block. But requests cannot complete until user-presence is satisfied, i.e. until someone touches the sensor on the security key.
These factors mean that browsers have to poll U2F security keys. Two protocol features facilitate this:
First, whenever an operation cannot be completed because the security key is waiting for a touch, the request will immediately result in a special error code (“test of user presence required”). The security key will usually blink an indicator for a short while after this.
Second, the flags byte in sign
requests can be set to
0x07 (“check only”). This causes the request to always fail but with one
of two distinct error codes. If the credential ID and relying party ID
hash are valid for the security key, then it returns the same “test of
user presence required” error as before. (Whether the security key has
been touched recently or not!) Otherwise it returns a different error
(“bad key handle”).
Browsers combine these two features with the following algorithm for handling U2F security keys:
For generate
requests, the request is sent to all
security keys. The security keys will fail the request because they
haven’t been touched, so the requests are repeated every few hundred
milliseconds until the user touches one of the security keys to select
it.
For sign
requests, each credential ID is sent to each
security key in “check only” mode until either a valid credential ID is
found, or until all credential IDs have been tried with a given security
key. If a valid credential ID is found for a security key then a stream
of sign requests are sent, waiting for a touch. If a security key
doesn’t recognize any IDs then either a message is displayed on the
screen, or else a stream of generate
requests are sent,
waiting for a touch. These generate requests are sent just to cause the
security key to blink its indicator, and to see whether the security key
is touched—any resulting public key is discarded. If the security key is
touched, an error is displayed to the user because that security key
isn’t going to work.
A fully working spec also needs to define how security keys are discovered on the USB bus and how the U2F messages are encoded for transmission.
Security keys advertise themselves as USB Human Interface Devices. HID devices can advertise a number of usage pages that describe broadly what kind of device they are. For example, usage page one covers keyboards, mice, joysticks, and other common input devices. Security keys advertise the usage page 0xf1d0 (“FIDO”, get it?) to identify themselves.
HID devices communicate using short report messages and FIDO defines how longer messages are fragmented and reassembled so that they can be sent as a series of reports.
To deal with the issue of multiple applications trying to talk to the same security key at the same time, and potentially interleaving their streams of report messages, every report starts with a channel identifier. Applications get an identifier by sending a 64-bit nonce and watching for a channel allocation report from the device that echoes that nonce.
Thus, in theory, a security key can handle multiple concurrent communication streams with different applications on the computer. In practice, since security keys are embedded devices, they often only support a single active channel and so an application requesting a new channel will disable the previously active one. However, this mechanism still allows communications to be cleanly broken off rather than having fragments from different applications interleaved, producing unpredictable results.
Security keys often also support communication over NFC, so that a security key can simply be held near the top of a mobile phone to be used. While the underlying technology and framing differs completely between USB and NFC, it is ultimately just another way of exchanging messages and so the underlying transport can be switched out without affecting any of the higher-level protocol.
A Bluetooth Low Energy (BLE) transport was also defined, but this requires the security key to have its own power source. The primary motivation was to use security keys with iPhones before they had support for NFC security keys, but now that the issue has been resolved, BLE security keys are very rarely seen. (This transport is unrelated to the one used between computers and phones, which is covered in chapter 10.)
The standards described in the previous chapter are sufficient to enable the classic 2nd-factor pattern of entering a username, then a password, and then tapping a security key. Many companies have deployed this sort of system to their employees to very great effect. Authenticating employees with just a password in this day and age is bordering on negligence and, unlike code-based second factors (whether delivered over SMS or from an app on a phone), security keys aren’t phishable.
But a password is still necessary. Otherwise, the security key alone would be sufficient to authenticate, and mislaid security keys would be a significant concern.
U2F also doesn’t solve the problem of usernames. Remembering usernames is a bother that password managers help solve today. So, if public key authentication is going to find a broader audience, it needs to solve this too. Thus we want to be able to ask a security key what accounts have been registered for a particular site, and for it to store the corresponding usernames.
The U2F protocol outlined in the previous chapter cannot support
this. It assumes that the credential ID is already known before a
sign
operation can be performed, and thus the account must
be known before the security key can be used. Indeed, the vast majority
of U2F security keys do not store any per-credential data, as described
on here, and so it’s impossible for them to
work with any other pattern of interaction.
So security keys, and the protocol for communicating with them, had to evolve. Within the FIDO Alliance this was broadly done under the umbrella term “FIDO2”, which covered both a new security key protocol (CTAP2) and a new Javascript API (WebAuthn), which we’ve already seen examples of.
The first major new concept in FIDO2 is discoverable credentials.
Credentials on a U2F security key, which we’ll now call non-discoverable credentials, conceptually consist of three values: the credential ID, the private key, and the relying party ID. Since U2F security keys generally don’t store any per-credential state, they can be thought of as an infinite bucket of such credentials.
A security key that stores discoverable credentials can be thought of as a database table. The first two columns of this table form the primary key, and those columns are the relying party ID and the user ID. A user ID is a new concept for discoverable credentials: an identifier for an account chosen by the site. It is not (and should not be) the username, for reasons that will be explained on here. Instead, it is better to think of it as the “user UUID”.
Since these two columns are a primary key, that implies that the security key stores at most one credential for any given pair of relying party ID and user ID. So if you create a second discoverable credential that has the same RP ID and user ID as an existing one, then the existing one is overwritten.
The third and fourth columns of this conceptual table are familiar: the credential ID and the private key. Discoverable credentials still have credential IDs and can be specified by them just like non-discoverable credentials. But the crux is that it’s also possible to ask for some credential for a given RP ID, without specifying any credential IDs. This is the key distinction between discoverable and non-discoverable credentials, and the one that lets them act like a username as well as an authentication factor.
There are many more columns in the table. Of course there’s a username (in fact, two), and an increasing number of extensions require more fields to be added to the credential row. These values will be covered in chapters 4 and 9.
Since discoverable credentials require per-credential storage on a security key, many security keys support creating both discoverable and non-discoverable credentials and have a limit on the number of discoverable credentials.
Discoverable credentials solve the problem of usernames, but we would still need a password to avoid risks from mislaid security keys. In fact, discoverable credentials make that problem even more acute: you wouldn’t even have to guess whose security key it was, it would tell you!
U2F introduced the concept of user presence—that some human was physically present. Coupled with discoverable credentials comes the stronger notion of user verification—that the correct human is physically present. The mechanism for establishing this varies. Some security keys have fingerprint readers, a few security keys have a physical PIN pad on the device, but most security keys use a PIN that is entered on the computer.
The implied contract with the security key is that it will maintain a user verification chain: if user verification is performed with a PIN, then that PIN may be changed, but the old PIN has to be presented to do so. If the PIN is lost and needs to be reset, then all the credentials must be deleted to do so. Different security keys may differ in the details, but each should maintain such a chain of verification.
While the term “PIN” is used, these PINs can be alphanumeric, so you could also call them passwords. But since FIDO2 was supposed to be replacing passwords, calling them passwords would have been awkward, and so they were called PINs. But the difference is more than just a name: these PINs are never sent over the network, can never be disclosed in a database leak, and, if they’re being used with a security key, then the security key can enforce hardware-based guess limits.
It also became apparent with U2F that many people wanted to keep their security keys plugged into their computer all the time. In response, a form factor of security key that mostly fits inside a USB port, and leaves only a small lump on the outside of the computer, became quite popular.
But this was driven by more than the realization that a USB port was a useful place to keep a security key where it wouldn’t get lost.
Malware that steals cookies is a serious concern because, no matter how strong the initial authentication is, if malware can steal the results of that authentication, you still have a security problem. So enterprises often ask their users to re-authenticate with their security keys, perhaps daily. Since a hardware-bound private key cannot be stolen by malware, this establishes that the active session is still legitimate.
But many computers already have a Trusted Platform Module or similar device within them that can store a hardware-bound private key and sign with it. So why not use those as opposed to having a security key permanently inserted? Since the computers or phones that security keys are used with are called platforms, these are called platform authenticators.
Traditional platform authenticators cannot help you with signing in on a different computer, but they can provide proof that an active session hasn’t been stolen by malware and, when coupled with user verification, they can provide evidence that the correct human is still behind the keyboard. They are also far more common than security keys since no extra hardware has to be purchased.
(From this point, we’ll start to use the more generic term authenticator, rather than security key, unless specifically speaking about the latter type of device.)
In the next chapter we’ll cover the Javascript API that was developed as part of the broader FIDO2 effort in order to expose these concepts: WebAuthn. The APIs for Android, iOS, macOS, and Windows are strongly shaped by it and, collectively, they form the WebAuthn family of APIs. So even if you never touch Javascript, you need to understand the core of WebAuthn. While the syntax of these different APIs varies, they’ll generally use the same terminology, and they all produce compatible outputs.
Since (traditional) platform authenticators can’t be used to sign in on other devices, FIDO2 originally assumed that all users had security keys. Otherwise losing or reinstalling a laptop would leave a user without any usable credentials.
Security keys are great and should be deployed with gusto in all enterprise and government environments that require strong authentication. But it’s unlikely that regular people are going to adopt them and, even if they did, security keys work much better when there’s a helpdesk that can be a backstop after the dog has chewed on one.
If the benefits of FIDO2 are going to be enjoyed more widely, credentials are going to have to be more usable, and that means syncing them. In this book, thus far, there has been a firm link between a credential and a single physical device that created and holds the private key. That device might be a security key, or it might be a platform authenticator, but either way you can point to where the private key is stored and it never moves. Syncing changes that, in good ways (people can recover from losing a device), and bad (a sync account could be compromised).
Wider deployment also needs to avoid overly technical terms; “WebAuthn credential” doesn’t sound very friendly. Thus the term passkey was invented by Apple, but they nicely let everyone use it.
A passkey is a synced, discoverable WebAuthn credential. Or, when that’s awkward, sometimes it’s just a discoverable credential. But this book will stick to the former definition. In general, passkeys are an attempt to take WebAuthn outside the enterprise.
The passkey ecosystem consists of providers, provider APIs, passkey APIs, and the hybrid transport. We’ll cover the hybrid transport in chapter 10. It’s the fallback for when syncing doesn’t bring a passkey to where you need it: you can pull out your phone, scan a QR code, and send a signature to a computer.
Passkey providers are services that store and sync passkeys (which are, remember, just another name for discoverable WebAuthn credentials). Within the Apple ecosystem, iCloud Keychain is the most common provider. On Android, you’ll commonly find Google Password Manager or Samsung Pass. Most other password managers are also passkey providers, such as 1Password, Dashlane, and Bitwarden.
Provider APIs are the way that passkey providers register with an operating system. We won’t be covering these APIs in detail in this book, but the Apple platforms and Android both provide APIs where providers can register. Then providers can offer passkeys to applications that want them.
The flip side of the provider APIs are the passkey APIs. These are covered in detail in chapter 11. They provide applications with access to registered providers so that they can request that passkeys be created, and can request signatures from them. These APIs are all based on the original FIDO2 API: WebAuthn.
When credentials are synced, the sync account (rather than a device) is considered to be the authenticator. So when the model of discoverable credentials says that only one credential with a given pair of RP ID and user ID exists within an authenticator, that applies to the sync account, not to each individual device that is syncing.
For example, if a credential is created in iCloud Keychain on a Mac for a given site and user ID, and then another credential is created on an iPhone, signed into the same iCloud account, with matching RP ID and user ID values, then the latter will overwrite the former because the sync account is the same, and the invariant applies to the whole account.
WebAuthn (short for “Web Authentication”) is a web API that lets a site interact with U2F-era security keys (see chapter 2), but also take advantage of all the new concepts in FIDO2, as outlined in the previous chapter.
The data formats and structures in WebAuthn are also strongly reflected in the platform APIs that exist for apps running on the Apple family of platforms, on Android, and on Windows. So even non-Web developers will need to understand the concepts from this chapter.
WebAuthn is integrated with the W3C Credential Manager
API, which lives in the navigator.credentials
namespace. You don’t need to know this API to use WebAuthn and it’s not
covered here. But you’ll notice that the APIs are structured to support
other credential types too. That’s not an accident: both passwords and
federated credentials can be used via Credential Management.
Creating a credential looks like this:
const promise = navigator.credentials.create({
publicKey: creationOptions,
; })
The resulting promise either resolves with a representation of the newly created credential, or else an error. But first we’ll look in detail at the options that control what is created and where it’ll be stored.
The creationOptions
above is a PublicKeyCredentialCreationOptions
dictionary and it contains the following members:
dictionary PublicKeyCredentialCreationOptions {
// Selecting the public key signature scheme to use.
required sequence<PublicKeyCredentialParameters> pubKeyCredParams;
// Controlling the location and type of the new credential.
AuthenticatorSelectionCriteria authenticatorSelection;
sequence<PublicKeyCredentialDescriptor> excludeCredentials = [];
// Metadata stored with the credential.
required PublicKeyCredentialRpEntity rp;
required PublicKeyCredentialUserEntity user;
// Guiding the UI.
unsigned long timeout;
sequence<DOMString> hints = [];
// Extension features, covered in a later chapter.
AuthenticationExtensionsClientInputs extensions;
// Fields related to attestation; covered in a later chapter
required BufferSource challenge;
DOMString attestation = "none";
sequence<DOMString> attestationFormats = [];
};
Signature schemes are identified by integers assigned by the IETF’s
COSE group. A site specifies the set of schemes that it accepts (in
preference order) by listing them in pubKeyCredParams
.
Since the values assigned by the COSE group leave some parameters
unspecified, WebAuthn additionally defines those parameters for the
common schemes.
ECDSA with P-256 and SHA-256 is the dominant scheme, by far, in the WebAuthn ecosystem. (If you recall from chapter 2, it’s the only scheme that was supported in U2F.) It has been given identifier -7. (Yes, a negative number.)
Many PCs have a TPM that only supports the older RSA PKCS#1 v1.5 standard and so, to be compatible with them, sites are advised to support that scheme too. RSA PKCS#1 v1.5 uses the identifier -257 and also specifies SHA-256 as the hash function.
So most sites should set pubKeyCredParams
to the
following:
type: "public-key", alg: -7}, {type: "public-key", alg: -257}] [{
Some other, less common schemes are -8 for Ed25519 and -35 for ECDSA with P-384 and SHA-384. Ed25519 is superior to ECDSA P-256 but the difference is often not sufficient to outweigh how common ECDSA is. ECDSA with P-384 can be used in order to meet certain regulatory requirements but, given that the primary risks are implementation issues, it is likely less secure in practice. In the future ML-DSA (previously called Dilithium) is likely to replace ECDSA due to concerns about quantum attacks, but that transition is still some years in the future.
The authenticatorSelection
field is, itself, another
dictionary with the following fields:
dictionary AuthenticatorSelectionCriteria {
// Controlling whether the credential is discoverable.
DOMString residentKey;
boolean requireResidentKey = false;
// Controlling whether user verification is performed.
DOMString userVerification = "preferred";
// Controlling which types of authenticators can store the
// credential.
DOMString authenticatorAttachment;
};
The first two fields control whether the credential must be discoverable. But why are there two of them, and why are they named “resident”, not “discoverable”?
Historically, discoverable credentials were first called resident credentials so this term will pop up in several places in the protocol. It is inaccurate, however. Resident credentials are those where the authenticator keeps state about the credential. Discoverable credentials are always resident, but non-discoverable credentials can be resident too. The important point is whether the credential follows the model of discoverable credentials and can be used without knowing the credential ID, not whether the authenticator operates statelessly.
So, despite the naming, these fields control whether the credential must be discoverable.
requireResidentKey
was defined first but is just a
boolean. In order to allow sites to express that they prefer, but do not
require, a discoverable credential it was necessary to add a second
field, residentKey
. A site must pick one of the following
scenarios in order to set these two fields:
requireResidentKey
to
false. (The resulting credential may still be discoverable, but
discoverable credentials work perfectly well with credential IDs. This
is the default if neither field is set.)requireResidentKey
to true. The resulting credential will
be discoverable. (But will still work when credential IDs are
given.)residentKey
to
"preferred"
. (It’s possible to optionally take advantage of
discoverable credentials with conditional UI, see chapter 8.) Whether
the credential is known to be discoverable is reported via the
credProps
extension (see
here).Note that there’s no way to require a non-discoverable credential. Created credentials can always be discoverable and these controls only allow a site to insist that they must be.
See here for a
discussion of the concept of user verification. The
userVerification
field can be set to one of the strings
discouraged
, preferred
, or
required
, and the default is preferred
.
Creating a credential with user verification ensures that the capability
will be available when getting a signature. It also marks the start of a
user verification chain: each future signature from the credential with
the UV flag set must have collected some proof of user verification, and
that proof must chain back to the proof collected at creation time. For
example, if user verification is established with a PIN, then changing
that PIN must require the old PIN. If a fingerprint is used, then
enrolling a new fingerprint must require an old fingerprint or PIN,
etc.
As with residentKey
, setting
userVerification
to preferred
means that the
site would make use of user verification if performed, but will accept a
credential without it. But how “preferred” is it? For example, should a
PIN be set up on a security key in order to create a credential with
user verification, or should a PIN only be used if it happens to already
be set up? That’s up to the platform. Windows tends towards a stronger
interpretation of “preferred” than other platforms and will, indeed, set
up a PIN on a security key in response to a request that prefers user
verification.
When it comes to platform authenticators and passkeys, the
interpretation of the userVerification
value varies by
provider, and might change over time. Here are the cases when user
verification will be performed today, for three common passkey
providers:
iCloud Keychain
Discouraged | Preferred | Required | |
---|---|---|---|
Biometrics available | ✓ | ✓ | ✓ |
Biometrics not available | ✓ |
Google password manager (desktop)
Discouraged | Preferred | Required | |
---|---|---|---|
Biometrics available | ✓ | ✓ | |
Biometrics not available | ✓ |
Windows Hello
Discouraged | Preferred | Required | |
---|---|---|---|
Biometrics available | ✓ | ✓ | ✓ |
Biometrics not available | ✓ | ✓ | ✓ |
For simple sites, the presence or absence of user verification will
often make no difference: if the WebAuthn signature is valid, the user
is signed in and, if not, they aren’t. If user verification makes no
difference, the parameter should be set to discouraged
.
More complex sites might base the sign-in decision on a risk
analysis, or might subject sign-ins without user verification to
additional challenges. In this case, the parameter should be set to
preferred
since doing user verification has value to the
user.
Some sites might decide to only ever accept sign-ins that include
user verification. In this case, set the parameter to
required
.
The authenticatorAttachment
field can be left undefined,
set to platform
, or set to cross-platform
.
This controls which types of authenticators can be used.
Setting this to platform
means that no external
authenticators can be used. So security keys and external mobile devices
are not applicable. For example, if a site offers to register “this
computer” in its UI then it might set this field to
platform
. The resulting platform UI will skip or remove
other options.
Setting this to cross-platform
means that only
authenticators that can be moved between devices are applicable. I.e.
security keys and external mobile devices. If a company website knows
that users have been issued security keys to use with company resources,
it might set this to cross-platform
.
Leaving this field undefined, the default, permits all authenticators, but the platform UI will generally default to a platform authenticator if available. If a site offers to register a passkey in its account settings, it might thus leave this value undefined.
Another way in which applicable authenticators can be controlled is
by setting excludeCredentials
in the top-level dictionary.
This should contain the credential IDs of all currently known
credentials for the user. It communicates that those credentials should
not be overwritten, and that any external authenticator with one of
those credentials should not be registered again.
Consider a user who is attempting to register a second security key on an account that uses the 2nd-factor flow. Their security key will likely create a stateless, non-discoverable credential and thus no existing credential can be overwritten. But it’s completely useless to register two credentials on the same security key: two credentials will appear in the user’s account, but there’s only a single security key that contains them both. It would be confusing.
By listing all existing credential IDs in
excludeCredentials
, the site can instruct the platform to
forbid an existing security key to be registered. If the user tries to
do so, they should see an error message and the platform will usually
allow them to try again with a different security key.
Next, consider a platform authenticator that always creates discoverable credentials. Recall that the model of discoverable credentials only allows at most one credential with a given RP ID and user ID to exist in an authenticator. So, if the user were to register a second credential it would overwrite the first. There would be two credentials listed on the site, but one would no longer exist in the platform authenticator. This would also be confusing for the user.
In the case of a platform authenticator, however, the platform will
do something different if a credential in
excludeCredentials
already exists. The creation process
will appear to succeed from the point of view of the user, but the
platform will return InvalidStateError
to the website. This
special error code is unique to this situation and the website can
choose to show an error. (E.g. “You’ve already registered this
computer”.) The site can also choose to ignore this error: the user
wanted the platform registered as an authenticator and it already is; no
need to complicate things with an error message.
Note that the excludeCredentials
field is not just a
list of credential IDs. It’s written like this:
[type: "public-key", id: /* credential ID in ArrayBuffer */},
{/* more such elements ... */
]
There is no type
other than “public-key”—that’s
extensibility that was never used. The id
is binary and
binary values in WebAuthn, of which there are a lot, are passed in
ArrayBuffer
and BufferSource
objects. This is
awkward because these objects don’t convert to/from JSON automatically.
There is light on the horizon in the form of native JSON support in
WebAuthn (see here) but this cannot be
relied on to be present in current browsers. So for now, when receiving
messages from a server to make WebAuthn calls, any binary data has to be
handled specially.
The rp
(relying party) and user
fields in
the top-level options dictionary contain metadata that is stored with
the credential. The rp
field has this structure:
dictionary PublicKeyCredentialRpEntity {
DOMString id;
required DOMString name;
};
This id
field specifies the relying party ID. This topic
is complex enough that it’s covered separately in chapter 5. If omitted,
it defaults to the domain of the current origin.
The name
field is required, but currently never used. It
was intended to be a human-friendly name for the site, e.g. “ACME
Corporation”. If there’s no obvious value to use, just pass the empty
string as this field will probably be unused forever.
The user
dictionary looks like this:
dictionary PublicKeyCredentialUserEntity {
required BufferSource id;
required DOMString name;
required DOMString displayName;
};
The id
field here is the user ID. Recall from the
discussion of the discoverable credential model that an authenticator
stores at most one credential with a given pair of RP ID and user ID
values. The user ID is not the username and users will never see this
value. It is an opaque, binary identifier for an account that can be up
to 64 bytes long. However, since the user ID is treated as less
sensitive than a username by security keys, it should not be equal to
the username. It cannot be empty either. It’s recommended to be a large
random value (e.g. a UUID) that is stored by the site for each account.
(In this book we’ll always call this value the user ID. But it’s also sometimes called the “user handle”, including in the WebAuthn API later on! These are two names for the same thing.)
The name
field is the username. This is a human-readable
string that usually uniquely identifies an account. It might be an email
address. This value will be displayed in account selectors and
management UIs so the user can understand which account a credential
corresponds to.
The displayName
field is for a more “friendly” name for
the account. For example, a site may require an account to have a unique
username, but allow users to configure a (potentially non-unique) name
that they will appear as on the site. That second name would appear in
this field. This field cannot be omitted but, if there is no obvious
value to put in it, set it to the empty string.
Unlike the username, the display name is not always shown in account selectors and management UI, depending on the platform. Apple’s platforms, for example, do not show display names.
The platform will show UI to help guide the user in creating the requested credential and there are a couple of ways in which the UI can be guided.
Firstly, there’s the timeout
parameter. If a site will
reject a credential that took too long to create (because it’s concerned
that the user has walked away and that someone else could be behind the
keyboard) it can set a timeout, in milliseconds. But it can take a while
for people to create credentials! They may have to dig a security key
out from the back of a drawer. Because of this, platforms are likely to
silently increase small timeouts to some minimum value. So a timeout of
less than five minutes (the recommended default) may be rounded up.
Secondly, there is the hints
parameter. This is a
catch-all for expressing non-binding requests to the platform. It is a
list of strings and currently three are defined:
security-key
suggests that the platform should show a
security key-focused UI. Enterprise cases where security keys have been
issued to employees should set this.client-device
suggests that the platform should focus
the UI on using a platform authenticator.hybrid
suggests that the platform should expect the
user to use an external mobile device to complete the request. (See
chapter 10.)The first two of these somewhat duplicate the
authenticatorAttachment
field, described above. Unlike that
field, they do not forbid the use of other types of authenticators.
However, hints are not yet supported by all platforms and so cannot be
assumed to have any effect.
Attestation and extensions get their own chapters and are not covered
here, but the challenge
parameter, which is unused during
creation except for attestation, is required. You can set it to
new Uint8Array([0]).buffer
in all non-attestation
cases.
A consumer site, prompting to create a first passkey with the local device:
(A website would have checked if a local authenticator exists first. See here.)
let promise = navigator.credentials.create({
publicKey: {
pubKeyCredParams: [
type: "public-key", alg: -7},
{type: "public-key", alg: -257},
{,
]authenticatorSelection: {
authenticatorAttachment: "platform",
userVerification: "discouraged",
requireResidentKey: true,
,
}rp: {id: "example.com", name: ""},
user: {
id: userId,
name: "jsmith",
displayName: "John Smith",
,
}hints: ["client-device"],
challenge: new Uint8Array([0]).buffer,
}; })
A consumer site, after a user clicks “Add passkey” in their account settings:
let promise = navigator.credentials.create({
publicKey: {
pubKeyCredParams: [
type: "public-key", alg: -7},
{type: "public-key", alg: -257},
{,
]authenticatorSelection: {
userVerification: "discouraged",
requireResidentKey: true,
,
}rp: {id: "example.com", name: ""},
user: {
id: userId,
name: "jsmith",
displayName: "John Smith",
,
}hints: ["client-device"],
challenge: new Uint8Array([0]).buffer,
}; })
When the Promise
returned
by navigator.credentials.create
resolves it might result in
an error. There are only three buckets of errors that need to be
considered: InvalidStateError
, a programming error, and
everything else.
InvalidStateError
arises when the user attempts to use a
platform authenticator that already holds one of the credentials listed
in excludeCredentials
. Sites may wish to report an error to
the user in this case (“This device is already registered”) or they may
validly conclude that the user wanted the local device registered, and
it is registered, therefore that’s not really an error.
If there’s an error in the structure of the options (e.g. a required field is missing), or the site is attempting to use an RP ID that it cannot use (see chapter 5), then a descriptive error will be returned. But this is a bug in the Javascript that should be resolved during development.
For privacy reasons, all other errors are essentially
indistinguishable. They will often be of type
NotAllowedError
, and the error message may contain more
details. However the error messages are not stable and should not be
used for anything other than logging and debugging.
Assuming that promise resolves successfully, it returns a
PublicKeyCredential
. Let’s take a look at what that
contains.
interface PublicKeyCredential {
// The credential ID.
USVString id;
ArrayBuffer rawId;
// The type of authenticator used.
DOMString? authenticatorAttachment;
// More about the credential.
AuthenticatorAttestationResponse response;
// See extensions chapter.
AuthenticationExtensionsClientOutputs getClientExtensionResults();
}
Firstly, you get the credential ID. Twice! id
contains
the credential ID as a string. Then rawId
contains it as an
ArrayBuffer
.
Where binary data is encoded in a string, WebAuthn always uses the
base64url variant of base64. This replaces the +
and
/
characters with -
and _
and
doesn’t include any =
padding characters at the end. So the
id
field is the base64url encoding of
rawId
.
Next, authenticatorAttachment
will probably contain
either platform
or cross-platform
depending on
the type of authenticator used. (The platform can omit this field if it
doesn’t know, and beware that it’s always possible that future versions
of WebAuthn will define new values.)
You should be expecting at least a public key too in order to be able
to use this credential, so let’s look inside response
:
interface AuthenticatorAttestationResponse {
// The public key
COSEAlgorithmIdentifier getPublicKeyAlgorithm();
ArrayBuffer? getPublicKey();
// Information from the authenticator.
sequence<DOMString> getTransports();
ArrayBuffer getAuthenticatorData();
// Only used for attestation.
ArrayBuffer clientDataJSON;
ArrayBuffer attestationObject;
};
getTransports
returns a list of strings that the site
needs to store if it’ll ever request this credential by ID. This will be
covered later, in the section about getting signatures from
credentials.
Next there’s getAuthenticatorData
. This returns a binary
blob that comes directly from the authenticator. It’s returned both when
creating a credential and when getting a signature from one and is
covered in more detail later.
The public key signature scheme of the new credential is returned by
getPublicKeyAlgorithm
. It uses the same identifiers as
pubKeyCredParams
in the options and the value must be one
of the algorithms that was listed there. (See
here.)
getPublicKey
returns the public key itself in Subject
Public Key Info (SPKI) format. This is a commonly used format for public
keys, but there are so many public key formats that they get a chapter
to themselves. (Chapter 13.) SPKI format can be passed to, at least:
java.security.spec.X509EncodedKeySpec
.System.Security.Cryptography.ECDsa.ImportSubjectPublicKeyInfo
.crypto/x509.ParsePKIXPublicKey
.(For more obscure public key schemes, the platform might not know how
to convert the public key and so getPublicKey
can return
null
. That doesn’t apply to any of the common formats
discussed here but, if you’re using such a format, you’ll have to
extract the public key from the authenticator data, see chapter 7.)
There are more options in WebAuthn than in the U2F protocol! But we have finally reached the core values that we expected to get when creating a credential: the credential ID and its public key. Think back to the introduction and recall that the other major operation is getting a signature, where we expect to provide some credential IDs (or not, with discoverable credentials) and a challenge value, and get back a signature and the message that was signed, which should contain the challenge and contextual information to prevent phishing.
Keep that big-picture view in mind as we dive into the second (and final) major operation in WebAuthn.
Getting signatures from credentials is also performed through the Credential Management API and looks like this:
const promise = navigator.credentials.get({publicKey: options});
That’s exactly the same as creating a credential except that, rather
than create
, the operation is get
. Again, a
dictionary of options is passed in:
dictionary PublicKeyCredentialRequestOptions {
// Core parameters for getting a signature.
required BufferSource challenge;
sequence<PublicKeyCredentialDescriptor> allowCredentials = [];
// The relying party ID.
USVString rpId;
// Whether to perform user verification.
DOMString userVerification = "preferred";
// Controlling the platform UI.
unsigned long timeout;
sequence<DOMString> hints = [];
// Extension features.
AuthenticationExtensionsClientInputs extensions;
};
The first option is hopefully familiar from the previous chapters.
The challenge
makes the signature specific to a request so
that it can’t be used again. It should be a large, random value
generated at the server and stored there, temporarily, to check against
when the signature is received.
The allowCredentials
is the list of credential IDs or,
if you’re using discoverable credentials, it can be empty. If you do
wish to pass a list of IDs, they’re not passed directly but rather as a
list of objects that look like this:
[{type: "public-key",
transports: ["hybrid", "internal"],
id: /* ArrayBuffer */,
}]
It’s structured just like excludeCredentials
except for
the additional transports list. That list of transports should be copied
verbatim from the result of calling getTransports
when
creating a credential. (See above.)
The rpId
is the relying party ID. It must match the
value used when the credential was created. See chapter 5 about picking
a relying party ID.
The userVerification
field is either
discouraged
, preferred
, or
required
. This has the same meaning as when the credential
was created, see here. Note that requiring user
verification when it wasn’t performed at registration time might not
work: the credential might have been created on an authenticator that
doesn’t support it. So this value will usually be the same as when the
credential was created.
The timeout
and hints
parameters also have
the same meaning as during creation, and the extensions
get
their own chapter. And with that, we’re done. Handling the response is
more complex, however…
The result of the promise from get
might, of course, be
an error. There are no particular error cases that you need to handle,
so all errors can be treated as generic failures.
Otherwise the response to a signature request is also a
PublicKeyCredential
object, but the type of the
response
field within it is different. Here’s a recap of
the structure, which was discussed on
here.
interface PublicKeyCredential {
// The credential ID.
USVString id;
ArrayBuffer rawId;
// The type of authenticator used.
DOMString? authenticatorAttachment;
// More about the assertion.
AuthenticatorAssertionResponse response;
AuthenticationExtensionsClientOutputs getClientExtensionResults();
}
This structure contains the credential ID (twice, again) which lets
you know which credential was used. If the allowCredentials
list was empty, this tells you who the purported user was. But, even if
it wasn’t, allowCredentials
can contain multiple
credentials and so this tells you which to validate against.
The rest of the new values are contained in the response
field:
interface AuthenticatorAssertionResponse {
ArrayBuffer? userHandle;
ArrayBuffer signature;
ArrayBuffer authenticatorData;
// Part of the signed message.
ArrayBuffer clientDataJSON;
};
The name of this structure is the first time that we’ve come across the term assertion. Until now we’ve always called the thing generated from the private key a signature. The term assertion refers to the whole response, as defined here, and the rest of the text will only use the word signature when specifically referring to that field.
An assertion contains several fields. First there’s a field called
the userHandle
. Above, it was mentioned that the user ID is
sometimes called the user handle, and here it is! The value
unfortunately has different names in the two places that it appears in
WebAuthn, but it is the user.id
value that was set at
creation time.
Note that it’s optional. There was no user ID field in the U2F protocol when we covered it, so U2F security keys cannot possibly store this value. In fact, non-discoverable credentials aren’t required to store it, only discoverable ones. In either case, this field should only be used in very particular situations:
The credential ID is the best identifier for a credential, but it’s randomly generated by the authenticator. Some sites cannot support looking up an account based on an identifier like that and must generate the identifiers themselves. The user ID can serve that need since it’s specified by the site but, in order to ensure that it’s always returned, such a site must require discoverable credentials. (See also here.)
The next field is the signature
. This must be validated
by the public key signature scheme used by this credential. But, to do
so, you need to construct the message that was signed. That’s where the
authenticator data comes in.
The authenticator data is so named because it comes directly from the authenticator. Since it’s included in the signed message, it must be exposed directly by WebAuthn: any transformation would change the bits and cause the signature verification to fail. So here you have to deal with a binary format; no more Javascript objects.
You might notice that looks very similar to the signed message format in the U2F protocol. That is no accident! Since this data cannot be modified, in order to be backwards compatible the format has only been added to since U2F. (Otherwise U2F security keys wouldn’t be interoperable with WebAuthn.)
The flags are named with two-character abbreviations, as shown in the diagram above:
UP
: User Presence. (Discussed previously.)UV
: User Verification. (Discussed previously.)BE
: Backup Eligible. Indicates that the credential can
be backed up (i.e. is a passkey).BS
: Backup State. Indicates that the credential has
been backed up. In practice, passkey providers just set both BE and BS
flags at all times. In theory, if syncing was paused, then you could see
BE set without BS, indicating that syncing was pending.AT
: Attested credential data. Indicates that an
“attested credential data” structure follows. This will always be set
when creating a credential and never set when getting an assertion. (See
chapter 7.)ED
: Extension Data. Indicates that a CBOR map of
extension data follows. If both AT and ED are set, then the extension
data always comes second.We are finally at the point where we can validate the
signature
! Since WebAuthn is backwards compatible with U2F
security keys, the signed message must be compatible with what U2F did,
so it’s the concatenation of the authenticator data with the hash of the
clientDataJSON
.
There are many checks that the server should make on the signed response in order to be secure. These are covered in chapter 12. But congratulations, you now understand the core of WebAuthn.
We saw an example of the client data in the U2F chapter. It appears
as the clientDataJSON
in the assertion response. Let’s look
at it again and discuss each value in more detail:
{"type": "webauthn.get",
"challenge": "AAECAwQFBgcICQoLDA0ODw",
"origin": "https://example.com",
"crossOrigin": false
}
type
It’s good security hygiene to ensure that any signed message is
unambiguous about what it is. Many security issues have been caused by
inducing two parties in a system to have divergent views about what is
going on. So the client data uses the type
field to specify
the context in which it should be interpreted.
challenge
This field is the base64url encoding of the challenge specified in the request. As discussed in the introduction, this ensures that the signature is specific to an authentication request and doesn’t devolve into what is effectively a password for the account that could be reused later.
origin
The origin field specifies the entity that requested the signature, here a URL. This prevents phishing by ensuring that the signature is specific to the requester and cannot be proxied by a malicious site. But the requester is not always a website. Mobile apps can also make WebAuthn requests using platform APIs covered in chapter 11.
For Android apps this field will contain
android:apk-key-hash:
followed by the base64url encoding of
the SHA-256 hash of the APK signing certificate. For iOS/iPadOS, the
origin will contain the requested RP ID with https://
prefixed. This is a mistake! It’s probably too late to change it now
but, because of this, it’s not possible to distinguish between requests
from apps and requests from websites with the Apple APIs.
crossOrigin
This specifies whether the request came from an iframe that is not
same-origin with all its parent frames. This is obviously specific to
the web, and most sign-in pages will use the
frame-ancestors
directive in a
Content-Security-Policy
header to prevent ever being shown
in an iframe. But, if you need to make WebAuthn requests from an iframe,
see here.
topOrigin
If crossOrigin
is true, this contains the origin of the
top-level frame, so that the server can see where the iframe was
embedded.
It’s important to note that these fields are not exclusive: more fields have been added to WebAuthn over time. So a server-side validator must be able to handle unknown fields in the JSON. But some validators do not want to have the complexity of a full JSON parser. For them, WebAuthn does guarantee some additional structure:
The fields type
, challenge
,
origin
, and crossOrigin
are guaranteed to
appear and guaranteed to appear in that order without any spaces or
newlines between the JSON tokens. If crossOrigin
is true
then the next field is guaranteed to be topOrigin
, again
without any spaces or newlines between tokens. All strings in this
prefix are guaranteed to be minimally escaped.
Any implementation taking advantage of this should carefully follow these steps.
A site using discoverable credentials:
let promise = navigator.credentials.get({
publicKey: {
challenge: new Uint8Array([
// Must be a cryptographically-random number sent
// by the server
0x79, 0x50, 0x68, 0x71, 0xDA, 0xEE, 0xEE, 0xB9,
0x94, 0xC3, 0xC2, 0x15, 0x67, 0x65, 0x26, 0x22,
0xE3, 0xF3, 0xAB, 0x3B, 0x78, 0x2E, 0xD5, 0x6F,
0x81, 0x26, 0xE2, 0xA6, 0x01, 0x7D, 0x74, 0x50,
.buffer,
]),
}; })
An enterprise site, requesting that an employee authenticate with their issued security key:
let promise = navigator.credentials.get({
publicKey: {
allowCredentials: [{
type: "public-key",
id: new Uint8Array([
0x94, 0x38, 0x2b, 0x37, 0xbf, 0x38, 0xc0, 0x05,
0x9a, 0xbd, 0x16, 0x09, 0xdd, 0xf5, 0xd7, 0x0c,
.buffer,
])transports: ["usb"],
,
}]challenge: new Uint8Array([
// Must be a cryptographically-random number sent
// by the server
0x79, 0x50, 0x68, 0x71, 0xDA, 0xEE, 0xEE, 0xB9,
0x94, 0xC3, 0xC2, 0x15, 0x67, 0x65, 0x26, 0x22,
0xE3, 0xF3, 0xAB, 0x3B, 0x78, 0x2E, 0xD5, 0x6F,
0x81, 0x26, 0xE2, 0xA6, 0x01, 0x7D, 0x74, 0x50,
.buffer,
]),
}; })
Keep in mind that WebAuthn is only about authentication. If you authenticate yourself on a machine that is controlled by malware, then the malware has all the same authority that you do. At best, some sites require frequent reauthentication with WebAuthn so that the malware can’t exfiltrate long-lived cookies and has to remain active on the compromised machine. That increases the attacker’s costs, but is no silver bullet.
In the introduction we mentioned that Javascript injected into a site could steal passwords. WebAuthn certainly stops passwords from being stolen by removing the need to enter them. But, similar to the malware case, if the malicious Javascript controls the origin context in the browser, it can make authenticated HTTP requests just as if it were the user. It’s certainly a lot less convenient for the attacker than exfiltrating a password, but an attack is still possible.
Lastly, many sites add WebAuthn as an authentication method alongside a password. But, as long as the password is still a valid way to sign in, a phishing site can try to get the user to enter it by pretending that WebAuthn isn’t working. The ultimate goal must be to remove passwords from accounts. The “backup state” flag in the authenticator data reports whether a credential has been backed up and, if an account has a backed-up credential, and the user has a history of successfully using it, perhaps prompt them to disable their password.
It’s still the case that passwords cause a lot of problems and we should try to address them. WebAuthn is by far the best prospect in that direction and the world would be much better off if it succeeds. But magical thinking only leads to disappointment, and WebAuthn isn’t magic.
As discussed in the previous chapter, relying party IDs (RP IDs) identify sites and prevent the use of credentials between them.
This happens to result in some degree of phishing protection, but
this mechanism is not designed to prevent phishing. Phishing is stopped
by the origin
field in the client data, which goes into the
signed message. RP IDs, on the other hand, are:
RP IDs are “domain shaped” rather than “URL shaped”. E.g.
example.com
is an RP ID, but
https://example.com
is not.
(The U2F Javascript API, which was mentioned in chapter 2, did use
URL-shaped values as RP IDs, which it called AppIDs. That has an impact
on WebAuthn in the form of the appid
and
appidExclude
extensions, which are covered on
here. But otherwise AppIDs are irrelevant now.)
Every operation has an RP ID associated with it, whether it’s using the WebAuthn API or one of the platform APIs. While an API may pick a default RP ID if one isn’t specified, it’s always possible to request a specific RP ID, and so every API defines a way of validating whether a website or app is allowed to request a specific RP ID.
For WebAuthn, the default RP ID is the domain of the current origin.
So for https://example.com
the default RP ID is
example.com
. WebAuthn allows an origin to claim any RP ID
that can be formed by discarding zero or more “labels” from the left of
its domain name until it hits an effective
TLD. A label in a domain name is a substring delimited by periods.
So for www.example.com
, the labels are www
,
example
, and com
. (Plus the empty label at the
end to represent the root, but we’ll ignore that.)
An effective TLD is an effective top-level domain. That is either a
top-level domain, like com
, or a domain that acts like a
top-level domain and is listed in the Public Suffix List, like
co.uk
.
So say that the current origin
is https://www.example.co.uk
: it can specify an RP ID
of www.example.co.uk
(discarding zero
labels), example.co.uk
(discarding one label), but
not co.uk
because that’s an effective TLD. It also couldn’t
specify an RP ID of example.com
because that’s a different
site: that domain cannot be formed simply by discarding labels from the
left of www.example.co.uk
.
Note: There is no relationship between similar-looking RP IDs beyond
WebAuthn’s rules for which origins can claim which RP IDs. So
www.example.co.uk
is as different to
example.co.uk
as to example.org
. Recall that
RP IDs are hashed before being sent to U2F security keys, enforcing that
all internal structure is irrelevant.
Android checks whether a given app is
allowed to use an RP ID by treating the RP ID as a domain name and
fetching https://example.com/.well-known/assetlinks.json
.
This needs to be a JSON file in the Digital Asset Links format, which
broadly attempts to describe the relations between different
entities.
Note: In fact, it’s incorrect to say that “Android” checks this
because the check is actually performed by the passkey provider handling
the request. Google Password Manager, which ships as part of Play
Services, uses assetlinks.json
and has defined the norm on
that platform, but, technically, other password managers could decide to
do something different.
The assetlinks.json
file needs to contain JSON similar
to the following:
[
{
"relation" : [
"delegate_permission/common.handle_all_urls",
"delegate_permission/common.get_login_creds"
],
"target" : {
"namespace" : "android_app",
"package_name" : "PACKAGE_NAME",
"sha256_cert_fingerprints" : [
"APK_CERTIFICATE_FINGERPRINT"
]
}
}
]
The PACKAGE_NAME
placeholder should be replaced with the
name of the Android package, e.g. com.example.myapp
. The
APK_CERTIFICATE_FINGERPRINT
should be replaced with a
hex-with-colons encoded, SHA-256 hash of the APK signing certificate.
You can get this value from an APK with:
keytool -list -printcert -jarfile app.apk
Or you can run ./gradlew signingReport
in your build
directory. Either way, the fingerprint value must separate each pair of
hex digits with a colon, like ab:cd:12:34:...
.
This JSON object should be replicated for each app that needs access to the RP ID. When doing so, consider that this is declaring a significant level of trust for each listed app. Not only can they use the RP ID, but saved passwords may be shared between these apps and the website too.
Currently, the only relation needed is
delegate_permission/common.handle_all_urls
, but Google
hopes to transition to the more specific
delegate_permission/common.get_login_creds
relation in the
future and currently documents that apps should list both.
The assetlinks.json
is currently not allowed to be
behind a redirect. So, if https://example.com
just serves
redirects to https://www.example.com
then that won’t work.
To debug issues, run:
curl 'https://digitalassetlinks.googleapis.com/v1/assetlinks:check?source.web.site=https://example.com&relation=delegate_permission/common.handle_all_urls&target.android_app.package_name=PACKAGE_NAME&target.android_app.certificate.sha256_fingerprint=APK_CERTIFICATE_FINGERPRINT'
Remember to replace PACKAGE_NAME
and
APK_CERTIFICATE_FINGERPRINT
with the same values as in the
JSON above.
To see an example of an assetlinks.json
file, try
fetching it for a commonly-known site. E.g.
https://amazon.com/.well-known/assetlinks.json
.
The Apple platforms use a similar system of .well-known
files in order to decide which apps are allowed to use a given RP ID.
For Apple devices, the file is
https://example.com/.well-known/apple-app-site-association
and it should include something like this:
{
"webcredentials": {
"apps": [ "T7AYYU7S6A.example.com.YourApp" ]
}
}
The first label of the app identifier (T7AYYU7S6A
in the
example) is your “Team ID”. You can find this under “Membership details”
on https://developer.apple.com/account
. The rest is the
“bundle identifier” for your app, which is set when the project is
created.
Requests for this file will come from Apple’s servers by default but,
for testing, it’s possible to enable “Associated Domains Development” in
the Developer menu on iOS. (But only, it seems, on devices. The option
doesn’t appear in the simulator.) If you want to see what Apple’s
servers see for your domain, fetch
https://app-site-association.cdn-apple.com/a/v1/example.com
.
An app also needs to list webcredentials:example.com
as
a domain in the “Associated Domains” capability.
If you change your site association file, delete the app and reinstall to ensure that any caches are updated on the device.
Browsers have to be special. Consider what would happen if you
browsed to https://example.com
on a phone and tried to use
a passkey to sign in. Using just the rules specified above, example.com
would have to authorize every possible browser app to use its RP ID.
This would be untenable, so browsers have to be trusted to act as any RP ID. There are processes run by both Apple and Google to recognize trusted browser apps for this purpose.
Note: As with assetlinks.json
, the set of recognized
browsers on Android is just a norm established by Google Password
Manager, which publishes its list of trusted browsers at
https://gstatic.com/gpm-passkeys-privileged-apps/apps.json
.
Other password managers on Android tend to follow this norm.
It’s important to carefully consider RP IDs from the outset. Let’s
take the example of https://www.example.co.uk
. That site
might happily be creating credentials with its default RP ID
(www.example.co.uk
) but later decide that it wants to move
all sign-in activity to an isolated origin,
https://accounts.example.co.uk
. But none of its credentials
could be used from that origin. It could discard a label from the left
to form example.co.uk
, but the rules don’t allow any labels
to be prepended, so www.example.co.uk
isn’t a valid RP ID
for that origin. The site would have needed to create credentials with
an RP ID of example.co.uk
from the outset.
But the rule is not to always use the most general RP ID possible.
Going back to our example,
if https://usercontent.example.co.uk
existed to host
uploaded content, then pages on that origin could create credentials
with an RP ID of example.co.uk
. We can assume
that accounts.example.co.uk
is checking the origin of any
assertions, so usercontent.example.co.uk
can’t use its
ability to set an RP ID of example.co.uk
to generate valid
signatures, but it can still try to get the user to create new
credentials which could overwrite the legitimate ones. It can also get
any secrets associated with the credentials because it can assert them.
(See here.)
All this means that the choice of RP ID needs to be considered carefully at the beginning of any deployment.
As described above, you have a lot more freedom with the native APIs than you do on the web. With the native APIs, you can nominate many apps to be able to use your RP ID, but the rules on the web don’t allow any other website, even with permission, to use another site’s RP ID.
Not all websites fit into that structure. Some are spread across
country code top-level domains and exist as example.com
,
example.de
, example.in
, etc. But with the
rules above, none of those country-specific instances of the site could
share credentials. Sometimes two different brands are intimately linked,
but have separate websites, like Hilton and DoubleTree.
Probably the best option in this case is to use a single origin to handle all sign-ins, and use a federation protocol like OpenID Connect on the related sites. But not all websites can do that, thus the RP ID rules for the web were relaxed somewhat with the introduction of related origins. When a WebAuthn request specifies an RP ID that would not be permitted under the rules above, browsers that implement related origins will attempt to fetch a document hosted at the following URL.
https://example.com/.well-known/webauthn
If it exists, and has the MIME type application/json
,
then it is parsed as JSON and can contain something like the
following.
{
"origins": [
"https://example.com",
"https://example.co.uk",
"https://example.in",
"https://www.example.in",
"https://otherbrand.com"
]
}
If the origin that made the WebAuthn request is listed as one of the permitted origins in that document, then the request will be allowed to continue. However, there are limits on how many origins can be listed like that:
For each listed origin, its eTLD+1 label is extracted from its domain name. The eTLD+1 label is the rightmost label that is not part of the effective TLD. For the example origins above:
Origin | eTLD+1 label |
---|---|
https://example.com |
example |
https://example.co.uk |
example |
https://example.in |
example |
https://www.example.in |
example |
https://otherbrand.com |
otherbrand |
The maximum number of distinct eTLD+1 labels currently permitted by browsers is five. So for sites that are spread across many country-code top-level domains, all of those domains only count as one label. But if you have many different brands, then you could quickly hit the limit.
Chapter 2 covered the protocol for communicating with U2F security keys, but that protocol doesn’t support discoverable credentials, user verification, or many other features of WebAuthn. So a new protocol for communicating with security keys was needed: CTAP2. (Computer To Authenticator Protocol Two; the U2F protocol can be considered to be CTAP1.)
CTAP2 is a significantly more complex protocol which involves the computer and security key exchanging messages encoded in a format called CBOR. This use of CBOR breaks through into WebAuthn in several places so, even if you’re going to skip most of this chapter, a familiarity with CBOR will be valuable.
CBOR officially stands for “Concise Binary Object Representation”, but the lead author’s name is C. Bormann, which might have had more to do with it. It is described in RFC 8949 and is a MsgPack-inspired format in the family of “binary JSONs”.
CTAP2 (and thus WebAuthn) only uses a subset of CBOR, which will be described here. CBOR used in CTAP2 and WebAuthn must conform to this subset, so beware of using generic CBOR libraries for encoding as they may not stay within it.
Each CBOR value (called a data item) starts with a leading byte. The most-significant three bits specify the major type from zero to seven. The value of the five least-significant bits specifies how to calculate the argument:
The argument is then interpreted depending on the value of the major type:
false
value or 21 to represent true
. All other values are invalid
in CTAP2.There are some additional rules applied to ensure that the CBOR is canonically encoded. I.e. that a given CBOR message has exactly one valid encoding:
This canonical form is the same as the Core Deterministic Encoding Requirements from the CBOR RFC, since only integers and strings appear as map keys in CTAP2, but it is not the same as the “Canonical CBOR” section of the older RFC 7049.
Note: Since security keys are either impossible or very difficult to update, and often under significant code-size pressure, using canonical encodings minimizes interoperability problems.
In the underlying CBOR data model, integers are signed, 65-bit values. However, since support for 65-bit integers is rare in programming languages, treating integers as signed 64-bit values is always sufficient in CTAP2.
One point about UTF-8 strings is worth keeping in mind. Security keys are embedded devices with limited storage and CPU. WebAuthn allows sites to specify arbitrary strings for things like user names, which security keys have to store (at least for discoverable credentials). Since storage is limited, security keys are allowed to truncate overly-long strings. However, when they do that truncation, they often do so after a fixed number of bytes, which may fall in the middle of a multi-byte UTF-8 sequence. Thus, when that string is returned in response to a future operation, the UTF-8 will be invalid. Platforms have to make accommodations for this because that makes the CBOR from security keys technically invalid and standard CBOR parsers will reject it.
Commands in CTAP2 consist of a single byte to identify the command, followed by a CBOR map with all the parameters. Responses consist of a single-byte response code which is either non-zero, representing an error, or zero for success. Successful responses are optionally followed by a CBOR map with details of the response.
Let’s look at the first command sent to a CTAP2 security key:
authenticatorGetInfo
. It’s command number four, and it
doesn’t have any parameters, so the whole command is a single byte with
value 0x04. The response will consist of a zero byte (to indicate
success) followed by a CBOR map. Let’s look at the contents of that map
(translated into CBOR’s diagnostic notation) to get a feel for the
protocol.
(There’s a lot here, you don’t need to understand everything. It’s just to give a sketch of all that’s contained in CTAP2.)
{
# Supported protocol revisions
1: ["FIDO_2_0", "FIDO_2_1_PRE", "FIDO_2_1"],
# Supported extensions.
2: [
"credProtect",
"hmac-secret",
"largeBlobKey",
"credBlob",
"minPinLength"
],
# The AAGUID of the security key. This identifies the make & model
# of security key.
3: h'D8522D9F575B486688A9BA99FA02F35B',
4: {
# Supports discoverable credentials.
"rk": true,
# Supports user presence tests.
"up": true,
# Supports user verification, but it's not configured.
"uv": false,
# Is not a platform authenticator.
"plat": false,
# Not actually an official option!
"uvToken": true,
# The security key will do user verification for all operations.
"alwaysUv": true,
# Supports listing and deleting discoverable credentials.
"credMgmt": true,
# Supports the `authenticatorConfig` CTAP2 command.
"authnrCfg": true,
# Has a biometric sensor, but it's not configured.
"bioEnroll": false,
# Supports PIN entry on the computer.
"clientPin": true,
# Supports the largeBlob extension.
"largeBlobs": true,
# Supports tokenized user verification.
"pinUvAuthToken": true,
# Supports configuring the minimum PIN length.
"setMinPINLength": true,
# Doesn't support creating credentials without user verification.
"makeCredUvNotRqd": false,
# Whether the authenticator supports older versions of a couple
# of commands.
"credentialMgmtPreview": true,
"userVerificationMgmtPreview": false
},
# Maximum command size, in bytes.
5: 1200,
# User verification token protocol versions, in preference order.
6: [2, 1],
# Maximum number of credential IDs that can be included in a
# single command
7: 8,
# The maximum length of a credential ID from this security key.
8: 128,
# The list of supported transports.
9: ["usb"],
# The list of supported signature schemes: ECDSA with P-256
# and Ed25519.
10: [
{"alg": -7, "type": "public-key"},
{"alg": -8, "type": "public-key"}
],
# Maximum largeBlob array.
11: 1024,
# The current PIN doesn't have to be changed immediately.
12: false,
# The minimum PIN length.
13: 4,
# Firmware version.
14: 328966,
# Maximum length of a credBlob value.
15: 32,
# Number of RP IDs that can be configured to receive the minimum
# PIN length.
16: 1,
# Number of biometric attempts that can fail before falling back
# to using a PIN.
17: 3,
# The type of biometric sensor used: a fingerprint reader
18: 2,
# How many more discoverable credentials can the security key store.
20: 25
}
A few patterns of the protocol are demonstrated here: the top-level
keys in the CBOR maps are integers for compactness, but strings can be
used to identify things too. Sometimes WebAuthn Javascript structures
are transliterated into CBOR: key 10 clearly mirrors the
pubKeyCredParams
structure which should be familiar from
chapter 4. Lastly, it’s obviously a lot more complex than U2F!
We won’t be covering every corner of CTAP2 in this book; instead we’ll be focusing on some of the higher-level concepts. The FIDO Alliance publishes the CTAP2 specification if you want all the details.
If you want to see the authenticatorGetInfo
response for
a given CTAP2 security key, do an operation with it in Chrome on macOS
or Linux and then open chrome://device-log
. It’ll be logged
there along with other details of the request.
User verification is one of the headline features of CTAP2. But performing a single WebAuthn operation can require many CTAP2 commands, in the same way that it required many U2F messages. So the platform has to ask the security key to verify the user, and the security key returns a secret value to the platform to represent that verification, which the platform uses as an HMAC key to authenticate all the commands to which that verification applies. The platform is then trusted to discard the token (although there are limits, as we’ll see).
There are broadly two forms of user verification supported in CTAP2. Either the user verification is built into the security key itself (with a fingerprint reader or a PIN pad), or user verification is done by entering a PIN on the computer and sending it to the security key to be checked. The CTAP2 spec refers to the computer as the “client” and thus the latter pattern is called client PIN.
The former is more secure because it eliminates the risk of the PIN being captured while it is entered on the computer. But a client PIN allows for security keys to be simpler and cheaper.
CTAP2 does not expose PINs and user verification tokens as plaintext in the protocol. While that’s not a huge concern for USB-connected security keys, security keys can also work via NFC. So the computer and security key perform an elliptic-curve Diffie–Hellman (ECDH) key agreement whenever UV is used in order to mutually calculate a secret key to encrypt any PINs and tokens with.
The PIN protocol (which, despite the name, is also used for built-in user verification methods to protect the resulting token) specifies the key agreement, encryption, and authentication primitives to use. There are only two defined by CTAP2 and the second is just a tweak of the first to make it easier to certify under NIST’s FIPS 140 programs. The second version is now required to be implemented by security keys and so will slowly replace the first. That’s what will be described here.
In the authenticatorGetInfo
response above, both
clientPin
and pinUvAuthToken
are true, which
means that the security key has a PIN set and supports getting a token
to represent a user verification. To get this token, the platform would
do the following:
authenticatorClientPIN
command to get an
ephemeral ECDH value from the security key.authenticatorClientPIN
with its own ECDH
value and 16 bytes of the hashed PIN, encrypted with the mutual shared
secret from the ECDH calculation.The security key decrypts the PIN hash and compares it against the correct value. The security key can enforce a maximum number of attempts but, if it’s satisfied, it encrypts a random value, called a PIN/UV Authentication Token (PUAT), and returns it to the platform.
Now that the platform has the PUAT, it can use it as an HMAC key to authenticate future commands. When it requested the PUAT, the platform had to specify what type of commands it was planning to use it for, and the relying party ID for all those commands. Thus, even if the PUAT leaks, its scope is limited.
That doesn’t make a lot of difference if UV is implemented with client PIN because a misbehaving platform could just save the PIN itself, and obtain as many PUATs as it needed. But when user verification is built into the security key, it does limit the abilities of misbehaving platforms a little.
Getting a PUAT from a security key that has a built-in method of user verification, like a fingerprint reader, looks very similar, but there’s no encrypted PIN in the request. In contrast with U2F, CTAP2 security keys don’t have to immediately respond to commands and so can take their time, flashing and waiting for the user to present their fingerprint.
However, security keys with fingerprint readers will often have a client PIN too because fingerprint readers can be temperamental.
Having covered these operations in detail in chapter 4, there is not much more that needs to be said about them here, because the CTAP2 commands for these operations simply transliterate most of the WebAuthn structures into CBOR.
The complexity of implementing these operations in CTAP2 comes from
the fact that the size of the commands can exceed the message buffer of
the security key. If you look at the example
authenticatorGetInfo
result above, key 5 specifies the
maximum number of bytes in a command, but the
excludedCredentials
list in a create()
operation, or the allowCredentials
list in a
get()
, can be arbitrarily long. While it’s unusual, it’s
valid to have 50 credential IDs, each of which are 100 bytes. That would
exceed the message buffer of nearly any security key.
The authenticatorGetInfo
result specifies the maximum
length of a credential ID that the security key will generate (in key
8). So any credential IDs longer than this can be discarded by the
platform when interacting with this security key, potentially reducing
the problem but the core issue remains.
So, after doing that filtering, the platform needs to split up the
list of credential IDs into batches such that no batch causes a command
to be too long. It will then probe for credential IDs with these
batches, very much like the U2F protocol, until it finds a batch where
one of the credential IDs was recognized by the security key. Each of
these probe messages sets the up
(user presence) flag to
false, so that no physical interaction with the security key is
needed.
Things get particularly complicated when user verification interacts
with the appId
extension (see here)
but these are details that only platform implementers need to concern
themselves with, and so we won’t cover them here.
CTAP2 also includes a number of commands that aren’t directly used for implementing WebAuthn operations, but it’s still useful to be aware of them.
Security keys support being reset. This erases all credentials and configuration, and rotates any root secret for stateless credentials such that they are invalidated. For obvious reasons, this is a dangerous command! Thus, security keys don’t always support this command over NFC if they also support USB and, over USB, this command is only valid within a few seconds of inserting the security key.
CTAP2 allows the platform to drive fingerprint enrollment and to manage fingerprint templates. The enrollment process is like that for any fingerprint reader: you have to press your finger repeatedly on the sensor until it has gathered enough data to compute a template that it can later match against. The security key can return a series of status messages (“finger too far to the left”, “finger wasn’t pressed for long enough”, etc) for the platform to report back to the user.
The authenticatorBioEnrollment
command also allows the
platform to list and delete any existing templates, each of which can
have a “friendly name” set so that the user can identify them.
The authenticatorCredentialManagement
lets the platform
list the RP IDs of all the discoverable credentials on a security key.
Then, for a given RP ID, it lets the platform list each discoverable
credential recorded, including the user information. Discoverable
credentials can be deleted, and the user information can be updated.
The
authenticatorConfig
command allows a number of
miscellaneous features to be controlled:
If a company has distributed security keys to its employees in order to protect authentication to its corporate resources, it might want to know if those employees are actually using them and not others. It might even want to know that they are using the precise security key that was inventory tracked and assigned to them. That is what attestation is for.
Nearly all security keys ship from the factory with an attestation private key included within them. Unlike every other private key that we have dealt with so far, it is not generated on demand and it is not specific to any given credential; it is global to the security key. When you create a credential in these security keys, they will sign over the resulting public key with their attestation private key to show that the newly generated key was generated within that security key.
This creates a tension: if the attestation private key were specific to the security key then all credentials generated in that security key could be correlated across RPs. But if the same private key were used in lots of security keys then the potential would exist for an attacker to extract it from another security key of the same model, and so the security of the attestation private key would be reduced.
FIDO balanced this tension by requiring that attestation private keys must be used over a batch of at least 100 000 security keys. So they do not uniquely identify an individual security key, they instead identify a large batch of them and so convey only the make and model of the security key. Correspondingly, this means that if you have the right (expensive) equipment, you could buy a security key from that batch and extract the attestation private key. (And some security keys don’t sell well enough to ever have a full 100 000-unit batch and, for those, the privacy is proportionally limited.)
For cases where this level of attestation is insufficient, and the attestation private key really needs to be specific to a security key, the concept of “enterprise attestation” was created, which allows this in a restricted fashion.
WebAuthn does not provide attestation by default; it’s assumed that
most sites will not want it. To request attestation when making a
credential, set the attestation
parameter in the
PublicKeyCredentialCreationOptions
dictionary to
direct
.
Just like a regular signature in WebAuthn, you want to bind the
signature from the attestation private key to the specific creation,
thus it needs a challenge. Therefore, when doing attestation, set the
challenge
parameter in that dictionary to a large random
value from the server.
Once the credential has been successfully created, you may have
received an attestation. To find out, you need to look at the
response.attestationObject
field of the resulting
PublicKeyCredential
.
Since this field can contain data that comes directly from the security key, and is signed by the attestation private key, we are leaving the realm of JavaScript. It is a CBOR map in the CTAP2 subset of CBOR (see chapter 6). It contains the attestation information but also the credential public key and the authenticator data (see here).
Having the public key inside attestationObject
may sound
duplicative because chapter 4 already covered how to get the public key.
But the server cannot use the getPublicKey
,
getAuthenticatorData
, and
getPublicKeyAlgorithm
methods when checking attestation.
Those methods are convenient because the platform will do the parsing
and conversion for you, but the attestation private key doesn’t sign the
nicely parsed and converted values, it signs the raw data from the
security key. And so an implementation that wants to check attestation
must only trust the raw data in attestationObject
.
The CBOR map will contain at least the following keys:
Key | Value |
---|---|
fmt |
A string that specifies the type of attestation provided. If this is
none , then no attestation has been provided. |
authData |
The authenticator data, a byte string. |
attStmt |
The attestation information itself. |
The authenticator data here is the same authenticator data that is returned when getting a regular signature (see here) except that it will have the AT flag set, indicating that “ATtested credential data” follows the signature counter. This new data has the following format:
The AAGUID is a random, 16-byte value that identifies a model of security key and is used when validating the attestation. The variable-length credential ID is as explained in chapter 2, but the public key is in the obscure COSE format. (See chapter 13 for details of public key formats and how to convert between them.) There’s no length prefix for the public key and the attested credential data may be followed by extension data if the ED flag is set. So, in order to find the end of the COSE key and thus the start of the extension data, you have to parse the CBOR map.
When checking an attestation, you also need to extract the
clientDataJSON
field from the
PublicKeyCredential
’s response
field. This
JSON is similar to what you have seen before except that the
type
will now be webauthn.create
. When not
checking attestation there’s no point looking at this field because, if
an attacker was trying to do something nefarious, they could just update
it. But it is covered by the attestation signature so it’s meaningful to
check it in the same way as when processing a regular credential
signature.
Recall from chapter 2 that when a U2F security key creates a credential, it returns an X.509 certificate and a signature. That certificate contains the attestation public key and the signature is made by the attestation private key.
Any attestations from a U2F security key will appear in WebAuthn with
a fmt
of fido-u2f
because the platform will
convert them. Seeing this format thus indicates that you have to
validate a U2F attestation. The attStmt
in this case will
be a CBOR map (in CTAP2 format, as always) with the following keys:
Key | Value |
---|---|
x5c |
The X.509 certificate as a byte string |
sig |
A P-256 ECDSA signature in ASN.1 DER format |
The data signed by the attestation private key for this format is the following, concatenated:
clientDataJSON
value.The X.509 certificate contains the attestation public key, but how do
you trust it? One answer is that, if your company is purchasing a large
number of security keys to distribute to its employees, it can simply
ask the vendor of those security keys for their root certificate. The
attestation certificate in x5c
should be signed by that
root certificate to show that it is authentic. The FIDO Alliance also
provides a repository of information about certified security keys,
indexed by AAGUID, which is covered in the next section.
The U2F protocol, however, doesn’t include any AAGUIDs, so the AAGUID
for a fido-u2f
attestation will always be zero and some
bespoke configuration will be needed to know the attestation roots for
validation. That’s fixed with CTAP2 authenticators which use the
packed
attestation format.
The significant changes with packed attestation are the following:
fmt
is packed
.x5c
key in attStmt
now contains a CBOR
array of one or more byte strings. The first is the attestation
certificate and the remainder (if any) are additional X.509 certificates
that form a certificate chain to the root.alg
key in the attStmt
that contains the COSE algorithm ID of the signature scheme used to
produce the signature.clientDataJSON
—matching the
form used for regular credential signatures.CTAP2 security keys will produce attestations in this form. Let’s do a worked example with one.
After creating a credential with attestation
set to
direct
, we inspect response.attestationObject
in the resulting PublicKeyCredential
and decode it as the
CTAP2 subset of CBOR (see chapter 6) to get:
{
"fmt": "packed",
"attStmt": {
"alg": -7, # ECDSA with P-256 and SHA-256
"sig": h'30460221...',
"x5c": [h'308202be30...']
},
"authData": h'f95bc73828...'
}
The authenticator data from authData
(see
here) breaks down as:
# SHA-256 hash of the RP ID
f95bc73828ee210f9fd3bbe72d97908013b0a3759e9aea3d0ae318766cd2e1ad
# Flags: AT, UV, and UP set.
45
# Signature counter
0000001f
# Attested credential data
# AAGUID
f8a011f38c0a4d15800617111f9edc7d
# Credential ID length
0040
# Credential ID
3429904107e65bf06f19fd8fa55b4bda
04ede99c1a6994c6bc315252cc6940bf
aeb0c7c62dc88214fc52cb7105aa33da
7b480da9012c36853d4179f159c9348c
# Public key in COSE format
a5010203262001215820bc767fb6069f
fd51dbd04916030ec23399e72eefab22
352f29906621351dc83122582066c21d
877c48527407f891ba9611ba85eed1b9
b00164daf2f0a67c39038d771f
While packed attestation can contain multiple X.509 certificates to form a chain to the root, this attestation only includes a single certificate, which contains the following:
Certificate:
Data:
Version: 3 (0x2)
Serial Number: 1955003842 (0x7486fdc2)
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN=Yubico U2F Root CA Serial 457200631
Validity
Not Before: Aug 1 00:00:00 2014 GMT
Not After : Sep 4 00:00:00 2050 GMT
Subject: C=SE, O=Yubico AB, OU=Authenticator Attestation,
CN=Yubico U2F EE Serial 1955003842
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:95:5d:f3:ad:f7:24:7d:31:75:ef:fd:9c:c4:f3:
1a:4e:87:8e:ba:e1:81:09:56:61:50:fb:38:8b:2e:
5f:65:27:bf:57:40:9a:a5:81:a5:0d:0a:c5:2f:18:
44:5c:0a:13:54:8a:13:53:c8:a4:e5:9a:70:4e:52:
3b:c0:4d:eb:ed
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
1.3.6.1.4.1.41482.2:
1.3.6.1.4.1.41482.1.1
1.3.6.1.4.1.45724.2.1.1:
...
1.3.6.1.4.1.45724.1.1.4:
.......M........}
X509v3 Basic Constraints: critical
CA:FALSE
Signature Algorithm: sha256WithRSAEncryption
31:5c:48:80:e6:9a:52:7e:38:66:89:bd:69:fd:0a:a8:6f:49:
...
The AAGUID (f8a011f3-8c0a-4d15-8006-17111f9edc7d
) from
the authenticator data can be used to look the security key up in the
FIDO Alliance’s metadata. The set of metadata can be downloaded by
fetching https://mds.fidoalliance.org/
, which results in a
signed JWT. (Information about the metadata service can be found at https://fidoalliance.org/metadata/
.)
We can find the record for this security key by running:
curl https://mds.fidoalliance.org/ | \
cut -d. -f2 | \
python3 -c "import sys, base64;
sys.stdout.buffer.write(
base64.urlsafe_b64decode(sys.stdin.read().strip()))" | \
jq '.entries[] |
select(.aaguid == "f8a011f3-8c0a-4d15-8006-17111f9edc7d")'
Note: While the data is being downloaded using HTTPS, this one-liner doesn’t validate the JWT signature. You may wish to do that.
Within the JSON record is an attestationRootCertificates
field that contains the attestation root certificate for this device.
Now we can check that the attestation certificate is valid using
standard X.509 libraries. For illustration we’ll use the OpenSSL
command-line tool:
openssl verify -CAfile root-certificate.pem attestation-certificate.pem
attestation-certificate.pem: OK
The public key from the attestation certificate can be used to verify
the attestation signature by appending the SHA-256 hash of
clientDataJSON
to the authenticator data. This provides
evidence that the COSE-format public key in the authenticator data
really was generated inside a security key of the attested make &
model. A real implementation wouldn’t use command line tools, of course,
and should perform each step of the validation
procedures from the specification.
Note: CTAP2 security keys often also still support the U2F protocol,
and will produce fido-u2f
attestations if used via that
protocol. Since early CTAP2 security keys could not create credentials
without doing user verification once a PIN had been configured, attempts
to create non-discoverable credentials without user verification may
still use the U2F protocol even though a security key supports
CTAP2.
You might also come across a packed attestation that is missing the
x5c
key completely. That is a variant of packed attestation
called a self-attestation.
Regular credential creation in WebAuthn does not involve the newly created private key signing anything. Thus, when a credential is created, there’s no proof that the entity submitting the new credential actually holds the private key, and thus it’s possible to submit someone else’s public key as your own.
This does not obviously cause any problems in WebAuthn but the
standard solution is a “self-signature” and self-attestation provides
this. It is a packed attestation, as described above, where there is no
x5c
key in the attestation statement, and the public key
used to validate the attestation signature is the newly-created
credential public key. The signature shows that the private key was
involved in the creation.
However, self-attestations are not really attestations! They claim nothing about how the private key was generated nor where it is stored. They simply reuse some of the mechanisms. Because of this, self-attestations are rarely encountered.
There are several other, less common attestation formats. If you are
building a WebAuthn deployment and come across an unexpected attestation
fmt
, you may be able to find details in this
section of the WebAuthn spec. Hopefully now that you understand the
broad shape of attestation, the specification will be easier to
understand.
None of the attestations above identify an individual security key, otherwise it would be possible to track a specific security key as it was used on different websites and in apps. However, if you are a company purchasing security keys for your employees, you might legitimately want to be able to individually track these security keys.
There is one straightforward way to do that, which is to pre-create credentials on the security keys before distribution. This is a viable solution in many cases but it does not always work so, for the exceptions, CTAP2 and WebAuthn define a concept of enterprise attestation, where the attestation certificate individually identifies a specific security key for inventory tracking purposes.
If this simply replaced the standard attestation certificate, that would cause all of the privacy problems outlined above. So an enterprise attestation certificate is in addition to a regular attestation certificate and requests have to be authorized to use it. Security keys with enterprise attestation must also be specially purchased from the vendor.
There are two ways that enterprise attestation can be authorized for a specific request:
In both cases the WebAuthn create
request has to request
enterprise attestation by setting the attestation
parameter
to enterprise
. If the security key has an enterprise
attestation certificate, and if the request meets at least one of the
two requirements above, then the enterprise attestation certificate will
be returned (and the corresponding attestation private key will be used
to sign the attestation).
When a request specifies enterprise attestation, the platform will
consider whether its configuration specifies that the relying party ID
from the request is authorized to receive it. The mechanism of this
configuration is specific to the platform but, for Chrome/Edge, the
policy is called SecurityKeyPermitAttestation
.
If, by whatever mechanism the platform uses, the relying party ID is
authorized, then the platform will send an extra ep
parameter with the CTAP2 authenticatorMakeCredential
command, and will give it the value 2
. This informs the
security key that the platform believes that the request is authorized
to use any enterprise attestation certificate that the security key may
have configured.
Otherwise, the platform sends the value 1
, which informs
the security key that enterprise attestation has been requested, but the
platform policy doesn’t authorize it. Still, the security key itself may
recognize the relying party ID and decide to use enterprise
attestation.
The security key is always free to decide that it doesn’t want to
return enterprise attestation for any request, and the feature is
disabled by default and must be explicitly enabled with an
authenticatorConfig
command after purchase, and after each
reset (see here). The presence of
the ep
field in the authenticatorGetInfo
response indicates that enterprise attestation is supported by a
security key, and its value indicates whether it is currently
enabled.
The enterprise attestation signature signs the authenticator data.
When considering the output of extensions (see chapter 9), a server may
need to extract extension outputs directly from the authenticator data
where possible, rather than use the browser’s reflection of them in the
PublicKeyCredential
object, in order to ensure that it’s
getting authentic extension results. This is most applicable to the
minPinLength
extension (see
here).
Chapter 4 covered the core of WebAuthn, which is reflected in the platform-specific APIs detailed in chapter 11. However there are several aspects of WebAuthn that are specific to operating on the web. This chapter covers those parts.
Nothing works unless WebAuthn is available in the browser. While
WebAuthn is very widely supported now, there are still contexts (such as
WebViews) where support might be lacking. To check whether WebAuthn is
available, test for the existence of
window.PublicKeyCredential
:
if (!window.PublicKeyCredential) {
// WebAuthn is not available in this context.
}
If WebAuthn is available then sites can silently detect whether a platform authenticator that supports user verification exists:
const promise = window.PublicKeyCredential.
isUserVerifyingPlatformAuthenticatorAvailable();
.then((hasUVPlatformAuthenticator) => ..., (error) => ...); promise
Sites would typically check this before proactively prompting users
to create a credential. Users may wish to use security keys, so the
option to create a credential should still appear in the account
settings, but it is unwise to try and upsell the user unless this
promise resolves with true
.
Hopefully there will come a day when WebAuthn dominates the authentication landscape and sites just have a sign-in button that starts a WebAuthn flow. But today that is not the case.
Today, most users recognize a pair of text boxes as the way that they start a sign-in process, and they may well be used to their password manager auto-filling their username and password into them. Conditional UI is a way for that autofill to also include WebAuthn credentials.
The way it works is that a page can make a
navigator.credentials.get
call and pass
mediation: "conditional"
in the top-level dictionary. (So
at the same level as publicKey
, not inside the assertion
options.) That will cause the request not to show a modal UI, but
instead the returned promise will hang around, unresolved.
const abortController = new AbortController();
const promise = navigator.credentials.get({
// The `requestOptions` are the same as for a regular request.
publicKey: requestOptions,
mediation: 'conditional',
signal: abortController.signal,
;
})// `promise` may not ever resolve but, if it does, it'll return a
// `PublicKeyCredential`, just like a regular `get()` call.
The contract with the browser is that it may present WebAuthn as an
option to the user in whatever unobtrusive manner that it wishes.
Currently, that means that credentials can be offered in autofill for
fields that have webauthn
as the final autocomplete
token:
<input type="text" name="username" autocomplete="username webauthn">
If the user selects a WebAuthn credential from the browser’s autofill menu then any needed user verification will be completed and the promise from the conditional request will be resolved. The Javascript for the page is then responsible for sending the assertion to the server and getting the user signed in.
Before starting a conditional request, check that the browser
supports them: (Otherwise the mediation
parameter will be
ignored and the call will trigger a modal UI.)
if (PublicKeyCredential.isConditionalMediationAvailable) {
const promise = PublicKeyCredential.
isConditionalMediationAvailable();
.then((isSupported) => ..., (error) => ...);
promise }
A conditional request should be made as soon as possible after page load because credentials won’t appear in autofill unless a conditional request is pending to receive the resulting credential.
A page may well use conditional UI and also have a “Sign in with passkey” button that triggers a modal request. However, only one WebAuthn request can be outstanding at any given time. So if the conditional request is hanging, waiting for a possible credential from autofill, trying to make a regular request in response to the user clicking on the button will fail immediately.
Instead, conditional requests need to be aborted before the modal
request can be started. In the example above, the conditional request
also took a signal
parameter. This allows an AbortController
to abort a conditional request by calling
abortController.abort()
and waiting for the promise from
the conditional request to fail. Then another WebAuthn request can be
started.
If the modal request fails, then you need to restart the conditional request; otherwise credentials won’t appear in autofill after the user clicks the button.
Conditional UI makes generating the
challenge
parameter in the request more, well, challenging.
This is also the case with the
preferImmediatelyAvailableCredentials
option to the
platform APIs that we’ll cover in chapter 11.
Recall that the properties that we want of a challenge are that:
The ideal way to meet these requirements is to generate a random challenge at the server, record it, and check against it when the signature is received. But conditional requests want to be started as soon as possible after page load. Any delay to fetch a challenge is a problem. This is also the case with mobile apps that want to show the option to sign in as soon as the app is opened.
With conditional UI, the challenge can be dynamically embedded in the page contents to avoid making a separate request for it. But this isn’t applicable when opening an app because that is a purely local operation.
Conditional UI challenges also have a second tension (which the app case doesn’t) because the signature from a conditional request could come days later. How long does the server have to store session information for the possibility that a page load is still in a tab somewhere, waiting for the user to sign in?
The best answer for these issues is probably a feature that doesn’t
exist at the time of writing but which has been proposed: a
challengeUrl
parameter as an alternative to the
challenge
parameter. This would specify a URL from which a
challenge can be downloaded at the point that one is needed. For
conditional UI, the challenge wouldn’t be downloaded until the user has
selected a credential and the signing operation is ready to happen. For
apps, fetching the challenge could happen concurrently with showing UI
to the user, thus minimizing latency.
An alternative that is often suggested is encrypted timestamps; i.e. distribute a stateless service around the world (to minimize latency) that encrypts and returns a timestamp on demand to serve as a challenge. When validating signatures, the timestamp can be decrypted and checked to be reasonably recent.
On the plus side, this avoids needing storage and can make latency acceptable. But there are several drawbacks. Firstly, it limits replay but doesn’t prevent it. The extent that replays are possible is bounded by how old a received challenge is allowed to be. If it’s just a few minutes, perhaps you deem that acceptable, but larger windows are progressively more concerning. Thus any conditional UI requests need to be restarted frequently.
Second, it would also obviously be bad if the encryption key leaked. Public-key cryptography doesn’t help here because, in order for future valid challenges to be unpredictable, it mustn’t be possible for an attacker to generate a valid encrypted timestamp. Thus public-key cryptography shouldn’t be used because public keys are assumed to be public, and making non-standard assumptions of cryptosystems is usually disastrous.
If a user is already using a password manager to store a password for
a site it would be great if that password manager would just start
storing a passkey instead. That’s why browsers increasingly support
setting mediation: "conditional"
on create
calls. This requests that a passkey be automatically created,
i.e. without the user having to confirm or present any biometrics.
Before attempting this, check whether the browser supports conditional create calls:
if (window.PublicKeyCredential &&
window.PublicKeyCredential.getClientCapabilities) {
window.PublicKeyCredential.getClientCapabilities().then(
=> {
(capabilities) if (capabilities.conditionalCreate ?? false) {
attemptConditionalCreate();
;
}}) }
Since conditional creation doesn’t involve any confirmation UI, the
user presence and user verification bits in the resulting authenticator
data will both be false. Thus, the request should set
authenticatorSelection.userVerification
to
discouraged
.
A site can technically attempt to conditionally create a passkey at any time. However, passkey providers will impose requirements before doing so. They will likely require that they already have an account with the same username to confirm that the user is happy storing that credential. They may also require that the password has been recently filled to try and confirm that any saved password is valid. Because of these requirements, the correct time to attempt a conditional create is immediately after a user has signed in using a password.
If the conditional creation attempt is rejected, the promise will
fail immediately with a NotAllowedError
.
(Conditional creation may also be available in platform APIs. See chapter 11.)
WebAuthn works without fuss in iframes that are same-origin with the
main frame. But when people ask about iframes, they usually mean
cross-origin iframes. WebAuthn
get
requests do work in cross-origin iframes, but the
parent frame has to grant permission for the iframe to make that call.
To do so, use the permissions policy framework:
<iframe src="..." allow="publickey-credentials-get">
Chromium-based browsers also allow create
calls within
cross-origin iframes. Again, the parent frame has to grant permission,
this time with the publickey-credentials-create
permission.
At the time of writing, Safari does not allow this, however.
Ideally, WebAuthn requests would be created in your backend and sent to the frontend to be performed by the browser. But because WebAuthn requests and responses contain ArrayBuffers, which can’t be expressed in JSON, this is not as easy as it should be.
Thus dedicated JSON conversion functions were added to WebAuthn. These functions are aware of the WebAuthn structures and, wherever an ArrayBuffer is needed, accept a base64url-encoded string instead.
Here’s an example WebAuthn registration converted to JSON by taking all ArrayBuffers and base64url encoding them:
= `
createJSON {
"challenge": "cmFuZG9tIGNoYWxsZW5nZQ",
"rp": { "id": "example.com", "name": "example.com" },
"user": {
"id": "dXNlcmlk",
"name": "name",
"displayName": "displayName"
},
"pubKeyCredParams": [ {"type": "public-key", "alg": -7} ],
"authenticatorSelection": {
"requireResidentKey": true,
"authenticatorAttachment": "platform"
}
}`;
A browser with support for these JSON functions will accept the following to trigger a creation request:
navigator.credentials.create({
publicKey: window.PublicKeyCredential.parseCreationOptionsFromJSON(
JSON.parse(createJSON)),
.then(console.log, console.log); })
The static parseCreationOptionsFromJSON
method on
window.PublicKeyCredential
takes a parsed JSON object and
converts it to a WebAuthn creation request. Similarly,
PublicKeyCredential.parseRequestOptionsFromJSON
also exists
for assertion requests.
Once a promise has resolved with a PublicKeyCredential
object, there’s also a function to convert it to JSON for sending back
to the server:
JSON.stringify(pubKeyCred.toJSON());
That results in JSON string where, again, all ArrayBuffers have been encoded as base64url strings.
To test whether a browser supports these functions, do:
if (window.PublicKeyCredential.parseCreationOptionsFromJSON) {
// JSON functions supported.
}
These JSON formats are also used by the Android platform APIs (see here) and so it’s possible for a backend to generate WebAuthn JSON objects that will transparently work for both web and Android clients.
WebAuthn involves interacting with security keys or local biometric sensors. These are all things that make automated testing very challenging. However, browsers can implement a virtual authenticator—a fake WebAuthn authenticator that skips showing any UI, but which creates credentials and generates signatures like any other.
Chromium-based browsers can add a virtual authenticator to a tab for when different types of security keys aren’t available for manual testing, or for when a platform authenticator isn’t available (although that’s rare these days). To do so, find “WebAuthn” under “More tools” in the developer tools, check the option to enable the environment, select the configuration of the virtual authenticator, and click “Add”. WebAuthn requests will now use the virtual authenticator and the state of the authenticator appears in the developer tools pane.
It’s also possible to do completely automated tests with a virtual authenticator by configuring it using WebDriver. See the WebDriver section of the WebAuthn specification for more details.
Recall from chapter 4 that WebAuthn accepts an
extensions
parameter when creating or asserting a
credential. These parameters are dictionaries mapping extension names to
extension-specific inputs. When an operation completes successfully, the
resulting PublicKeyCredential
has a
getClientExtensionResults
method that returns a dictionary
mapping those same names to the extension’s outputs.
This extension mechanism allows a number of non-core features to be cleanly added to WebAuthn and several of them are covered in this chapter, along with descriptions of how they are implemented for security keys.
To save reading this whole chapter just to discover whether any of these extensions are useful to you, here’s a quick summary of what each does:
Extension | Use |
---|---|
credProps | Learning whether a newly created credential is discoverable or not |
PRF | Getting secret keys for encrypting data |
credProtect | Setting a minimum security bar for credentials kept on security keys |
credBlob | Storing 32 bytes of data with a credential |
minPinLength | Checking that company PIN-length requirements are being enforced |
largeBlob | Storing certificates for offline operation of a security key |
appId / appIdExclude | Backwards compatibility with credentials created via the U2F web API |
Extensions are processed by the platform or browser and some of them,
like credProps
don’t involve any explicit work by the
authenticator or security key. For others, the extension is forwarded
entirely to the authenticator for processing. There is also a third
class of extensions where the authenticator is involved in the
processing of the extension, but the platform also has to do work.
In all cases, the results are returned via
getClientExtensionResults
. Even for extensions that are
entirely processed by the authenticator, the platform will generally
take the authenticator’s output and rewrite it into a JavaScript object
to include there.
If you are using attestation (see chapter 7) then you need to
consider that the platform’s output is not signed by the attestation
private key. So for each extension used, you need to decide whether your
implementation will extract the extension output from the returned
authenticator data instead—which is signed. This is not always possible,
for example with the prf
and largeBlob
extensions, but sometimes it’s nonsensical not to, for example with the
minPinLength
extension.
Note: Attestation is not commonly used and most implementations do not need to worry about this.
Recall from chapter 4 that the
authenticatorSelection
field of the creation parameters
lets you specify residentKey
as preferred
.
(And recall that a “resident key” is a historical name for a
discoverable credential.)
If you configure that, credProps
is the way that you
learn whether a discoverable credential was actually created or not. To
request this information, set this extension in the creation
options:
.extensions = {credProps: {}}; options
Then there are three possible outcomes in a successful response:
These possibilities can be extracted from the resulting
PublicKeyCredential
object like this:
const extensionOutputs = credential.getClientExtensionResults();
const hasCredProps = 'credProps' in extensionOutputs;
if (!hasCredProps) {
// Platform doesn't support the extension.
return "probablynot";
}const propertyReported = 'rk' in extensionOutputs.credProps;
if (!propertyReported) {
// The platform doesn't know whether the authenticator used creates
// discoverable credentials or not.
return "probablynot";
}return extensionOutputs.credProps.rk ? "yes" : "no";
A pseudo-random function (PRF) is a cryptographic abstraction that approximates a random oracle. A random oracle is a function that takes an arbitrary byte-string input and produces a fixed-sized output. It works like this:
Conceptually, the oracle contains a table mapping inputs to outputs that starts off empty. Each time the oracle is evaluated, it looks in its table to see whether the input has been seen before. If so, it returns the corresponding output from its table. Otherwise, it generates an output uniformly at random, records it in its table, and returns it.
As an example, we start off with an oracle with an empty table, and we evaluate it on the input “apples”. The table is empty, so “apples” isn’t in it, and so the oracle generates a random output, records it, and returns it.
Next, we evaluate it on the input “bananas” and the same thing happens. (Although the output, being randomly generated, will be different with very high probability.)
Finally, we evaluate it on “apples” again. This time “apples” is in its table, and it so returns the same output as it did the first time we evaluated it.
If you built such a function, that would be an ideal random oracle. But storing the table is problematic so, instead, PRFs approximate a random oracle by using functions like HMAC-SHA256. They are computationally indistinguishable from a random oracle but, by using a hash function, do not require storing all the input and output pairs.
The result of all this theory is that attaching a PRF to a credential allows you to derive an unlimited number of secret keys from it. These secret keys can be used for whatever you want, but encrypting data is the most common use.
The prf
extension to WebAuthn lets you attach a PRF to
each credential. The PRF is credential-specific and can only be
evaluated when the credential is created or asserted. During each
operation the PRF can be evaluated on up to two inputs in order to
support key rotation.
To use the prf
extension with a credential, it should be
requested at creation time. (It is possible for authenticators to
support PRF evaluation at assertion time even when it wasn’t configured
at creation time, but this is authenticator-specific behavior and can’t
be depended upon.)
To request PRF support during credential creation, just create an empty extension:
.extensions = {prf: {}}; options
In the resulting PublicKeyCredential
, see whether PRF
was supported:
const extensionOutputs = credential.getClientExtensionResults();
const hasPrf = 'prf' in extensionOutputs &&
.prf.enabled === true; extensionOutputs
Unlike capabilities such as user verification, there is no way to express that the PRF extension is required, so the possibility that an authenticator doesn’t support it must always be handled.
It’s also possible to opportunistically evaluate the newly-created PRF during the creation operation. (Because security keys can’t support this, this is not guaranteed to succeed.)
.extensions = {prf: {
optionseval: {
first: new Uint8Array([1,2,3,4]),
second: new Uint8Array([5,6,7,8]),
,
}; }}
The PRF can be evaluated at two inputs, as shown above, but
second
is optional if you don’t need that ability.
The outputs will always be 32 bytes long and are in the
results
member of the extension outputs if supported:
const extensionOutputs = credential.getClientExtensionResults();
const hasPrf = 'prf' in extensionOutputs &&
.prf.enabled === true;
extensionOutputsconst hasOutputs = hasPrf && 'results' in extensionOutputs.prf;
if (hasOutputs) {
const output1 = extensionOutputs.prf.results.first;
const output2 = extensionOutputs.prf.results.second;
}
The same code works at assertion time except that there will always
be PRF results if hasPrf
was true at creation time.
But at assertion time, there might be multiple credential IDs listed
in allowedCredentials
and the PRF of each may need to be
evaluated at different inputs depending on which credential was used. If
that’s the case, evalByCredential
can be set instead of
eval
.
.extensions = {prf: {
optionsevalByCredential: {
"Y3JlZGVudGlhbElEMQ": {
first: new Uint8Array([1,2,3,4]),
second: new Uint8Array([5,6,7,8]),
,
}"b3RoZXJJRA": {
first: new Uint8Array([4,3,2,1]),
second: new Uint8Array([8,7,6,5]),
,
}; }}
The keys in evalByCredential
are base64url-encoded
credential IDs. It can only be used if allowCredentials
is
non-empty and every credential ID listed in
evalByCredential
must be present in
allowCredentials
. If both evalByCredential
and
eval
are both set then the former will be used for any
credential ID listed in it, with eval
used as the fallback
for any other credentials.
As a general rule, one shouldn’t use the same secret key for multiple purposes. So, if you’re using a PRF output to encrypt data with AES-GCM, and then you later switch to a different algorithm, you should use a different key obtained from a different PRF input.
However, that doesn’t exclude the possibility that a service uses the
same PRF input globally for all users. For example, you could always use
new TextEncoder().encode("user data encryption key").buffer
as the sole PRF input.
A worry with that design would be that, if an attacker were ever to be able to request an assertion with your RP ID, then they could get that secret key for a specific credential. Thus a step up from that design is to have random, per-user PRF inputs that an attacker would struggle to obtain: even if the attacker were able to request an assertion somehow, they would not know what input to use to get the secret key.
A further step up is to be continually rotating the secret key, which is why it’s possible to evaluate the PRF at two different inputs each time.
In this design, in addition to a per-user PRF input, each account also has a second, random PRF input that is “pending”. The PRF is then evaluated for both inputs and the data can be decrypted with the secret key from the first evaluation and then re-encrypted with the secret key from the second. Then the second PRF input becomes the primary one and the server generates a new “pending” input for the next rotation.
In this case, even if the server’s database of PRF inputs leaked, it would rapidly and automatically become out of date.
The decision of which style of PRF evaluation makes sense for a given service has to be made in light of the security needs of that service. Each additional step adds a meaningful amount of complexity and so there’s no uniformly applicable advice about which is appropriate.
PRF inputs from WebAuthn, and WebAuthn-like APIs, are prefixed with
the string "WebAuthn PRF"
, followed by a zero byte, and
hashed with SHA-256 before being used. This means that there are two
layers of “access” to the PRF. An API that allows PRF inputs to be
specified without hashing has more authority than WebAuthn, because it
can evaluate the PRF at all the inputs that WebAuthn can, plus it can
evaluate the PRF at inputs that WebAuthn cannot express.
PRF inputs sent over CTAP2 and hybrid (the protocol used between
computers and phones) are already hashed. So applications that have
direct CTAP2 access can choose to use PRF inputs that are inexpressible
in WebAuthn. But this also causes problems when CTAP2 requests are sent
over hybrid and need to be translated back into WebAuthn-like requests,
because the hashing of PRF inputs cannot be undone. Thus credential
providers on Android might see a prfAlreadyHashed
extension. See here.
The prf
extension is implemented for security keys by
using a different extension: hmac-secret
; security keys
don’t implement prf
directly.
When creating a credential, the hmac-secret
extension
only takes a boolean parameter to specify whether HMAC support is
requested or not. Because of this it’s not possible to evaluate any PRFs
at creation time when using a security key, and that’s why creation time
evaluation is optional in WebAuthn.
At assertion time, in order to protect data when it is on the USB bus, or transmitted to an NFC security key, the PRF inputs and results are encrypted using the same Elliptic Curve Diffie–Hellman derived key as described in the section on user verification. (See here.)
Those hmac-secret
inputs simply specify the input(s) for
evaluation, i.e. there is no equivalent of WebAuthn’s
evalByCredential
field. So to implement
evalByCredential
with security keys, platforms have to
probe the possible credentials from allowCredentials
to
find a match, and only then select the PRF inputs for the final
authenticatorGetAssertion
command.
The hmac-secret
extension also defines that two PRFs are
created per credential. Which PRF is used is determined automatically
based on whether user verification was done for a request. These dual
PRFs are not exposed through WebAuthn, however. Rather, WebAuthn defines
that when two PRFs are present in an authenticator, the user
verification PRF must always be used. Thus, if a WebAuthn request
includes PRF evaluation, user verification will be done with security
keys if they support it, irrespective of the
userVerification
parameter in the WebAuthn request
itself.
Because the prf
extension is implemented this way on
security keys, you’re actually able to see hmac-secret
appearing in the authenticator data when getting an assertion. We can
take a look at the authenticator data from a security key operation
after requesting a PRF evaluation:
# Hash of RP ID
26bd7278be463761f1faa1b10ab4c4f82670269c410c726a1fd6e05855e19b46
# Flags: ED + UV + UP
85
# Signature counter
00000023
# Because the ED flag is set, the remainder is extension data.
# This is CBOR-encoded.
a16b686d61632d7365637265745840a2
0f1e5cd69d341c5e98fe1f2e90834a10
d1df55f835f45e69f2e53650bc3c579a
08d3919308582656a4658c876d1521f6
d703a63a55db81ad9c64b94808a454
The CBOR-encoded extension data decodes as
{"hmac-secret": h'A20F1E5CD69D341C5E9...'}
. The payload is
64 bytes long because two PRF inputs were sent, and two 32-byte outputs
thus require 64 bytes. The result is encrypted with an ephemeral key
that isn’t accessible outside of the platform/browser, so neither
Javascript nor the server can learn anything about the PRF results from
this. It’s mentioned here to connect different concepts rather than for
any practical use.
Security keys can get lost. If someone picks up your lost security key in the car park, how much can they learn about you?
They have physical access to the security key, so they can send
authenticatorGetAssertion
commands to it and query for
discoverable credentials associated with any RP ID. Should security keys
better protect this information?
Perhaps, but maybe you also want to use a security key like an access card, where you tap it on an NFC reader and a door unlocks for you. That inherently requires that the security key disclose the existence of credentials to any NFC reader that gets close to it.
The credProtect
extension exists to let the privacy
level of a credential be specified at creation time. It only applies to
security keys, since phones and laptops have screen locks to protect the
information on them already.
Note: Security keys may disclose the existence of a discoverable credential, its credential ID, and its user ID without user verification being performed. But they never disclose the user name or user display name without it. That’s why WebAuthn says that the user ID should not contain identifiable information.
There are three levels of credential protection:
If you want to use a credential like an access card then it must use “userVerificationOptional”. Otherwise, you might want a more restrictive value. Security keys can enforce a higher level of privacy for all credentials, but most don’t.
Tucking this away into an extension is a little obscure and so some platforms will set a higher default. Chromium (and thus Chrome and Edge) will default to at least “userVerificationOptionalWithCredentialIDList” whenever a WebAuthn creation request requires or prefers a discoverable credential.
Additionally, if a discoverable credential is required, and
userVerification
is set to preferred
(which is
the default in WebAuthn), then Chromium will set the credProtect level
to “userVerificationRequired”. This protects users when a site accepts
both user-verified and non-verified assertions. That’s a reasonable
thing to do in the context of a platform authenticator where access is
protected by a screen lock, even if user verification isn’t done for a
specific assertion. But with a security key, a discoverable credential
with optional user verification may mean that finding a lost security
key grants immediate access to the owner’s accounts.
We can observe this happening because the security key will echo the
credProtect
setting in the authenticator data. When
inspecting the authenticator data from a creation request made in Edge
with requireResidentKey
set, we see the ED (Extension Data)
flag set. After skipping over the attested credential data, the
extensions decode as the following CBOR map:
{"credProtect": 3}
. So Edge has set this extension for us,
based on the parameters. The value “3” corresponds to
“userVerificationRequired” as, over CTAP2, the three protection levels
are simply numbered 1, 2, and 3.
A WebAuthn request can overwrite these defaults if it wishes:
.extensions = {
optionscredentialProtectionPolicy: "userVerificationOptional",
enforceCredentialProtectionPolicy: false,
}
The enforceCredentialProtectionPolicy
can be set to
true
to require that a credential only be created if the
specified protection policy can be implemented. So it excludes security
keys that don’t implement the credProtect
extension. (For
this purpose, platform authenticators are always considered to be
sufficiently privacy-preserving.)
After setting the extensions shown above, and doing a creation in
Edge, the resulting extensions in the authenticator data decode as
{"credProtect": 1}
. So indeed, the explicit extension
overrode the defaults.
The credBlob
extension allows at least 32 bytes of
arbitrary data to be stored with a credential. The data is set once,
when the credential is created, and can be read when it’s asserted. At
the time of writing no platform authenticators implement this, only some
security keys.
To attempt to set a blob, just set the extension to an
ArrayBuffer
containing the blob’s data:
.extensions = {
optionscredBlob: new TextEncoder().encode("credBlob contents").buffer,
}
Note: If the contents of the blob are sensitive, combine this
extension with credProtect
to ensure that user verification
is required for the credential to be asserted.
Just because this extension was set doesn’t mean that the security key supports storing a blob, so the extension results have to be checked after a successful creation to see whether the blob was stored:
const extensionOutputs = credential.getClientExtensionResults();
const credBlobStored =
'credBlob' in extensionOutputs &&
.credBlob === true; extensionOutputs
Later, when asserting the credential, the credBlob data can be requested:
.extensions = { getCredBlob: true }; options
And the contents, if any, will be found in the extension outputs:
const extensionOutputs = credential.getClientExtensionResults();
let credBlob = 'getCredBlob' in extensionOutputs ?
new TextDecoder().decode(extensionOutputs.getCredBlob)) :
(undefined;
(TextEncoder
and TextDecoder
are used here
only for illustration. The contents of a credBlob can be arbitrary
binary data.)
However, this extension is not suitable for storing secret keys
because the contents end up in the authenticator data, which has to be
sent to the server for the assertion signature to be validated. If we
look at the authenticator data that resulted from the assertion request
above, the extensions within decode as
{"credBlob": h'63726564426C6F6220636F6E74656E7473'}
, and
that hex string is the UTF-8 encoding of “credBlob contents”. Not very
secret! To store secret keys, use the PRF extension (see
here).
This extension requests that a security key report its configured minimum PIN length. In an enterprise environment, this can be used to enforce that a minimum length policy is in effect for all created credentials.
The minimum PIN length can be configured on some security keys using
the authenticatorConfig
CTAP2 command (see
here). The minimum can be raised but
cannot be reduced without resetting the security key, which erases all
credentials. So the minimum length reported during credential creation
will be in effect for the lifetime of that credential.
To request that the minimum PIN length be reported during a
credential creation, just set the minPinLength
extension:
.extensions = { minPinLength: true }; options
The minimum will only be reported if the RP ID has been previously
configured via the authenticatorConfig
command. The result
can be found in the extensions block of the authenticator data under the
key minPinLength
. The result must be taken from the
authenticator data because this extension typically has to be combined
with attestation (see chapter 7) to meet compliance requirements, and
only the authenticator data is signed by the attestation private
key.
Sometimes the entity that creates a credential is not the entity that verifies assertions from it, and communication between those two entities may be difficult or impossible. The classic case for this are air-gapped systems, where authority to access them may be issued centrally, but has to be checked by a system that cannot communicate with that central authority.
Typically the central authority will sign a public key with a certificate. The certificate specifies what the holder of the corresponding private key is permitted to access. Such a private key can be stored in a security key, but where are we going to put the certificate? That’s what the largeBlob extension is for.
Keep in mind that while this extension does have the word “large” in the name, security keys are embedded devices and the term is relative. The amount of storage available on a security key that supports this extension is only guaranteed to be 1024 bytes.
A largeBlob can only be read and written during assertions, meaning that it cannot be written during creation. But you don’t get the public key to put in the certificate until the creation is complete, so that shouldn’t be a constraint.
You won’t see largeBlob
appearing in the authenticator
data extensions. Given the size, the implementation in security keys is
a bit more complex than that.
Security keys that support largeBlob expose a single storage extent to the platform. That storage can only be read or written completely, albeit in a streaming fashion to satisfy message buffer limits. The largeBlob values for every credential on the security key have to fit within that storage and the platforms perform a read/update/write pattern in order to update any part of it.
The storage extent is formatted as a CBOR array by the platforms.
Each element of the array is a CBOR map that contains a compressed and
encrypted largeBlob value. The encryption key is returned by the
authenticator when a credential is asserted, thus, while platforms can
read the security key’s storage at will, they cannot learn the contents
of a largeBlob without its corresponding encryption key, which requires
them to have successfully asserted the credential. If
credProtect
is set on the credential (see
here) then the platform can be required to
know the credential ID, or to complete user verification, before that
occurs.
The largeBlob storage does not include any credential IDs because doing so would render credProtect’s “userVerificationOptionalWithCredentialIDList” policy moot. Instead platforms trial-decrypt each element of the CBOR array in order to find the correct largeBlob entry for a credential once they have the key.
Platforms can, however, always see the number of largeBlobs stored by a security key, and their compressed and uncompressed sizes.
Before WebAuthn existed there was a U2F web API. Firefox supported it and Chrome shipped a hidden, internal extension by which it could be polyfilled. It has long been deprecated but had meaningful usage while it existed. It had an equivalent of relying party IDs called AppIDs, but AppIDs were origins, rather than domain names.
When WebAuthn was introduced, sites that had used this U2F API faced
a problem. All the credentials that they had registered were associated
with AppIDs, like https://example.com
. But WebAuthn used RP
IDs, like example.com
. Those two strings are different and
so security keys would consider them distinct, forever making all
AppID-based credentials inaccessible to WebAuthn. That would block a
transition to WebAuthn.
The appId
extension exists so that sites can express
that they may also have credentials registered to an AppID. The value of
the extension is the AppID that they used, and platforms have to
validate that they can request that AppID, similar to how they have to
validate RP IDs.
When doing an assertion that contains an appId
extension, platforms will check for any credentials that match an ID
from the allowCredentials
list, and then check all those
IDs again using the given AppID.
This gets complex when doing user verification with a PUAT (see
chapter 6) because using a PUAT will bind it to the RP ID or AppID
requested. Naively, platforms would have to request two PUATs, and get
the user to complete user verification twice. However, any credential
created with an AppID must have been created with the old U2F web API.
That API didn’t support user verification and didn’t support
credProtect
, therefore any credential with an AppID can be
probed without user verification. So platforms actually probe for each
credential ID with the AppID first, and then probe for each ID again
with an RP ID and with the PUAT.
The appIdExclude
extension has the same shape but
applies at creation time. It specifies that one or more of the
credential IDs in excludeCredentials
may have been created
with an AppID rather than an RP ID. The processing complexity is
similar.
At this time, these extensions are only of historical interest because they are only useful if a site needs compatibility with credentials created with the old U2F web API. The set of sites that ever used that API is small, and no new site will ever have had an AppID. But these extensions are mentioned here because they appear in modern API references, and so they should be explained.
Security keys are great, but realistically regular people are not going to carry one around with them. But people do carry around their phones. Wouldn’t it be great if a phone could work like a security key?
That’s what the hybrid transport is for. It allows phones or tablets to communicate with laptops and desktops in a way that shows that they are physically proximal. The evidence of proximity is the key factor here: there are plenty of “sign in by scanning this QR code” schemes in use today that simply work across the internet, and they’re phishable. An attacker can request a sign-in QR code from the legitimate site, display it on their own site and, if they can persuade a user to scan it with their phone, that user will be authorizing the attacker’s computer.
In contrast, a security key requires that a computer be physically connected to its USB connector, or within NFC range, thus no remote attack is possible. (Unless the computer itself is compromised.)
Sadly, NFC is rare on laptops, but Bluetooth support is common. Could we have the phone and laptop communicate via Bluetooth to ensure proximity? Unfortunately this didn’t turn out to be practical. Measurements of this scheme in the real world showed an unacceptably high rate of Bluetooth communication failures. But broadcasting a single Bluetooth message from the phone to establish proximity, and running the rest of the communication over the internet, did work acceptably well. This scheme was called “hybrid” (Bluetooth + internet).
Note: During development, this transport was called caBLE, for Cloud Assisted Bluetooth Low Energy. That was a cute name, but calling a wireless protocol “caBLE” caused some confusion and thus it was changed. But you might still see the old name hanging around as caBLE reached version 2.1 before being renamed.
The device making a WebAuthn request, usually a laptop, will display a QR code if it wants to use the hybrid transport. That QR code contains a CBOR map that contains the following values:
If the user scans the QR code and triggers the operation on the phone, then the phone connects to its tunnel service. This is some service on the internet that is willing to relay messages between laptops and phones as part of this protocol. It’s run by whomever is implementing the phone side of the protocol, so iPhones use a tunnel service operated by Apple and Android phones (at least if they’re using Google’s Play Services) use one operated by Google. The phone asks the tunnel service to wait for a connection from the laptop which will be identified by a long, random ID.
Then the phone starts broadcasting a Bluetooth Low Energy (BLE) advert to tell the laptop that it’s ready. Adverts are small messages used by BLE devices to advertise their supported services. Importantly, they can be sent and received without needing to do a Bluetooth pairing between the devices. The BLE advert sent by the phone advertises a service number assigned to the FIDO Alliance (0xfff9) and the advert can include a small (20-byte), service-specific payload.
Twenty bytes is not a lot! For this protocol it’s split into 16 bytes of encrypted message and four bytes of authentication tag. Both are keyed based on the secret key that was in the QR code and thus is shared by the two devices. The authentication tag means that two different hybrid transactions happening within BLE range of each other are very unlikely to interfere because the tag will only be valid for one of them.
As soon as the laptop displays a QR code, it starts listening for a matching BLE advert. When it receives one with a correct authentication tag, it decrypts it. The resulting 16 bytes specify the domain of the phone’s tunnel server and the laptop connects to it. By knowing the contents of the QR code and the BLE advert, the laptop can calculate the random ID that the phone told the tunnel server to expect and now the tunnel server can relay messages between the two devices.
Note: A domain name often won’t fit in a 16-byte message and so the tunnel server’s domain isn’t included directly. Rather a 16-bit field is used. The first 256 values specify pre-defined tunnel server domains (of which only two values have been defined so far). The remaining values are hashed to generate a random-looking domain name. So, to set up your own tunnel service, either get its name into the FIDO Alliance specification as a pre-defined name, or else register one of the random-looking domains generated by the hash function.
We don’t want the tunnel server to be able to see the contents of any of the messages, thus the two devices run a cryptographic handshake. During this protocol the laptop proves to the phone that it holds the private key corresponding to the public key in the QR code, and that it received the BLE advert and thus is in Bluetooth range. Also, the phone proves to the laptop that it knows the secret key from the QR code.
The two devices can now exchange encrypted messages. The phone is
convinced that the device on the other end of the connection received a
BLE message that it broadcast, and thus is physically close. CTAP2 is
used between the two devices and, in order to save a round-trip, the
phone preemptively sends the result of the
authenticatorGetInfo
command (see
here) so that the laptop can immediately send an
authenticatorMakeCredential
or
authenticatorGetAssertion
command. The messages are padded
up to the next multiple of 32 bytes to reduce the amount that a tunnel
server could learn from seeing the lengths of the messages
exchanged.
This protocol is a practical way for a phone to act as an authenticator, but it’s worth knowing about the limits of its threat model.
The protocol ensures that the device connected to the phone was able to learn the contents of a BLE advert that the phone broadcast. So one option for an attacker is to have a BLE receiver with internet access physically close to the victim, for example hidden in a busy coffee shop. The attacker can then email a QR code to the victim and, if the victim scans it, the BLE receiver can play the part of the laptop, proxy a challenge from an important site, and send a signature request to the phone. Hopefully the victim notices the UI on their phone and declines to authorize the operation, but some small fraction of people may fall for it.
Simpler still, an attacker could hide a BLE receiver somewhere near a poster with a QR code claiming “Free WiFi!” Again, the BLE receiver would need internet access, but could proxy a challenge and hope that the user ignores all the messages and signs in.
In both cases, from the phone’s perspective the transaction is indistinguishable from a legitimate sign-in request. If these attacks start happening in the real world, there are a couple of defenses. Firstly, the messaging on the phone can be sharpened to make people less likely to misunderstand what they’re authorizing. Secondly, the QR code already includes a timestamp and phones could start enforcing that it’s current. This will have false negatives because clocks are not always accurate, but it would force attackers to produce a fresh QR code—a “Free WiFi” poster would no longer suffice.
While the hybrid transport isn’t perfect, these attacks are far more difficult and far less scalable than phishing attacks.
There is another part of this protocol that allows scanning the QR code to be skipped if the phone and laptop have interacted previously.
During a QR-initiated connection, the phone can optionally send information to the laptop that will allow the laptop to contact the phone again in the future. This includes a public key for the phone and an identifier that the laptop can send to the phone’s tunnel server in order to request a connection to that phone.
Note: Since the phone picks which tunnel server will be used, we can assume that the tunnel server knows some private way to contact the phone. So these connections are always triggered by the laptop connecting to the phone’s tunnel server. They are not triggered by the laptop broadcasting any kind of Bluetooth message.
Later, when the laptop wants to contact the phone again, it connects to the tunnel server and sends the phone’s identifier. Using that, the tunnel server figures out which phone to contact and establishes a connection with it, forwarding a small message from the laptop as it does so. But, even though there is a pre-established relationship between the phone and laptop, we always want to establish proximity for security reasons. So the phone starts broadcasting a BLE advert.
Both parties to the transaction already have a connection to the tunnel server, so the BLE advert doesn’t need to include the tunnel server’s domain again, it just needs to include random data so that it’s unpredictable. Once the laptop receives the BLE advert, it starts the cryptographic handshake over the already established tunnel and proves receipt to the phone. Now the devices once again have an encrypted tunnel established between them and no QR scanning was necessary.
While CTAP2 is used over the encrypted tunnel, it is a slight variant of the protocol. As discussed in chapter 6, a single WebAuthn request may generate many CTAP2 commands: in order to respect the memory limits of security keys, lists of credential IDs might need to be batched, and extensions such as largeBlob are implemented using a sequence of commands to manage the storage on the security key.
But smartphones do not have the tight resource limits of an embedded device. They also don’t want to disclose any information without the user authorizing it. So, when working over a hybrid connection, the phone really wants to receive a single CTAP2 command that contains everything.
A phone also doesn’t want to produce a PUAT to represent user verification (see here) because it wants to display the request to the user before asking the user for biometrics. At best, the phone would return a dummy PUAT without actually collecting user verification from the user and then do the actual user verification after processing the main command. But all that would consume round trips which, given that a hybrid connection runs over the internet, could cause a significant delay.
Thus the flavor of CTAP2 used over hybrid does not do any PUAT exchange, nor does it do any batching of credential IDs. Also, there is a special flavor of the largeBlob extension used over hybrid which looks just like the WebAuthn extension (but expressed in CBOR), and which avoids the many round-trips used to implement largeBlob on a security key. There is also a special version of the PRF extension, again looking like the WebAuthn version, where the evaluation points for all possible credentials are sent.
The PRF extension has another problem when going over hybrid. If you recall, the evaluation points are hashed before being sent to security keys, so the evaluation points sent over hybrid are already hashed. But when the request is received at the phone, the platform APIs typically expect unhashed inputs and would normally hash them again, which would result in the wrong value. But it is not possible for the phone to unhash the inputs as hash functions are, by their very nature, irreversible.
So, on Android, a synthetic extension called
prfAlreadyHashed
is synthesized for requests received over
hybrid, which has the same shape as the regular PRF extension, but where
the evaluation points are already hashed.
These tweaks to CTAP2 suggest that it was probably not the correct protocol to use over hybrid. Instead, JSON-encoded WebAuthn requests and responses should have been used. A future revision of the protocol may thus change this.
Much of this book focuses on WebAuthn as implemented in browsers—its original context. But Android, iOS, macOS, and Windows all implement WebAuthn-inspired APIs that produce compatible signed messages. These are the platform APIs.
In order to use the WebAuthn-like API on Apple
platforms, start by importing AuthenticationServices
.
(This book assumes that you’re using Swift. If you’re using Objective C
then everything is the same, modulo syntax, and you’re probably very
used to translating from Swift at this point.)
Apple’s API is divided between one set of classes for handling
security keys and another for handling credentials on the local device
or over the hybrid transport. They have overlapping, but distinct, sets
of parameters. If you want to handle both in a given request (which will
often be the case) then you can pass one request of each type to the
ultimate ASAuthorizationController
.
To create credentials on the local device (or on another phone via scanning a QR code) then start with:
let provider = ASAuthorizationPlatformPublicKeyCredentialProvider(
: "example.com")
relyingPartyIdentifierlet request = provider.createCredentialRegistrationRequest(
: Data([0]), // fine unless attestation is used
challenge: "user.name",
name: Data("user.id".utf8))
userID
let controller = ASAuthorizationController.init(
: [request])
authorizationRequests.delegate = self
controller.presentationContextProvider = self
controller.performRequests() controller
(Make sure that you’ve read chapter 5, have configured the associated
domains for your RP ID, and have set the webcredentials
entitlement for your project.)
The names in the API mirror the WebAuthn parameters and so should be immediately clear. If not, see chapter 4.
Note: There’s no displayName
parameter. Apple platforms
only support display names for security key requests and never show them
in their UI.
In iOS 18, the request
object has a
requestStyle
property that can be set to
.conditional
to request that the creation happen silently.
This will only succeed soon after the user has filled a password from a
password manager into your app, but it lets you easily upgrade users
from passwords to passkeys. (See
here.)
There are also largeBlob
and prf
properties
to enable support for those extensions. See chapter 9.
You might notice that there’s no excludeList
mentioned
so far. The Apple API doesn’t support setting an exclude list, except
for security key requests. Thus it’s not possible to avoid overwriting
existing credentials! Perhaps Apple will address this in a future
revision but currently you’ll need to think carefully about the
implications of this.
As a common pattern across all the calls documented here,
controller
is passed two delegates (which can be the same
object). The presentationContextProvider
answers a
presentationAnchor
message which provides the
UIWindow
for the app:
class ExamplePresentationDelegateClass: NSObject,
ASAuthorizationControllerPresentationContextProviding {
var anchor: ASPresentationAnchor?
...
func presentationAnchor(for controller: ASAuthorizationController)
ASPresentationAnchor {
-> return anchor!
}
}
The delegate
object handles success and failure
callbacks from the controller:
class ExampleDelegateClass: NSObject,
ASAuthorizationControllerDelegate {
...
func authorizationController(
controller: ASAuthorizationController,
didCompleteWithAuthorization authorization: ASAuthorization
) {
switch authorization.credential {
case let registration as
:
ASAuthorizationPublicKeyCredentialRegistrationlet credID = registration.credentialID
let clientDataJSON = registration.rawClientDataJSON
let attestationObject = registration.rawAttestationObject!
.logger.log(
Self"success: \(credID.base64EncodedString()) \
(clientDataJSON.base64EncodedString()) \
(attestationObject.base64EncodedString())")
default:
.logger.error("unknown ASAuthorization type received")
Self}
}
func authorizationController(
controller: ASAuthorizationController,
didCompleteWithError error: Error
) {
.logger.error("failed: \(error)")
Self}
}
Unfortunately, unlike with WebAuthn and the Android APIs, the authenticator data and public key aren’t exposed directly. Instead you must parse the CBOR in the attestation structure and extract the public key and authenticator data from the raw contents. See chapter 7 for details on how to do this, or seek a wrapper library that makes iOS more friendly in this respect.
Let’s have a look at the authenticator data returned after creating a credential in iCloud Keychain, the default passkey provider on Apple platforms. We’ll break down the hex-encoded data with comments:
# The hash of the RP ID
26bd7278be463761f1faa1b10ab4c4f82670269c410c726a1fd6e05855e19b46
# Flags: AT + BE + BS + UV + UP
5d
# Signature counter; always zero
00000000
# Attested credential data
# The AAGUID of iCloud Keychain
fbfc3007154e4ecc8c0b6e020557d7bd
# Credential ID length: 20 bytes
0014
# Credential ID
df46b51df21331fb23bbfa3e9622ae9fc92fc9ea
# Public key in COSE format.
a501020326200121582071f5ce7ba3e4
4960ddc7f7026e708fc98a835039aa58
97f0d3c80373f3759d542258209e5f26
ec5c054c5841ac4b331bd79b196f006b
7a75e8e5ad585947b4edd3a5ac
Since iCloud Keychain syncs credentials, the BE (Backup Eligible) and BS (Backup State) flags are both set. The signature counter is always zero because it would be implausible to synchronize a signature counter between devices. iCloud Keychain sets a distinct AAGUID so that it can be identified in account management UIs. (See here.)
To support creating a credential on a security key, add a security key credential provider too:
let provider =
(
ASAuthorizationSecurityKeyPublicKeyCredentialProvider: "example.com")
relyingPartyIdentifierlet skRequest = provider.createCredentialRegistrationRequest(
: Data([0]),
challenge: "user.displayName",
displayName: "user.name",
name: Data("user.id".utf8))
userID.credentialParameters = [
skRequest.init(
ASAuthorizationPublicKeyCredentialParameters: ASCOSEAlgorithmIdentifier.ES256)
algorithm]
let controller = ASAuthorizationController.init(
: [request, skRequest])
authorizationRequests.delegate = self
controller.presentationContextProvider = self
controller.performRequests() controller
This time displayName
and
credentialParameters
are required to be set. Other
properties that can be set are excludedCredentials
,
residentKeyPreference
, attestationPreference
,
and userVerificationPreference
. Although the names are
slightly different, the meaning of all of these is the same as in
WebAuthn.
Most of the time you’ll want to support creating a credential both
locally and on a security key and thus will create requests from both
types of provider and pass both requests when initializing the
ASAuthorizationController
.
When requesting a signature, the classes are again split between using security keys or using a local credential / showing a QR code.
func assertCredentialOnPlatform(anchor: ASPresentationAnchor) {
self.authenticationAnchor = anchor
let provider = ASAuthorizationPlatformPublicKeyCredentialProvider(
: "example.com")
relyingPartyIdentifierlet request = provider.createCredentialAssertionRequest(
: Data("SHOULDBERANDOMVALUEFROMSERVER!".utf8))
challenge
let controller = ASAuthorizationController.init(
: [request])
authorizationRequests.delegate = self
controller.presentationContextProvider = self
controller.performRequests()
controller}
The properties allowedCredentials
, prf
, and
largeBlob
are available on the request
object,
with the same meanings as in WebAuthn. Obviously the challenge value
should be random and come from the server—the fixed value used in this
code example here is only for illustration and a fixed value should
never be used in real code.
The same pair of delegates is used and results are returned via a
different subclass of ASAuthorization
:
func authorizationController(
controller: ASAuthorizationController,
didCompleteWithAuthorization authorization: ASAuthorization
) {
switch authorization.credential {
case let assertion as ASAuthorizationPublicKeyCredentialAssertion:
let credID = assertion.credentialID
let userID = assertion.userID;
let clientDataJSON = assertion.rawClientDataJSON
let signature = assertion.signature
let authenticatorData = assertion.rawAuthenticatorData
.logger.log(
Self"success: \(credID.base64EncodedString()) \
(clientDataJSON.base64EncodedString()) \
\(signature!.base64EncodedString()) \
\(authenticatorData!.base64EncodedString())"
\)
default:
.logger.error("unknown ASAuthorization type received")
Self}
}
Recall from chapter 5 that, for Apple platforms, the client data’s
origin
is always a web origin (taken from the RP ID) even
when called from an app. For example, here’s the client data JSON from
the sample request above:
{
"type": "webauthn.get",
"challenge": "Y2hhbGxlbmdl",
"origin": "https://example.com"
}
You would have expected the origin to be something like
ios:T7AYYU7S6A.com.YourApp
, but Apple platforms don’t let
you distinguish between apps and web origins!
The performRequests
call also has a useful option to
control whether UI is shown or not:
.performRequests(
controller: .preferImmediatelyAvailableCredentials) options
When preferImmediatelyAvailableCredentials
is given,
requests that don’t have any matching local credentials will fail
immediately and no UI will be shown. So this option lets you prompt for
a passkey only if one exists. When a request fails for this reason, the
error passed to
authorizationController(controller:didCompleteWithError:)
has code ASAuthorizationErrorCanceled
and domain
ASAuthorizationErrorDomain
. That is the same error that you
get if the user declines to use a passkey that does exist, although you
can tell the two apart based on how fast the error occurs. (And based on
the error message, although you can’t assume that will be stable.)
To accept a signature from a security key, as you generally should, a second request object is needed:
self.authenticationAnchor = anchor
let provider =
(
ASAuthorizationSecurityKeyPublicKeyCredentialProvider: "example.com")
relyingPartyIdentifierlet skRequest = provider.createCredentialAssertionRequest(
: Data("SHOULDBERANDOMVALUEFROMSERVER".utf8))
challenge// Omit setting this property to request a discoverable credential.
.allowedCredentials = [
skRequest(
ASAuthorizationSecurityKeyPublicKeyCredentialDescriptor: Data(base64Encoded: "AM==")!,
credentialID: ["usb"])
transports]
let controller = ASAuthorizationController.init(
: [request, skRequest])
authorizationRequests.delegate = self
controller.presentationContextProvider = self
controller.performRequests() controller
To use WebAuthn on Android, start by importing these libraries:
implementation("androidx.credentials:credentials:1.2.2")
implementation(
"androidx.credentials:credentials-play-services-auth:1.2.2")
(Although keep in mind that there’s likely a newer version of them that you should use by the time you’re reading this.)
Don’t forget to configure the assetlinks.json
for your
RP ID. See chapter 5.
Rather than having an API that mirrors WebAuthn, Android just uses
JSON-encoded WebAuthn structures with any ArrayBuffer
s
encoded using base64url. This is the same format that
window.PublicKeyCredential.parseCreationOptionsFromJSON
takes on the web, allowing a backend to generate this form of request
and have it be easily consumed by web and Android frontends. (See
here.)
val request = CreatePublicKeyCredentialRequest(
= """
requestJson {
"challenge": "cmFuZG9tIGNoYWxsZW5nZQ",
"rp": { "id": "example.com", "name": "example.com" },
"user": {
"id": "dXNlcmlk",
"name": "name",
"displayName": "displayName"
},
"pubKeyCredParams": [ {"type": "public-key", "alg": -7} ],
"authenticatorSelection": {
"requireResidentKey": true,
"authenticatorAttachment": "platform"
}
}""",
)
val credentialManager = CredentialManager.create(requireContext())
.launch {
coroutineScopetry {
val result = credentialManager.createCredential(
= requireActivity(),
context = request,
request )
when (result) {
is CreatePublicKeyCredentialResponse -> {
// This is a JSON-encoded response. See below for
// an example.
(result.registrationResponseJson)
println}
else -> {
// Unknown response type.
}
}
} catch (e: CreateCredentialException) {
when (e) {
is CreatePublicKeyCredentialDomException -> {
when (e.domError) {
is InvalidStateError -> {
// Credential already exists
}
// Other error
}
}
else -> {
// Other error
}
}
}
}
The only non-boilerplate above is the request JSON, and it should be familiar to you from chapter 4. The responses are also just JSON-encoded WebAuthn structures:
{
"rawId": "uGZDSrbiPsPDJ1gv1ebluA",
"id": "uGZDSrbiPsPDJ1gv1ebluA",
"authenticatorAttachment": "platform",
"type": "public-key",
"response": {
"clientDataJSON": "eyJ0eXBlIjoi...",
"attestationObject": "o2NmbXRkbm...",
"transports": ["internal", "hybrid"],
"authenticatorData": "Jr1yeL5GN2Hx-qGxCrTE-CZwJpxBDHJqH9bgWFXhm...",
"publicKeyAlgorithm": -7,
"publicKey": "MFkwEwYHKoZ..."
},
"clientExtensionResults": {
"credProps": {"rk": true}
}
}
The first two lines of the JSON might look odd but, if you recall
from the WebAuthn chapter, the id
field of WebAuthn’s
PublicKeyCredential
dictionary is the base64url-encoded
credential ID, while the rawId
field is the credential ID
as an ArrayBuffer
. Since, in the JSON form,
ArrayBuffer
s are base64url-encoded, the JSON does indeed
end up with two copies of the same value!
But otherwise there’s nothing new that you need to learn here due to
the adherence to regular WebAuthn. Also note that the
getPublicKey
, getPublicKeyAlgorithm
, and
getAuthenticatorData
helpers in WebAuthn have been turned
into fields in the above JSON. So the public key is directly available
in the more useful SPKI format.
Note: the JSON result is the same as
PublicKeyCredential.toJSON
will produce on the web.
Long fields have been elided in the example above, but here’s the
decoded clientDataJSON
:
{
"type": "webauthn.create",
"challenge": "cmFuZG9tIGNoYWxsZW5nZQ",
"origin": "android:apk-key-hash:wGsazqR2MsDW-DBK0TJQqBlYUK2MD59aPxzt5rl5Bsc",
"androidPackageName": "com.example.webauthn"
}
Note that the origin reflects that the caller was an app, not a website. The app is identified by the hash of the signing certificate but the package name is also available. Your server will need to be updated to recognize the app as legitimate.
(This value after apk-key-hash
is the same SHA-256
signing-certificate hash as you put in your assetlinks.json
file, but base64url-encoded, rather than hex-encoded.)
Getting an assertion looks very similar, just with some different classes. Again, the request and response are just JSON-encoded WebAuthn structures:
val request = GetPublicKeyCredentialOption(
= """
requestJson {
"challenge": "cmFuZG9tIGNoYWxsZW5nZQ",
"rpId": "example.com"
}
""")
val credentialManager = CredentialManager.create(requireContext())
.launch {
coroutineScopetry {
val result = credentialManager.getCredential(
= requireActivity(),
context = GetCredentialRequest(listOf(request)),
request )
when (val cred = result.credential) {
is PublicKeyCredential -> {
(cred.authenticationResponseJson)
println}
else -> {
// Unknown response type.
}
}
} catch (e: GetCredentialException) {
(e)
println}
}
Just for reference, here’s the resulting JSON. Again, none of the
fields should be surprising. (The clientDataJSON
has a
similar form to the one shown above.)
{
"rawId":"uGZDSrbiPsPDJ1gv1ebluA",
"id":"uGZDSrbiPsPDJ1gv1ebluA",
"authenticatorAttachment":"platform",
"type":"public-key",
"response":{
"clientDataJSON":"eyJ0eXBlIjoid2ViYXV...",
"authenticatorData":"Jr1yeL5GN2Hx-qGxCrTE-CZw...",
"signature":"MEUCIQDi0o8OOtUJQDKtFLBMU_Cnuycd...",
"userHandle":"dXNlcmlk"
},
"clientExtensionResults":{}
}
Like the Apple platform API,
GetPublicKeyCredentialOption
supports a
preferImmediatelyAvailableCredentials
argument, which will
cause the operation to return immediately if there are no local
credentials available.
Although non-browser applications use WebAuthn much less frequently on Windows than on mobile platforms, Windows does provide an API for it. We won’t dive into it in detail, but it is well explained in the header file provided by Microsoft.
Unlike the mobile platforms, there is no need to configure any files on your server to authorize the use of an RP ID, as the Windows API trusts applications to assert any RP ID.
Windows also enjoys the most complete security key support of any of the platform APIs, although the platform authenticator, Windows Hello, does not sync credentials at the time of writing.
This chapter will cover some of the details of a server-side implementation of WebAuthn. This helps you understand what’s going on, but doing everything yourself is not necessarily the best choice: when implementing the server side you may well be best served by using an existing library for WebAuthn support. However, I do claim that it’s perfectly practical to build support yourself if you wish.
This chapter will assume that you’re building a flow based on discoverable credentials because that’s the most common option. Some changes are needed if you want to build a purely 2nd-factor flow but, if you’ve read chapter 4, you should be well positioned to make those tweaks.
First, a brief checklist of basics to take care of:
example.com
.Presumably you already have a database table of users. Perhaps it contains salted and hashed passwords for authentication, and maybe phone numbers for SMS OTP. Passkeys aren’t just another column because, while an account can only have one password, an account can have multiple passkeys. Don’t make the mistake of limiting accounts to a single passkey!
So you need to have a separate table for passkeys with a foreign-key relation to the primary key used to identify accounts:
CREATE TABLE passkeys (
BLOB PRIMARY KEY,
cred_id NOT NULL,
username STRING BLOB,
public_key_spki BOOLEAN,
backed_up /* You may also want creation_time, last_used_time, and perhaps
aaguid columns. */
FOREIGN KEY(username) REFERENCES users(username));
Recall from here that the user.id
in a credential creation request should be an opaque identifier for an
account, and from here that security keys
don’t consider this value to be sensitive thus it shouldn’t be possible
to identify the user from this value. You may already have a user ID in
your system, but are you sure that it doesn’t leak out anywhere else? If
so, perhaps it’s a reasonable value to use as the user ID. But otherwise
it might be safer to generate a fresh identifier for just this
purpose:
ALTER TABLE users ADD COLUMN passkey_id blob DEFAULT(randomblob(16));
/* The CASE expression causes the function to be non-constant. */
UPDATE users SET passkey_id=hex(randomblob(CASE rowid WHEN 0
THEN 16
ELSE 16 END));
(This SQL is just an example. You’ll need to adjust it for your specific environment.)
These SQL snippets assume that your table of passkeys is keyed by the credential ID, and that your user IDs are random. These are the correct choices for the majority of sites.
However, there are some cases where your users are split up. Perhaps as the result of an acquisition or the merging of disparate systems, you might need to know in which universe an account lives in order to be able to efficiently look it up. But when you get a WebAuthn assertion that contains a credential ID, the ID is random: you don’t know from which universe it came.
In this case, you can use structured user ID values. Because an assertion from a discoverable credential also returns the user ID, and that’s a value chosen by the server, you can encode whatever universe information you need in it.
We won’t develop this possibility any further as it’s mentioned only for the handful of people who will find this hint useful.
When a user signs-in with a password, you might want them to create a passkey on the local device for easier sign-in next time. First, check that a local platform authenticator exists and that the browser or mobile platform supports passkeys / conditional UI. (See here.)
Next, try conditionally creating a passkey; see here. (Conditional creation may also be available via platform APIs.) If this doesn’t work, you may want to prompt the user to create a passkey and use the traditional modal UI flow.
In both the conditional and modal cases, the server will need to send creation parameters to the client. Below the parameters are represented as a Javascript object for the purposes of exposition but note that both Android and, increasingly, the web support accepting JSON-encoded requests. (See here.) Unfortunately, the iOS API does not accept JSON, and so either your backend will need to produce a different style of output for any iOS apps, or else you’ll need to implement a converter in the app from JSON to the native API on that platform.
var createOptions : CredentialCreationOptions = {
: {
publicKey: {
rp// The RP ID.
: "example.com",
id// This field is required to be set to something but is not
// currently used by any implementations.
: "",
name,
}
: {
user// `userIdBase64` is the passkey_id field from the users table,
// base64-encoded.
: Uint8Array.from(atob(userIdBase64), c => c.charCodeAt(0)),
id// `username` is the username field from the users table.
: username,
name// `displayName` can be a more human name for the user, or
// just leave it blank.
: "",
displayName,
}
// This lists the ids of the user's existing credentials. I.e.
// SELECT cred_id FROM passkeys WHERE username = ?
// and supply the resulting list of values, base64-encoded, as
// existingCredentialIdsBase64 here.
: existingCredentialIdsBase64.map(id => {
excludeCredentialsreturn {
: "public-key",
type: Uint8Array.from(atob(id), c => c.charCodeAt(0)),
id;
},
})
// Boilerplate that advertises support for P-256 ECDSA and RSA
// PKCS#1v1.5. Supporting these key types results in universal
// coverage so far.
: [{
pubKeyCredParams: "public-key",
type: -7
alg, {
}: "public-key",
type: -257
alg,
}]
// Unused during registrations, except when doing attestation.
// (But don't do this during sign-in!)
: new Uint8Array([0]),
challenge
: {
authenticatorSelection: "platform",
authenticatorAttachment: true,
requireResidentKey,
}
// Five minutes.
: 300000,
timeout
};
}
navigator.credentials.create(createOptions).then(
, handleCreationError); handleCreation
When the promise
from navigator.credentials.create
resolves successfully,
you have a newly created passkey! Now you have to ensure that it gets
recorded by the server.
The promise will result in a PublicKeyCredential
object,
the response
field of which is an AuthenticatorAttestationResponse
.
Call getAuthenticatorData()
and getPublicKey()
on response
and
send those ArrayBuffers to the server. (These fields also exist in the
JSON output from the Android platform APIs but, sadly, not on iOS where
they would have to be polyfilled.)
At the server, we want to insert a row into
the passkeys
table for this user. The authenticator
data is a fairly simple, binary format (see
here). Offset 32 contains the flags byte. Sanity
check that bit 6 is set and then extract:
backed_up
.
(I.e. (authData[32] >> 4) & 1
.)id
.The ArrayBuffer that came from getPublicKey()
is the
value for public_key_spki
. That should be all the values
needed to insert the row.
Note: Neither the user presence nor user verification bits are checked above. This works for conditional creation (see here) but might not be right for every deployment.
The promise from create()
might also result in an
exception. InvalidStateError
is special and means that a
passkey already exists for the local device. This is not an error, and
no error will have been shown to the user. They’ll have seen a UI just
like they were registering a passkey but the server doesn’t need to
update anything.
NotAllowedError
means that the user canceled the
operation. Other exceptions mean that something more unexpected
happened.
The WebAuthn-family APIs on mobile platforms will have similarly structured errors. See chapter 11.
See chapter 8 for details on using conditional UI for signing in on the web, and chapter 11 for details of invoking the APIs on mobile platforms to do something similar.
One thing that all of these APIs will need is a challenge value. See here about picking challenge values. Otherwise, assuming that you’re using discoverable credentials, there aren’t any other inputs. (There couldn’t be because, at this point, you don’t know who the user is!)
A successful response from all these APIs will include the:
rawId
in WebAuthn’s
result).Those values should be sent to the server for validation. At the
server, first look up the
passkey: SELECT username, public_key_spki, backed_up FROM passkeys WHERE cred_id = ?
and
give the credential ID value for matching.
The cred_id
column is a primary key, so there can either be
zero or one matching row(s). If there are zero rows then the user is
signing in with a passkey that the server doesn’t know about—perhaps
they deleted it. This is an error, reject the sign-in.
Otherwise, the server now knows the claimed username and public key.
To validate the signature you’ll need to construct the signed data and
parse the public key. The public_key_spki
values from the
database are stored in SubjectPublicKeyInfo format and most languages
will have some way to ingest them. See chapter 13.
Your language’s crypto library should provide a function that takes a signature and some signed data and tells you whether that signature is valid for a given public key. For the signed data, calculate the SHA-256 hash of the client data JSON and append it to the contents of the authenticator data. If the signature isn’t valid, reject the sign-in.
But there are still a number of things that you need to check!
Parse the client data as UTF-8 JSON and check that:
type
member is webauthn.get
.challenge
member is equal to the base64url encoding
of the challenge that the server gave for this sign-in.origin
member is equal to your site’s sign-in
origin (e.g. a string like “https://www.example.com”
), or
is a recognised Android app.crossOrigin
member, if present, is false.There’s more! Take the authenticatorData
and check
that:
(authData[32] & 1) == 1
. This is the user
presence bit that indicates that a user approved the signature.If all those checks work out, then sign in the user whose passkey it was. E.g. set a cookie and respond to the running Javascript so that it can update the page.
If the stored value of backed_up
is not equal
to (authData[32] >> 4) & 1
then update that in
the database.
Note: The user verification bit isn’t checked above, but some sites might want to require user verification.
Once a user is using passkeys to sign in, great! But if they were upgraded from a password then that password is hanging around on the account, doing nothing useful yet creating risk. It would be good to ask the user about removing the password.
Doing this is reasonable if the account has a backed-up passkey. I.e.
if SELECT 1 FROM passkeys WHERE username = ? AND backed_up = TRUE
returns
results. A site might consider prompting the user to remove the password
on an account when they sign in with a passkey and have a backed-up one
registered.
If you’ve used passkeys with any sites, then you’ll have noticed that
they tend to list registered passkeys in their account settings, let
users name each one, show the last used time, and let them be
individually removed. If you want to do this then you’ll need to add
more columns to the passkeys
table that we sketched above
to support this. You might also want to record the AAGUID in order to
automatically show where a passkey is stored.
Recall from here that the AAGUID is the first 16 bytes of the attested credential data that is inside the authenticator data when creating a credential. It will generally reveal which passkey provider created a given credential. (If it was a passkey provider and not a security key, that is.) If the AAGUID is not the all-zero value, then it might be one of the following values:
ID | Name |
---|---|
08987058-cadc-4b81-b6e1-30de50dcbe96 |
Windows Hello |
0ea242b4-43c4-4a1b-8b17-dd6d0b6baec6 |
Keeper |
17290f1e-c212-34d0-1423-365d729f09d9 |
Thales PIN iOS SDK |
39a5647e-1853-446c-a1f6-a79bae9f5bc7 |
IDmelon |
50726f74-6f6e-5061-7373-50726f746f6e |
Proton Pass |
531126d6-e717-415c-9320-3d9aa6981239 |
Dashlane |
53414d53-554e-4700-0000-000000000000 |
Samsung Pass |
6028b017-b1d4-4c02-b4b3-afcdafc96bb2 |
Windows Hello |
66a0ccb3-bd6a-191f-ee06-e375c50b9846 |
Thales Bio iOS SDK |
771b48fd-d3d4-4f74-9232-fc157ab0507a |
Edge on Mac |
8836336a-f590-0921-301d-46427531eee6 |
Thales Bio Android SDK |
891494da-2c90-4d31-a9cd-4eab0aed1309 |
Sésame |
9ddd1817-af5a-4672-a2b9-3e3dd95000a9 |
Windows Hello |
adce0002-35bc-c60a-648b-0b25f1f05503 |
Chrome on Mac |
b5397666-4885-aa6b-cebf-e52262a439a2 |
Chromium Browser |
b84e4048-15dc-4dd0-8640-f4f60813c8af |
NordPass |
bada5566-a7aa-401f-bd96-45619a55120d |
1Password |
cc45f64e-52a2-451b-831a-4edd8022a202 |
ToothPic Passkey Provider |
cd69adb5-3c7a-deb9-3177-6800ea6cb72a |
Thales PIN Android SDK |
d548826e-79b4-db40-a3d8-11116f7e8349 |
Bitwarden |
dd4ec289-e01d-41c9-bb89-70fa845d4bf2 |
iCloud Keychain (Managed) |
ea9b8d66-4d01-1d21-3ce4-b6b48cb575d4 |
Google Password Manager |
f3809540-7f14-49c1-a8b3-8f813b225541 |
Enpass |
fbfc3007-154e-4ecc-8c0b-6e020557d7bd |
iCloud Keychain |
fdb141b2-5d84-443e-8a35-4698c205a502 |
KeePassXC |
(This comes from https://github.com/passkeydeveloper/passkey-authenticator-aaguids, which may be more up to date by the time that you read this, and also contains icons for many of the providers.)
For simpler sites it’s also perfectly valid to avoid all that
complexity and have a “reset passkeys” button (like a “reset password”
button). It would prompt for a new passkey registration (with no
excludeCredentials
listed), delete all other passkeys, and
invalidate all other active sessions for the user. Unfortunately the
deleted passkeys would still exist on the client side. Work is underway
to add an API (called the signal API) that would let sites inform
platforms during subsequent sign-ins that other passkeys have been
invalidated, but it’s not ready at the time of writing.
There are several different public key formats that you’ll encounter around WebAuthn. This chapter will help you to recognize and handle them. Overwhelmingly, WebAuthn uses a signature scheme called ECDSA P-256, so we’ll discuss the different public key formats for this scheme.
ECDSA public keys are (x, y) coordinate pairs. The coordinates in P-256 are 256-bit numbers. The simplest public key format zero-pads the coordinates so that they’re each 32-byte, big-endian values, sticks them together, and prepends an 0x04 byte. This is X9.62 format.
You can recognize it because these public keys are always 65 bytes long and they start with an 0x04 byte. Although not commonly encountered directly in WebAuthn, raw X9.62 keys are found within the next format discussed.
Note that, technically, this is the uncompressed X9.62 format. Since the coordinate values are related by an equation, you can derive the y value given the x value, although the square root operation means you obtain a value that is either equal to y or -y.
Because of this, there’s also compressed X9.62 format where only the x coordinate is given and the leading byte is 0x02 or 0x03, depending on which of the two y values from the square root operation is the correct one. However, this is extremely rare in general. In WebAuthn, it only appears inside the QR code of a hybrid connection (see chapter 10) and hybrid is a feature implemented only by the platform.
X9.62 works great, but it’s nice for public key formats to be able to describe which signature scheme they apply to. There are other elliptic curves with 256-bit coordinates and it would be nice if the public keys for all these schemes weren’t mutually ambiguous.
So a SubjectPublicKeyInfo
(SPKI) wraps a public key in ASN.1 that identifies a signature scheme.
This is the format used inside of X.509 certificates, and so is quite
widely supported. It’s also the format that WebAuthn’s
getPublicKey()
returns.
You can recognize it because it starts with an 0x30 byte. (Although that’s common to all ASN.1-based formats.) You can convert from X9.62 format to SPKI by prepending the following bytes:
3059301306072a8648ce3d020106082a8648ce3d030107034200
Those bytes include the algorithm identifier for ECDSA P-256 and the needed prefix such that the X9.62 bytes can follow, without any suffix. However, the format is sufficiently flexible that it would be unwise to try matching and removing that prefix to convert in the other direction. Since SPKI is widely supported, that shouldn’t be necessary anyway.
If you’re looking for functions in your favorite language to parse an
SPKI, see java.security.spec.X509EncodedKeySpec
in Java,
System.Security.Cryptography.ECDsa.ImportSubjectPublicKeyInfo
in .NET, or crypto/x509.ParsePKIXPublicKey
in Go.
COSE is the “CBOR Object Signing and Encryption” framework and it’s the format used by public keys in the authentication data. If you’re doing attestation (see chapter 7) or dealing with the Apple API (see chapter 11) then you’ll have to process this format. Unfortunately, it’s rare and is not commonly supported by general cryptographic libraries.
As the name suggests, it’s encoded with CBOR and, in the case of WebAuthn, that’ll be the CTAP2 subset of CBOR (see chapter 6). Technically speaking you can read RFC 8152 and figure out how to process a COSE public key, but the RFC is not straightforward. You’re better off looking at an example:
{
1: 2, # key type = elliptic curve
3: -7, # alg = ECDSA P-256
-1: 1, # curve = P-256
# x and y coordinates
-2: h'950F7AF17D9E...',
-3: h'7A6B0654742C...',
}
The x and y coordinates in the CBOR are zero-padded and so, for P-256, must always be 32-bytes long.
You can recognize these keys because they’ll start with 0xa5 (for a CBOR map with five entries) and, as mentioned, they appear in the authenticator data. In order to convert to X9.62 format you can parse the CBOR, check that keys 1, 3, and -1 are present with the expected values, check that keys -2 and -3 are present with 32-byte values, then concatenate an 0x04 byte and the x and y values. Once you have X9.62 format, see above for how to convert to SPKI format if you need.