ImperialViolet

Playing with the Certificate Transparency pilot log (01 Aug 2013)

I've written about Certificate Transparency several times in the past, but now there's actually something to play with! (And, to be clear, this is entirely thanks to several of my colleagues, not me!) So I though it would be neat to step through some simple requests of the pilot log.

I'm going to be using Go for this, because I like it. But it's substantially just JSON and HTTP and one could do it in any language.

In order to query a log you need to know its URL prefix and public key. Here's a structure to represent that.

// Log represents a public log.
type Log struct {
	Root string
	Key *ecdsa.PublicKey
}

For the pilot log, the URL prefix is https://ct.googleapis.com/pilot and here's the public key:

-----BEGIN PUBLIC KEY-----
MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEfahLEimAoz2t01p3uMziiLOl/fHT
DM0YDOhBRuiBARsV4UvxG2LdNgoIGLrtCzWE0J5APC2em4JlvR8EEEFMoA==
-----END PUBLIC KEY-----

Since it's a little obscure, here's the code to parse a public key like that. (This code, and much of the code below, is missing error checking because I'm just playing around.)

	block, _ := pem.Decode([]byte(pilotKeyPEM))
	key, _ := x509.ParsePKIXPublicKey(block.Bytes)
	pilotKey = key.(*ecdsa.PublicKey)
	pilotLog = &Log{Root: "https://ct.googleapis.com/pilot", Key: pilotKey}

The log is a tree, so the first thing that we want to do with a log is to get its head. This is detailed in section 4.3 of the RFC: we make an HTTP GET to get-sth and it returns a JSON blob.

Any HTTP client can do that, so give it a go with curl on the command line:

$ curl https://ct.googleapis.com/pilot/ct/v1/get-sth 2>>/dev/null | less

You should get a JSON blob that, with a bit of formatting, looks like this:

{
	"tree_size": 1979426,
	"timestamp": 1368891548960,
	"sha256_root_hash": "8UkrV2kjoLcZ5fP0xxVtpsSsWAnvcV8aPv39vh96J2o=",
	"tree_head_signature": "BAMARjBEAiAc95/ONhz2vQsULrISlLumvpo..."
}
In Go, we'll parse that into a structure like this:

// Head contains a signed tree head.
type Head struct {
	Size uint64 `json:"tree_size"`
	Time time.Time `json:"-"`
	Hash []byte `json:"sha256_root_hash"`
	Signature []byte `json:"tree_head_signature"`
	Timestamp uint64 `json:"timestamp"`
}

And here's some code to make the HTTP request, check the signature and return such a structure:

func (log *Log) GetHead() (*Head, error) {
	// See https://tools.ietf.org/html/rfc6962#section-4.3
	resp, err := http.Get(log.Root + "/ct/v1/get-sth")
	if err != nil {
		return nil, err
	}

	defer resp.Body.Close()
	if resp.StatusCode != 200 {
		return nil, errors.New("ct: error from server")
	}
	if resp.ContentLength == 0 {
		return nil, errors.New("ct: body unexpectedly missing")
	}
	if resp.ContentLength > 1<<16 {
		return nil, errors.New("ct: body too large")
	}
	data, err := ioutil.ReadAll(resp.Body)
	if err != nil {
		return nil, err
	}
	
	head := new(Head)
	if err := json.Unmarshal(data, &head); err != nil {
		return nil, err
	}

	head.Time = time.Unix(int64(head.Timestamp/1000), int64(head.Timestamp%1000))

	// See https://tools.ietf.org/html/rfc5246#section-4.7
	if len(head.Signature) < 4 {
		return nil, errors.New("ct: signature truncated")
	}
	if head.Signature[0] != hashSHA256 {
		return nil, errors.New("ct: unknown hash function")
	}
	if head.Signature[1] != sigECDSA {
		return nil, errors.New("ct: unknown signature algorithm")
	}

	signatureBytes := head.Signature[4:]
	var sig struct {
		R, S *big.Int
	}

	if signatureBytes, err = asn1.Unmarshal(signatureBytes, &sig); err != nil {
		return nil, errors.New("ct: failed to parse signature: " + err.Error())
	}
	if len(signatureBytes) > 0 {
		return nil, errors.New("ct: trailing garbage after signature")
	}
	
	// See https://tools.ietf.org/html/rfc6962#section-3.5
	signed := make([]byte, 2 + 8 + 8 + 32)
	x := signed
	x[0] = logVersion
	x[1] = treeHash
	x = x[2:]
	binary.BigEndian.PutUint64(x, head.Timestamp)
	x = x[8:]
	binary.BigEndian.PutUint64(x, head.Size)
	x = x[8:]
	copy(x, head.Hash)

	h := sha256.New()
	h.Write(signed)
	digest := h.Sum(nil)

	if !ecdsa.Verify(log.Key, digest, sig.R, sig.S) {
		return nil, errors.New("ct: signature verification failed")
	}

	return head, nil
}

If one runs this code against the current pilot, we can get the same information as we got with curl: 1979426 certificates at Sat May 18 11:39:08 EDT 2013, although now the date will be newer and there will be more entries.

As you can see, the log has been seeded with some certificates gathered from a scan of the public Internet. Since this data is probably more recent than the EFF's Observatory data, it might be of interest and anyone can download it using the documented interface. Again, we can try a dummy request in curl:

$ curl 'https://ct.googleapis.com/pilot/ct/v1/get-entries?start=0&end=0' 2>>/dev/null | less

You should see a result that starts with {"entries":[{"leaf_input":"AAAAAAE9p. You can request log entries in batches and even download everything if you wish. My toy code simply writes a single large file containing the certificates, gzipped and concatenated with a length prefix.

My toy code is go gettable from https://github.com/agl/certificatetransparency. In the tools directory is ct-sync, which will incrementally download the pilot log and ct-map, which demonstrates how to look for interesting things by extracting the certificates from a local mirror file. Both will use multiple cores if possible so be sure to set GOMAXPROCS. (This is very helpful when calculating the hash of the tree since the code doesn't do that incrementally.)

The project's official code is at https://code.google.com/p/certificate-transparency/. See the README and the main site.

And since we have the tools to look for interesting things, I spent a couple of minutes doing so:

I wrote a quick script to find ‘TURKTRUST’ certificates - leaf certificates that are CA certificates. The log only contains certificates that are valid by a fairly promiscuous root set in order to prevent spam, so we don't have to worry about self-signed certificates giving us false positives. As ever the Korean Government CA is good to highlight malpractice with 40 certificates that mistakenly have the CA bit set (and are currently valid). This issue was reported by a colleague of mine last year. Thankfully, the issuing certificate has a path length constraint that should prevent this issue from being exploited (as long as your X.509 validation checks these sort of details). The root CA is also only present in the Microsoft root set as far as I know.

A different Korean root CA (KISA) can't encode ASN.1 correctly and is issuing certificates with negative serial numbers because ASN.1 INTEGERs are negative if the most significant bit is one.

Another CA is issuing certificates with 8-byte IP addresses (I've let them know).

The ct-map.go in the repository is setup to find publicly valid certificates with names that end in .corp (which will soon not get an HTTPS indication in Chrome.)

Anyway, at the moment the log is just pre-seeded with some public certificates. The real value of CT comes when we have strong assurances that all valid certificates are in the log. That will take some time, but we're working towards it.