Get some in-depth insights in our work at Bitahoy.

Website | Jobs at Bitahoy | GitHub | Linkedin

Dissecting QUIC in python

• Tristan Hermanns

Dissecting QUIC in python

Why QUIC?

Big companies like Google and Meta are rapidly switching towards using HTTP/3. As HTTP/3 is built on top of QUIC, the protocol is becoming increasingly prevalent on the web. In the context of the Bitahoy Content Blocking, domain-based filtering is currently performed on DNS and TLS traffic. We wanted to extend our capabilities by additionally parsing QUIC traffic.

Bitahoy Content Blocking

Our most straightforward approach for content blocking is pDNS or DNS hijacking. Here, all DNS queries and responses are intercepted and checked to see whether the requested domain should be blocked. If that is the case, we reply with an empty record (i.e., for A records, the value will be 0.0.0.0).

However, we can not extract the domain when the DNS request is encrypted. To give you a rough estimate, for some of our customers, DNS only accounts for less than 10% of the domains we observe. This happens, for example, if DNS-over-HTTPS (DoH) or DNS-over-TLS (DoT) is used. Due to the rising adoption of DoH (e.g., in firefox and Android), it is crucial to not only rely on DNS for content blocking.

In the case of only having TLS traffic, our inspection and blocking occur during the actual TLS handshake. To identify the domain of the web server the client is connecting to, we extract the Server Name Indication (SNI) field from the TLS Client Hello. If the server name is not considered benign, we block the TLS connection.

Yet, this only works for connections that do not use HTTP/3 because HTTP/3 uses UDP & QUIC & TLS instead of TCP & TLS. But because QUIC (v1) still uses the TLS handshake internally, we can dissect and extract the SNI from the initial packet. However, the extraction is not trivial, as the initial packet’s payload (and also parts of the header) are “encrypted” with static secrets (specified in RFC9001). We can also not rely on scapy, as it does not yet support QUIC.

First approach

First, we tried to use pyshark, a python API for tshark/Wireshark, to dissect QUIC packets. However, we quickly realized that parsing packets using this approach takes quite a long time (> 1 sec), which besides a timeout, also often causes the browser to resend the request, causing the packet queue to add up quickly.

Second approach

Instead, we built a library to decrypt and parse relevant QUIC packets. With that goal in mind, we perform three successive steps:

Header Decryption
Payload Decryption
Client Hello Extraction
Encrypted Packet
Packet Number
Encrypted Payload
Decrypted Payload
TLS Client Hello

In the end, this approach provides the TLS Client Hello, from which we can extract the SNI using scapy.

We used the following resources to understand the structure and encryption of QUIC:

Header Protection

The QUIC Initial packet looks like this:

QUIC Initial Packet

The Token Length and Remainder Length fields (Packet Number + Payload + Authentication Tag) are provided in Variable-Length Integer Encoding. VLIE uses the two most significant bytes to indicate if 1,2,4, or 8 bytes are used to represent the actual value. Everything (unprotected) up to the packet number can already be extracted with this info.

However, the Packet Number is needed as input for the payload decryption, but both the Packet Number and Packet Number Length (PNL) are encrypted using QUIC Header Protection. We decrypt the PNL using the following code snippet:

# First byte contains packet number length
first_byte = raw_quic_packet[0] ^ (mask[0] & 0x0f)
pnl = (first_byte & 0x03) + 1

As we can see, the variable mask is needed for this step. It is calculated by encrypting a part of the “Protected Payload”, namely the “Sampled Part” (16 bytes).

This encryption is done with the following setup:

from scapy.layers.tls.crypto.hkdf import TLS13_HKDF
from Crypto.Cipher import AES

# static constant
salt = bytes.fromhex("38762cf7f55934b34d179ae6a4c80cadccbb7f0a")

tls_hkdf = TLS13_HKDF("sha256")

# dcid from header
prk = tls_hkdf.extract(salt, dcid)
client_secret = tls_hkdf.expand_label(prk, b"client in", b"", 32)
hp_key = tls_hkdf.expand_label(client_secret, b"quic hp", b"", 16)

header_encryptor = AES.new(hp_key, AES.MODE_ECB)
mask = header_encryptor.encrypt(sample_payload)

Notice that the only variable part is the DCID, as all other arguments are given in RFC9001.

By knowing its length, the Packet Number can be extracted by combining it with the mask’s 2nd to 5th byte using another xor:

encrypted_pn = raw_quic_packet[pn_offset:pn_offset+pnl]
pn = bytes(map(operator.xor, encrypted_pn, mask[1:pnl + 1]))

Payload decryption

To decrypt the payload, QUIC also uses AES; however, this time in GCM mode. For creating key and IV, the packet number and the variable client_secret calculated before are used:

pp_key = tls_hkdf.expand_label(client_secret, b"quic key", b"", 16)

pre_iv = tls_hkdf.expand_label(client_secret, b"quic iv", b"", 12)
iv = (int.from_bytes(pre_iv, "big") ^ int.from_bytes(pn, "big")).to_bytes(12, "big")

payload_encryptor = AES.new(pp_key, AES.MODE_GCM, iv)

Additionally, AEAD (Authenticated Encryption with Associated Data) is used, which allows authentication of the encrypted payload and the (partially) unencrypted header. The algorithm creates a 16-byte authentication tag attached to the payload’s end.

payload_encryptor.update(header)
payload = payload_encryptor.decrypt_and_verify(protected_payload, auth_tag)

Note that the payload length cannot simply be inferred by the packet length, as additional segments can be appended (i.e., 0-RTT). Instead, it is calculated from the Remainder Length field.

Client Hello Extraction

We first need to understand its structure to extract the TLS Client Hello from the now decrypted payload. The payload consists of multiple “frames”, with the first byte indicating its type (e.g., PING, STREAM, ACK). The Client Hello is provided through the CRYPTO frame(s) in the initial packet. These frames have an offset and a length, and the content can be split up into multiple parts (Chrome, for example, makes heavy use of this). To recreate the original structure, we parse every frame in the payload and recombine all CRYPTO frames in the given order. The result is the TLS Client Hello.

To block the request, in case the requested domain is contained in the blocklist, we simply overwrite the authentication tag at the end of the payload with 16 A’s.

Takeaway

To extend Bitahoy Content Blocking, we implemented our own dissector & decryptor for QUIC.

Our approach performs three successive steps:

  1. Header Decryption
  2. Payload Decryption
  3. Client Hello Extraction

The first two steps mainly require understanding the respective decryption pipeline, whereas the last step rebuilds the TLS Client Hello from the resulting decrypted payload.