hugo-relie/content/en/posts/dns-challenge.md at 0b411ff4ee05267083c0fb0b0ab766d59e7b2c4a

faercol/hugo-relie

Fork 0

Melora Hugues 0b411ff4ee

/ docker-build-push (push) Successful in 26s

Details

Fix wrong article date

2024-09-08 15:38:13 +02:00

26 KiB

Raw Blame History

title

date

draft

toc

images

The problem of having a self-hosted infrastructure

I've been maintaining a personal homelab and self-hosted infrastructure for a few years now, but one of the most infuriating pages when starting such project is this dreaded Warning: Potential Security Risk Ahead page that appears when you're using a self-signed certificate, or when trying to use a password on a website or app that is served through plain HTTP.

While acceptable if you're alone on your own infrastructure or dev environment, this poses several issues in many other contexts:

It is not acceptable to publicly expose a website presenting this issue
It's not advisable to say "hey look, I know that your browser gives you a big red warning, but it's okay, you can just accept" to friends/family/etc. It's just a very bad habit to have
After a while, it really starts to get on your nerve

Thankfully a free solution for that, which is well known by now, has existed for almost ten (10) years now: Let's Encrypt and the ACME protocol.

{{< callout type="note" >}} I promise this is not yet another Let's Encrypt tutorial... Well it is, but for a more specific use-case {{< /callout >}}

The Let's Encrypt solution

What is Let's Encrypt

Let's Encrypt is a nonprofit certificate authority founded in November 2014. Its main goal was to provide an easy and free way to obtain a TLS certificate in order to make it easy to use HTTPS everywhere.

The ACME protocol developed by Let's Encrypt is an automated verification system aiming at doing the following:

verifying that you own the domain for which you want a certificate
creating and registering that certificate
delivering the certificate to you

Most client implementation also have an automated renewal system, further reducing the workload for sysadmins.

The current specification for the ACME protocol proposes two (2) types of challenges to prove ownership and control over a domain: HTTP-01 and DNS-01 challenge.

{{< callout type="note" >}} Actually there are two (2) others: TLS-SNI-01 which is now disabled, and TLS-ALPN-01 which is only aimed at a very specific category of users, which we will ignore here. {{< /callout >}}

The common solution: HTTP challenge

The HTTP-01 challenge is the most common type of ACME challenge, and will satisfy most use-cases.

For this challenge, we need the following elements :

A domain name and a record for that domain in a public DNS server (it can be a self-hosted DNS server, our providers', etc)
Access to a server with a public IP that can be publicly reached

When performing this type of challenge, the following happens (in a very simplified way):

The ACME client will ask to start a challenge to the Let's Encrypt API
In return, it will get a token
It will then either start a standalone server, or edit the configuration for our current web server (nginx, apache, etc) to serve a file containing the token and a fingerprint of our account key.
Let's Encrypt will try to resolve our domain test.example.com.
If resolution works, then it will check the url http://test.example.com/.well-known/acme-challenge/<TOKEN>, and verify that the file from step 3 is served with the correct content.

If everything works as expected, then the ACME client can download the certificate and key, and we can configure our reverse proxy or server to use this valid certificate, all is well.

{{< callout type="help" >}} Okay, but my app contains my accounts, or my proxmox management interface, and I don't really want to make it public, so how does it work here? {{< /callout >}}

Well it doesn't. For this type of challenge to work, the application server must be public. For this challenge we need to prove that we have control over the application that uses the target domain (even if we don't control the domain itself). But the DNS-01 challenge bypasses this limitation.

When it's not enough: the DNS challenge

As we saw in the previous section, sometimes, for various reasons, the application server is in a private zone. It must be only reachable from inside a private network, but we might still want to be able to use a free Let's Encrypt certificate.

For this purpose, the DNS-01 challenge is based on proving that one has control over the DNS server itself, instead of the application server.

For this type of challenge, the following elements are needed :

A public DNS server we have control over (can be a self-hosted server, or your DNS provider)
A ACME client (usually it would be on the application server), it doesn't need to be public

Then, the challenge is done the following way :

The ACME client will ask to start a challenge to the Let's Encrypt API.
In return, it will get a token.
The client then creates a TXT record at _acme-challenge.test.example.com derived from the token and the account key.
Let's Encrypt will try to resolve the expected TXT record, and verify that the content is correct.

If the verification succeeds, we can download your certificate and key, just like the other type of challenge.

It's important to note that at no point in time did Let's Encrypt have access to the application server itself, because this challenges involves proving that we control the domain, not that we control the destination of that domain.

If I'm trying to obtain a valid certificate for my Proxmox interface, this is the way I would want to go, because it would allow me to have a valid certificate, despite my server not being public at all. So let's see how it works in practice.

DNS challenge in practice

For this example, I will try to obtain a certificate for my own domain test.internal.example.com. As this name suggests, it is an internal domain and should not be publicly reachable, so this means I'm going to use a DNS challenge. I don't really want to use my DNS provider API for this, so I'm going to use a self-hosted bind server for that.

{{< callout type="note" >}} The rest of this "guide" will be based on a deployment for a bind9 server. It can be adapted to any other type of deployment, but all the configuration snippets are based on bind9. Let's Encrypt has relevant documentations for other hosting providers. {{< /callout >}}

Configuring the DNS server

The first step is configuring the DNS server. For this, I'll just use a bind server installed from my usual package manager.

# example on Debian 12
sudo apt install bind9

Most of the configuration happens in the /etc/bind directory, mostly in /etc/bind/named.conf.local

root@dns-server: ls /etc/bind/
bind.keys  db.127  db.empty  named.conf         named.conf.local    rndc.key
db.0    db.255  db.local  named.conf.default-zones  named.conf.options  zones.rfc1918

Let's declare a first zone, for internal.example.com. Add the following config to /etc/bind/named.conf.local

zone "internal.example.com." IN {
  type master;
  file "/var/lib/bind/internal.example.com.zone";

This simply declares a new zone which is described in the file /var/lib/bind/internal.example.com.zone

Let's now create the zone itself. A DNS zone has a base structure that we must follow

$ORIGIN .
$TTL 7200 ; 2 hours
internal.example.com IN SOA ns.internal.example.com. admin.example.com. (
                            2024070301 ; serial
                            3600       ; refresh (1 hour)
                            600        ; retry (10 minutes)
                            86400      ; expire (1 day)
                            600        ; minimum (10 minutes)
                            )
                    NS ns.internal.example.com.

$ORIGIN internal.example.com.
ns      A     1.2.3.4
test    A     192.168.1.2

This file declares a zone internal.example.com which master is ns.internal.example.com. It also sets the parameters (time to live for the records, and the current serial for the zone config).

Finally, two (2) A records are created, associating the name ns.internal.example.com to the IP address 1.2.3.4, and test.internal.example.com (the domain for which we want a certificate) to a local IP address 192.168.1.2.

A simple systemctl restart bind9 would be enough to apply the modification, but we still have one thing to do, which is allowing remote modifications to the zone.

Enabling remote DNS zone modification

To allow remote modification of our DNS zone, we are going to use TSIG which stands for Transaction signature. It's a way to secure server to server operations to edit a DNS zone, and is preferred to access control based on IP addresses.

Let's start with creating a key using the command tsig-keygen <keyname>

➜ tsig-keygen letsencrypt
key "letsencrypt" {
 algorithm hmac-sha256;
 secret "oK6SqKRvGNXHyNyIEy3hijQ1pclreZw4Vn5v+Q4rTLs=";
};

This creates a key with the given name using the default algorithm (which is hmac-sha256). The entire output of this command is actually a code block that you can add to your bind9 configuration.

Finally, using update-policy, allow this key to be used to update the zone.

update-policy {
  grant letsencrypt. zonesub txt;
};

{{< callout type="note" >}} Doing so allows users to update everything in our zone using this key. In fact we would only need to update _acme-challenge.test.internal.example.com as seen in the DNS challenge description.

If we want a better restriction, then we can use the following configuration instead

update-policy {
  grant letsencrypt. name _acme-challenge.test.internal.example.com. txt;
};

This means our entire named.conf.local would become something like this

key "letsencrypt" {
 algorithm hmac-sha256;
 secret "oK6SqKRvGNXHyNyIEy3hijQ1pclreZw4Vn5v+Q4rTLs=";
};

zone "internal.example.com." IN {
  type master;
  file "/var/lib/bind/internal.example.com.zone";
  update-policy {
    grant letsencrypt. zonesub txt;
  };
};

{{< callout type="warning" >}} Be very cautious about the . at the end of the zone name and the key name, they are easy to miss, and forgetting them will cause issues that would be hard to detect. {{< /callout >}}

With that being done, you can restart the DNS server and everything is ready server side, the only remaining thing to do would be the DNS challenge itself.

Performing the challenge

We start by installing the certbot with the RFC2136 plugin (to perform the DNS challenge).

apt install python3-certbot-dns-rfc2136

It's handled using a .ini configuration file, let's put it in /etc/certbot/credentials.ini

dns_rfc2136_server = <you_dns_ip>
dns_rfc2136_port = 53
dns_rfc2136_name = letsencrypt.
dns_rfc2136_secret = oK6SqKRvGNXHyNyIEy3hijQ1pclreZw4Vn5v+Q4rTLs=
dns_rfc2136_algorithm = HMAC-SHA512

Finally, run the challenge using certbot (if it's the first time you're using certbot on that machine, it might ask for an email to handle admin stuff).

root@toolbox:~: certbot certonly --dns-rfc2136 --dns-rfc2136-credentials /etc/certbot/credentials.ini -d 'test.internal.example.com'

Saving debug log to /var/log/letsencrypt/letsencrypt.log
Requesting a certificate for test.internal.example.com
Waiting 60 seconds for DNS changes to propagate

Successfully received certificate.
Certificate is saved at: /etc/letsencrypt/live/test.internal.example.com/fullchain.pem
Key is saved at:         /etc/letsencrypt/live/test.internal.example.com/privkey.pem
This certificate expires on 2024-09-30.
These files will be updated when the certificate renews.
Certbot has set up a scheduled task to automatically renew this certificate in the background.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
If you like Certbot, please consider supporting our work by:
 * Donating to ISRG / Let's Encrypt:   https://letsencrypt.org/donate
 * Donating to EFF:                    https://eff.org/donate-le
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

And that's done, we have a certificate, and a no point in time did we need to actually expose our application to the outside world.

{{< callout type="warning" >}} We used standalone mode for the certbot here, which means that when it renews the certificates, certbot will only download the new certificates, and nothing more. If we use a reverse proxy like nginx, we would also need to restart the service in order to load the new certificates when they are renewed, as certbot would not do it itself in standalone mode. {{< /callout >}}

Now because I like to go way too far, I can propose two (2) improvements to this setup:

Using ACL in addition to the TSIG key to secure operations on the DNS server
Using a second DNS server only locally accessible for your private records, and using the public server to only perform challenges

Bonus 1: adding a second layer of authentication to connect to the DNS

In our setup, we used TSIG to secure our access to the DNS server, meaning that having the key is necessary to perform the operations. If we are paranoid, or if we want to do a little bit more, then we could add a second layer of authentication based on Access Control List (ACL).

ACL allow to filter allowed operations based on several characteristics, such as IP address, TSIG key, subnet. In our case, we will use an IPV4 subnet from inside a Wireguard tunnel between the application servers (DNS clients) and the DNS server. It could be any form of tunnel, but Wireguard is easy to configure and perfect for point-to-point tunnels such as what we are doing here.

Wireguard configuration

First, let's create the Wireguard tunnel.

We start by creating two wireguard key pairs, which can be done this way

# Install wireguard tools
apt install wireguard-tools

# Create the keypair
wg genkey | tee privatekey | wg pubkey > publickey

Private key is in the privatekey file, and public key in the publickey file.

Then we can create the server configuration, create a file /etc/wg/wg0.conf on the DNS server.

[Interface]
PrivateKey = <server_private_key>
Address = 192.168.42.1/24
ListenPort = 51820

[Peer]
PublicKey = <client_public_key>
AllowedIPs = 192.168.42.0/24

Then on the client side you can do the same

[Interface]
PrivateKey = <client_private_key>
Address = 192.168.42.2/24

[Peer]
PublicKey = <server_public_key>
Endpoint = <dns_public_ip>:51820
AllowedIPs = 192.168.42.1/32

Then you can start the tunnel on both sides using wg-quick up wg0, check that ip works by pinging the server from the client

root@toolbox:~ ping 192.168.42.1
PING 192.168.42.1 (192.168.42.1) 56(84) bytes of data.
64 bytes from 192.168.42.1: icmp_seq=1 ttl=64 time=19.2 ms
64 bytes from 192.168.42.1: icmp_seq=2 ttl=64 time=8.25 ms

Basically, we created a new network 192.168.42.0/24 which links the DNS server and our client, and we can restrict modification to the DNS zone to force them to be from inside the virtual network, instead of allowing them from anywhere.

{{< callout type="note" >}} The ACL that we are going to use here can have many other purposes, such as hiding some domains, or serving different versions of a zone depending on the origin of the client. This is not our topic of concern here though. {{< /callout >}}

DNS configuration

Using ACLs, we are going to split the DNS zone into several views based on the source IP. Basically our goal is to say that

Users coming from inside our wireguard network 192.168.42.0/24 can modify DNS records in our zone using the TSIG key defined earlier.
Users coming from any other IP can read the DNS zone, but nothing else, so they can't update it, even using the correct key.

ACL can be defined inside named.conf.local using the following syntax.

acl local {
  127.0.0.0/8;
  192.168.42.0/24;
};

This means that local addresses, and addresses coming from our wireguard network will be considered as local and can be referenced as such in the rest of the configuration.

Then, a view can be created like this:

view "internal" {
  match-clients { local; };
  zone "internal.example.com." IN {
    type master;
    file "/var/lib/bind/internal.example.com.zone";
    update-policy {
      grant letsencrypt. zonesub txt;
    };
  };
};

Basically this means that the view internal is only used for clients that match the local ACL defined above. In this view we define the zone internal.example.com, which is the zone we defined earlier.

We also need to declare the zone for non-local users who wouldn't match the local ACL. It's important to note that you cannot use the same zone file twice in different zones, so we cannot define the public view exactly the same way. Our public view will be defined the following way:

view "public" {
  zone "internal.example.com." IN {
    in-view internal;
  };
};

This way, in the public view, we define the internal.example.com zone, and we define this zone as being inside the internal view. This way, we will serve the exact same DNS zone whatever the origin, but the update policy only applies to user from local addresses, and they will be the only ones able to edit the zone.

In summary, our named.conf.local file should now look like this.

acl local {
  127.0.0.0/8;
  192.168.42.0/24;
};

key "letsencrypt." {
  algorithm hmac-sha512;
  secret "oK6SqKRvGNXHyNyIEy3hijQ1pclreZw4Vn5v+Q4rTLs=";
};

view "internal" {
  match-clients { local; };
  zone "internal.example.com." IN {
    type master;
    file "/var/lib/bind/internal.example.com.zone";
    update-policy {
      grant letsencrypt. zonesub txt;
    };
  };
};

view "public" {
  zone "internal.example.com." IN {
    in-view internal;
  };
};

And now, without any additional change needed, we have a second layer of authentication for the DNS zone updates. We can go a little further and make sure that the private IPs themselves are hidden from the outside.

Bonus 2: completely hiding our private domains from outside

In this post, we implemented our own DNS server (or we used the one from our provider) in order to resolve internal private hosts, and perform DNS challenges for those hosts in order to obtain SSL certificates. But this is not entirely satisfying.

For example, we have the following record in our DNS zone:

test    A     192.168.1.2

This means that running host test.internal.example.com (or dig, or any other DNS query tool) will return 192.168.1.2, whether you're using your internal DNS, or Google's, or any other server. This is not great: this IP is private, and should not have any meaning outside of your network, and, while there wouldn't probably be any impact, publicly giving the information that you have a private host named test on an internal domain, its IP address (and thus par of your internal infrastructure) isn't great, especially if you have 10 hosts instead of only one.

For this reason we could use two (2) DNS servers with a different purpose:

A server inside the private network which would resolve the private hosts
A server outside the private network, which is only used for the challenges

Indeed, inside our network, we don't really need to be publicly reachable, but we need name resolution on our local hosts. In the same way, Let's Encrypt doesn't need any A record to perform DNS challenges, it only needs a TXT record, so each server can have its own specific role.

Basically, what we need is the following:

a publicly reachable DNS server (the one from the previous parts of this post), that will have:
- only its own NS records
- the TSIG key and rules to update the zone
- optionally, the VPN tunnel
- the TXT record to perform the DNS challenges
a private DNS on your local infrastructure, that will have
- all the A (and other types of) DNS records for your internal infrastructure

Let's split the previous configuration (I'll use the one from the Bonus 1 section as an example

Private DNS server

On the private DNS server, the only thing we need is our local internal.example.com zone definition, so our named.conf.local should look like this

zone "internal.example.com" IN {
  type master;
  file "/var/lib/bind/internal.example.com.zone";
  allow-update { none; };
};

And our zone definition would look like this

$ORIGIN .
$TTL 7200 ; 2 hours
internal.example.com IN SOA ns.internal.example.com. admin.example.com. (
                            2024070301 ; serial
                            3600       ; refresh (1 hour)
                            600        ; retry (10 minutes)
                            86400      ; expire (1 day)
                            600        ; minimum (10 minutes)
                            )
                    NS ns.internal.example.com.

$ORIGIN internal.example.com.
ns      A     192.168.1.1
test    A     192.168.1.2

This server should be set as DNS in our DHCP configuration (or in the client configuration if we don't use DHCP).

Public DNS server

For the public DNS server, we don't need private A records, we just need the configuration necessary to update the public zone, so our named.conf.local file should look like this (it's the exact same configuration as before)

acl local {
  127.0.0.0/8;
  192.168.42.0/24;
};

key "letsencrypt." {
  algorithm hmac-sha512;
  secret "oK6SqKRvGNXHyNyIEy3hijQ1pclreZw4Vn5v+Q4rTLs=";
};

view "internal" {
  match-clients { local; };
  zone "internal.example.com." IN {
    type master;
    file "/var/lib/bind/internal.example.com.zone";
    update-policy {
      grant letsencrypt. zonesub txt;
    };
  };
};

view "public" {
  zone "internal.example.com." IN {
    in-view internal;
  };
};

The zone file should be the following (we only removed the private A record, the rest is the same as before).

$ORIGIN .
$TTL 7200 ; 2 hours
internal.example.com IN SOA ns.internal.example.com. admin.example.com. (
                            2024070301 ; serial
                            3600       ; refresh (1 hour)
                            600        ; retry (10 minutes)
                            86400      ; expire (1 day)
                            600        ; minimum (10 minutes)
                            )
                    NS ns.internal.example.com.

$ORIGIN internal.example.com.
ns      A     1.2.3.4
test    A     192.168.1.2

Testing the configuration

Once the two servers are up and running, and everything is configured as expected, we can test that everything works as expected by trying to perform a DNS query using hosts, dig, etc on our private records and our NS record.

# Trying to resolve our domain from inside our private infra returns the expected IP
~ …
➜ host test.internal.example.com
Using domain server:
Name: 192.168.1.1
Address: 192.168.1.11#53
Aliases:

test.internal.example.com has address 192.168.1.2

# Trying to resolve our domain using a public DNS server (here Google)
# fails since it doesn't exist outside our network
~ …
➜ host test.internal.example.com 8.8.8.8
Using domain server:
Name: 8.8.8.8
Address: 8.8.8.8#53
Aliases:

Host test.internal.example.com not found: 3(NXDOMAIN)

Final words

While this method, including the small adjustments and improvements, is a bit more involved than ignoring the issue and using only HTTP challenges, when the infrastructure is in place it becomes very easy to use and to set-up, and makes for a very clean infrastructure.

It is also the only way to obtain a wildcard certificate *.internal.example.com for example that would allow using a single certificate for all the services inside an infrastructure.

I would argue that a setup of this type is very adapted to homelabs or small businesses that have a private infrastructure, but don't want to go through the trouble of setuping an entire PKI (Private Key Infrastructure).

26 KiB Raw Blame History