Ads blocking with OpenBSD unbound(8)

    

The Internet is full of Ads and Trackers. And a way to avoid those is to simply not reach the stinky servers. This can be partially done using a local DNS resolver.

This article is a reboot of both the 2019 Blocking Ads using unbound on OpenBSD and Storing unbound logs into InfluxDB posts ; hopefully improved.

Introduction

DNS Ads blocking is fairly simple: when you were supposed to make an Internet request to some servers known to host Ads and Trackers, then you just don’t!

This requires you to set up and maintain a smart DNS server. You also have to tell your devices (smartphones, tablets, computers …) to use it. Under the hood, the DNS server tells your devices that the domain names they’re looking for don’t exist.

There are such ready-to-use solutions available. Pi-hole and AdGuard Home are some well-known solutions. uBlock Origin works in another way but uses the same kind of algorithm to protect your privacy: detects Bad resources and not let your go there.

Here, the bad domain names are grabbed using some of the same sources also used by those projects.

Ingredients needed for this recipe:

The DNS server

Looks like unbound(8) came in with OpenBSD 5.2.

Anyway, v1.15.0 is now available stock in OpenBSD 7.1/amd64.

Sourcing Ads and Trackers lists

I’m using a combinaison of sources that are used by Pi-hole, AdGuard Home and uBlock. I write a simple shell script that parses the lists and turn them into a format that unbound(8) will understand:

# cat /home/scripts/unbound-adhosts
#!/bin/sh

PATH="/bin:/sbin:/usr/bin:/usr/sbin"

_tmp="$(mktemp)"        # Temp file to use while parsing
_out="/var/unbound/etc/unbound-adhosts.conf"  # Unbound formatted zone file

# AdGuard Home
function adguardhome {
  # AdGuard DNS filter
  _src="https://adguardteam.github.io/AdGuardSDNSFilter/Filters/filter.txt"
  ftp -MVo - "$_src" | \
  sed -nre 's/^\|\|([a-zA-Z0-9\_\-\.]+)\^$/local-zone: "\1" always_nxdomain/p'

  # AdAway default blocklist
  _src="https://adaway.org/hosts.txt"
  ftp -MVo - "$_src" | \
  awk '/^127.0.0.1 / { print "local-zone: \"" $2 "\" always_nxdomain" }'
}

# From Pi-hole
function stevenblack {
  _src="https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts"
  ftp -MVo - "$_src" | \
  awk '/^0.0.0.0 / { print "local-zone: \"" $2 "\" always_nxdomain" }'
}

# StopForumSpam, toxic domains
function stopforumspam {
  _src="https://www.stopforumspam.com/downloads/toxic_domains_whole.txt"
  ftp -MVo - "$_src" | \
  awk '{ print "local-zone: \"" $1 "\" always_nxdomain" }'
}

# uBlock Origin
function ublockorigin {
  # Malicious Domains Unbound Blocklist
  _src="https://malware-filter.gitlab.io/malware-filter/urlhaus-filter-unbound.conf"
  ftp -MVo - "$_src" | grep '^local-zone: '

  # Peter Lowe's Ad and tracking server list
  _src="https://pgl.yoyo.org/adservers/serverlist.php?showintro=0;hostformat=hosts"
  ftp -MVo - "$_src" | \
  awk '/^127.0.0.1 / { print "local-zone: \"" $2 "\" always_nxdomain" }'

  # AdGuard Fran├žais
  _src="https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/FrenchFilter/sections/adservers.txt"
  ftp -MVo - "$_src" | \
  sed -nre 's/^\|\|([a-zA-Z0-9\_\-\.]+)\^.*$/local-zone: "\1" always_nxdomain/p'
}

# Grab and format the data
adguardhome >> "$_tmp"
stevenblack >> "$_tmp"
stopforumspam >> "$_tmp"
ublockorigin >> "$_tmp"

# Clean entries
sed -re 's/\.\" always/" always/' "$_tmp" | \
egrep -v "\"(t.co)\"" | \
sort -u -o "$_tmp"
chmod 0644 "$_tmp"

# Take action is required
diff -q "$_out" "$_tmp" 1>/dev/null
case $? in
  0)  rm "$_tmp" && exit 0;;
  1)
    mv "$_tmp" "$_out" && \
    doas -u _unbound unbound-checkconf 1>/dev/null && \
    exec doas -u _unbound unbound-control reload 1>/dev/null
    ;;
  *)  echo "$0: something bad happened!"; exit 1;;
esac

exit 0
#EOF

Cron regularly synchronizes the list content with a dedicated unbound(8) zone file:

# crontab -l
(...)
# Update DNS block list
0~5 */6 * * * -s /home/scripts/unbound-adhosts
(...)

The zone file content can now be used by unbound(8).

Configuration

Enable statistics, configure logs, include the Ads/Trackers FQDN zone file:

# cat /var/unbound/etc/unbound.conf
(...)
statistics-cumulative: yes
extended-statistics: yes
(...)
use-syslog: yes
log-queries: no
log-replies: yes
log-local-actions: yes
(...)
include: /var/unbound/etc/unbound-adhosts.conf
(...)

Then apply the new unbound(8) configuration:

# rcctl restart unbound

From now on, each time a client will request DNS resolution for a bad domain, it’ll get an NXDOMAIN and the query will not be processed.

The usage data

The logs and metrics end in InfluxDB so that I can render a pretty dashboard. There’s nothing special to do on the InfluxDB side. Simply create the database(s) and send data to it/those.

Collect the metrics

A shell script parses unbound statistics and write them down into a special InfluxDB measurement:

# cat /home/scripts/collectd-unbound
#!/bin/sh
#
# CollectD Exec unbound(8) stats
# Configure "extended-statistics: yes"
#

PATH="/bin:/sbin:/usr/bin:/usr/sbin"

HOSTNAME="${COLLECTD_HOSTNAME:-$(hostname -s)}"
INTERVAL="${COLLECTD_INTERVAL:-10}"

while sleep "$INTERVAL"; do
  doas -u _unbound unbound-control stats_noreset | \
  egrep -v "^(histogram\.|time\.now|time\.elapsed)" | \
  sed -re "s;^([^=]+)=([0-9\.]+);PUTVAL $HOSTNAME/exec-unbound/gauge-\1 interval=$INTERVAL N:\2;"

  awk -v h=$HOSTNAME -v i=$INTERVAL \
  'END { print "PUTVAL " h "/exec-unbound/gauge-num.adhosts interval=" i " N:" FNR }' \
  /var/unbound/etc/unbound-adhosts.conf
done

exit 0
#EOF

# cat /etc/doas.conf
(...)
permit nopass _collectd as _unbound cmd unbound-control
(...)

# cat /etc/collectd.conf
(...)
<Plugin exec>
  Exec _collectd "/home/scripts/collectd-unbound"
</Plugin>
(...)

# rcctl restart collectd

In InfluxDB, the data will look like this:

> SELECT * FROM "exec_value" WHERE "instance"='unbound' ORDER BY DESC LIMIT 10
name: exec_value
time                           host    instance type  type_instance                 value
----                           ----    -------- ----  -------------                 -----
2022-10-02T17:03:01.66013246Z  openbsd unbound  gauge num.query.authzone.down       0
2022-10-02T17:03:01.660101373Z openbsd unbound  gauge num.query.authzone.up         0
2022-10-02T17:03:01.660069948Z openbsd unbound  gauge key.cache.count               4030
2022-10-02T17:03:01.660033432Z openbsd unbound  gauge infra.cache.count             491
2022-10-02T17:03:01.659930095Z openbsd unbound  gauge rrset.cache.count             37499
2022-10-02T17:03:01.659893329Z openbsd unbound  gauge msg.cache.count               108713
2022-10-02T17:03:01.659857007Z openbsd unbound  gauge unwanted.replies              9
2022-10-02T17:03:01.659820476Z openbsd unbound  gauge unwanted.queries              0
2022-10-02T17:03:01.659784111Z openbsd unbound  gauge num.query.aggressive.NXDOMAIN 882
2022-10-02T17:03:01.659747595Z openbsd unbound  gauge num.query.aggressive.NOERROR  256

Parse the logs

OpenBSD syslogd(8) has a feature that allows sending some logs to an external program. I decided I would write an awk(1) script that you get the logs from syslogd, parse and format them into an InfluxDB proper dataset and use curl(1) to actually save the data.

Authentication is configured on my InfluxDB instance. So curl(1) has to use login/password to be able to store the data. But I noticed that if you use the “–user” flag, then one can see the credentials using ps(1). So I’m using an extra credential file for curl(1).

# cat /home/scripts/unbound-logs2influxdb
#!/usr/bin/awk -f
BEGIN {
  # Build an associative array (_ptr[ip]=hostname) of known DNS clients.
  _fs = FS; FS = "[\"   ]+"          # Dirty hack to parse unbound logs.

  _ptr["127.0.0.1"] = "localhost"
  while (getline < "/var/unbound/etc/unbound-tumfatig.conf") {
    if ($0 ~ /^local-data-ptr:/) {                     # only parse PTR.
      split($3, _fqdn, "\."); _ptr[$2] = _fqdn[1]
    }
  }
  close($0)

  FS = _fs        # Rollback dirty hack.
}
$3 == "unbound:" && $5 == "info:" {      # Only parse unbound info logs.
  if($7 == "static") {                  # Local zone: authoritative DNS.
    split($8, _client, "@")                   # Client format is IP@PORT
    if (_ptr[_client[1]] == "") { _host = "<unknown>" }     # If no PTR.
    else { _host = _ptr[_client[1]] }

    _rec = "unbound_static,host=" $2 ",name=" $9
    _rec = _rec ",type=" $10 ",class=" $11
    _rec = _rec ",clientip=" _client[1]
    _rec = _rec ",client=" _host " matched=1i"
  } else if($7 == "always_nxdomain") {          # Local zone: AD blocks.
    split($8, _client, "@")                   # Client format is IP@PORT
    if (_ptr[_client[1]] == "") { _host = "<unknown>" }     # If no PTR.
    else { _host = _ptr[_client[1]] }

    _rec = "unbound_adblock,host=" $2 ",name=" $6
    _rec = _rec ",type=" $10",class=" $11
    _rec = _rec ",clientip=" _client[1]
    _rec = _rec ",client=" _host " matched=1i"
  } else if(NF == 13) {                    # DNS queries have 13 fields.
    if (_ptr[$6] == "") {
      _host = "<unknown>"                  # Set hostname to '<unknown>'
    } else { _host = _ptr[$6] }         # if no PTR exists in zone file.

    _rec = "unbound_queries,host=" $2 ",name=" $7 ",clientip=" $6
    _rec = _rec ",client=" _host ",type=" $8 ",class=" $9
    _rec = _rec ",return_code=" $10 ",from_cache=" $12
    _rec = _rec " time_to_resolve=" $11 ",response_size=" $13 "i"
  }

  # Build Influxdb protocol line using curl
  _cmd = "/usr/local/bin/curl -s -XPOST "
  _cmd = _cmd "-K /home/scripts/unbound-logs2influxdb.conf "
  _cmd = _cmd "--data-binary \"" _rec "\""

  # Run the curl command = Insert data in InfluxDB
  system(_cmd)
}

# cat /home/scripts/unbound-logs2influxdb.conf
# InfluxDB credentials
url = "https://influxdb_host:8086/write?db=db_name&precision=s"
user = "db_user:db_pass"

The script is run by syslogd(8) and the configuration file contains credentials. So both files require special care regarding permissions and ownership:

# ls -alh /home/scripts/unbound-logs2influxdb*
-rwxr-x---  1 root  _syslogd   1.9K Oct  2 16:04 /home/scripts/unbound-logs2influxdb*
-rw-r-----  1 root  _syslogd   505B Sep 29 00:51 /home/scripts/unbound-logs2influxdb.conf

syslogd(8) has a special configuration to allow unbound(8) logs and only them to be send and parsed by the script:

# cat /etc/syslog.conf
(...)
!!unbound
*.* |/home/scripts/unbound-logs2influxdb
!*
(...)

# rcctl restart syslogd

20221014 UPDATE: I’m running syslogd(8) with the -Z flag for historical reasons. If you don’t, the awk script will have to be modified to match field numbers. Thanks @MattPovey2 for the note.

The parsed logs can now be queried from influxdb:

> SELECT * FROM "unbound_adblock" ORDER BY DESC LIMIT 5
name: unbound_adblock
time                 class client           clientip   host    matched name                      type
----                 ----- ------           --------   ----    ------- ----                      ----
2022-10-02T22:14:24Z IN    ThinkPad-de-Joel 192.0.0.16 unbound 1       s.youtube.com.            A
2022-10-02T22:13:35Z IN    -                192.0.0.12 unbound 1       www.googleadservices.com. HTTPS
2022-10-02T22:13:35Z IN    -                192.0.0.12 unbound 1       www.googleadservices.com. A
2022-10-02T22:13:34Z IN    -                192.0.0.12 unbound 1       s.youtube.com.            HTTPS
2022-10-02T22:13:34Z IN    -                192.0.0.12 unbound 1       s.youtube.com.            A

The dashboard

Doing things is great but checking what you’re doing is better. You could regularly run influxdb commands and even parse results and send emails. But you can also set up a moootiful Web page with Grafana.

Extra - DNS performance

For the most impatients and/or curious, it is possible to benchmark unbound(8) using commonly used domain names. Grab and parse the Top 10 milion domains (based on Open PageRank data) so that they can be used by dnsperf(1).

# pkg_add dnsperf

# ftp https://www.domcop.com/files/top/top10milliondomains.csv.zip
Trying 94.130.193.220...
Requesting https://www.domcop.com/files/top/top10milliondomains.csv.zip
100% |************************************************************|  112 MB  00:09
117800727 bytes received in 9.77 seconds (11.49 MB/s)

# unzip top10milliondomains.csv.zip

# awk -F '[",]' '{ if($5 != "Domain") { print $5 " A" };          \
  if($5 ~/^[a-k]/) { print $5 " MX" }; if(FNR == 100000) exit }'   \
  top10milliondomains.csv > top100k.txt

# dnsperf -s 192.168.0.1 -c 5 -d top100k.txt
Statistics:

  Queries sent:         145937
  Queries completed:    145821 (99.92%)
  Queries lost:         116 (0.08%)

  Response codes:       NOERROR 137226 (94.11%), SERVFAIL 278 (0.19%), NXDOMAIN 8317 (5.70%)
  Average packet size:  request 33, response 79
  Run time (s):         236.612169
  Queries per second:   616.286984

  Average Latency (s):  0.155683 (min 0.000112, max 4.990909)
  Latency StdDev (s):   0.300890

You can see that unbound(8) replies but is a bit out of power. Not all queries were served. And collectd seemed to have difficulty getting some of the stats during such load.

Looking at the logs, warnings popped out:

warning: cannot increase max open fds from 512 to 4152
warning: continuing with less udp ports: 460
warning: increase ulimit or decrease threads, ports in config to remove this warning

This means my unbound configuration is not tuned properly for such a load. In real conditions, I’m way bellow 8 req/s. So it’ll be ok for me.

And that’s all for now!