rpz

Response policy zone (RPZ) file generator.
git clone https://git.philomathiclife.com/repos/rpz
Log | Files | Refs | README

README.md (11229B)


      1 # rpz
      2 
      3 `rpz` consists of a binary crate and [library crate](https://docs.rs/rpz/latest/rpz).
      4 The binary crate, `rpz`, is an application that downloads, parses, and transforms ad-(un)block files from
      5 URLs and local file paths into a [response policy zone (RPZ)](https://en.wikipedia.org/wiki/Response_policy_zone)
      6 file. This RPZ file can be consumed by a DNS server that supports such files
      7 (e.g., [Unbound](https://nlnetlabs.nl/projects/unbound/about/)).
      8 
      9 ## rpz in action
     10 
     11 In this example it is assumed [`unbound.conf(5)`](https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html) is properly configured
     12 and has `name` and `zonefile` in the `rpz` section set to `.` and `/var/unbound/db/rpz` respectively in addition to `control-enable` set to `true`
     13 in the `remote-control` section.
     14 
     15 ```bash
     16 [zack@laptop ~]$ cat<<EOF>/usr/local/etc/rpz/config
     17 > timeout = 15
     18 > rpz = "/var/unbound/db/rpz"
     19 > local_dir = "/usr/local/etc/rpz/"
     20 > adblock = [
     21       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers.txt",
     22       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers_firstparty.txt",
     23       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt",
     24       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/mobile.txt",
     25       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers.txt",
     26       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers_firstparty.txt",
     27       "https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_adservers.txt",
     28       "https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_thirdparty.txt",
     29       "https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_thirdparty.txt",
     30       "https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_trackingservers.txt",
     31       "https://malware-filter.gitlab.io/malware-filter/urlhaus-filter-agh.txt"
     32     ]
     33 domain = ["https://www.stopforumspam.com/downloads/toxic_domains_whole.txt"]
     34 hosts = ["https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt", "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts"]
     35 wildcard = ["https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblock&showintro=0&mimetype=plaintext"]
     36 > EOF
     37 [zack@laptop ~]$ cat /usr/local/etc/rpz/unblock/domain/unbound
     38 dpm.demdex.net # ESPN app on PS5 needs this.
     39 [zack@laptop ~]$ rpz -f /usr/local/etc/rpz/config
     40 unblock count written: 1
     41 block count written: 271559
     42 total lines written: 271560
     43 domains parsed: 254147
     44 comments parsed: 6629
     45 blanks parsed: 4519
     46 parsing errors: 24624
     47 [zack@laptop ~]$ head -1 /var/unbound/db/rpz
     48 dpm.demdex.net CNAME rpz-passthru.
     49 [zack@laptop ~]$ tail -6 /var/unbound/db/rpz
     50 stats.zone-telechargement CNAME .
     51 *.stats.zone-telechargement CNAME .
     52 5wh.co.zw CNAME .
     53 www.5wh.co.zw CNAME .
     54 pandi.co.zw CNAME .
     55 www.pandi.co.zw CNAME .
     56 [zack@laptop ~]$ unbound-control -q auth_zone_reload . && unbound-control -q flush_zone . && unbound-control -q flush_negative
     57 ```
     58 
     59 ## Ad-(un)block file format and encoding
     60 
     61 All ad-(un)block files must be valid UTF-8; however for a given domain, each label must only contain 1–63 Unicode scalar values from the set:
     62 `!`, `$`, `&`, `'`, `(`, `)`, `+`, `,`, `-`, `0`–`9`, `;`, `=`, `_`, `` ` ``, `A`–`Z`, `a`–`z`, `{`, `}`, and `~`. Labels must be delimited
     63 by `.`. Domains in the file must be delimited by a line feed or carriage return and line feed. A domain must be less than 254 characters in length
     64 including the `.` label separator. Domains are treated as case-insensitive with uppercase letters treated as lowercase. Domains must not be an
     65 IPv4 address.
     66 
     67 ### Adblock-style
     68 
     69 Domain constructed from an [Adblock-style rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#adblock-style-syntax)
     70 with the requirement that the rule conforms to the following extended regex:
     71 
     72 `^<ws>*(\|\|)?<ws>*<domain><ws>*\^?<ws>*$`
     73 
     74 where `<domain>` conforms to a valid [`Domain`](https://docs.rs/ascii_domain/latest/ascii_domain/dom/struct.Domain.html) based on
     75 [`ASCII_FIREFOX`](https://docs.rs/ascii_domain/latest/ascii_domain/char_set/constant.ASCII_FIREFOX.html) with the added requirements
     76 that the TLD is either all letters or at least length five and begins with `xn--` and does not contain `$`, and `<ws>` is any sequence of [ASCII whitespace](https://infra.spec.whatwg.org/#ascii-whitespace).
     77 
     78 Lines that begin with `||` cause all subdomains to be blocked (i.e., the domain itself and all proper subdomains); without
     79 `||`, only the specific domain is blocked.
     80 
     81 Due to the conservative nature in how these files are processed, one is encouraged to still use an application-level
     82 ad blocker (e.g., [uBlock Origin](https://ublockorigin.com/)). Adblock-style files often contain paths as well as
     83 additional information (e.g., “third-party”) that require application-level information to process correctly as such
     84 entries will be considered “parsing errors” by `rpz`.
     85 
     86 ### Domain-style
     87 
     88 Domain constructed from a [domains-only rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#domains-only-syntax)
     89 with the requirement that the rule conforms to the following regex:
     90 
     91 `^<ws>*<domain><ws>*(#.*)?$`
     92 
     93 where `<domain>` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, and `<ws>` is any sequence of ASCII whitespace.
     94 
     95 Domains only represent themselves (i.e., proper subdomains will not be blocked).
     96 
     97 ### Hosts-style
     98 
     99 Domain constructed from a [`hosts(5)`-style rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#etc-hosts-syntax)
    100 with the requirement that the rule conforms to the following extended regex:
    101 
    102 `^<ws>*<ip><ws>+<domain><ws>*(#.*)?$`
    103 
    104 where `<domain>` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, `<ws>` is any sequence of ASCII whitespace, and `<ip>` is one of the following:
    105 
    106 `::`, `::1`, `0.0.0.0`, or `127.0.0.1`.
    107 
    108 Domains only represent themselves (i.e., proper subdomains will not be blocked).
    109 
    110 ### Wildcard-style
    111 
    112 Domain constructed from a [wildcard domain rule](https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblock&showintro=0&mimetype=plaintext)
    113 with the requirement that the rule conforms to the following extended regex:
    114 
    115 `^<ws>*(\*\.)?<domain><ws>*(#.*)?$`
    116 
    117 where `<domain>` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, and `<ws>` is any sequence of ASCII whitespace.
    118 
    119 If `domain` begins with `*.`, then `domain` must have length less than 252 and all proper subdomains are blocked—this
    120 does _not_ include the domain itself; otherwise, only the `domain` is blocked.
    121 
    122 ## Config file
    123 
    124 Either `-` or the absolute path to the TOML config file must be passed via the `-f`/`--file` CLI option. If `-` is passed, then `stdin` will be read. The
    125 format of this file must conform to the following:
    126 
    127 ```bash
    128 timeout = <timeout_in_seconds>
    129 rpz = <absolute_file_path_to_the_RPZ_file_to_be_written>
    130 local_dir = <absolute_file_path_to_the_directory_containing_local_files>
    131 adblock = [<HTTP(S)_URLs>]
    132 domain = [<HTTP(S)_URLs>]
    133 hosts = [<HTTP(S)_URLs>]
    134 wildcard = [<HTTP(S)_URLs>]
    135 ```
    136 
    137 If `rpz` does not exist, then the file will be written to `stdout`. If `local_dir` is specified, `block/` and `unblock/` subdirectories are searched; and for each of those subdirectories,
    138 `adblock/`, `domain/`, `hosts/`, and `wildcard/` subdirectories are searched for files which are parsed according to the directory they are in. It is not
    139 an error if any of the directories do not exist.
    140 
    141 In the event keys are specified corresponding to arrays, URLs must be unique across all arrays. The files these URLs
    142 point to are interpreted as block files (i.e., unblock files are only allowed on the local file system).
    143 
    144 The `timeout` corresponds to the maximum _seconds_ allowed for an HTTP(S) file to be downloaded.
    145 If it does not exist or has a value of 0, then a timeout of one hour will be used. If the value specified exceeds one hour,
    146 then it will be truncated to one hour.
    147 
    148 ## RPZ file
    149 
    150 Unless `stdout` is the destination, a temporary RPZ file is written in the same location as the `rpz` value in the config file except with `tmp` appended to the name. Upon success, this file
    151 is renamed to the `rpz` value in the config file. The contents of this file contain the minimum number of lines possible with unblock entries taking precedence
    152 over block entries.
    153 
    154 In the event there are no block entries or the temp file already exists, the program will abort.
    155 
    156 ## Options
    157 
    158 When `rpz` is passed `-V`/`--version`, the version of `rpz` will be printed to `stdout`. When passed `-h`/`--help`,
    159 information about the program and its options will be printed to `stdout`. When passed `-f`/`--file` along with
    160 `-` or the absolute path to the TOML config file, `rpz` will run normally printing summary information to `stdout`
    161 upon completion. One can additionally pass `-q`/`--quiet` along with `-f`/`--file` in order to suppress summary
    162 information from being printed to `stdout`. When `-v`/`--verbose` is passed along with `-f`/`--file`, in addition to
    163 the normal summary information being printed to `stdout`, itemized summary information for each input file
    164 including the kinds of errors and counts of errors will be printed to `stdout`.
    165 
    166 ### Example
    167 
    168 If `www.example.com`, `*.example.com`, and `foo.com` are to be blocked while `foo.example.com` and `||foo.com` are to be unblocked, the RPZ file would look like the following:
    169 
    170 ```bash
    171 foo.example.com CNAME rpz-passthru.
    172 *.example.com CNAME .
    173 ```
    174 
    175 Upon success, the quantity of unblock, block, and total lines written is written to `stdout` in addition
    176 to the total number of domains, comments, blanks, and parsing errors.
    177 
    178 ## Errors
    179 
    180 Parsing errors are ignored; all other errors are written to `stderr` before program abortion.
    181 
    182 ## License
    183 
    184 Licensed under either of
    185 
    186 * Apache License, Version 2.0 ([LICENSE-APACHE](LICENSE-APACHE) or http://www.apache.org/licenses/LICENSE-2.0).
    187 * MIT license ([LICENSE-MIT](LICENSE-MIT) or http://opensource.org/licenses/MIT).
    188 
    189 at your option.
    190 
    191 ## Contribution
    192 
    193 Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you,
    194 as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
    195 
    196 ### Status
    197 
    198 This package is actively maintained.
    199 
    200 The crates are only tested on the `x86_64-unknown-linux-gnu` and `x86_64-unknown-openbsd` targets, but
    201 they should work on platform.
    202 
    203 Nightly `rustc` is required. Once `BTreeMap` [cursors are stabilized](https://github.com/rust-lang/rust/issues/107540), stable `rustc` will work.
    204 On OpenBSD-stable, one can use the `rust` port as long as `RUSTC_BOOTSTRAP` is `export`ed with a value of `1` before invoking
    205 `cargo build --all-features --release` or `cargo install --all-features rpz`.