rpz

Response policy zone (RPZ) file generator.
git clone https://git.philomathiclife.com/repos/rpz
Log | Files | Refs | README

README.md (12385B)


      1 # `rpz`
      2 
      3 [<img alt="git" src="https://git.philomathiclife.com/badges/rpz.svg" height="20">](https://git.philomathiclife.com/rpz/log.html)
      4 [<img alt="crates.io" src="https://img.shields.io/crates/v/rpz.svg?style=for-the-badge&color=fc8d62&logo=rust" height="20">](https://crates.io/crates/rpz)
      5 [<img alt="docs.rs" src="https://img.shields.io/badge/docs.rs-rpz-66c2a5?style=for-the-badge&labelColor=555555&logo=docs.rs" height="20">](https://docs.rs/rpz/latest/rpz/)
      6 
      7 `rpz` consists of a binary crate and [library crate](https://docs.rs/rpz/latest/rpz).
      8 The binary crate, `rpz`, is an application that downloads, parses, and transforms ad-(un)block files from
      9 URLs and local file paths into a [response policy zone (RPZ)](https://en.wikipedia.org/wiki/Response_policy_zone)
     10 file. This RPZ file can be consumed by a DNS server that supports such files
     11 (e.g., [Unbound](https://nlnetlabs.nl/projects/unbound/about/)).
     12 
     13 ## rpz in action
     14 
     15 In this example it is assumed [`unbound.conf(5)`](https://unbound.docs.nlnetlabs.nl/en/latest/manpages/unbound.conf.html) is properly configured
     16 and has `name` and `zonefile` in the `rpz` section set to `.` and `/var/unbound/db/rpz` respectively in addition to `control-enable` set to `true`
     17 in the `remote-control` section.
     18 
     19 ```bash
     20 [zack@laptop ~]$ cat<<EOF>/usr/local/etc/rpz/config
     21 > timeout = 15
     22 > rpz = "/var/unbound/db/rpz"
     23 > local_dir = "/usr/local/etc/rpz/"
     24 > adblock = [
     25       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers.txt",
     26       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/BaseFilter/sections/adservers_firstparty.txt",
     27       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/MobileFilter/sections/adservers.txt",
     28       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/mobile.txt",
     29       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers.txt",
     30       "https://raw.githubusercontent.com/AdguardTeam/AdguardFilters/master/SpywareFilter/sections/tracking_servers_firstparty.txt",
     31       "https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_adservers.txt",
     32       "https://raw.githubusercontent.com/easylist/easylist/master/easylist/easylist_thirdparty.txt",
     33       "https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_thirdparty.txt",
     34       "https://raw.githubusercontent.com/easylist/easylist/master/easyprivacy/easyprivacy_trackingservers.txt",
     35       "https://malware-filter.gitlab.io/malware-filter/urlhaus-filter-agh.txt"
     36     ]
     37 domain = ["https://www.stopforumspam.com/downloads/toxic_domains_whole.txt"]
     38 hosts = ["https://raw.githubusercontent.com/AdAway/adaway.github.io/master/hosts.txt", "https://raw.githubusercontent.com/StevenBlack/hosts/master/hosts"]
     39 wildcard = ["https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblock&showintro=0&mimetype=plaintext"]
     40 > EOF
     41 [zack@laptop ~]$ cat /usr/local/etc/rpz/unblock/domain/unbound
     42 dpm.demdex.net # ESPN app on PS5 needs this.
     43 [zack@laptop ~]$ rpz -f /usr/local/etc/rpz/config
     44 unblock count written: 1
     45 block count written: 271559
     46 total lines written: 271560
     47 domains parsed: 254147
     48 comments parsed: 6629
     49 blanks parsed: 4519
     50 parsing errors: 24624
     51 [zack@laptop ~]$ head -1 /var/unbound/db/rpz
     52 dpm.demdex.net CNAME rpz-passthru.
     53 [zack@laptop ~]$ tail -6 /var/unbound/db/rpz
     54 stats.zone-telechargement CNAME .
     55 *.stats.zone-telechargement CNAME .
     56 5wh.co.zw CNAME .
     57 www.5wh.co.zw CNAME .
     58 pandi.co.zw CNAME .
     59 www.pandi.co.zw CNAME .
     60 [zack@laptop ~]$ unbound-control -q auth_zone_reload . && unbound-control -q flush_zone . && unbound-control -q flush_negative
     61 ```
     62 
     63 ## Ad-(un)block file format and encoding
     64 
     65 All ad-(un)block files must be valid UTF-8; however for a given domain, each label must only contain 1–63 Unicode scalar values from the set:
     66 `!`, `$`, `&`, `'`, `(`, `)`, `+`, `,`, `-`, `0`–`9`, `;`, `=`, `_`, `` ` ``, `A`–`Z`, `a`–`z`, `{`, `}`, and `~`. Labels must be delimited
     67 by `.`. Domains in the file must be delimited by a line feed or carriage return and line feed. A domain must be less than 254 characters in length
     68 including the `.` label separator. Domains are treated as case-insensitive with uppercase letters treated as lowercase. Domains must not be an
     69 IPv4 address.
     70 
     71 ### Adblock-style
     72 
     73 Domain constructed from an [Adblock-style rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#adblock-style-syntax)
     74 with the requirement that the rule conforms to the following extended regex:
     75 
     76 `^<ws>*(\|\|)?<ws>*<domain><ws>*\^?<ws>*$`
     77 
     78 where `<domain>` conforms to a valid [`Domain`](https://docs.rs/ascii_domain/latest/ascii_domain/dom/struct.Domain.html) based on
     79 [`ASCII_FIREFOX`](https://docs.rs/ascii_domain/latest/ascii_domain/char_set/constant.ASCII_FIREFOX.html) with the added requirements
     80 that the TLD is either all letters or at least length five and begins with `xn--` and does not contain `$`, and `<ws>` is any sequence of [ASCII whitespace](https://infra.spec.whatwg.org/#ascii-whitespace).
     81 
     82 Lines that begin with `||` cause all subdomains to be blocked (i.e., the domain itself and all proper subdomains); without
     83 `||`, only the specific domain is blocked.
     84 
     85 Due to the conservative nature in how these files are processed, one is encouraged to still use an application-level
     86 ad blocker (e.g., [uBlock Origin](https://ublockorigin.com/)). Adblock-style files often contain paths as well as
     87 additional information (e.g., “third-party”) that require application-level information to process correctly as such
     88 entries will be considered “parsing errors” by `rpz`.
     89 
     90 ### Domain-style
     91 
     92 Domain constructed from a [domains-only rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#domains-only-syntax)
     93 with the requirement that the rule conforms to the following regex:
     94 
     95 `^<ws>*<domain><ws>*(#.*)?$`
     96 
     97 where `<domain>` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, and `<ws>` is any sequence of ASCII whitespace.
     98 
     99 Domains only represent themselves (i.e., proper subdomains will not be blocked).
    100 
    101 ### Hosts-style
    102 
    103 Domain constructed from a [`hosts(5)`-style rule](https://adguard-dns.io/kb/general/dns-filtering-syntax/#etc-hosts-syntax)
    104 with the requirement that the rule conforms to the following extended regex:
    105 
    106 `^<ws>*<ip><ws>+<domain><ws>*(#.*)?$`
    107 
    108 where `<domain>` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, `<ws>` is any sequence of ASCII whitespace, and `<ip>` is one of the following:
    109 
    110 `::`, `::1`, `0.0.0.0`, or `127.0.0.1`.
    111 
    112 Domains only represent themselves (i.e., proper subdomains will not be blocked).
    113 
    114 ### Wildcard-style
    115 
    116 Domain constructed from a [wildcard domain rule](https://pgl.yoyo.org/adservers/serverlist.php?hostformat=adblock&showintro=0&mimetype=plaintext)
    117 with the requirement that the rule conforms to the following extended regex:
    118 
    119 `^<ws>*(\*\.)?<domain><ws>*(#.*)?$`
    120 
    121 where `<domain>` conforms to a valid `Domain` based on `ASCII_FIREFOX`, the TLD is either all letters or at least length five and begins with `xn--`, and `<ws>` is any sequence of ASCII whitespace.
    122 
    123 If `domain` begins with `*.`, then `domain` must have length less than 252 and all proper subdomains are blocked—this
    124 does _not_ include the domain itself; otherwise, only the `domain` is blocked.
    125 
    126 ## Config file
    127 
    128 Either `-` or the absolute path to the TOML config file must be passed via the `-f`/`--file` CLI option. If `-` is passed, then `stdin` will be read. The
    129 format of this file must conform to the following:
    130 
    131 ```bash
    132 timeout = <timeout_in_seconds>
    133 rpz = <absolute_file_path_to_the_RPZ_file_to_be_written>
    134 local_dir = <absolute_file_path_to_the_directory_containing_local_files>
    135 adblock = [<HTTP(S)_URLs>]
    136 domain = [<HTTP(S)_URLs>]
    137 hosts = [<HTTP(S)_URLs>]
    138 wildcard = [<HTTP(S)_URLs>]
    139 ```
    140 
    141 If `rpz` does not exist, then the file will be written to `stdout`. If `local_dir` is specified, `block/` and `unblock/` subdirectories are searched; and for each of those subdirectories,
    142 `adblock/`, `domain/`, `hosts/`, and `wildcard/` subdirectories are searched for files which are parsed according to the directory they are in. It is not
    143 an error if any of the directories do not exist.
    144 
    145 In the event keys are specified corresponding to arrays, URLs must be unique across all arrays. The files these URLs
    146 point to are interpreted as block files (i.e., unblock files are only allowed on the local file system).
    147 
    148 The `timeout` corresponds to the maximum _seconds_ allowed for an HTTP(S) file to be downloaded.
    149 If it does not exist or has a value of 0, then a timeout of one hour will be used. If the value specified exceeds one hour,
    150 then it will be truncated to one hour.
    151 
    152 ## RPZ file
    153 
    154 Unless `stdout` is the destination, a temporary RPZ file is written in the same location as the `rpz` value in the config file except with `tmp` appended to the name. Upon success, this file
    155 is renamed to the `rpz` value in the config file. The contents of this file contain the minimum number of lines possible with unblock entries taking precedence
    156 over block entries.
    157 
    158 In the event there are no block entries or the temp file already exists, the program will abort.
    159 
    160 ## Options
    161 
    162 When `rpz` is passed `-V`/`--version`, the version of `rpz` will be printed to `stdout`. When passed `-h`/`--help`,
    163 information about the program and its options will be printed to `stdout`. When passed `-f`/`--file` along with
    164 `-` or the absolute path to the TOML config file, `rpz` will run normally printing summary information to `stdout`
    165 upon completion. One can additionally pass `-q`/`--quiet` along with `-f`/`--file` in order to suppress summary
    166 information from being printed to `stdout`. When `-v`/`--verbose` is passed along with `-f`/`--file`, in addition to
    167 the normal summary information being printed to `stdout`, itemized summary information for each input file
    168 including the kinds of errors and counts of errors will be printed to `stdout`.
    169 
    170 ### Example
    171 
    172 If `www.example.com`, `*.example.com`, and `foo.com` are to be blocked while `foo.example.com` and `||foo.com` are to be unblocked, the RPZ file would look like the following:
    173 
    174 ```bash
    175 foo.example.com CNAME rpz-passthru.
    176 *.example.com CNAME .
    177 ```
    178 
    179 Upon success, the quantity of unblock, block, and total lines written is written to `stdout` in addition
    180 to the total number of domains, comments, blanks, and parsing errors.
    181 
    182 ## Errors
    183 
    184 Parsing errors are ignored; all other errors are written to `stderr` before program abortion.
    185 
    186 ## Minimum Supported Rust Version (MSRV)
    187 
    188 This will frequently be updated to be the same as stable. Specifically, any time stable is updated and that
    189 update has "useful" features or compilation no longer succeeds (e.g., due to new compiler lints), then MSRV
    190 will be updated.
    191 
    192 MSRV changes will correspond to a SemVer minor version bump.
    193 
    194 ## SemVer Policy
    195 
    196 * All on-by-default features of this library are covered by SemVer
    197 * MSRV is considered exempt from SemVer as noted above
    198 
    199 ## License
    200 
    201 Licensed under either of
    202 
    203 * Apache License, Version 2.0 ([LICENSE-APACHE](https://www.apache.org/licenses/LICENSE-2.0))
    204 * MIT license ([LICENSE-MIT](https://opensource.org/licenses/MIT))
    205 
    206 at your option.
    207 
    208 ## Contribution
    209 
    210 Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you,
    211 as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
    212 
    213 Before any PR is sent, `cargo clippy` and `cargo t` should be run for both `--no-default-features` and
    214 `--all-features`. Additionally `RUSTDOCFLAGS="--cfg docsrs" cargo +nightly doc --all-features` should be run to
    215 ensure documentation can be built.
    216 
    217 ### Status
    218 
    219 This package is actively maintained.
    220 
    221 The crates are only tested on the `x86_64-unknown-linux-gnu` and `x86_64-unknown-openbsd` targets, but
    222 they should work on platform.
    223 
    224 Nightly `rustc` is required. Once `BTreeMap` [cursors are stabilized](https://github.com/rust-lang/rust/issues/107540), stable `rustc` will work.
    225 On OpenBSD-stable, one can use the `rust` port as long as `RUSTC_BOOTSTRAP` is `export`ed with a value of `1` before invoking
    226 `cargo build --all-features --release` or `cargo install --all-features rpz`.