website/blog/packaging-searx/part1.org
Magic_RB 1646cb84bd
Add SearX blog post
Signed-off-by: Magic_RB <magic_rb@redalder.org>
2023-09-02 15:17:53 +02:00

19 KiB

Packaging Searx - Part 1

In this N part blog post series, I'll show you the exact process of packaging Searx a meta seach engine. Here's an excerpt from Searx's readme to shine a bit of light on what we'll be packaging.

Searx is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled. Additionally, searx can be used over Tor for online anonymity.

So if you're a privacy nerd or want ensure Google doesn't know what you're cooking tonight, read on and you'll learn how Searx works from a system administrator and packager perspective.

Searx is already packaged in nixpkgs, but for the sake of this blog post, let's pretend it isn't. I'll go over all the things I check, verify and all the things I do when packaging. So I'll quit mumbling and start Nix-ing!

Discovery

First it's imperative that we find the upstream repo we'll be working with, it may sound simple enough, and in the case of Searx it luckily is, but it can also be challenging. It all depends on how well-known the project is and how unique the name is. My recommendation is to use a search engine and search for searx git in this case, which gets us https://github.com/searx/searx.

Now that we have a link to the repo, we need to identify the language and in the case of some languages the build system. There are several ways to do this, one is to just look at the root of the repo and look for a few recognizable files. I'll leave an incomplete table below.

files / directories language / build system
Cargo.toml, Cargo.lock Rust - cargo
requirements.txt, setup.py Python 2/3
CMakeFiles.txt C, C++ - cmake
meson.build C, C++ - meson
composer.json, composer.lock PHP - composer
package.json, package-lock.json Node - npm
package.json, yarn.lock Node - yarn
*.cabal, stack.yaml, package.yaml Haskell - stack/cabal

I won't list tools to use when packaging these different languages, because the recommended set changes often and I'd have to keep this blog post up to date :), but it's easy enough to search for them. Generally if you search for <package-manager>2nix.

dream2nix is a new and shiny thing, I personally haven't used it yet and won't use it in this blog post, but do keep it in mind and check whether it's relevant to your project the next time you're packaging.

Looking at the repository we see a requirements.txt and a setup.py, the first one is valuable because we should have a list of all python packages we need and the second we need to keep in mind, since it contains custom arbitrary python that we may need inspect and fix.

  certifi==2022.5.18.1
  babel==2.9.1
  flask-babel==2.0.0
  flask==2.1.1
  jinja2==3.1.2
  lxml==4.9.0
  pygments==2.8.0
  python-dateutil==2.8.2
  pyyaml==6.0
  httpx[http2]==0.23.0
  Brotli==1.0.9
  uvloop==0.16.0; python_version >= '3.7'
  uvloop==0.14.0; python_version < '3.7'
  httpx-socks[asyncio]==0.7.4
  langdetect==1.0.9
  setproctitle==1.2.2

It's also worth looking at the Dockerfile and any Makefile, Justfile, or scripts folder. Here we have a Dockerfile and also a Makefile, lucky! Let's start with the Dockerfile, I'll pick out the important bits only.

  FROM alpine:3.15

A pretty crucial piece of information here, we now know both the distro the container uses so we can descern the environment a bit and that Searx will happily run on musl libc.

  ENTRYPOINT ["/sbin/tini","--","/usr/local/searx/dockerfiles/docker-entrypoint.sh"]

Here we see where we should look for the startup script.

  ENV INSTANCE_NAME=searx \
      AUTOCOMPLETE= \
      BASE_URL= \
      MORTY_KEY= \
      MORTY_URL= \
      SEARX_SETTINGS_PATH=/etc/searx/settings.yml \
      UWSGI_SETTINGS_PATH=/etc/searx/uwsgi.ini

Here we have a incomplete list of arguments we can pass to into the Docker container, it's important to later notice where they're handled, in the scripting or in the actual program itself?

  apk add --no-cache -t build-dependencies \
    build-base \
    py3-setuptools \
    python3-dev \
    libffi-dev \
    libxslt-dev \
    libxml2-dev \
    openssl-dev \
    tar \
    git \

Here we see a list of packages installed with apt, but you (and me actually) may not know what does -t build-dependencies do. It's best to look at the manpage for apk add, so search for apk-add man. According to https://www.mankier.com/8/apk-add -t adds a virtual package with the dependencies listed on the command line and then installs that package. So we have one package build-dependencies containing a set of packages we need at build time.

  apk add --no-cache \
    ca-certificates \
    su-exec \
    python3 \
    py3-pip \
    libxml2 \
    libxslt \
    openssl \
    tini \
    uwsgi \
    uwsgi-python3 \
    brotli \

Next we have a list of packages needed at runtime, this one is really important to remember since we may have to add these in a special way later. You'll see what I mean.

  pip3 install --upgrade pip wheel setuptools \

Then it upgrades pip, wheel, and setuptools. I personally had to look up what wheel is. But looking at Alpine Linux packages yields no results, so let's just ignore it for now. If it doesn't come up later it's not important.

 pip3 install --no-cache -r requirements.txt \

Second to last it installs the packages specied in requirements.txt as expected.

  apk del build-dependencies \
  && rm -rf /root/.cache

And lastly it does some cleanup. Which is interesting, because I expected those dependencies to be used later by some custom searx native component, but I guess it makes sense they're not.

  COPY searx ./searx
  COPY dockerfiles ./dockerfiles

We now see where that startup script comes from.

  RUN /usr/bin/python3 -m compileall -q searx; \
      touch -c --date=@${TIMESTAMP_SETTINGS} searx/settings.yml; \
      touch -c --date=@${TIMESTAMP_UWSGI} dockerfiles/uwsgi.ini; \
      if [ ! -z $VERSION_GITCOMMIT ]; then\
        echo "VERSION_STRING = VERSION_STRING + \"-$VERSION_GITCOMMIT\"" >> /usr/local/searx/searx/version.py; \
      fi; \
      find /usr/local/searx/searx/static -a \( -name '*.html' -o -name '*.css' -o -name '*.js' \
      -o -name '*.svg' -o -name '*.ttf' -o -name '*.eot' \) \
      -type f -exec gzip -9 -k {} \+ -exec brotli --best {} \+

This is a complicated little beast, we see searx/settings.yml dockerfiles/uwsgi.ini and /usr/local/searx/searx/version.py, we also see that it compiles all the python files, but that will be taken care of by nixpkgs. Interestingly it also compresses all the assets with gzip. The find command looks for all files with .html, .css, .js, .svg, .ttf and .eot, then executes gzip -9 -k and brotli --best. (here I had to again search for what's brotli). (it looks to be a compression scheme)

That's all from the Dockerfile. Now we need to look at the script it calls.

docker-entrypoint.sh script

  printf "\nEnvironment variables:\n\n"
  printf "  INSTANCE_NAME settings.yml : general.instance_name\n"
  printf "  AUTOCOMPLETE  settings.yml : search.autocomplete\n"
  printf "  BASE_URL      settings.yml : server.base_url\n"
  printf "  MORTY_URL     settings.yml : result_proxy.url\n"
  printf "  MORTY_KEY     settings.yml : result_proxy.key\n"
  printf "  BIND_ADDRESS  uwsgi bind to the specified TCP socket using HTTP protocol. Default value: \"${DEFAULT_BIND_ADDRESS}\"\n"

That's a nice little rundown of the supported configuration options and also that Searx is configured with settings.yml, this knowledge will come in handy when we're writing the NixOS module for Searx.

  # update settings.yml
  sed -i -e "s|base_url : False|base_url : ${BASE_URL}|g" \
     -e "s/instance_name : \"searx\"/instance_name : \"${INSTANCE_NAME}\"/g" \
     -e "s/autocomplete : \"\"/autocomplete : \"${AUTOCOMPLETE}\"/g" \
     -e "s/ultrasecretkey/$(openssl rand -hex 32)/g" \
     "${CONF}"

This command confirms that in fact we're dealing with a settings.yaml.

  sed -i -e "s/image_proxy : False/image_proxy : True/g" \
              "${CONF}"
  cat >> "${CONF}" <<-EOF

  # Morty configuration
  result_proxy:
     url : ${MORTY_URL}
     key : !!binary "${MORTY_KEY}"
  EOF

This bit is interesting, I initially thought that the script updates the existing config with new values, but the code block above would mean that on every restart a new result_proxy block would be added. Which means that it must take a default config, write your settings in and replace the current one with that.

It's common to realize things like this, it unusual to get all assumptions right initially, but when you go further into the package, you'll naturally stumble upon issues caused by your assumptions. Just make sure you remember what you know and what you assume.

  if [ -f "${CONF}" ]; then
      if [ "${REF_CONF}" -nt "${CONF}" ]; then
          # There is a new version
          if [ $FORCE_CONF_UPDATE -ne 0 ]; then
              # Replace the current configuration
              printf '⚠️  Automaticaly update %s to the new version\n' "${CONF}"
              if [ ! -f "${OLD_CONF}" ]; then
                  printf 'The previous configuration is saved to %s\n' "${OLD_CONF}"
                  mv "${CONF}" "${OLD_CONF}"
              fi
              cp "${REF_CONF}" "${CONF}"
              $PATCH_REF_CONF "${CONF}"
          else
              # Keep the current configuration
              printf '⚠️  Check new version %s to make sure searx is working properly\n' "${NEW_CONF}"
              cp "${REF_CONF}" "${NEW_CONF}"
              $PATCH_REF_CONF "${NEW_CONF}"
          fi
      else
          printf 'Use existing %s\n' "${CONF}"
      fi
  else
      printf 'Create %s\n' "${CONF}"
      cp "${REF_CONF}" "${CONF}"
      $PATCH_REF_CONF "${CONF}"
  fi

When you encounter such an ugly piece of code, you don't need to understand it fully, just the general jist of it is more than enough. At a glance we see that configuration is based on a reference config and patching of it to produce a final config.

  # make sure there are uwsgi settings
  update_conf ${FORCE_CONF_UPDATE} "${UWSGI_SETTINGS_PATH}" "/usr/local/searx/dockerfiles/uwsgi.ini" "patch_uwsgi_settings"

  # make sure there are searx settings
  update_conf "${FORCE_CONF_UPDATE}" "${SEARX_SETTINGS_PATH}" "/usr/local/searx/searx/settings.yml" "patch_searx_settings"

Looking at the call sites, we see both the reference config file paths and the functions used for patching.

  patch_uwsgi_settings() {
      CONF="$1"

      # Nothing
  }

Interestingly the uwsgi config doesn't get patched, so the reference one should be fine in most cases.

  exec su-exec searx:searx uwsgi --master --http-socket "${BIND_ADDRESS}" "${UWSGI_SETTINGS_PATH}"

And finally we see the command used to actually launch Searx.

What is uwsgi

I once again had to look this up. But according to Wikipedia it's similar to CGI if you're familiar with that. If not then, well, it's used to allow webserver's like Nginx to serve arbitrary scripts in arbitrary languages. So client -> Nginx - uwsgi -> Python backend.

Aren't we missing a full webserver?

uWSGI natively speaks HTTP, FastCGI, SCGI and its specific protocol named “uwsgi”

No, uwsgi can serve as a lightweight webserver. So ideally in the NixOS module we'd support all methods, HTTP, CGI, SCGI and uwsgi, but that's something to worry about later.

Packaging

Now that we know all there is to know from the Docker image and related files, we can start writing Nix expressions. First let us create a new repository quickly, we'll first do it as a Flake, it's easier and can be easily ported to nixpkgs if done right.

  git init searx-nix
  {
    inputs.nixpkgs.url = "github:NixOS/nixpkgs";

    outputs =
      {
        self,
        nixpkgs
      }:
      let
        supportedSystems = [ "x86_64-linux" ];
        forAllSystems' = nixpkgs.lib.genAttrs;
        forAllSystems = forAllSystems' supportedSystems;

        pkgsForSystem =
          system:
          import nixpkgs { inherit system; };
      in
        {
          packages = forAllSystems
            (system:
              let
                pkgs = pkgsForSystem system;
              in
                {
                  default = pkgs.callPackage ./searx.nix {};
                }
            );
        };
  }

We then create a tiny flake.nix, the cruft around it is generic and not really important, the important bit is

pkgs.callPackage ./searx.nix {}
, that ensures that our actual package doesn't really care for whether it's in a flake or not.

Looking up nixpkgs python gets us to the nixpkgs manual (the information is both in the official one and ryatm's, but the latter is better since it isn't one huge html page) ryatm's nixpkgs manual.

  { lib, python3 }:

  python3.pkgs.buildPythonApplication rec {
    pname = "luigi";
    version = "2.7.9";

    src = python3.pkgs.fetchPypi {
      inherit pname version;
      sha256 = "035w8gqql36zlan0xjrzz9j4lh9hs0qrsgnbyw07qs7lnkvbdv9x";
    };

    propagatedBuildInputs = with python3.pkgs; [ tornado python-daemon ];

    meta = with lib; {
      ...
    };
  }

As an example we're given a derivation for luigi, I don't know and don't need to know what luigi is. It's important to ignore irrelevant information and not research it to speed up packaging.

Based on the example derivation we can build our own. Instead of python3.pkgs.fetchPypi we're going to use fetchFromGitHub as that's more universal and easier to work with.

  {
    lib,
    python3,
    fetchFromGitHub
  }:
  with lib;
  let
    pname = "searx";
    version = "1.0.0";
  in
  python3.pkgs.buildPythonApplication {
    inherit pname version;

    src = fetchFromGitHub {
      rev = version;
      repo = pname;
      owner = pname;
      # If you update the version, you need to switch back to ~lib.fakeSha256~ and copy the new hash
      sha256 = "sha256-sIJ+QXwUdsRIpg6ffUS3ItQvrFy0kmtI8whaiR7qEz4="; # lib.fakeSha256;
    };

    postPatch = ''
      sed -i 's/==.*$//' requirements.txt
    '';

    # tests try to connect to network
    doCheck = false;

    pythonImportsCheck = [ "searx" ];

    # Since Python is weird, we need to put any dependencies we know of here
    # and not into ~buildInputs~ or ~nativeBuildInputs~ as one might expect.
    # As a starting point, just copy everything from ~requirements.txt~ and
    # hope for the best.
    propagatedBuildInputs = with python3.pkgs;
      [
        certifi
        babel
        flask-babel
        flask
        jinja2
        lxml
        pygments
        python-dateutil
        pyyaml
        # httpx[http2]
        httpx
        brotli
        # uvloop==0.16.0; python_version >= '3.7'
        # uvloop==0.14.0; python_version < '3.7'
        uvloop
        # httpx-socks[asyncio]
        httpx-socks
        langdetect
        setproctitle

        # sometimes the packages in ~requirements.txt~ may not be enough, so if something is missing, just add it
        requests
      ];

    meta = with lib; {
      # You'll fill this in later when upstreaming to nixpkgs
    };
  }

Let me just clarify a few things.

  {
    cmake,
    gnumake,
    gcc
  }:

That pattern works, because Nix has a special builtin which allow one to inspect the arguments of a function, getting a list with all its arguments. calLPackage then uses that list to call the function with your requested packages.

  {
    deps =
      [
        "cmake"
        "gnumake"
        "gcc"
      ];
    fn =
      {
        cmake,
        gnumake,
        gcc
      }:
  }

The above would also work, but we like conciseness.

Lastly, you may ask what's up with the lib.fakeSha256, well, it returns sha256-AAAAAAAAAAAAAAAAAAAAA= (I didn't count the number of A so it's probably wrong), which stands for I don't know yet. The point is that when Nix dowloads the source code and checks the hash, it won't match, therefore it will print out the one you gave it and the one it calculated. You can then replace lib.fakeSha256 with the actual hash.

At this point I looked at the already existing derivation, because I was qurious.

  # tests try to connect to network
  doCheck = false;

  pythonImportsCheck = [ "searx" ];

  postPatch = ''
    sed -i 's/==.*$//' requirements.txt
  '';

The

doCheck = false} is there by experimentation. I didn't know what src_nix{pythonImportsCheck = [ "searx" ]
does, so I looked around, I first went to nixpkgs and clicked on Go to file, searched for python and then went to pkgs/top-level/python-packages.nix. Inspecting the file on line 41 I found the definition of buildPythonApplication.

  buildPythonPackage = makeOverridablePythonPackage (lib.makeOverridable (callPackage ../development/interpreters/python/mk-python-derivation.nix {
    inherit namePrefix;     # We want Python libraries to be named like e.g. "python3.6-${name}"
    inherit toPythonModule; # Libraries provide modules
  }));

This points to a file called mk-python-derivation.nix, so again, Go to file. mk-python-derivation.nix tells us a lot, but still not what pythonImportsCheck does, it's only mentioned as pythonImportsCheckHook, which prompted me to look for said hook. Going to the containing directory and into hooks/python-imports-check-hook.sh we can satiate our curiosity.

Lastly the

postPatch = ''...''
is used to patch out the requirement version constraints, it seems to cause an error at build time.

With all these things, we get a successful build.

In the next blog post we'll start with the NixOS module by first trying to actually get a full launch of Searx. Till then!