Signed-off-by: Magic_RB <magic_rb@redalder.org>
19 KiB
Packaging Searx - Part 1
In this N part blog post series, I'll show you the exact process of packaging Searx a meta seach engine. Here's an excerpt from Searx's readme to shine a bit of light on what we'll be packaging.
Searx is a free internet metasearch engine which aggregates results from more than 70 search services. Users are neither tracked nor profiled. Additionally, searx can be used over Tor for online anonymity.
So if you're a privacy nerd or want ensure Google doesn't know what you're cooking tonight, read on and you'll learn how Searx works from a system administrator and packager perspective.
Searx is already packaged in nixpkgs, but for the sake of this blog post, let's pretend it isn't. I'll go over all the things I check, verify and all the things I do when packaging. So I'll quit mumbling and start Nix-ing!
Discovery
First it's imperative that we find the upstream repo we'll be working with, it may sound simple enough, and in the case of Searx it luckily is, but it can also be challenging. It all depends on how well-known the project is and how unique the name is. My recommendation is to use a search engine and search for searx git
in this case, which gets us https://github.com/searx/searx.
Now that we have a link to the repo, we need to identify the language and in the case of some languages the build system. There are several ways to do this, one is to just look at the root of the repo and look for a few recognizable files. I'll leave an incomplete table below.
files / directories | language / build system |
---|---|
Cargo.toml, Cargo.lock | Rust - cargo |
requirements.txt, setup.py | Python 2/3 |
CMakeFiles.txt | C, C++ - cmake |
meson.build | C, C++ - meson |
composer.json, composer.lock | PHP - composer |
package.json, package-lock.json | Node - npm |
package.json, yarn.lock | Node - yarn |
*.cabal, stack.yaml, package.yaml | Haskell - stack/cabal |
I won't list tools to use when packaging these different languages, because the recommended set changes often and I'd have to keep this blog post up to date :), but it's easy enough to search for them. Generally if you search for <package-manager>2nix
.
dream2nix is a new and shiny thing, I personally haven't used it yet and won't use it in this blog post, but do keep it in mind and check whether it's relevant to your project the next time you're packaging.
Looking at the repository we see a requirements.txt
and a setup.py
, the first one is valuable because we should have a list of all python packages we need and the second we need to keep in mind, since it contains custom arbitrary python that we may need inspect and fix.
certifi==2022.5.18.1
babel==2.9.1
flask-babel==2.0.0
flask==2.1.1
jinja2==3.1.2
lxml==4.9.0
pygments==2.8.0
python-dateutil==2.8.2
pyyaml==6.0
httpx[http2]==0.23.0
Brotli==1.0.9
uvloop==0.16.0; python_version >= '3.7'
uvloop==0.14.0; python_version < '3.7'
httpx-socks[asyncio]==0.7.4
langdetect==1.0.9
setproctitle==1.2.2
It's also worth looking at the Dockerfile
and any Makefile
, Justfile
, or scripts
folder. Here we have a Dockerfile
and also a Makefile
, lucky! Let's start with the Dockerfile
, I'll pick out the important bits only.
FROM alpine:3.15
A pretty crucial piece of information here, we now know both the distro the container uses so we can descern the environment a bit and that Searx will happily run on musl libc.
ENTRYPOINT ["/sbin/tini","--","/usr/local/searx/dockerfiles/docker-entrypoint.sh"]
Here we see where we should look for the startup script.
ENV INSTANCE_NAME=searx \
AUTOCOMPLETE= \
BASE_URL= \
MORTY_KEY= \
MORTY_URL= \
SEARX_SETTINGS_PATH=/etc/searx/settings.yml \
UWSGI_SETTINGS_PATH=/etc/searx/uwsgi.ini
Here we have a incomplete list of arguments we can pass to into the Docker container, it's important to later notice where they're handled, in the scripting or in the actual program itself?
apk add --no-cache -t build-dependencies \
build-base \
py3-setuptools \
python3-dev \
libffi-dev \
libxslt-dev \
libxml2-dev \
openssl-dev \
tar \
git \
Here we see a list of packages installed with apt, but you (and me actually) may not know what does -t build-dependencies
do. It's best to look at the manpage for apk add
, so search for apk-add man
. According to https://www.mankier.com/8/apk-add -t
adds a virtual package with the dependencies listed on the command line and then installs that package. So we have one package build-dependencies
containing a set of packages we need at build time.
apk add --no-cache \
ca-certificates \
su-exec \
python3 \
py3-pip \
libxml2 \
libxslt \
openssl \
tini \
uwsgi \
uwsgi-python3 \
brotli \
Next we have a list of packages needed at runtime, this one is really important to remember since we may have to add these in a special way later. You'll see what I mean.
pip3 install --upgrade pip wheel setuptools \
Then it upgrades pip
, wheel
, and setuptools
. I personally had to look up what wheel
is. But looking at Alpine Linux packages yields no results, so let's just ignore it for now. If it doesn't come up later it's not important.
pip3 install --no-cache -r requirements.txt \
Second to last it installs the packages specied in requirements.txt
as expected.
apk del build-dependencies \
&& rm -rf /root/.cache
And lastly it does some cleanup. Which is interesting, because I expected those dependencies to be used later by some custom searx native component, but I guess it makes sense they're not.
COPY searx ./searx
COPY dockerfiles ./dockerfiles
We now see where that startup script comes from.
RUN /usr/bin/python3 -m compileall -q searx; \
touch -c --date=@${TIMESTAMP_SETTINGS} searx/settings.yml; \
touch -c --date=@${TIMESTAMP_UWSGI} dockerfiles/uwsgi.ini; \
if [ ! -z $VERSION_GITCOMMIT ]; then\
echo "VERSION_STRING = VERSION_STRING + \"-$VERSION_GITCOMMIT\"" >> /usr/local/searx/searx/version.py; \
fi; \
find /usr/local/searx/searx/static -a \( -name '*.html' -o -name '*.css' -o -name '*.js' \
-o -name '*.svg' -o -name '*.ttf' -o -name '*.eot' \) \
-type f -exec gzip -9 -k {} \+ -exec brotli --best {} \+
This is a complicated little beast, we see searx/settings.yml
dockerfiles/uwsgi.ini
and /usr/local/searx/searx/version.py
, we also see that it compiles all the python files, but that will be taken care of by nixpkgs. Interestingly it also compresses all the assets with gzip. The find command looks for all files with .html
, .css
, .js
, .svg
, .ttf
and .eot
, then executes gzip -9 -k
and brotli --best
. (here I had to again search for what's brotli). (it looks to be a compression scheme)
That's all from the Dockerfile. Now we need to look at the script it calls.
docker-entrypoint.sh
script
printf "\nEnvironment variables:\n\n"
printf " INSTANCE_NAME settings.yml : general.instance_name\n"
printf " AUTOCOMPLETE settings.yml : search.autocomplete\n"
printf " BASE_URL settings.yml : server.base_url\n"
printf " MORTY_URL settings.yml : result_proxy.url\n"
printf " MORTY_KEY settings.yml : result_proxy.key\n"
printf " BIND_ADDRESS uwsgi bind to the specified TCP socket using HTTP protocol. Default value: \"${DEFAULT_BIND_ADDRESS}\"\n"
That's a nice little rundown of the supported configuration options and also that Searx is configured with settings.yml
, this knowledge will come in handy when we're writing the NixOS module for Searx.
# update settings.yml
sed -i -e "s|base_url : False|base_url : ${BASE_URL}|g" \
-e "s/instance_name : \"searx\"/instance_name : \"${INSTANCE_NAME}\"/g" \
-e "s/autocomplete : \"\"/autocomplete : \"${AUTOCOMPLETE}\"/g" \
-e "s/ultrasecretkey/$(openssl rand -hex 32)/g" \
"${CONF}"
This command confirms that in fact we're dealing with a settings.yaml.
sed -i -e "s/image_proxy : False/image_proxy : True/g" \
"${CONF}"
cat >> "${CONF}" <<-EOF
# Morty configuration
result_proxy:
url : ${MORTY_URL}
key : !!binary "${MORTY_KEY}"
EOF
This bit is interesting, I initially thought that the script updates the existing config with new values, but the code block above would mean that on every restart a new result_proxy
block would be added. Which means that it must take a default config, write your settings in and replace the current one with that.
It's common to realize things like this, it unusual to get all assumptions right initially, but when you go further into the package, you'll naturally stumble upon issues caused by your assumptions. Just make sure you remember what you know and what you assume.
if [ -f "${CONF}" ]; then
if [ "${REF_CONF}" -nt "${CONF}" ]; then
# There is a new version
if [ $FORCE_CONF_UPDATE -ne 0 ]; then
# Replace the current configuration
printf '⚠️ Automaticaly update %s to the new version\n' "${CONF}"
if [ ! -f "${OLD_CONF}" ]; then
printf 'The previous configuration is saved to %s\n' "${OLD_CONF}"
mv "${CONF}" "${OLD_CONF}"
fi
cp "${REF_CONF}" "${CONF}"
$PATCH_REF_CONF "${CONF}"
else
# Keep the current configuration
printf '⚠️ Check new version %s to make sure searx is working properly\n' "${NEW_CONF}"
cp "${REF_CONF}" "${NEW_CONF}"
$PATCH_REF_CONF "${NEW_CONF}"
fi
else
printf 'Use existing %s\n' "${CONF}"
fi
else
printf 'Create %s\n' "${CONF}"
cp "${REF_CONF}" "${CONF}"
$PATCH_REF_CONF "${CONF}"
fi
When you encounter such an ugly piece of code, you don't need to understand it fully, just the general jist of it is more than enough. At a glance we see that configuration is based on a reference config and patching of it to produce a final config.
# make sure there are uwsgi settings
update_conf ${FORCE_CONF_UPDATE} "${UWSGI_SETTINGS_PATH}" "/usr/local/searx/dockerfiles/uwsgi.ini" "patch_uwsgi_settings"
# make sure there are searx settings
update_conf "${FORCE_CONF_UPDATE}" "${SEARX_SETTINGS_PATH}" "/usr/local/searx/searx/settings.yml" "patch_searx_settings"
Looking at the call sites, we see both the reference config file paths and the functions used for patching.
patch_uwsgi_settings() {
CONF="$1"
# Nothing
}
Interestingly the uwsgi
config doesn't get patched, so the reference one should be fine in most cases.
exec su-exec searx:searx uwsgi --master --http-socket "${BIND_ADDRESS}" "${UWSGI_SETTINGS_PATH}"
And finally we see the command used to actually launch Searx.
What is uwsgi
I once again had to look this up. But according to Wikipedia it's similar to CGI if you're familiar with that. If not then, well, it's used to allow webserver's like Nginx to serve arbitrary scripts in arbitrary languages. So client -> Nginx - uwsgi -> Python backend
.
Aren't we missing a full webserver?
uWSGI natively speaks HTTP, FastCGI, SCGI and its specific protocol named “uwsgi”
No, uwsgi can serve as a lightweight webserver. So ideally in the NixOS module we'd support all methods, HTTP, CGI, SCGI and uwsgi, but that's something to worry about later.
Packaging
Now that we know all there is to know from the Docker image and related files, we can start writing Nix expressions. First let us create a new repository quickly, we'll first do it as a Flake, it's easier and can be easily ported to nixpkgs if done right.
git init searx-nix
{
inputs.nixpkgs.url = "github:NixOS/nixpkgs";
outputs =
{
self,
nixpkgs
}:
let
supportedSystems = [ "x86_64-linux" ];
forAllSystems' = nixpkgs.lib.genAttrs;
forAllSystems = forAllSystems' supportedSystems;
pkgsForSystem =
system:
import nixpkgs { inherit system; };
in
{
packages = forAllSystems
(system:
let
pkgs = pkgsForSystem system;
in
{
default = pkgs.callPackage ./searx.nix {};
}
);
};
}
We then create a tiny flake.nix
, the cruft around it is generic and not really important, the important bit is
pkgs.callPackage ./searx.nix {}
Looking up nixpkgs python
gets us to the nixpkgs manual (the information is both in the official one and ryatm's, but the latter is better since it isn't one huge html page) ryatm's nixpkgs manual.
{ lib, python3 }:
python3.pkgs.buildPythonApplication rec {
pname = "luigi";
version = "2.7.9";
src = python3.pkgs.fetchPypi {
inherit pname version;
sha256 = "035w8gqql36zlan0xjrzz9j4lh9hs0qrsgnbyw07qs7lnkvbdv9x";
};
propagatedBuildInputs = with python3.pkgs; [ tornado python-daemon ];
meta = with lib; {
...
};
}
As an example we're given a derivation for luigi, I don't know and don't need to know what luigi is. It's important to ignore irrelevant information and not research it to speed up packaging.
Based on the example derivation we can build our own. Instead of python3.pkgs.fetchPypi
we're going to use fetchFromGitHub
as that's more universal and easier to work with.
{
lib,
python3,
fetchFromGitHub
}:
with lib;
let
pname = "searx";
version = "1.0.0";
in
python3.pkgs.buildPythonApplication {
inherit pname version;
src = fetchFromGitHub {
rev = version;
repo = pname;
owner = pname;
# If you update the version, you need to switch back to ~lib.fakeSha256~ and copy the new hash
sha256 = "sha256-sIJ+QXwUdsRIpg6ffUS3ItQvrFy0kmtI8whaiR7qEz4="; # lib.fakeSha256;
};
postPatch = ''
sed -i 's/==.*$//' requirements.txt
'';
# tests try to connect to network
doCheck = false;
pythonImportsCheck = [ "searx" ];
# Since Python is weird, we need to put any dependencies we know of here
# and not into ~buildInputs~ or ~nativeBuildInputs~ as one might expect.
# As a starting point, just copy everything from ~requirements.txt~ and
# hope for the best.
propagatedBuildInputs = with python3.pkgs;
[
certifi
babel
flask-babel
flask
jinja2
lxml
pygments
python-dateutil
pyyaml
# httpx[http2]
httpx
brotli
# uvloop==0.16.0; python_version >= '3.7'
# uvloop==0.14.0; python_version < '3.7'
uvloop
# httpx-socks[asyncio]
httpx-socks
langdetect
setproctitle
# sometimes the packages in ~requirements.txt~ may not be enough, so if something is missing, just add it
requests
];
meta = with lib; {
# You'll fill this in later when upstreaming to nixpkgs
};
}
Let me just clarify a few things.
{
cmake,
gnumake,
gcc
}:
That pattern works, because Nix has a special builtin which allow one to inspect the arguments of a function, getting a list with all its arguments. calLPackage
then uses that list to call the function with your requested packages.
{
deps =
[
"cmake"
"gnumake"
"gcc"
];
fn =
{
cmake,
gnumake,
gcc
}:
}
The above would also work, but we like conciseness.
Lastly, you may ask what's up with the lib.fakeSha256
, well, it returns sha256-AAAAAAAAAAAAAAAAAAAAA=
(I didn't count the number of A
so it's probably wrong), which stands for I don't know yet. The point is that when Nix dowloads the source code and checks the hash, it won't match, therefore it will print out the one you gave it and the one it calculated. You can then replace lib.fakeSha256
with the actual hash.
At this point I looked at the already existing derivation, because I was qurious.
# tests try to connect to network
doCheck = false;
pythonImportsCheck = [ "searx" ];
postPatch = ''
sed -i 's/==.*$//' requirements.txt
'';
The
doCheck = false} is there by experimentation. I didn't know what src_nix{pythonImportsCheck = [ "searx" ]
Go to file
, searched for python
and then went to pkgs/top-level/python-packages.nix
. Inspecting the file on line 41 I found the definition of buildPythonApplication
.
buildPythonPackage = makeOverridablePythonPackage (lib.makeOverridable (callPackage ../development/interpreters/python/mk-python-derivation.nix {
inherit namePrefix; # We want Python libraries to be named like e.g. "python3.6-${name}"
inherit toPythonModule; # Libraries provide modules
}));
This points to a file called mk-python-derivation.nix
, so again, Go to file
. mk-python-derivation.nix tells us a lot, but still not what pythonImportsCheck
does, it's only mentioned as pythonImportsCheckHook
, which prompted me to look for said hook. Going to the containing directory and into hooks/python-imports-check-hook.sh
we can satiate our curiosity.
Lastly the
postPatch = ''...''
With all these things, we get a successful build.
In the next blog post we'll start with the NixOS module by first trying to actually get a full launch of Searx. Till then!