There is a race condition
in the new paralleized OCR code.
The race condition got "active" in commit
819d304a39 (Use futures for OCR parallelization),
however, the underlying bug already slipped in with commit
e6ea13f4ea (User proper `Path` instead of `str` in OCR code).
The OCR module applies tesseract to at most three variants
of the screenshot: the original one, and two variants that
are created by a preprocessing step (with ImageMagick).
The preprocessing step needs an output filename
that is used to write the preprocessed image file.
The "Path" commit broke the way the output file is named:
The code still attempts to append a ".negative" to *one*
of the preprocessed output files, but the method
`.with_suffix` is not suitable for that purpose:
Lateron, ".png" is also added with `.with_suffix`,
*replacing* the ".negative" and thereby yielding the
*the same* output filename for both preprocessed files.
Without parallelization, this doesn't hurt;
preprocessed files are simply created and analyzed in order.
But the parallelization commit
causes that these two tasks now run in parallel
(plus the third task that analyses the original screensshot,
but that does not cause any further harm here):
* Task 1: preprocess (non-negative), then tesseract the output
* Task 2: preprocess (negative), then tesseract the output
Both tasks use the same filename and thus the same file for the
preprocessed image that is generated, then used by tesseract.
This often creates a garbage file since both
preprocessings write that one file at the same time.
Tesseract consequently fails and
complains about bad data in its input file.
The commit at hand simply fixes the file naming
by adding ".negative.png" or ".positive.png"
to the filename for the preprocessed image.
This ensures both threads no longer hurt each
other's data and can now coexist in peace.
See https://discourse.nixos.org/t/i-cannot-for-the-life-of-me-find-the-package-that-has-pg-config/66244/4
I decided against doing this in its own nixpkgs manual: the line
to draw is quite blurry already (e.g. we have documented our package
removal policy in here as well) and having to check two manuals for a
single subsystem feels pretty annoying to me.
The relevant part - where to find pg_config - is written at the top. I
decided to give a bit more context about the way our packaging works
since I realized a few times now that I don't remember all the details
about the problems we had in the past and having to look up individual
commit messages for that isn't very productive.