There is a race condition
in the new paralleized OCR code.
The race condition got "active" in commit
819d304a39 (Use futures for OCR parallelization),
however, the underlying bug already slipped in with commit
e6ea13f4ea (User proper `Path` instead of `str` in OCR code).
The OCR module applies tesseract to at most three variants
of the screenshot: the original one, and two variants that
are created by a preprocessing step (with ImageMagick).
The preprocessing step needs an output filename
that is used to write the preprocessed image file.
The "Path" commit broke the way the output file is named:
The code still attempts to append a ".negative" to *one*
of the preprocessed output files, but the method
`.with_suffix` is not suitable for that purpose:
Lateron, ".png" is also added with `.with_suffix`,
*replacing* the ".negative" and thereby yielding the
*the same* output filename for both preprocessed files.
Without parallelization, this doesn't hurt;
preprocessed files are simply created and analyzed in order.
But the parallelization commit
causes that these two tasks now run in parallel
(plus the third task that analyses the original screensshot,
but that does not cause any further harm here):
* Task 1: preprocess (non-negative), then tesseract the output
* Task 2: preprocess (negative), then tesseract the output
Both tasks use the same filename and thus the same file for the
preprocessed image that is generated, then used by tesseract.
This often creates a garbage file since both
preprocessings write that one file at the same time.
Tesseract consequently fails and
complains about bad data in its input file.
The commit at hand simply fixes the file naming
by adding ".negative.png" or ".positive.png"
to the filename for the preprocessed image.
This ensures both threads no longer hurt each
other's data and can now coexist in peace.
At work we have the use-case that several people connect to a large
Linux box to run tests and debug those interactively.
All tests write their state into a global `/tmp` -- e.g. the vde1 socket
and the VMs' state. This leads to conflicts when multiple people are
doing this.
This change tries to use XDG_RUNTIME_DIR before using Python's detection
of a global temp directory: when connecting, this requires a working
user session, but then we get working directories per user. This is
preferable over doing something like `mktemp -d` per run since that
would break use-cases where you want to keep the VMs' state across
multiple sessions (`--keep-vm-state`).
This patch adds a NixOS test for Limine on BIOS systems. It also fixes
some formatting in `nixos/lib/make-disk-image.nix`.
Signed-off-by: John Titor <50095635+JohnRTitor@users.noreply.github.com>
This patch adds a new partition layout type `"legacy+boot` which will
produce MBR images with a separate FAT32 boot partition. This is
necessary for adding a NixOS test for Limine on BIOS systems, as Limine
can't read ext4 partitions.
Right now it wrongly seems as if you can set
`sshBackdoor.enable = true;` for each test and not only for debugging
purposes.
This is wrong however since you'd need to pass /dev/vhost-vsock into the
sandbox for this (which is also a prerequisite for #392117).
To make that clear, two things were changed:
* add a warning to the manual to communicate this.
* exit both interactive and non-interactive driver early if
/dev/vhost-vsock is missing and the ssh backdoor is enabled.
If that's the case, we pass a CLI flag to the driver already in the
interactive case. This change also sets the flag for the
non-interactive case.
That way we also get a better error if somebody tries to enable this
on a system that doesn't support that.
This commit fixes a small typo in the documentation for the
`asDropinIfExists` option and clarifies the comment.
Signed-off-by: squat <lserven@gmail.com>
I'm a little annoyed at myself that I only realized this _after_ #392030
got merged. But I realized that if something else is using AF_VSOCK or
you simply have another interactive test running (e.g. by another user
on a larger builder), starting up VMs in the driver fails with
qemu-system-x86_64: -device vhost-vsock-pci,guest-cid=3: vhost-vsock: unable to set guest cid: Address already in use
Multi-user setups are broken anyways because you usually don't have
permissions to remove the VM state from another user and thus starting
the driver fails with
PermissionError: [Errno 13] Permission denied: PosixPath('/tmp/vm-state-machine')
but this is something you can work around at least.
I was considering to generate random offsets, but that's not feasible
given we need to know the numbers at eval time to inject them into the
QEMU args. Also, while we could do this via the test-driver, we should
also probe if the vsock numbers are unused making the code even more
complex for a use-case I consider rather uncommon.
Hence the solution is to do
sshBackdoor.vsockOffset = 23542;
when encountering conflicts.
With this it's possible to trivially SSH into running machines from the
test-driver. This is especially useful when running VM tests
interactively on a remote system.
This is based on `systemd-ssh-proxy(1)`, so there's no need to configure
any additional networking on the host-side.
Suggested-by: Ryan Lahfa <masterancpp@gmail.com>
Format all Nix files using the officially approved formatter,
making the CI check introduced in the previous commit succeed:
nix-build ci -A fmt.check
This is the next step of the of the [implementation](https://github.com/NixOS/nixfmt/issues/153)
of the accepted [RFC 166](https://github.com/NixOS/rfcs/pull/166).
This commit will lead to merge conflicts for a number of PRs,
up to an estimated ~1100 (~33%) among the PRs with activity in the past 2
months, but that should be lower than what it would be without the previous
[partial treewide format](https://github.com/NixOS/nixpkgs/pull/322537).
Merge conflicts caused by this commit can now automatically be resolved while rebasing using the
[auto-rebase script](8616af08d9/maintainers/scripts/auto-rebase).
If you run into any problems regarding any of this, please reach out to the
[formatting team](https://nixos.org/community/teams/formatting/) by
pinging @NixOS/nix-formatting.
We're getting 2x5 darwin VM jobs that aren't schedulable
on our current Hydra.nixos.org, which makes them hang around
and delay advancing of all `nixpkgs-*` channels.
To me that's quite an annoying effect, as it can be like an extra day
of additional delay without any benefit that I can really perceive.
(unless someone like me keeps manually cancelling the jobs all the time)
Their last commit is from March 28th 2019, about 6 years ago.
This PR is in no way intended to diminish Daniels's accomplishments, and they're welcome to just say so if they'd prefer this PR not to be merged. Also, even if it's merged, of course they're always welcome to return to activity and be added back. The intent of this PR is to give more realistic expectations around the maintainership of these packages, and to invite others to step up for maintainership if they rely on those packages.
If this is merged, they should probably also be removed from the list
of committers for the time being.
Closes#180089
I realized that the previous commits relying on `sys.exit` for dealing
with `MachineError`/`RequestedAssertionFailed` exit the interactive
session which is kinda bad.
This patch uses the ipython driver: it seems to have equivalent features
such as auto-completion and doesn't stop on SystemExit being raised.
This also fixes other places where this happened such as things calling
`log.error` on the CompositeLogger.
After a discussion with tfc, we agreed that we need a distinction
between errors where the user isn't at fault (e.g. OCR failing - now
called `MachineError`) and errors where the test actually failed (now
called `RequestedAssertionFailed`).
Both get special treatment from the error handler, i.e. a `!!!` prefix
to make it easier to spot visually.
However, only `RequestedAssertionFailed` gets the shortening of the
traceback, `MachineError` exceptions may be something to report and
maintainers usually want to see the full trace.
Suggested-by: Jacek Galowicz <jacek@galowicz.de>
When doing `machine.succeed(...)` or something similar, it's now clear
that the command `...` was issued on `machine`.
Essentially, this results in the following diff in the log:
-(finished: waiting for unit default.target, in 13.47 seconds)
+machine: (finished: waiting for unit default.target, in 13.47 seconds)
(finished: subtest: foobar text lorem ipsum, in 13.47 seconds)