A FreeIPA test failed. It looks like the web UI errors out immediately after logging in as a regular domain user (though it works OK in a previous test step when logged in as the administrator). I'll try and look into this in more detail in a bit.
So, I fiddled with openQA a bit and got logs from the server. Note that I did get one successful run of the test, so it seems the bug doesn't happen every time. I now have 4 fails to 1 pass, though.
Note the failed web UI access attempt happens just five seconds after the service decides to shut down. So there's a timing element here; presumably that's why the earlier web UI access as admin works.
It doesn't seem like we caught a core dump, though, unfortunately - I do have the openQA tests set up to try and capture any core dump caught by coredumpctl or abrt, but it doesn't seem to have found one. I'll have to poke it some more tomorrow and see if I can get a hold of the dump.
It's definitely crashing, and I think I know which issue it might be but we do need a core file to verify. Ideally if you could get us the stack trace from the system after the crash that would be easier: Attach gdb to core file and run "thread apply all bt full". Or, attach gdb to ns-slapd before the test is run and catch the crash live (then run "thread apply all bt full")
Current status: @mreynolds did a scratch build with a potential fix, but it still crashed. I am now knee-deep in hacking up the test to try and attach gdb to ns-slapd and get a backtrace out of it.
OK, so it got weirdly harder to reproduce the crash - I had a bunch of passed tests with both the original update and the scratch build - but I finally got a crash with a full backtrace:
This update has been submitted for testing by mreynolds.
This update's test gating status has been changed to 'waiting'.
This update's test gating status has been changed to 'ignored'.
A FreeIPA test failed. It looks like the web UI errors out immediately after logging in as a regular domain user (though it works OK in a previous test step when logged in as the administrator). I'll try and look into this in more detail in a bit.
This update has been obsoleted.
Note: same failure on prod and staging, so it doesn't look like a flake.
So, I fiddled with openQA a bit and got logs from the server. Note that I did get one successful run of the test, so it seems the bug doesn't happen every time. I now have 4 fails to 1 pass, though.
Here's the /var/log tarball from the server. It seems that the web UI error is due to the directory server not being available, which is kinda what I expected. The journal shows these errors for
dirsrv@DOMAIN-LOCAL.service
:Note the failed web UI access attempt happens just five seconds after the service decides to shut down. So there's a timing element here; presumably that's why the earlier web UI access as admin works.
Oh, yikes, I just noticed it's actually crashing, isn't it?
It doesn't seem like we caught a core dump, though, unfortunately - I do have the openQA tests set up to try and capture any core dump caught by coredumpctl or abrt, but it doesn't seem to have found one. I'll have to poke it some more tomorrow and see if I can get a hold of the dump.
It's definitely crashing, and I think I know which issue it might be but we do need a core file to verify. Ideally if you could get us the stack trace from the system after the crash that would be easier: Attach gdb to core file and run "thread apply all bt full". Or, attach gdb to ns-slapd before the test is run and catch the crash live (then run "thread apply all bt full")
Current status: @mreynolds did a scratch build with a potential fix, but it still crashed. I am now knee-deep in hacking up the test to try and attach gdb to ns-slapd and get a backtrace out of it.
OK, so it got weirdly harder to reproduce the crash - I had a bunch of passed tests with both the original update and the scratch build - but I finally got a crash with a full backtrace:
That's with the later scratch build, 37740210.