We recently completed Endless Orange Week, which was a week for all
employees to explore a topic of their choosing and expand their skills.
I chose to explore openQA. This wasn’t a totally new
topic as we’d previously had an openQA setup that we eventually shut
down for various reasons. There were several reasons I wanted to look
into it again, though:
I’m a relatively recent convert to the idea that you should have
automated tests for everything you possibly can. Sometimes that means
investing significant effort in constructing a proper test
environment, but the payoff from being able to catch issues before you
release to the world is worth it. A novel concept, I know.
The infrastructure we setup before was difficult for us to maintain.
We were using Fedora’s packages on an Equinix bare
metal server since openQA runs tests in VMs. In theory
this is fine, but there were 2 problems. First, we’re a Debian shop
(both for the OS and our infrastructure), and maintaining the
equivalent tooling to support a Fedora host was cumbersome. Second,
Fedora is not a supported OS on Equinix, so we were
using their custom iPXE process. This involved
manually walking through Anaconda over a remote serial console, which
is… not fun. I’m sure there’s a better way to do all of this, but
ultimately my life would be better if I could make this setup less
snowflake and more cattle.
Certain parts of our OS are currently only tested manually and would
really benefit from automated testing. Of particular interest to me is
the OS upgrading process using
OSTree. I’ve invested
significant time at Endless on both the client and server side of this
process and have been responsible for releasing upgrade failures into
the wild at least once. To put it mildly, I’d prefer that we not do
Even though I was involved in the previous openQA infrastructure, I
didn’t really get involved in the tests and wanted to understand how
they worked better. As a corollary, I’d heard from the people that
were involved that it was difficult to keep the tests up to date, so I
was interested to try it myself and see where the friction was.
In a word, meh. If the goal was to get automated tests for our OS going
again, then it was mostly a fail. I ended up spending almost the entire
time working on the infrastructure. While this was personally satisfying
since I was able to overcome some obstacles from the previous setup, I
essentially got back to the same functionality we had before. Which is
to say that I didn’t work on any upgrade tests that I was interested in
having. I’ll detail a few of the things I did work on below.
Google Cloud Platform
As mentioned at the beginning, the previous setup was on a bare metal
server that was difficult to provision. Since most of our infrastructure
is in AWS, I’d normally use that, but they don’t supported nested VMs
and the price for one of their provisionable bare metal EC2 instances
was a bit much for what I was doing.
Enter Google Cloud Platform (GCP). Their VMs do support nested
virtualization. Like AWS, it’s well supported in most ops tools (for me
this is Terraform and Packer). I was able to
reuse most of our tooling basically as is to get a GCE VM going. The
exception being that the Debian GCE base image doesn’t include
Once it was running and I enabled nested virtualization,
I was able to run a simple QEMU image with KVM. Yay.
Although openQA is in Debian unstable, I wanted to try
using containers since it insulates the application
from the host. The openQA worker is fairly simple, but my experience
running webapps from Debian packages has not been great. Furthermore,
while the workers have the requirement of running VMs, the webui could
later be split out to a more proper container manager.
I chose to use the
docker-compose method with openSUSE’s openQA
containers. It mostly worked very nicely, although there’s always fun
bootstrapping new containers and providing configuration to them. I sent
a couple fixes upstream that they were kind enough to
In openQA you can do screen matching using “needles”. Endless
OS is a desktop OS, so we do want to test that what’s showing up on the
screen is actually what we want. In our previous openQA configuration we
did have the tests and needles being automatically pulled from our
repo, but we didn’t have the proper configuration to
create and push commits from the webui.
After figuring out how to get an SSH key into the container and some
subsequent hair pulling, I was able to update a
needle from the webui. This was a significant
roadblock for our developers before where they’d need to manually edit
the needle and make a PR from it. Yay again.
Previously openQA required OpenID 2.0 for user
authentication. I ended up spending a significant amount of time setting
up an identity provider just for openQA. Not fun.
Since then, openQA has gained support for OAuth2, which
is great. I didn’t get a chance to test it out, but I worked on
supporting Google as the OAuth2 provider out of the box.
Since we use Google Workspace, Google authentication
is our preferred method and I’ve spent quite a bit of time in the weeds
with OAuth2 and OIDC. I hope to find a little time to spin that up and
test it so I can send it upstream.
Where to now?
I’m not sure. I still believe that we should have automated testing for
Endless OS and that openQA can do that for us. I think the work I did
here would provide a solid foundation for us to start from again. There
would still need to be a significant investment in the tests themselves,