<h1>TL;DR: We are building some awesome technical automation to deliver applications - but how does the actual intended audience actually use it ?</h1>
Whoa ! That’s a mouthful, even for a quasi-abstract, but stick with me, this one is all about “Ok, now what ?”. The time has come to get real with CODE-RADE project, after many many months of laying down foundations. The basic premise of the project is to demolish the barrier to entry for new applications, by
- Judicious use of web technologies
- Heavy use of automation
Basically, we want a user or research group to propose an application that they want, literally by sending a pull request, have some fancy robots take over from there, then a few days later have the user submitting jobs to the grid with their new shiny.
Delivering applications should be a simple as pushing a few commits to a repo; executing that application in a workflow should be as simple as writing a script that says not much more than
<code> # Add the AAROC deployed apps module
module add deploy
# Add myApp and dependencies
module add myApp
# SCIENCE !
. . . my_workflow . . .
Step 4 - Profit !
This is most likely attributable to a South Park episode, but since this is the internet, who knows anymore !
Unless you’ve been living on Mars until just this moment, or are reading this blog post from the New Horizons probe, you probably know the Profit !!! meme.
If not, here’s how it works :
You have a goal, and you have a vague idea of how to reach it. The first two steps are easy, then some magic happens, finally, PROFIT !!!
I propose that this is usually the case for researchers - they know their application better than the infrastructure operators do, and they usually know how to build it and execute it. The infrastructure operators, however, know how to deliver applications to their users - they might have some local customisations at their sites for how they deal with users, parallelism, file systems etc. In this analogy, the user would know the first two steps of their goal-oriented path :
- Compile the application against it’s respective dependencies
- Check that the application executes correctly
while the operator knows the other side of that model, and can ensure that the application is in the right path with the right permissions, etc.
A bit of background.
Before we get into the steps that a user would need to take right now to deploy their application to the grid, let’s take a brief look into the past.
No-one said it would be easy…
Getting applications onto the infrastructure was one of the main blocking aspects of SAGrid, and most national distributed computing platforms. Without being able to run their code or workflows in a reasonable time, users and entire communities get discouraged and - in some cases - turn to home-brew solutions. However, the model we had in place assumed that there was an impermeable barrier between the user and the infrastructure, with the “Software Manager” playing the role of “priest”, interceding for the user to the infrastructure’s operations team. Essentially, this created a huge bottleneck, because special jobs had to be created, to be submitted site-by-site, in order to install the applications. What is more, functional checks were rarely made on the deployed applications, and updates to the packages were not guaranteed - they needed to be manually updated.
No-one said it should be this hard
Wow, looking back that was quite dumb, for lack of a better word. Yes, we had nothing else to work with, and yes, everyone else was doing it, but that doesn’t mean we can’t make it less stupid ! We wanted to re-design the process so that it was more user driven and parallel than previously, with as few as possible bottlenecks. Essentially, we wanted to make the impermeable wall permeable, but brokering some trust between user and operator. All the user had to do was show the operator how the application had to be built; all the operator had to do was ensure that trusted applications were always available. Since most of the support effort previously was spent on ensuring that applications were compiled and linked correctly, and tracking down runtime errors, we wanted to eliminate or drastically reduce the occurence of these. The operators would be saying “ok, prove to me that you can execute in this environment witout doing bad things”.
The Two-Way Mirror
What if all the testing, checking and execution could be automated in some way ? A tireless robot could realistically test all desired configurations of an application, and could do so in an unbiased way - such that both the operator and the user could refer to the same build and say “yep, looks good”. This is a kind of “two-way mirror”, where the user can make verifiable statements about the application, and the operator can make verifiable statements about whether that application will execute on the remote side of the infrastructure. In the middle there is some opaqueness to both sides, but there is enough transparency to engender trust.
The jenkins magic
This is the basis of the trust brokerage and we’ve implemented it with Jenkins. The details of the Continuous Integration setup for AAROC have been described at various previous posts in this blog and other slide decks, and here we want to focus on the user side of the process. What does a user have to do, step-by-step, in order to get their application to a point where it’s executable ?
Your move, user…
Let’s start by putting the ball in the user’s hand - we expect the user to prepare at least two scripts, and to place them in a change-controlled repository. As a default approach, this repo would be on Github, containing:
README telling us what the application is and what dependencies it needs
build.sh script which only compiles the code and produces binary and library files (this is somtimes trivial, in the case of precompiled code)
check-build.sh script which provides a functional test of the code, as well as creates a modulefile for
Fork the demo repo
Fork the My-First-Deploy repo to get an idea of what’s going on; there’s a nice README in there which explains how things work and where to get help.
Mount the CVMFS repo.
All of the applications that have already been tested are already in the CVMFS repo, which you can easily mount on your laptop or local cluster.
We’ve even got a nice playbook that you can use to make your machine a CVMFS client . This will give you direct access (via fuse) to the existing repo, and allow you to build code against the libraries therein.
Make sure it builds
Your next task is to make the application build on something that you know - something “Linux” that is. Your main aim is to prove to the operators that the application will run, and they are operating a CentOS-based infrastructure. Some sites deploy Debian worker nodes too, which is why we have the Debian build slave - so basically if you can write a script that will compile your application against the libraries in CVMFS (and only those), then it will pass the first phase of testing. At this point, you can send a pull request and we can create the job to test the code on Jenkins. The script should be called
build.sh for conventional reasons.
From here on out, every commit will trigger a Jenkins build
Make sure it runs
Once you’ve made it compile, the next step is to prove that it will not give runtime errors. This means that a modulefile needs to be created for the application and a short script, by convention called
check-build.sh is executed to load the modulefile for the applicaiton and execute functional tests. These tests have to be provided by you, the requestor, since you’re the expert after all !
Green Lights ! Ansible ! Transaction !
check-build phase has successfully completed, a manual promotion is needed. We want this to be one of the few human steps to be made in the workflow, and we have to maintain some form of human interaction, mostly for quality and accountability reasons. Once a check has been made on the artifact, and perhaps a few messages have been exchanged between developer and expert, the application can be pulled into the
This involves putting the CVMFS repo into a “transaction” - a write-enabled mode in which updates of the repo can be made - and publishing a new version of the repo. All of this happens “behind the curtains”, so to speak, from the user’s point of view. All that they should see is a message somewhere that says something like
A new version of the repository has been published in the
dev repository -
Changelog: “New application xyz added after successful build, requested by so-and-so”
Now, the application is out there “in the wild” - albeit in a testing repository. Certain sites, and the user themself can test the application in a wider sense than the narrow infrastructure-focussed tests that Jenkins runs. After a brief period of testing, and consensus from the production sites, we can move the application into the production repository.
This intermediate step can of course be considered optional, and skipping it could speed up deployment in most cases, but it is useful to have it in the workflow, to ensure that sites have informed consensus on what they are executing. Since the repo is mounted permanently, no action is taken on the site operator’s behalf in order to deliver the application - this is a feature.
Go wild !
So, at this point, the application is executable out there in the world. All we need to do is keep track of who is mounting the repo - which is usually published in the site-BDII. If updates to the application are made, they are user-triggered, and the operations team is consistently kept in the loop thanks to the continuous integration system. New versions of the application can be developed, and continuously tested, until arriving again at that most beautiful of all sights :
Continuous delivery - that’s where we want to be.
Bonus points to anyone who can mention the songs referred to in this post in the comments below.
Tagged with blog • HPC • CODE-RADE • automation • jenkins • github • rock n roll
This is a companion discussion topic for the original entry at http://www.africa-grid.org//blog/2015/07/27/CODE-RADE-Real-World/