minossoft: Deployment on the GRID

Introduction

In this document, where GRID specific tools are mentioned, the LCG ones are chosen. I am not sure how wide an application these have for sites without direct LHC involvement, but in any case, most of what follows would apply regardless of these specifics.

The essence of the GRID is distributed computing: a user logs onto a UI (User Interface) and there submits a job to the GRID. A Resource Broker determines a suitable CE (Computing Element) on which to run the job and forwards the job to it. There, data is retrieved from a neighbouring SE (Storage Element) and the job runs. The job output is returned to the SE and the log files and returned to the UI.

This process raises a number of fundamental questions but the only ones to be addressed here are:-

At least at present, it is possible to cheat, or at the very least not follow the spirit of the GRID, for both these questions:- These are hardly long term solutions so I shall dismiss them.

Determining which CEs are suitable for a job is in fact quite straightforward. Once the software has been installed and validated, the installer publishes software tag using the tool lcg-ManageVOTag either directly or indirectly. The GRID IS (Information Service) insures that this information is made available to Resource Brokers.

The job submitter includes in his Job Description Language the requirement that the CE has the required software tag and the Resource Broker does the rest.

Remotely deploying our software is the topic for the remainder of this document.

The Deployment Model

In what follows I will assume that we will not be deploying the Labyrinth. It is likely that deploying that would present more difficulties that with the rest of our software (Robert do you want to comment?). Also it is the past; it will eventually be replaced by the new C++ MC code. Should the future not get here fast enough, it should be possible to extend whatever system we develop to include it.

By Hand or Automated?

As has been stated above, we could install interactively, but we choose to follow a more sustainable approach and consider an automated system

Source or Binary?

I believe that the machines at RAL which are the first target for deployment could well be binary compatible with the front-end machines and, if so, we could just install binaries. Still, that too isn't a good long term approach, we have been bitten by the "near" binary compatible site in the past! We should go for a build from source.

CVS or Tar?

We could bootstrap on the target machine and load most of the code via CVS. However, since builds don't evolve CVS would only be used to deliver a frozen version of the sources. My view is that, in this case, CVS just provides another failure mode and it would be better to prepare tar balls so that any CVS problems occur locally and not remotely. Also it means that we only have one system for both our code and third partly libraries such as Libsigc++ and unixODBC.

Package/Library Sharing: Yes or No?

As of now, sites where multiple versions of our code is installed, typically involve significant code and binary sharing:- It's certainly possible to adopt the same model on the GRID, but managing the interdependencies remotely and maintaining them as we add and remove versions of the code is a complication.

An alternative approach is simply to build everything from scratch for each software tag and follow a simple directory structure e.g.:-

  $VO_MINOS_SW_DIR/tag/minossoft (in LCG $VO_?expt_SW_DIR is the top-level software dir)
                      /minos_packs
This has a number of advantages. The additional processing time for a full build should be insignificant compared to the time used in production, so the only read disadvantage is in the additional disk space used. Just looking at the size of the libraries as an indicator shows that that minossoft (~0.4GB) is roughly twice the size of all the other libraries put together. Further, what little experience I have suggests that disk space allocation for software isn't mean (~ 50GB). So, for now, I propose to keep things simple and treat each tag as completely independent. Even if we have eventually to go over to a system that saves space by sharing, the tools to assemble and install tar files will still be needed.

Proposed Model

The recommended way to run software installation jobs is to submit standard scripts to the GRID in just the same way as for normal production jobs. The document Experiment Software Installation in LCG-2 gives a nice little example of such an installation script used by DTEAM:-
#!/bin/bash
export TAR_LOC=`pwd`   # this is the temporary directory where it is
                       # supposedd to be the steering script and the tarballs

wget http://grid-deployment.web.cern.ch/grid-deployment/eis/docs/lcg_util-client.tar.gz 
                       # In this case the script doesn't need tarball from the grid but
                       # it fetches from the WEB  requiring OUTBOUND connectivity

cd $VO_DTEAM_SW_DIR    #software installation root directory 
mkdir lcg_utils-4.5
cd  lcg_utils-4.5
echo "running the command : tar xzvf $TAR_LOC/lcg-util-client.tar.gz"
tar xzvf $TAR_LOC/lcg-util-client.tar.gz
if [ ! $? = 0 ]; then #failure?
 exit $?
endif
It suggests a simple model: We have a web-visible directory containing all the tar balls needed to build any version we require of our software with an additional tar ball of installation scripts. Then the scripts we submit to the GRID simply identifies the task to be done e.g. a particular build to install at some CE. Once on that CE it uses wget to retrieve the installation tools and launches the main installer tool. This in turn, in the case of an installation, consults some configuration table to determine what tar balls are required, loads them using wget and then installs.

Other Applications

Although the current context is the GRID, the tools could have application whenever we want to run production jobs but not development (as that requires CVS). The obvious example is a farm. There is no reason why the same tools cannot be used here too making a farm manager's life a little easier.

The Deployment Tools

Tar ball Preparation

Third party libraries are available as source tar balls but we need tools to create:-

Installation

There are 3 deployment tasks:- If we adopt the model that each build is independent then the last of these is trivial and simply involves the removal of an entire directory tree. To deal with the first two tasks we need an installer system with the following components:- By employing an standard naming convention for Library Installer Scripts, the Installation Driver can be largely data-driven by the Build Configuration Table. It can use a simple optimisation: when asked to install, for each library it first asks the Library Installer Script to validate, and only if that fails does it then request first an installation and then a validation. In this way if the build process fails at some point then, after fixing the appropriate tar ball, the process can run again and will only build libraries as required.
n west (APC)
Last modified: Wed Mar 15 13:07:12 GMT 2006