RSD: Remote Software Deployment

- a system for the automated installation of MINOS offline software

Last modified: Thu Apr 12 14:20:52 BST 2007

Nick West

Contents


Overview

RSD is a system to simplify installation of the MINOS offline code on remote machines. It is specifically aimed at the GRID environment but could be used in any situation where it is only required to install a frozen version of the code (i.e. no CVS updating) so could be used on farms for example.

RSD come complete with a simple tool to help take the tedium out of running jobs to the GRID.

The diagram below illustrates the key features.

ASSEMBLE

The assemble command is used to prepare component source tar files, including RSD itself and place them on a web-visible directory. They don't have to be assembled in any particular order or indeed from any particular site, however the RSD tools have to be installed and in order to prepare the minossoft tar, the site must have minossoft installed.

LAUNCH

The launch command has to be issued at a site which is able to submit jobs that will run where the software is to be installed. For GRID work this means that the site has to be a User Interface The RSD tools need to be installed, but it isn't necessary to have minossoft installed.

To install on some remote site the user specifies the target machine and the software application to be installed. RSD assembles a script with this information together with the URL of the web directory and submits it.

INSTALL

When the launch script runs on the remote machine it uses the web URL to download a copy of RSD and then runs RSD with the install command. This looks up the requested software application in a configuration table and from this deduces the component libraries to be installed. For each of these libraries in turn it download it from the web and then looks up and runs the appropriate installation script.

RSD supports recursive installation, that is to say an application's list of components can include other supporting applications that are to be installed, if necessary, before installing its own component libraries.

If the process completes successfully then the application is considered to be installed. If the installation of any library or supporting application fails, installation terminates. In either case all the installation logs are returned to the launch site for examination. After fixing up any problems the installation job can be run again and, RSD attempts a validation before installation of each library, any libraries that were successfully installed before will not be reinstalled.

If running in a GRID environment, RSD uses lcg-ManageVOTag to maintain software tags.

REMOVE

The launch command can also be used to submit a remove command to remove an application that is no longer needed at a target site. Its supporting applications are not removed as they may be required by other applications; to remove them the launch command must specify them directly.


User Manual

Syntax

  {global options} command {command options} {command args}


Global Options

  --debug=<level>       Set debug to required level (1, 2 or 3) [Default:0]
  --download_method={cp|wget}
                        Select tar file download method 
                          wget [Default]
                          cp  (useful when running RSD locally without web visible dirs)
  --http_proxy=<proxy>  To select a non-standard proxy for wget when installing
  --log_file=<file>     Write log output to this file. [Default:/dev/stdout]       
  --ticket_dir=<dir>    Specify an alternative ticket directory
  --upload_method={cp|scp}
                        Select assembled tar file upload method 
                          scp [Default]
                          cp  (useful when running RSD locally without web visible dirs)
  --web_url=<url>       Specify an alternative web URL.
  --work_dir=<dir>      Write temporary data to this directory. [Default:./]


Commands


assemble -- Assemble a tarball

assemble <library>:<version>
    Options:-
      --web_account=user@host    (not used if  --upload_method=cp)
      --web_dir=directory
      --use_local                 ROOT only: Use current ROOTSYS
The assemble command is used to build a tar file (normally of source code) for a selected version of a library and upload it onto the web visible directory. Uploading uses scp command. The --web_account option can be used to control the remote account and host used and --web_dir the directory.

In the case of ROOT, the option --use_local can be used to create a tar from the currently defined ROOTSYS, which is useful when an "off-tag" version is required.

There is a Naming Convention for libraries and versions and the configuration file

  build_config_table.dat
lists the libraries and supporting applications that are required for each application.


help -- Display brief help

This basically does what you would expect.


install -- Install a complete set of libraries

  install  <top-dir> <application>:<version>
    Options:-
      --force               Force reloading of tar file and skip pre-install validating
      --install_log=<file>  Collect all the installation logs and return a single .tar.gz in <file>
      --validate_only       Don't install, just validate.
The install command steers the installation of specified version of a selected application into the specified top level directory which must be given as in absolute NOT relative form as RSD doesn't normally use the same working directory as its user. For a list of applications RSD can install see the configuration file.
  build_config_table.dat

By default, before installing each library, RSD runs a validation check and skips installation if it passes. This saves time if attempting to complete an installation that had earlier failed part way through. The --force option can be used to force all libraries to reinstall from scratch. The option is not propagated through to any supporting application.

Normally you don't run the install command directly but instead run the launch command that runs the install command on a remote machine and there RSD supplies a --install_log option so that all the individual library log files are returned as a single gzipped tar file.

The --validate_only option is useful if you just want to check that an application is O.K., but don't want to install, even if it isn't. This option is propagated through to any supporting application.


job -- Commands to manage job submission and output retrieval

Strictly the job command isn't part of RSD at all, but internally RSD needed a method to submit jobs to the GRID and the services it requires have been extended and made available as a command. For more details see Job Submission


launch -- Launch job to install (or remove) a complete set of libraries

  launch <target>  <application>:<version>
    <target> is one of:-
      self:<sw_dir>      Install locally using sw_dir as top level directory
                         e.g. self:/data/minos/minos2/west/rsd_tests
      lgc:<remote-site> Install on LCG remote site 
                         e.g. lcg:lcgce01.gridpp.rl.ac.uk:2119/jobmanager-lcgpbs-minosL
    Options:-
      --install_global_option=<install option>     Pass global option to installer
                                                   e.g.  --install_global_option=--debug=1
      --install_command_option=<install option>    Pass command option to installer
                                                   e.g.  --install_command_option=--force
      --remove                                     Run remove job rather than install job
The launch job is used to generate a job to run the install (or remove) command on a target machine. The <target> argument determines the location of the target machine. For testing purposes it can designate a self target and specify the top-level directory under which to install. Currently the only other type of target is an LCG GRID host and the remote-site must correspond to GlueCEUniqueID of the machine.

The --install_global_option and --install_command_option can be used to add global and command arguments respectively to the install command.

The --remove option is used to completely remove all the constituent libraries of an application. There are no safety checks, you have been warned!! Supporting applications are not removed.


remove -- Deinstall a complete set of libraries

  remove  <top-dir> <application>:<version>
    Options:-
      --install_log=<file>  Collect all the deinstallation logs and return a single .tar.gz in <file>
De-install a complete set of libraries on target site below <top-dir>. As with the install, you don't this normally issue this command directly but rather indirectly when you use the launch --remove command. Supporting applications are not removed.


test -- Run component test

You should leave this alone!

This command is used to run tests on RSD itself. Currently, there is only valid command argument.

   test replace_sw_tag


Testing

The configuration file
  build_config_table.dat
includes at trivial application: test_rsd that is used for testing RSD itself.


Maintenance

Installing

  1. Check out the code.
  2. Define the environmental variable RSD_HOME to point to the top level directory.

    It is also convenient to define an rsd alias (csh) or subroutine (sh/bash) as:-

      perl -w $RSD_HOME/driver/rsd.pm
    


Extending

Adding New Libraries

Adding a new library XXX requires the creation of two new scripts:-
  $RSD_HOME/libs_and_builds/assemble_XXX.sh
  $RSD_HOME/libs_and_builds/install_XXX.sh 
These scripts have to handle all the versions of library XXX. For examples, look at the scripts in
  $RSD_HOME/libs_and_builds
and for more inforamtion on writing these scripts see the Library_Script API


Adding New Applications and Builds

To add a new application or build update:-
  $RSD_HOME/libs_and_builds/build_config_table.dat
Review the individual library assemble and install scripts to ensure that can handle any new versions.
  $RSD_HOME/libs_and_builds/assemble_XXX.sh 
  $RSD_HOME/libs_and_builds/install_XXX.sh  


Implementation Notes

Naming Conventions

The core concepts are:- Both concepts are qualified by a version.:- A library or application name are often combined with its version to form a Application ID or Library ID. In the RSD API they are combined using a colon e.g. libsigc++:1.2.5 or minossoft:R.1.20-build_1 (the colon was chosen to ensure easy parsing) but for file names the separator is normally the more conventional minus sign e.g. libsigc++-1.2.5. There is an exception: for some reason the dcap tar file is formed using an underscore e.g. dcap_v2_36_f0506_Linux+2.4 so RSD has to be programmed to know about exceptions like this.


Directory Structure

RSD uses two directory trees:- The two trees are organised as follows:-

  Environment   Directory                    Notes
  variable
  
  HOME         .../                          Holds global log files
                                               install(or remove)_<application_id>_rsd.log      (RSD output)
                                               install(or remove)_<application_id>_rsd.log.err  (stderr - LCG only)
                                               install_<application_id>_install_logs.tar.gz (individual install logs)
  
  
  RSD_SW_DIR  .../
                 application name/               i.e. minossoft/
  RSD_TOP_DIR       application version/         e.g. R.1.20-build_1/ Holds: installed_libraries
  SRT_DIST             internal/             
  INSTALLATION         external/             External libraries and the field map files (bfieldmap/)
                       install/
                          logs/              Holds: <library_id>.log,  <library_id>_wget.log
                          scripts/           Holds: <library_id>.sh,  <library_id>.status
                          tars/              Holds: <library_id>.tar.gz
  RSD_HOME          rsd_home/                The RSD code
                      driver                 Generic (independent of supported software specifics) driver
                      doc/                   Documentation
                      libs_and_builds/       Software specific scripts to configure and build
  

Notes

For completeness, as this section includes file naming conventions, the script RSD generates when launching a job to run RSD on a remote machine is named:-
  launch_[install|remove]_<application_id>.sh


Global Driver Variables

Global driver variables are defined in
  $RSD_HOME/driver/initialise_globals.pm


Library Script API

Each time an assemble or install script is invoked, the following environmental variables will be defined:-

VariableMeaning
RSD_DEBUG Debug flag. Set to 1 for debug
RSD_LIBRARY_VERSION Library version
RSD_LIB_STATUS_FILE File used to communicate one-line exit message from script
RSD_LOG_FILE RSD (top level) log file
RSD_PRINT Print utility used to print to RSD_LOG_FILE RSD
in order to maintain a consistent format with RSD driver entries.
RSD_RETURN Utility to be sourced to return status. It takes 2 args:-
1) Return code (0 = success)
2) Message
Example: . RSD_RETURN 1 "Failed to build library"
RSD_RUN Utility to be sourced to execute a single script line
The utility checks the result of execution and quits if it fails
Example: . $RSD_RUN tar xzf $RSD_TAR_FILE
RSD_SITE_NAME Generic site name. One of:-
"ral_ce"
"ral_ui"
"oxford"
"laptop_nick"
RSD_TAR_FILE Tar file
RSD_WORK_DIR Empty work directory for use if required

To provide good diagnostics, all scripts are expected to:-

These responsibilities can make writing scripts rather tedious so helper scripts have been provided.

Just look at

  $RSD_HOME/libs_and_builds
for examples using these helper scripts.


Assemble-Specific Environmental Variables

Each time an assemble script is invoked, the following additional environmental variables will be defined:-

VariableMeaning
RSD_USE_LOCAL =1 if --local sepcfied, ==0 otherwise
Used when assembling ROOT to take ROOTSYS rather than CVS.


Install-Specific Environmental Variables

Each time an install script is invoked, the following environmental additional variables will be defined:-

VariableMeaning
HOME Log in directory
RSD_HOME Top level directory for RSD tools
RSD_LIB_INSTALL_LOG_FILE Library install log (create if installing, check is validating)
RSD_MODE One of "installing" or "validating"
RSD_TOP_DIR The top level directory to install under
It will be of the form: ../<application-name>/<application-version>/


GRID Site Specifics

RSD assumes as little as possible about remote GRID sites and tries only to use resources (files and environmental variables) that constitute the standard GRID setup. This section lists the resources it needs. It also lists the Computing Elements it is currently working on and the name mapping it uses to get from the WN (Worker Node) host name to the RSD_SITE name which will be used, if necessary to hardwire information where the standard seup is inadequate.

GRID ResourceUse
$VO_MINOS_SW_DIR Top-level directory for software installation
See Directory Structure
$HOME/.BrokerInfo Used to get the CE name.
$EDG_WL_LOG_DESTINATION Used to get the CE name if unable to find .BrokerInfo.

SiteWN namesRSD_SITE_NAME
RAL Tier 1 *.rl.ac.uk ral_tier1
RAL Tier 2 *.pp.rl.ac.uk ral_tier2

For the WN name mapping to RSD_SITE_NAME see set_site.pm
For site-specific hardwired settings see set_local_config.pm


Commands to Manage GRID Job Submission and Output Retrieval


Introduction

Job submission and subsequent output retrieval is a fairly tedious affair, as is demonstrated by the introductory Submitting Jobs to the Grid

Quite apart from learning JDL in order to create a job script, you have to a create and keep track of a temporary file containing the job ID that is used after job submission to check on the job status and finally retrieve output to a temporary directory and then have to move the files again to the directory of your choosing. The same temporary file can be used for multiple jobs at the price of having to specify the entry you require each time you want access. When submitting multiple jobs you have the unsatisfactory choice of either using the same file for them all and then having to pick the entry you want, or creating multiple files and remembering all the names of the temporary files.

To simplify this RSD job command works as follows.

  1. When you use the submit command to submit a job, RSD creates a ticket file in a special directory set aside for tickets. This directory is site specific, to see what the local one is just run RSD without any arguments and look at the first few lines output. Having everyone use this directory isn't very satisfactory; tickets can only be read by the user who wrote them so you are recommended to create your own subdirectory, using your login user name as the directory name. If RSD finds such a directory for you it will use it instead. Alternatively you can use the Global Option
      --ticket_dir=<dir>    Specify an alternative ticket directory
    
    and in this way create separate "pools" of ticket for each type of job you submit.

    RSD creates a ticket file whose name by default is based on your JDL file (although you can select some other name). If necessary it appends a serial number to ensure it is unique. The ticket file serves as the temporary file to be used by the edj-job commands but also records additional information, for example when the job was submitted and where the output should finally be delivered.

    If the job requirements are relatively simple, you can supply a .sh or .csh script and have RSD create a temporary file containing the JDL for you.

  2. Once submitted you can check on active tickets using the list command giving the ticket name, which can be wild-carded and defaults to "*" so every ticket will be listed.

  3. When a job has finished the retrieve command is used to retrieve the output. Again the selected ticket name can be wild-carded and even "*" is safe; RSD will only attempt to retrieve suitable tickets i.e. ones that have finished and have output ready to return. RSD returns the output to a directory of your choice with a default based on the ticket name. Again a serial number is added to the directory to ensure its uniqueness.

  4. After retrieval the ticket remains active until it is removed using the dismiss command which records information about the ticket in a ticket_history file so that you can go back and look to see what jobs you ran in the past, and where their output was returned to.


The subcommand syntax

    job {command options} <subcommand> {<subcommand-args>}
    Options:-
      --output_dir=<dir>
      --verbose
      --arguments=<space-separated-list>
      --intput_sandbox=<space-separated-input-file-list>
      --output_sandbox=<space-separated-output-file-list>
      --separate_stderror
The options only only relevant for some subcommands and are ignored otherwise.


The submit subcommand

    job submit   <target> <file.jdl> or <file.sh> or <file.csh> {<ticket-name>}
    <target> is one of:-
      lcg:remote-site  Install on LCG remote site
    <file.jdl>          Is a Job Description Language file
    <file.sh>           Is a bourne/bash script (JDL file created automatically)
    <file.csh>          Is a C shell script (JDL file created automatically)
Use the submit command to run a specified JDL file on a remote site. If a bourne/bash/C shell script is supplied a temporary JDL file is created. If a ticket name isn't supplied, one based on the JDL/script file is used, but in either case it is made unique, by appending a serial number if required.

If the --output_dir option is given it will become the default final output parent directory.

The --arguments, --input_sandbox, --output_files and --separate_stderror options are only used when constructing a JDL file with which to submit a supplied a bourne/bash/C shell script. Even then they are all optional. All but --separate_stderror can take a space separated list of values, but if more than one they have to be enclosed in quotes. For example:-

  --arguments=abc
  --input_sandbox="inputfile1 inputfile2"
  --output_sandbox="outputfile1 outputfile2 outputfile3"
The --arguments option is used to defined the JDL Arguments attribute.

The --input_sandbox option is used to extend the JDL InputSandbox attribute which automatically includes the supplied script file.

The --output_sandbox option is used to extend the JDL OutputSandbox attribute which automatically includes the stdout and, if required, stderr files.

The --separate_stderror option is used to direct error output to a separate file, by default it is merged in with the standard output.

When creating a JDL file, execution can be constrained to a specific Computing Element by selecting it in the <target>. For example:-

  lcg:heplnx201.pp.rl.ac.uk
would submit the job to the RAL Tier 2 (in fact even heplnx201 would work as it just checks that the site GlueCEUniqueID contains this string) while
  lcg: 
would place no constraints on where the job should run.


The list subcommand

    job list     {<ticket-name>}
Use the list command to list active tickets. The ticket-name can contain one or more wild-card "*" characters. If omitted, the tick-name defaults to "*".

Use the --verbose option to give details of each ticket.


The retrieve subcommand

    job retrieve <ticket-name>
Use the retrieve command on jobs that have completed and have output ready to return. Use the --output_dir option to select the final output parent directory and override the default supplied when the job was submitted. RSD will create a unique directory, based on the ticket name in the parent directory and move all output files to it.

The ticket-name can contain one or more wild-card "*" characters; any ticket that hasn't got output ready to move will be ignored with a warning.


The dismiss subcommand

    job dismiss  <ticket-name> 
After job output retrieval, the ticket remains active until it is dismissed at which time the contents of the ticket file is entered into the
  ticket_history.txt
file and the ticket file deleted.

The ticket-name can contain one or more wild-card "*" characters; any ticket that isn't ready to be dismissed will be ignored with a warning.


Nick West