Last modified: Thu Apr 12 14:20:52 BST 2007
Nick West
RSD come complete with a simple tool to help take the tedium out of running jobs to the GRID.
The diagram below illustrates the key features.
To install on some remote site the user specifies the target machine and the software application to be installed. RSD assembles a script with this information together with the URL of the web directory and submits it.
RSD supports recursive installation, that is to say an application's list of components can include other supporting applications that are to be installed, if necessary, before installing its own component libraries.
If the process completes successfully then the application is considered to be installed. If the installation of any library or supporting application fails, installation terminates. In either case all the installation logs are returned to the launch site for examination. After fixing up any problems the installation job can be run again and, RSD attempts a validation before installation of each library, any libraries that were successfully installed before will not be reinstalled.
If running in a GRID environment, RSD uses lcg-ManageVOTag to maintain software tags.
{global options} command {command options} {command args}
--debug=<level> Set debug to required level (1, 2 or 3) [Default:0]
--download_method={cp|wget}
Select tar file download method
wget [Default]
cp (useful when running RSD locally without web visible dirs)
--http_proxy=<proxy> To select a non-standard proxy for wget when installing
--log_file=<file> Write log output to this file. [Default:/dev/stdout]
--ticket_dir=<dir> Specify an alternative ticket directory
--upload_method={cp|scp}
Select assembled tar file upload method
scp [Default]
cp (useful when running RSD locally without web visible dirs)
--web_url=<url> Specify an alternative web URL.
--work_dir=<dir> Write temporary data to this directory. [Default:./]
assemble <library>:<version>
Options:-
--web_account=user@host (not used if --upload_method=cp)
--web_dir=directory
--use_local ROOT only: Use current ROOTSYS
The assemble command is used to build a tar file (normally of source
code) for a selected version of a library and upload it onto the web
visible directory. Uploading uses scp command. The
--web_account option can be used to control the remote account and
host used and --web_dir the directory.In the case of ROOT, the option --use_local can be used to create a tar from the currently defined ROOTSYS, which is useful when an "off-tag" version is required.
There is a Naming Convention for libraries and versions and the configuration file
build_config_table.datlists the libraries and supporting applications that are required for each application.
install <top-dir> <application>:<version>
Options:-
--force Force reloading of tar file and skip pre-install validating
--install_log=<file> Collect all the installation logs and return a single .tar.gz in <file>
--validate_only Don't install, just validate.
The install command steers the installation of specified version of a
selected application into the specified top level directory which must
be given as in absolute NOT relative form as RSD doesn't normally use
the same working directory as its user. For a list of applications RSD
can install see the configuration file.
build_config_table.dat
By default, before installing each library, RSD runs a validation check and skips installation if it passes. This saves time if attempting to complete an installation that had earlier failed part way through. The --force option can be used to force all libraries to reinstall from scratch. The option is not propagated through to any supporting application.
Normally you don't run the install command directly but instead run the launch command that runs the install command on a remote machine and there RSD supplies a --install_log option so that all the individual library log files are returned as a single gzipped tar file.
The --validate_only option is useful if you just want to check that an application is O.K., but don't want to install, even if it isn't. This option is propagated through to any supporting application.
launch <target> <application>:<version>
<target> is one of:-
self:<sw_dir> Install locally using sw_dir as top level directory
e.g. self:/data/minos/minos2/west/rsd_tests
lgc:<remote-site> Install on LCG remote site
e.g. lcg:lcgce01.gridpp.rl.ac.uk:2119/jobmanager-lcgpbs-minosL
Options:-
--install_global_option=<install option> Pass global option to installer
e.g. --install_global_option=--debug=1
--install_command_option=<install option> Pass command option to installer
e.g. --install_command_option=--force
--remove Run remove job rather than install job
The launch job is used to generate a job to run the install (or
remove) command on a target machine. The <target> argument
determines the location of the target machine. For testing purposes it
can designate a self target and specify the top-level directory under
which to install. Currently the only other type of target is an LCG GRID
host and the remote-site must correspond to
GlueCEUniqueID
of the machine.The --install_global_option and --install_command_option can be used to add global and command arguments respectively to the install command.
The --remove option is used to completely remove all the constituent libraries of an application. There are no safety checks, you have been warned!! Supporting applications are not removed.
remove <top-dir> <application>:<version>
Options:-
--install_log=<file> Collect all the deinstallation logs and return a single .tar.gz in <file>
De-install a complete set of libraries on target site
below <top-dir>. As with the install,
you don't this normally issue this command directly but rather
indirectly when you use the launch --remove
command. Supporting applications are not removed.
This command is used to run tests on RSD itself. Currently, there is only valid command argument.
test replace_sw_tag
build_config_table.datincludes at trivial application: test_rsd that is used for testing RSD itself.
It is also convenient to define an rsd alias (csh) or subroutine (sh/bash) as:-
perl -w $RSD_HOME/driver/rsd.pm
$RSD_HOME/libs_and_builds/assemble_XXX.sh $RSD_HOME/libs_and_builds/install_XXX.shThese scripts have to handle all the versions of library XXX. For examples, look at the scripts in
$RSD_HOME/libs_and_buildsand for more inforamtion on writing these scripts see the Library_Script API
$RSD_HOME/libs_and_builds/build_config_table.datReview the individual library assemble and install scripts to ensure that can handle any new versions.
$RSD_HOME/libs_and_builds/assemble_XXX.sh $RSD_HOME/libs_and_builds/install_XXX.sh
Environment Directory Notes
variable
HOME .../ Holds global log files
install(or remove)_<application_id>_rsd.log (RSD output)
install(or remove)_<application_id>_rsd.log.err (stderr - LCG only)
install_<application_id>_install_logs.tar.gz (individual install logs)
RSD_SW_DIR .../
application name/ i.e. minossoft/
RSD_TOP_DIR application version/ e.g. R.1.20-build_1/ Holds: installed_libraries
SRT_DIST internal/
INSTALLATION external/ External libraries and the field map files (bfieldmap/)
install/
logs/ Holds: <library_id>.log, <library_id>_wget.log
scripts/ Holds: <library_id>.sh, <library_id>.status
tars/ Holds: <library_id>.tar.gz
RSD_HOME rsd_home/ The RSD code
driver Generic (independent of supported software specifics) driver
doc/ Documentation
libs_and_builds/ Software specific scripts to configure and build
launch_[install|remove]_<application_id>.sh
$RSD_HOME/driver/initialise_globals.pm
| Variable | Meaning |
|---|---|
| RSD_DEBUG | Debug flag. Set to 1 for debug |
| RSD_LIBRARY_VERSION | Library version |
| RSD_LIB_STATUS_FILE | File used to communicate one-line exit message from script |
| RSD_LOG_FILE | RSD (top level) log file |
| RSD_PRINT | Print utility used to print to RSD_LOG_FILE RSD in order to maintain a consistent format with RSD driver entries. |
| RSD_RETURN | Utility to be sourced to return status. It takes 2 args:- 1) Return code (0 = success) 2) Message Example: . RSD_RETURN 1 "Failed to build library" |
| RSD_RUN | Utility to be sourced to execute a single script line The utility checks the result of execution and quits if it fails Example: . $RSD_RUN tar xzf $RSD_TAR_FILE |
| RSD_SITE_NAME | Generic site name. One of:- "ral_ce" "ral_ui" "oxford" "laptop_nick" |
| RSD_TAR_FILE | Tar file |
| RSD_WORK_DIR | Empty work directory for use if required |
To provide good diagnostics, all scripts are expected to:-
Example
$RSD_PRINT "..Checking for msrt..."
Example
. $RSD_RETURN 0 "Ran minossoft tests O.K."
Example
. $RSD_RUN loon -b -q $RSD_PRINT "..Checking odbc. library exists" . $RSD_RUN ls -l lib/libodbc*.soThe second example shows a way to check that files exist - it's a good idea to precede these with a RSD_PRINT explaining the purpose.
Just look at
$RSD_HOME/libs_and_buildsfor examples using these helper scripts.
| Variable | Meaning |
|---|---|
| RSD_USE_LOCAL | =1 if --local sepcfied, ==0 otherwise Used when assembling ROOT to take ROOTSYS rather than CVS. |
| Variable | Meaning |
|---|---|
| HOME | Log in directory |
| RSD_HOME | Top level directory for RSD tools |
| RSD_LIB_INSTALL_LOG_FILE | Library install log (create if installing, check is validating) |
| RSD_MODE | One of "installing" or "validating" |
| RSD_TOP_DIR | The top level directory to install under It will be of the form: ../<application-name>/<application-version>/ |
| GRID Resource | Use |
|---|---|
| $VO_MINOS_SW_DIR | Top-level directory for software installation See Directory Structure |
| $HOME/.BrokerInfo | Used to get the CE name. |
| $EDG_WL_LOG_DESTINATION | Used to get the CE name if unable to find .BrokerInfo. |
| Site | WN names | RSD_SITE_NAME |
|---|---|---|
| RAL Tier 1 | *.rl.ac.uk | ral_tier1 |
| RAL Tier 2 | *.pp.rl.ac.uk | ral_tier2 |
For the WN name mapping to RSD_SITE_NAME see
set_site.pm
For site-specific hardwired settings see
set_local_config.pm
Quite apart from learning JDL in order to create a job script, you have to a create and keep track of a temporary file containing the job ID that is used after job submission to check on the job status and finally retrieve output to a temporary directory and then have to move the files again to the directory of your choosing. The same temporary file can be used for multiple jobs at the price of having to specify the entry you require each time you want access. When submitting multiple jobs you have the unsatisfactory choice of either using the same file for them all and then having to pick the entry you want, or creating multiple files and remembering all the names of the temporary files.
To simplify this RSD job command works as follows.
--ticket_dir=<dir> Specify an alternative ticket directoryand in this way create separate "pools" of ticket for each type of job you submit.
RSD creates a ticket file whose name by default is based on your JDL file (although you can select some other name). If necessary it appends a serial number to ensure it is unique. The ticket file serves as the temporary file to be used by the edj-job commands but also records additional information, for example when the job was submitted and where the output should finally be delivered.
If the job requirements are relatively simple, you can supply a .sh or .csh script and have RSD create a temporary file containing the JDL for you.
job {command options} <subcommand> {<subcommand-args>}
Options:-
--output_dir=<dir>
--verbose
--arguments=<space-separated-list>
--intput_sandbox=<space-separated-input-file-list>
--output_sandbox=<space-separated-output-file-list>
--separate_stderror
The options only only relevant for some subcommands and are ignored
otherwise.
job submit <target> <file.jdl> or <file.sh> or <file.csh> {<ticket-name>}
<target> is one of:-
lcg:remote-site Install on LCG remote site
<file.jdl> Is a Job Description Language file
<file.sh> Is a bourne/bash script (JDL file created automatically)
<file.csh> Is a C shell script (JDL file created automatically)
Use the submit command to run a specified JDL file on a remote site.
If a bourne/bash/C shell script is supplied a temporary JDL file is
created. If a ticket name isn't supplied, one based on the JDL/script
file is used, but in either case it is made unique, by appending a
serial number if required.If the --output_dir option is given it will become the default final output parent directory.
The --arguments, --input_sandbox, --output_files and --separate_stderror options are only used when constructing a JDL file with which to submit a supplied a bourne/bash/C shell script. Even then they are all optional. All but --separate_stderror can take a space separated list of values, but if more than one they have to be enclosed in quotes. For example:-
--arguments=abc --input_sandbox="inputfile1 inputfile2" --output_sandbox="outputfile1 outputfile2 outputfile3"The --arguments option is used to defined the JDL Arguments attribute.
The --input_sandbox option is used to extend the JDL InputSandbox attribute which automatically includes the supplied script file.
The --output_sandbox option is used to extend the JDL OutputSandbox attribute which automatically includes the stdout and, if required, stderr files.
The --separate_stderror option is used to direct error output to a separate file, by default it is merged in with the standard output.
When creating a JDL file, execution can be constrained to a specific Computing Element by selecting it in the <target>. For example:-
lcg:heplnx201.pp.rl.ac.ukwould submit the job to the RAL Tier 2 (in fact even heplnx201 would work as it just checks that the site GlueCEUniqueID contains this string) while
lcg:would place no constraints on where the job should run.
job list {<ticket-name>}
Use the list command to list active tickets. The ticket-name can
contain one or more wild-card "*" characters. If omitted, the
tick-name defaults to "*".Use the --verbose option to give details of each ticket.
job retrieve <ticket-name>
Use the retrieve command on jobs that have completed and have output
ready to return. Use the --output_dir option to select the final
output parent directory and override the default supplied when the job
was submitted. RSD will create a unique directory, based on the
ticket name in the parent directory and move all output files to
it.The ticket-name can contain one or more wild-card "*" characters; any ticket that hasn't got output ready to move will be ignored with a warning.
job dismiss <ticket-name>
After job output retrieval, the ticket remains active until it is dismissed
at which time the contents of the ticket file is entered into the
ticket_history.txtfile and the ticket file deleted.
The ticket-name can contain one or more wild-card "*" characters; any ticket that isn't ready to be dismissed will be ignored with a warning.