Nick West
Last modified: Wed May 16 15:20:09 BST 2007

Data Cache Manager 5.80

Contents

Introduction

Storage Element Access, Names and The DCM URL

DCM (Data Cache Manager) was originally a utility to manage a collection of data files spread over multiple disks and owned by multiple users. It's role has now been extended to support GRID operations, at least for the EGEE/LCG GRID at RAL in the UK. It has a list of locally available Store Elements (SEs) from which it builds catalogues and then uses these to locate and transfer files to local disk. It records changes in the catalogues as it moves files to keep them up to date but rebuilds the catalogues by scanning the local SEs each night to ensure they stays in sync. So even if files are written to a local SE without DCM, DCM will quickly know about them.

DCM names SE elements using the following syntax:-

  <site>-<type>-<service>
  e.g. ral_t1-castor-test_d0t1

Where:-

  <site>    Site name e.g. ral_t1 (RAL Tier 1) or fnal
  <type>    The storage technology e.g. dcache or castor
  <service> The individual service e.g.tape 
When DCM is runs its initial output includes a list of the SEs it can access. For example:-
Local Storage Elements (in search order):-
  ral_t1_ui-nfs                   Local NFS Disks
  ral_t1-castor-prod_d0t1         RAL T1 CASTOR disk0tape1 Production Service
  ral_t1-castor-test_d0t1         RAL T1 CASTOR disk0tape1 Test Service
  ral_t1-dcache-disk              RAL T1 dCache Disk Store
  ral_t1-dcache-tape              RAL T1 dCache Tape Store
  fnal-dcache-enstore             FNAL dCache interface to Enstore
The order reflects the default search order used to retrieve files. Note how the local disk is treated as an SE.

DCM uses the SE names as the basis for a "DCM URL". The syntax is:-

  dcm://<SE_name>/<SE_dir>/<File_name<#>byte_size>
  e.g. dcm://fnal-dcache-enstore/pnfs/fs/usr/minos/reco_near/R1_18/snts_data/2005-04/N00007148_0008.spill.snts.R1_18.0.root#129262
When requested to transfer files, DCM first converts the files names into URLs which are then used to determine the appropriate commands to perform the operation.

Local Disk Management

DCM retains its original function of local disk management and on each disk it manages there must be a top level directory:
    dcm_cache/
which is where DCM will place files, although it can also manage files that users have placed elsewhere on these disks.

On the first disk in the list there must also be a top-level directory:-

    dcm_catalogue/
this is the "soft links catalogue" which is where DCM places soft links to data files on all the disks it manages. That directory has the sub-directory
    DCM/
where DCM maintains its text catalogues which also holds
    history.log
that records when files are retrieved from SEs

When DCM is run it starts by listing the disks it is managing. For example:-

DCM configuration:-
  List of DCM-managed disks:      /stage/minos-data1/d3
                                  /stage/minos-data1/d4
                                  /stage/minos-data1/d5
                                  /stage/minos-data1/d6
                                  /stage/minos-data1/d7
  Ownership group:                minos
  Scratch directory               /tmp/dcm_scratch_area_13645

User Manual

Command syntax

    $MINOS_TOOLS/dcm.sh  {global options} command {command options} {command args}

Global Options

  --debug n    Switch on debug level n (=0 off)
  --expt  e    Selected experiment.  Allowed values: minos [default] and sno
  --site  s    Select site.  CAUTION: use for testing only!!

The catalogue command

catalogue {<file>...<file>...} { --all}
Example:  catalogue  /stage/minos-data1/d4/C00080277_0000.mdaq.root
This adds the file into both in the text catalogue and and as a soft link to the file in the soft links catalogue:-
    dcm_catalogue/
directory which must be the top level directory on the first disk managed by DCM. This command is useful if adding a file that is not within the set of directories managed by DCM
Example:  catalogue -all
This uses the results of the last disk scan (see the survey command) and checks that all the data files that it found are in the text and soft links catalogues.

The directory_ownership command

directory_ownership {mode}
where
  mode  (optional):-
          "full" [default] show every directory
          "compress" suppress sub-directory wholly owned by a single user
This command uses the results of the last disk scan (see the survey command) and reports, for each data directory, the users who own files in it including sub directories.

The disk_usage command

This uses the results of the last disk scan (see the survey command) to produce a summary of usage, both by disk and by user.

DCM classifies all files into 1 of 4 types:-

The get command

get {command options} file-query  file-query ...
Transfer one or more files from an SE (Storage Element).

Command Options

  --accept_dcm_url  Return files as DCM URLs; doesn't attempt any transfers

  --accept_root_url Return files as ROOT URLs if supported; otherwise transfer.

  --demand_complete_set
                    Quit without getting any files unless able to get them all
                    Default: return whatever files can be located.

  --file_list f     If command succeeds, record list of files (or URLs) in file f.
                    Will include all files i.e. even those already on disk.
                    Caution: On input f must not exist.

  --force_local     Force a copy to local dir (see  --local_dir) unless already
                    there.

  --local_dir d     Copy files to specified directory.  
                    Default: the dcm_cache directory of the disk with most space

  --max_files n     Set upper limit on number of files to transfer.
                    Default 10.  Hard upper limit of 1000 files.
                    Used to prevent misplaced wildcard from transferring
                    huge amounts of data!

  --num_get_jobs n  Run up to n transfer jobs at once.
                    Default 1.  Hard upper limit of 10 jobs.

  --remote_se se_name{/se_dir} 
                    Only copy files from selected SE  {and within selected /se_dir}
                    e.g --remote_se ral_t1-castor-test_d0t1/gnumi/v19/fluka05_le010z185i
                      Only look in SE ral_t1-castor-test_d0t1 within directory gnumi/v19/fluka05_le010z185i
                    e.g --remote_se 'ral_t1-dcache-disk/gnumi/v19/fluka05_le010z185i/job1.*'
                      Only look in SE ral_t1-dcache-disk within directory sub-tree gnumi/v19/fluka05_le010z185i/job1.*
                      Note: . - any single char; .* - any char string

  --test            Determine what files have to be transferred and from where  but 
                    don't transfer files

Command Args

file-query  

  Either: File name 
          e.g. F00030574_0002.mdaq.root
          or an 'egrep' wildcard regular expression: 'F000256.*.cand.R1.14.root'
          Note: . - any single char; .* - any char string
          CAUTION: Once match found in any SE DCM quits searching.

  Or:     A database query for SAM enclosed in square brackets
          e.g. [ file_name like N00008695_002%.cosmic.sntp.R1_18.0.root ]
          e.g. [     "run_type physics% 
                 and data_tier sntp-near 
                 and physical_datastream_name spill%
                 and start_time < to_date('2006-02-18','yyyy-mm-dd') 
                 and end_time   > to_date('2006-02-17','yyyy-mm-dd') 
                 and version cedar" ]

          Make sure there is a space after the leading '[' or the shell
          command parser may treats as a wildcard construction.

          Enclose in double quotes if query includes parentheses.

   Or     A DCM URL e.g. dcm://fnal-dcache-enstore/pnfs/fs/usr/minos/rec ... .snts.R1_18.0.root#129234

All 3 type types of command arg may be mixed in the same invocation. DCM first executes all SAM commands to resolve them into files names. Then, for file names that are not already a DCM URLs it searches the SE catalogues and converts then to DCM URLs. It then transfers any that it locates that are not already on local disk.

Note that the 2 stage approach allows users to have a dataset defined by a SAM query and yet retrieve files from the closest SE.

Note that, for a given file-query, DCM stops searching SE catalogues as soon as it finds any match. The logic is that a dataset should always be defined by applying a search to a single SE and not by the logical OR of all SEs. So if you want to copy some data set, say a group of files matching a wildcard, and some are already on the local disk, then, by default DCM will only find them and not copy the rest. The solution is to use the --remote_se option to force DCM to look at the SE which has the full set; it will still check the local disk so there is no risk that it will copy files it already has.

If using the --file_list option be sure that the name of the file you pass is unique. The normal way to do that is to include the process ID (environmental variable $$) in the file name. Otherwise on a system with multiple jobs running all getting files via DCM there is a danger that two might use the same name to return their file list. As an additional precaution, DCM will reject the command if it is passed an pre-existing file.

The --accept_dcm_url can be useful to see what files would satisfy a request without doing any transfer. Using the --test option only shows you what files would have to be transferred, unlike the URL request which will show files on local disks as well. It also allows you to see if transfers would have to take place. The resultant URLs can later be passed to DCM for transfer, so long as they are still valid. This might be useful if running a job on a Worker Node if no catalogue were available.

The help command

The help command has provides brief on-line help, but for details this document should be consulted.

The put command

put {command options} file_name file_name ...
Transfer one or more files to an SE (Storage Element).

Command Options

  --create_remote_dir
                     If necessary create remote directory
  --file_list f      If command succeeds, record list of files transferred
                     Each line of file is:-
                       Either: Name of file successfully written
                           Or: Error message starting with the character '?'
                     Caution: On input f must not exist.
  --local_dir d      Copy files from specified directory.  Default: current directory
  --overwrite        Overwrite existing file.  Default don't overwrite
  --remote_se se_name/se_dir
                     Directory on SE. Compulsory
   --test            Just test, don't transfer files

Command Args

file-name   File name relative to  --local_dir.  
            No wild-cards permitted and no check that file is recognisable as a data file.

The survey command

survey {<se>...<se>...}
Example: survey ral_t1-castor-test_d0t1 fnal-dcache-enstore
This command rebuilds the catalogues for the selected SEs or from all available SEs if none is specified. The resulting catalogue is stored in
  dcm_catalogue/DCM/<SE name>.cat
For most SEs the scan is carried out using the appropriate commands for the SE concerned, but there are two special cases:- Normally the survey command gets executed by nightly cron job.

The test command

  test <sub-command> <arg> ...
Is used to test and debug DCM. Typing the test command without further arguments will list what tests are currently available.

The uncatalogue command

catalogue <file>{. <file>..}
Example:  uncatalogue  /stage/minos-data1/d4/C00080277_0000.mdaq.root
This removes the file from both in a disk based catalogue and and as a soft link to the file in the soft links catalogue:-
    dcm_catalogue/
directory which must be the top level directory on the first disk managed by DCM.

Implementation notes

  • Internal Structure

    Configuration

    Individual experiments and sites are configured with the following files stored in the
      config/
    
    subdirectory.

    1. <expt>.se_servers e.g. minos.se_servers

      This file identifies all the SEs used by the experiment, the services each provides and the way to access these services.

    2. <expt>.site_<site>.se_access e.g. minos.site_ral_t1_ui.se_access

      This file specifies which of the experiments SEs can be accessed from the local site and which interfaces to use to them.

    3. <expt>.site_<site>.local_disks e.g. minos.site_ral_t1_ui.local_disks

      This file specifies the local disk setup at the site.

    Internal Structure

    When DCM was originally developed its function was local disk management and as such didn't require any formal internal structure but now that its principle objective is SE access two layers have been developed:-

    1. SEI: Storage Element Interface
      This layer is responsible for all commands that directly access an SE. Changes to the SEs available and methods of access should only affect this layer.

    2. FRS: File Retrieval System
      This layer takes user requests, converts them to DCM URLs and executes the commands to effect transfers and handles failures, all using the SEI layer.
    These layers are describe in more detail in the following sections.

    SEI: Storage Element Interface

    Introduction

    This layer is responsible for all commands that directly access an SE. Changes to the SEs available and methods of access should only affect this layer.

    Configuration

    The system is essentially data driven and is built upon the following concepts.

    1. A SE offers a series of services e.g. 'rfio' or 'dcap'

    2. The combination of an SE name and a specific service constitutes a server named <name>;<service>

    3. From a server there is a mapping to:-

      1. URL prefix that has to be prefixed to the SE directory before it can be used in a command

      2. environment commands a set of 0 or more bash commands that have to be executed before the command.

      For example, for the server "ral_t1-castor-prod_d0t1;rfio"

      • The prefix is
          /castor/ads.rl.ac.uk/prod/grid/hep/disk0tape1/minos;
        
      • The environment is
        export STAGE_SVCCLASS=minosDisk0Tape1
        export STAGE_HOST=castorstager.ads.rl.ac.uk
        export RFIO_USE_CASTOR_V2=YES
        

    4. A site wanting to access an SE makes a request e.g. "get" or "list" to it. The combination of the SE name and the request constitutes an action named <name>;<request>

    5. sites are configured by enumerating the actions that are available and hence what services they can call upon on different SEs.

    6. An action maps to:-

      1. A service

      2. A command - but only if a departure from the default for the service - see below.

    7. Typically an accessible SE offers a number of services to a site and then, to avoid explicitly enumerating all the actions the following shortcut can be used:-

      1. A request can be set to "*" which matches any request that's not explicitly listed in an action

      2. In such cases the command is default for the service. For example for the request list of the rfio service the command is rfdir.

      3. A specification can be mapped to the special service "disabled" meaning that it's not available. This allows the use of a wildcard to cover most cases and then fine tune the remainder, either with their own commands and services or disabled.

    The Routines

    The central routines are sei_assemble_command which calls sei_get_server_cmd that are responsible to assembling the appropriate command given the SE name and a request, for example "list" or "put" (copy to SE). Construction of ROOT URLs is handled by sei_get_root_url

    Handling of DCM URL , which encodes the SE name, SE directory and file size), is done by sei_dcm_url_pack

    SE directory creation is done by sei_prepare_directory and file overwriting is done by sei_prepare_file

    sei_dcm_url_unpack

    Catalogue handling is provided by sei_survey that can scan an SE and build a text catalogue and searching such a catalogue for a file name and hence infer the DCM URL (which encodes the SE name, SE directory and file size) is done by sei_search_catalogue

    FNAL Anomalies

    There are a couple of related anomalies when it comes to FNAL:- I don't claim to really understand this but it's basically what Art does, or at least did, in 2006!

    FRS: File Retrieval System

    This layer takes user requests, converts them to DCM URLs and executes the commands to effect transfers and handles failures, all using the SEI layer. If the user supplies a SAM queries FRS is responsible for resolving into a series of file names by passing to a web based SAM client. The SEI catalogues are then searched for files names to convert them to DCM URLs. The file transfers themselves are performed by a separate perl script: dcm_frs_job which does the transfer, checks the size of the copied file and retries in cases where an error has occurred.

    Having the transfer as a separate script allows FRS, when transferring multiple files, to run multiple jobs in parallel.

    After a successful transfer FRS updates the SEI catalogues.

    Experiment API

    Introduction

    Originally designed for MINOS, DCM has been extended for SNO. Some of the code is experiment specific and will be held in the subdirectories:-
      dcm/minos
      dcm/sno
    
    apart from
      init_minos.pm
      init_sno.pm
    
    After parsing any global switches DCM knows which experiment it is dealing with and then executes the appropriate experiment initialisation.

    Calls from the generic to the experiment specific code constitute the experiment API.

    identify_file

    
         Parameters:-
         ==========
      
         $file_name   Name of file to be identified (can contain directory)
      
         Return:-
         ======
      
         $data_name   MINOS:   Currently this is returned as the component
                               between the sub-run and the data type
                      SNO:     The module name e.g.Reconstruct
         $data_type   MINOS:   Data type i.e. the extension e.g. mdaq.root
                      SNO:     Data type e.g. sno_root
         $detector    MINOS:   The detector.One of "CalDet", "Far" or Near".
                      SNO:     The phase e.g. salt
         $run_no      Run number
         $sub_run_no  Sub-run number (or -1 if n/a)
         $version     MINOS:   Release (or "" if n/a)
                      SNO:     Pass number (or "" if n/a)
    

    frs_locate_file

        Parameters:-
        ==========
    
        Either: $file_name   File name whose access info is required.
        Or:     $db_query    A database query for SAM (MINOS) or Ral (SNO)
    
        Return:-
        ======
    
        A list file_access_size variables: Each consisting of:-
    
           $file_name:$access_info:$estimated_file_size
    
        where:-
    
        $file_name            File name.
        $access_info          MINOS:   ENSTORE directory 
                              SNO:     Tape name:file number 
        $estimated_file_size  Estimated size in GB
    
        In the case of an error a single entry is returned: "? Error message"
    

    Global Data Structures

    See the routine init.pm