Q1. How do I publish CE-SE relationships in a MDS Glue-schemas based?

 

The GRID is going to be populated of a high number of Computing Services (CE) and Storage Services (SE) each of them identified by a unique ID. A service request (job execution) to a resource broker, can require some input file to the job. For this file a logical name is provided (LFN) and replicas can be spread all over the GRID in some storage service.

Once the broker gets the list of all physical replicas and their location, it is asked to bind one of them with a suitable computing service for the specified job. One possible strategy is to get a list of all CE’s that are accessible by the service requestor (authorization) and that match the job requirements. Once this list is created and ordered using some rank parameter, we would like to select the best CE (top of the list) that has access to the best replica.

 

To select the best replica of a file for a certain CE is not at all a trivial issues. The choice could be made on:

  1. current network load along the path CE-SE
  2. max IO capacity for the SE
  3. file latency

 

All of these parameters currently cannot be gathered from the MDS. Therefore the temporary solution provided to have a matching criteria is to explicitly publish a CE-SE binding that is meant to be used by the broker to perform the needed matching.

 

For each CE, there is a two-level binding advertisement. At the first level (Group Level), a list of the SE unique ID is provided. At the Single Level (for each SE), specific CE-SE attributes can be provided. At the moment, at the single level, only the mount point for a CE is published. This in case that the SE is locally accessible by jobs running on that CE.

 

To have a practical understanding of this solution, have a look to the following ldif example. In this example, CE which unique ID is grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq is bound to SE’s edt004.cnaf.infn.it and grid025.pd.infn.it. The first one is accessible through an NFS mounted directory in the CE called /shared

 

dn: GlueCESEBindGroupCEUniqueID=grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq, mds-vo-name=local, o=grid

objectclass:            GlueGeneralTop

objectclass:            GlueSchemaVersion

GlueSchemaVersionMajor: 0

GlueSchemaVersionMinor: 1

objectclass:            GlueCESEBindGroup

GlueCESEBindGroupCEUniqueID: grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq

GlueCESEBindGroupSEUniqueID: edt004.cnaf.infn.it

GlueCESEBindGroupSEUniqueID: grid025.pd.infn.it

 

dn: GlueCESEBindSEUniqueID=edt004.cnaf.infn.it, GlueCESEBindGroupCEUniqueID=grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq, mds-vo-name=local, o=grid

objectclass:            GlueGeneralTop

objectclass:            GlueSchemaVersion

GlueSchemaVersionMajor: 0

GlueSchemaVersionMinor: 1

objectclass:            GlueCESEBind

GlueCESEBindCEUniqueID: grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq

GlueCESEBindSEUniqueID: edt004.cnaf.infn.it

GlueCESEBindCEAccesspoint: /shared

 

dn: GlueCESEBindSEUniqueID=grid025.pd.infn.it, GlueCESEBindGroupCEUniqueID=grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq, mds-vo-name=local, o=grid

objectclass:            GlueGeneralTop

objectclass:            GlueSchemaVersion

GlueSchemaVersionMajor: 0

GlueSchemaVersionMinor: 1

objectclass:            GlueCESEBind

GlueCESEBindCEUniqueID: grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq

GlueCESEBindSEUniqueID: grid025.pd.infn.it

 

 

Who is in charge to publish the CE-SE binding in the MDS?

 

At the moment this is not yet decided. Probably, the temporary solution will be the CE GRIS. Later, we can think about an entity which activity is to monitor the GRID (e.g. per VO) and dynamically advertise for each CE the best CE-SE bind.

 

Query by example:

 

I want to know all SE Unique ID of storage services bound to the CE which Unique ID is grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq

 

ldapsearch -h my.giis.hostname -p 2135 -b "mds-vo-name=myVO, o=grid" -x -LLL "(&(objectclass=GlueCESEBindGroup)(GlueCESEBindGroupCEUniqueID= grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq))" GlueCESEBindGroupSEUniqueID

 

I want to know all CE Unique ID bound to the SE grid025.pd.infn.it

 

ldapsearch -h my.giis.hostname -p 2135 -b "mds-vo-name=MyVO,o=grid" -x -LLL "(&(objectclass=GLueCESEBindGroup)(GlueCESEBindGroupSEUniqueID=grid025.pd.infn.it))" GlueCESEBindGroupCEUniqueID

 

References

[1] Glue CESEBind schema.

     http://cvs.infn.it/cgi-bin/cvsweb.cgi/datatag-glue/glue-schemas/Glue-CESEBind.schema

 

 

Q2. Given a SE and a CE, how do I understand if a job running on a CE can locally access files kept in the SE? (locally means through standard system calls)

 (to be updated to SE schema version 1.0)

Let’s explain it by example. The CE grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq can submit jobs to a set of worker nodes identified by the subcluster grid006f.cnaf.infn.it. All of them hasve an NFS mounted directory locally named /shared/permanentfiles. The remote one is /permanent and it is provided by a Storage Library which Unique ID is edt004.cnaf.infn.it. This Storage Library provides also a Storage Service which Unique ID is edt004.cnaf.infn.it:7777.

 

Here a picture:

 

   


The current choice in the Glue modelling is to split services from underlying systems. In the schema you can clearly describe this situation as it.


The SE service edt004.cnaf.infn.it:7777 is provided by the Storage Library edt004.cnaf.infn.it:

 

dn: GlueSEUniqueID=edt004.cnaf.infn.it, mds-vo-name=local, o=grid

objectclass:                  GlueSETop

objectclass:                  GlueSchemaVersion

GlueSchemaVersionMajor: 0

GlueSchemaVersionMinor: 1

objectclass:                 GlueSE

GlueSEUniqueID:               edt004.cnaf.infn.it

GlueSEName:                  edt004.cnaf.infn.it

GlueSEPort:                  7777

GlueSEHostingSL:              edt004.cnaf.infn.it

 

The Storage Library edt004.cnaf.infn.it has a local directory called /permanent that can be NFS mounted.

 

dn: GlueSLLocalFileSystemName=/permanent, GlueSLUniqueID=edt004.cnaf.infn.it, mds-vo-name=local, o=grid

objectclass:            GlueSLTop

objectclass:            GlueSchemaVersion

GlueSchemaVersionMajor: 0

GlueSchemaVersionMinor: 1

objectclass:                          GlueSLLocalFileSystem

GlueSLLocalFileSystemName:            /permanent

GlueSLLocalFileSystemRoot:            /permanent

GlueSLLocalFileSystemSize:             3000000

GlueSLLocalFileSystemAvailableSpace:   300000

GlueLocalFileSystemType:               ext3

 

The subcluster grid006f.cnaf.infn.it has an NFS remote directory mounted as /shared/permanent from edt004.cnaf.infn.it:/permanent

 

dn: GlueHostRemoteFileSystemName=/shared/permanent, GlueSubClusterUniqueID=grid006f.cnaf.infn.it, GlueClusterUniqueID=grid006f.cnaf.infn.it, mds-vo-name=local, o=grid

objectclass:            GlueClusterTop

objectclass:            GlueSchemaVersion

GlueSchemaVersionMajor: 1

GlueSchemaVersionMinor: 0

objectclass:                          GlueHostRemoteFileSystem

GlueHostRemoteFileSystemName:         /shared/permanent

GlueHostRemoteFileSystemRoot:         /shared/permanent

GlueHostRemoteFileSystemType:         NFS

GlueHostRemoteFileSystemSize:         3000000

GlueHostRemoteFileSystemAvailableSpace: 300000

GlueHostRemoteFileSystemServer:       edt004.cnaf.infn.it:/permanent

 

The CE grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq can submit jobs to the subcluster grid006f.cnaf.infn.it

 

dn: GlueCEUniqueID=grid006f.cnaf.infn.it:2119/jobmanager-pbs-workq, mds-vo-name=local, o=grid

objectclass:            GlueCETop

objectclass:            GlueSchemaVersion

GlueSchemaVersionMajor: 1

GlueSchemaVersionMinor: 0

objectclass:                 GlueCE

GlueCEUniqueID:   grid006f.cnaf.infn.it:2119/job-manager-pbs-workq

GlueCEName:                  long

GlueCEHostingCluster:        grid006f.cnaf.infn.it

……………..

……………..

 

 

 


Maintained by Sergio Andreozzi - Last update: Tue 29 October, 2002