PROPOSAL EXTENSIONS TO THE CURRENT Glue CE Schema 1.0

1. CE Policy attributes, need for agreement on "undefined" state.
(MaxTotalJobs, MaxRunningJobs, MaxCPUTime, Priority, MaxWallClockTime)

PROPOSAL: -1 = undefined policy = no policy
e.g. MaxTotalJobs=-1 means no limit to max number of jobs

AGREED

2. need for advertising of GRIS access point data:

- hostname
- port number
- base dn

These are needed by a data consumer who discover the existence of services/resources
from a top-level GIIS and that wants to query GRIS's in order to get fresh dynamic data.
For instance, the EDG broker, when selected a subset of suitable services based on static data,
queries each GRIS to get the dynamic one and make the choice.

PROPOSAL: add GlueInformationServiceURL attribute (in respect of RFC2255)
e.g. ldap://edt004.cnaf.infn.it:2135/mds-vo-name=local,o=grid

to be published in
- each LDAP entry?
- each TOP LEVEL GRIS ENTRY?

AGREED
The attribute GlueInformationServiceURL will be added.
The need is to represent the association between an Information Service Network Location (GRIS URL) and the entities.
Involved entities are those having a Global Unique ID, i.e. Computing Element, Cluster, Subcluster, Host, Storage Service, Storage Library till now

3. Host detailed info are optional, if not represented we miss information about:
- how many hosts are present in the cluster?
- how many CPU's?

There is a TotalCPUs attribute in the CE, but this give info only about the CPU number assigned to the queue.

PROPOSAL:
add GlueClusterNumberOfProcessors: int
add GlueClusterNumberOfNodes: int

to the Cluster classes!

NO! To get this kind of info, host entities should be instantiated and, then count.

4. Benchmark:
at the moment we have a benchmark class with SpectInt2000 and SpecFloat2000. These values are unfortunately heavy to gather (SiteAdmin should run the benchmark software on a sample worker node and this could take
more than one hour). -> SI00 and SF00 are often not filled!

Proposal: add one more attribute in the benchmark class: BogoMips
(refer to http://www.linux.org/docs/ldp/howto/mini/BogoMips.html) this is already present in every linux system in /proc/cpuinfo. No time to wait to get it. Can be used to rank CPU computing capacity.

NO! BogoMIPS is linux-related and the glue schema is meant to be general.

5. FreeCPUs (integer): currently defined as the number of CPU's that are available for a new job

with OpenPBS, easy to fill. One job per CPU, therefore a CPU is free when no job is running on it.
with LSF, it is more difficult. You can configure more running job on the same CPU.
The previous EDG approach was to assign TotalCPUs = MaxRunningJobs, so they were considering a kind of virtual CPUs. But this is in contrast with TotalCPUs definition (number of physical CPU's), moreover it is configuration dependent.

PROPOSAL

change the definition of FreeCPUs from

Number of free CPUs available to a scheduler

to

Number of CPUs that can run a new job

In case of LSF, CPUj is free if RunningJobs(CPUj) < MaxRunningJobs(CPUj).
This means, a CPU is considered to be free if it can accept a new job

NO!

6. We would like to have a metric related to the available computing capacity that a CE can provide to a new job. This in order to rank CE's to get the highest performance for a new job, when needed.

Can be this a valid proposal?

GlueCEStateComputingCapacity2Job: the minimum computing capacity a job will receive

this can be approximated as = ceil(FreeCPUs/TotalCPUs)*(TotalCPUs*BogoMips)/MaxRunningJobs

ceil(FreeCPUs/TotalCPUs) = 1 if at least one more job can be run, 0 otherwise
(TotalCPUs*BogoMips)/MaxRunningJobs = minimum BogoMips for a new running job

NO! This is an aggregate value, can be computed at runtime.

7. Rename attributes in SMPLoad and CPUload classes:

from LoadXMin to MinX or LastXMin

this because the naming we decided (namespace+entity+component+attribute)
brings to a redundant name for these attributes:
e.g. Glue + Host + SMPLoad + Load1Min = GlueHostSMPLoadLoad1Min

this would be

GlueHostSMPLoadMin1 or GlueHostSMPLast1Min

OK for LastXMin!

8. Is the SMPLoad class necessary?

I think that having the SMP load is a bad design choice, since it has no meaning when the host is composed by one CPU.

At the moment we do not represent an identity for the CPU (there is no UniqueID attribute). That means, if I have a 2-CPU machine, I can represent the CPU type and the load of one of them. That's why SMPLoad was introduced I guess.

-> is it meaningfull/useful to represent the load of each CPU?
-> is it easy to gather?

PROPOSAL:

Remove the SMP Load class

Option 1:

if we want to be able to represent the load of each CPU:
add a UniqueID attribute to the Processor Class, associate the ProcessorLoad class to the Processor class

Option 2:

if we are not interested in the fine-grain load, but just the average one, in any case:
keep the ProcessorLoad associated to the host, change the description stating that the load is the average among all the loads related to all processors in a host.

NO!


18.02.2003