A) Do we modify management sections of the ensemble/config XML? Chris proposed to isolate management sections to different XML documents. I remember that Dirk took objection to the proposal. I'd like to know whether Chris and Dirk agreed on this issue at the offline meeting in Dublin. ((T.Yoshie, 2005 Nov 15)) Chris and Dirk disagree as to whether this should be in QCDml. I believe we agreed to make this optional so that I could exclude it from my QCDml IDs and Dirk could include them in his. Can Dirk corroborate this? I think it may be worth revisiting the discussions briefly to make sure everyone is happy with the outcome. (( C. Maynard, 2005-11-15) Yes, I still agree that we could make those entries optional. ((D. Pleiter, 2005-11-15)) I've read carefully Emails circulated in May-June. I'm sorry to bring this matter again, but I think it is not a good idea to make the management sections optional. They contain important information such as "history of documents/config" and "number of configurations" which is useful for users. If we make the management sections optional, such information will not be generally available. I agree, as Balint once pointed out, that management information in QCDml1.1 is not sufficient for managers, and local implementations of management of IDs are necessary. Nonetheless, I think that removing all part of the management sections from our standard is too drastic. I prefer keeping "minimal set of management information" as standard. I understand that Chris wants to remove the management sections mainly because Chris wants to write a valid ensemble XML ID before any configurations are generated or submitted to ILDG. In order to satisfy Chris's requirement, I ask you to consider Dirk's proposal in (qcdml 457): If one foresees an operation 'create' the QCDml could be made valid before it is made public. Alternatively, adding the attribute minOccurs="0" to the corresponding element would also do the job. Both modifications would be backward compatible. ((T. Yoshie, 2005-11-17)) I am in agreement with Chris, that management should be made optional. The reasons are varied. I am making commentary based on an example mark up found on the QCDML web page. First for an ensemble: http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/UKQCD1.xml -- action: add. Why is this useful? It says that someone has added some 829 configurations. Since it does not detail pointers to the individual 829 configurations what use is it? The number of configurations currently belonging to an ensemble can in principle be counted with an XPath query which will be more accurate than a hardwired number. Also this management information has to be manipulated every time data is uploaded, which is a pain. Looking at the configuration metadata http://www.ph.ed.ac.uk/ukqcd/community/the_grid/QCDml1.1/config010170.xml I see actions such as: generate. Is this useful? The fact that the data exists surely means it has been generated. The fact that it has been added by Chris may be useful, but it may be useless especially if it is legacy data and it is no longer known who has generated it. Further for non legacy data, it may not be easy to know who generated the data. A computer may supply a getuid() type function which allows the code to generate a UID on the fly. However, transcribing this to a name may not be so straighforward. Incidentally, I also have problems with and

. For implementation, I'd like to know how the code producing
the metadata (perhaps running on a parallel computer) should find out
what computer it runs on? The machine may have a uname() call to
find out about the processor, and a gethostname() call to find out its
hostname - of course for a parallel machine each node may have a different hostname).
Of course none of these calls are guaranteed to exist (see
the case of the QCDOC). SO all this data would have to be munged into
every config document after the fact (or by including the snippet of
XML into the code somehow). This makes the creation of this tag awkward.
Which means people won't use it. The problems with the  tag are as follows: code may
know its name and its version. What does the  mean from the point of view of the
code? Data of compilation? Date of running? Date of writing? Also what about
multi layer code which uses many packages each with its own version? The precision
tag is redundant and ambiguous. Is it the precision of the code or the produced configs.
The precision of the binary data is now encoded in the
Binary Format XML record and is not needed here.
I believe that ALL the metadata a config needs is encoded in the 
and the  which should be moved into .
((B. Joo, 2005 Nov 17))

Balint says:

 >>	 -- action: add. Why is this useful? It says that someone
 >> has added some 829 configurations. Since it does not detail pointers
 >> to the individual 829 configurations what use is it?

   No we have pointers. As far as I remember, revision number N in
N of the ensemble XML document corresponds to
the revision number N of the config XML document. Thus if you add 829
configs in revision N=1, you will find corresponding 829 config XML with N=1.
   I understand that your headache resides in generating XML IDs
aromatically. I can compromise a bit, though I still think that it is
not a good idea to remove the management sections or make them optional.
I think  and  are mandatory.
If we remove the management sections,
we have to agree on "what to do when ILDG database is changed".
Use cases and possible actions are
   1) when you submit new ensemble:
         just register the ensemble XML ID with a Metadata Catalogue (MDC)
   2) when you want to remove the ensemble:
         just remove the ID from the MDC
   3) when you want to change the ensemble XML ID:
         remove the old one, register the new one
   4) when you submit a configuration:
         register the config XML ID with a MDC and place the config at a TURL
   5) when you remove the configuration:
         remove both the config XML ID and the config itself
   6) when you want to replace a configuration:
         just replace the configuration,
         you don't have to change the config XML ID
(Oh, this is revival of old discussions, someone sighed)
Please note that users cannot traceback/detect any modifications anyhow,
if some kind of  is not mandatory. Are you sure that's OK?
(I suppose that Middleware does not record any modifications.
A log of modifications will be maintained only locally.)
   In addition, I'm worrying about two things; one is contact person
and another is freshness of ensembles/configurations. Users may want
to ask something to someone who is responsible for ensembles/configurations.
If we remove management sections completely, it will be difficult for users
to find a person. Collaboration name is the only clue. Is it OK?
ILDG will be used for a long period of time. Numerous number of ensembles
will be archived. Don't you want to filter out very old ensembles when
you search configurations?  It is impossible to do this, because no
time stamps are recorded in XML IDs. I suppose that no time stamps
are maintained by Middleware.
  Summary of this comment: May I ask you to consider whether we need
some information of revision, person, and time stamp, when we remove
archiveHistory.
((T.Yoshie, 2005 Nov 18))
   OK First point: Regarding the 'revision' and 'add' tag in the Ensemble metadata
and 'configuration metadata'. There is a presupposition that one is able to
add many configurations at the same time, by a kind of 'batch add'. Suppose
I add say 10 configs. Now my ensemble ID is at revision 1, and my 10 configs
have revision 1. Now I add another 10 configs. My ensemble ID document gets
its revision changed to revision 2. Does my insertion tool then have to
ensure that the configuration XML-s of the second batch all have revision 2 in them? I 
believe this is what Tomoteru is suggesting above. I think this is not right. I thought
that the 'revisions' tag in 'management' is a count of the total number of
revisions in the ID document, not some kind of a key between the config ID and
the XML ID. That would be bad. First, it is counterintuitive. Someone looking at the 
documents for the second batch of configs will see that they have 'revision 2' and
will wonder what has changed since 'revision 1'. Secondly it is entirely
possible that there is no batch add. To add say 100 configs, I'll run a
script to add a single config in a loop 100 times. I use the script to add in
things like the CRC checksum, the implementation info and anything else
I can't produce with my code. In this case My ensemble ID
document will
	i) fill up with 100 - "Balint added 1 configuration" tags
	ii) By the end of the process my ensemble ID will be revision 100
	(and using the above logic, my individual configurations will have
	 revisions ranging from 1 to 100)
Is this really what is desired? I would advise against using the revision
tag in the ensemble to point to the configs affected by that revision ation. I am
willing to compromise that the ensemble archivalHistory be made optional, but
would likt to require and its 'actions'
should refer to the ensemble ID document only (ie an 'add' action should refer to
the person adding the ensemble ID rather than adding configuration IDs). This
would make the archiveHistory actions have the same semantics as the configuration
actions (ie add, replace, remove). This way, in adding a configID, I don't need
to touch the ensemble ID.
     I am also willing to make compromise in the case of the configuration IDs
Clearly a contact info is desirable (although it is possible that during the
lifetime of a configuration the contact will move between collaborations)
I would propose the following compromise:
	i) The 'generate' revision action be removed from 
into the  with something like:

   
     T3E-900
     epcc Edinburgh
     Alpha processor
   
   
     UKQCD FORTRAN
     16.8.3.1
     1997-04-04T16:20:10Z
   
   
      Joe Bloggs
      The Gauge Generation Company
      UKQCD
   

         ii) Move the  out of  to the
same level as precision, or into the 
and
  	iii) make   optional.

This is (to my mind) sensible because the generation information is grouped with the other
information to do with generation (code, machine, etc). After step ii) the only thing 
remaining in  is the  and the  tag which is
the number of elements in archiveHistory.
Archival History can then be maintained by the actual archive (perhaps
even in a separate XML document or logging database, or in the document if you prefer) In 
teh case of a separate system, when the XML file is retreived by a query, the
archiveHistory could be reconstructed and pasted back into the
ID document (filling out the optional tag), or not (if the particular
MDC implementation doesn't support this). The archiveHistory tag then
doesnt need to be pre-generated by the user for his document to still be valid,
and is potentially normalised out into an optional piece.
    The rest can be generated by a script I suppose, but my gloomy prediction
is that there will be lots of 'boilerplate information' there and there may
be many errors as people forget to update their scripts.
(( B. Joo, 2005 Nov 18))
As already pointed out in May (see qcdml-452) instead of dropping the
element  a redefinition of the sub-elements would also
allow to take the concerns into account which recently have been
raised again:
1) The archiveHistory in the ensemble XML document and configuration
        XML document stores only those operations which are related
        to the ensemble XML document or the configuration XML+binary files,
        respectively. With this rule none of the examples provided by
        Balint in qcdml-566 would apply anymore. In particular, when
        adding any configurations the archiveHistory of the ensemble XML
        document should NOT be extended, because this is not a change of
        the ensemble XML document itself.
2) In this case the element , ,  as mandatory.
       For revisions, starting with revision=1, we count-up when some change
       is made on the ensemble XML ID.
b) add a mandatory element , in order to record "when generation
       of the ensemble started". I think this is necessary. If 
archiveHistory
       is made optional and some group really drops archiveHistory, the
ensemble
       ID would have no time stamp.
c) archiveHistory is optional.   "add" refers to
       "submission of the ensemble ID to the ILDG, "replace" refers to
       "replacement of the ensemble ID", as proposed by Balint and Dirk.
       Of course, we remove .
d) When one removes the ensemble, just remove the ensemble ID from MDC.
       If you use optional archiveHistory, you may keep the ID for a little
while
       in MDC with  "remove" added.

The manegement section of ensemble ID looks like
      
      
        2
        UKQCD
        Clover NF=2
        2002-04-04T13:20:10Z
        
        
	
              1
	  add
	  
                Chris Maynard
	    University of Edinburgh
	  
	  2003-01-10T15:20:10Z
	  Submit this ensemble ID to ILDG
	
	
              2
	  replace
	  
                Chris Maynard
	    University of Edinburgh
	  
	  2004-02-18T15:20:10Z
	  Modification is made on this ensemble ID
	
        
      

For the management section of configuration ID,
a) keep .
	
The config ID looks like
      
      
        3
        
        
          
            1
            add
            
              Chris Maynard
              University of Edinburgh
            
            2002-04-24T10:25:52Z
            Submit this config ID and config itself to
ILDG
          
            2
            replace
            
              Chris Maynard
              University of Edinburgh

            
            2002-05-24T10:25:52Z
            config is replaced, config ID is unchanged
          
            3/revision>
            replace
            
              Chris Maynard
              University of Edinburgh
            
            2002-05-24T10:25:52Z
            config ID is replaced, config remains
unchanged
          
        
      

      
        
          T3E-900
          epcc Edinburgh
          Alpha processor
        
        
          UKQCD FORTRAN
          16.8.3.1
          1997-04-04T16:20:10Z
        
        
          
            Chris Maynard
            University of Edinburgh
          
          2001-04-24T10:25:52Z
        
      

      ....

      single
      2632843688

      ....

Note that I have changed . As 
is still mandatory, this move is done for aesthetical reasons. In
an earlier phase of this project I would have accepted this without
discussion. Concerning the new element : it is not good
to replicate information within the schema. Furthermore, if this
element is going to be mandatory, why not keeping the 
in the configuration XML mandatory. The submitter is only supposed
to add an element with  equal to add. This element
is essentially equivalent to . Adding any further elements
would be optional. I would accept dropping .
((D. Pleiter, 2005 Nov 21))

I understand that Balint proposes  to make 
optional. If we accept this proposal, only  action (XML
element in this case ) will be mandatory, while add, replace, remove
actions are optional. Another way to do is that we mandate generate
action in , and other actions in  are
optional. I have no preference, as far as "time stamp and person" are
mandatory.
(( T. Yoshie, 2005 Nov 21 ))

Sorry, in my previous comment I meant "generate" and not "add".
Maybe I miss something, but it seems that ...
and generate...
 are equivalent. If  is
mandatory (to be more precise: a list of non-zero length) then
contributors have to insert only one element (I do not think that
we should worry that this element could have  unequal
"generate"). No contributor is forced to extend this list when
actions "add", "replace" or "remove" are performed.
For this reason I think we should change the schema in the following
way:
- ensemble:  is a list of optionally zero length
- configuration:  is a list with at least one element
- ensemble+configuration: element  is removed from
     managementActionType
- configuration: revisionActionType is extended by action "replace"
Element  should remain where it is.
((D. Pleiter, 2005 Nov 21))

It is very good idea that we mandate at least one  in
the configuration . Shall we apply the same idea to
the ensemble ? If we do so,  element
proposed above is not necessary and the schema will be backward
compatible.
((T. Yoshie, 2005 Nov 22))

I disagree. It stops the  tag from being optional.
Which would put us back to exactly where we started from.
For the sake of backward compatibility, I'll go along with an optional

with the semantics as suggested by Dirk for both the ensemble and the
configuration. As long as its optional and I can keep it out of the
documents
the user has to generate, and I don't have to touch the user generated
metadata I'll remain reasonably happy. Likewise with Dirk's semantics, I
don't have to
touch the ensemble metadata when I add a config. That's great.
However, for  to be truly optional, 
must also become optional (as it essentially counts the number of
items in )

This is not purely an aesthetic issue. I want to keep my archiveHistory
subsytem entirely separate from my user submitted IDs.
This reduces the harm I can do if I screw up with an update somewhere.
It also decouples the archiveHistory from everything else.

This will help me write simpler services which has recently become a big
issue for me,
since I am the one  who has to write them all for my collaboration. It also
means I can
later incrementally add sophisticated logging and revision control without
having an
impact on the existing system.
So I shift my position to the following compromise (moving towards what
Dirk has suggested but not all the way):
    - ensemble:  optional list of zero length
    - config  :  optional list of zero length
    - ensemble and config :  optional (for backward compatibility
         with existing documents only)
    - crcChecksum to stay where it is (also for backward compatibility)
If a configuration is questionable the relevant collaboration can be
contacted from the  tag in the ensembleID.

On the issue of 'boiler plate information' which may be ambiguous
(implementation,
algorithm) I still disagree with Dirk on philosophical grounds as to whether
it is a good
thing that we now get people to record more information than they did
previously. However,
the need of backward compatibility forces me to accept that we shouldn't
mess with it.
((B. Joo, 2005 Nov 21))

I disagree with Balint's proposal that  is optional. We have
to mandate  particularly for config ID. Suppose that a contibutor
replaced a configuration and he/she didn't change the config ID. How are
users informed that the configuration is replaced?
 does not refer to the number of  tags in .
 counter is increased by one, if configuration or ID is changed.
(Submitter does not have to add anything to .)
With this counter, users can notice that something is changed.

I compromise as follows (this is the same as that I proposed before.
Some are proposed by Balint.)
    - ensemble:  optional list of zero length
                add new mandatory element  in 
                when we one uses the optional ,
                     archiveHistory/elem/revision is also optional
    - config  :  optional list of zero length
                add new mandatory element  in 
                 has to include date and person information
    - ensemble and config :  mandatory
    - crcChecksum to stay where it is (also for backward compatibility)

I understand what bothers Balint. But we have to take care of user's
point of view.
(( T. Yoshie, 2005 Nov 22))

If the  tag is mandatory, I cannot achieve my goal of not having
to edit the user contributed XML, since I have to update the  tag.
In this case I give up completely and suggest that Dirk and Tomoteru's scheme
of Nov 21 since it is more backward compatible than mandating new elements;

- ensemble:  is a list of optionally zero length
- configuration:  is a list with at least one element
- ensemble+configuration: element  is removed from
     managementActionType
- configuration: revisionActionType is extended by action "replace"
Element  should remain where it is.
- Dirk's semantics for archiveHistory actions (ie reference self only)

At least the coupling between ensembleIDs and configurationIDs has been
removed.
(( B. Joo, 2005 Nov 22))

The  was the biggest problem I had, but that has now gone,
which makes things a little easier. The reason for our current difficulty is
the definition of an ensemble. The definition as "a collection of gauge
configurations" is fine until we realise that this is not static. Ideally
the metadata would be non-mutable, but this is a problem when the definition
of the ensemble changes. I think really our difficulty comes from this. We
say this data is ensemble A. Now we change the data (i.e. extend the
ensemble, or remove some cfgs or whatever) and now we want to be able to
say, " Actually what we said was A is no longer true, what we now mean by A
is this". This is where we have got into a mess. Really ensemble A is still
A, but now we have A1 which is similar to A but has some changes. I don't
think that the ensemble metadata is the right place for this. Currently our
concept of an ensemble is not "this data" but "any data generated with these
parameters". So we cannot generate non-mutable metadata describing this
because it is going to change. We are attempting to construct a fudge, on
one hand non-mutable metadata (i.e. what we say is A is A, which is not the
same as saying we can add information about A, i.e. measurements), and
keeping track of changes to an ensemble. How to reconcile these two ideas? I
don't know, we are close to a fudge which may work, but is this the best
solution? I can accept a fudge, but we are going to run into the same
difficulties when we think about measurements. What about measurements on
half the ensembles, or a later measurements suggests that the auto
correlations are longer, and thus measurements are more widely separated, or
if someone is binning the data, some is measuring every 5, rather than 10
trajectories. I think we should be clear about what the problem is, then
decide if a fudge is OK, or if we need to tackle what the real problem is,
the lack of a clear definition of an ensemble. I appreciate I am not
proposing a solution here, but let's take a moments thought before we accept
a fudge which may bite us later.
((C. Maynard Nov. 23))

I would like to suggest a different compromise:
a) ensemble::: - list is allowed to be of zero length
                                  - element  will be removed
                                  - element  will be removed
b) config:::   - list contains at least one element
                                    which is supposed to have
                                    revisionAction="generated"
                                  - element  will be removed
                                  - element  will be removed
c) ensemble:::      - will be removed
d) config:::        - will be removed
e) config:::      - remains unchanged
f) config:::   - enumeration extended by 'replaced'
I would like to explain why I suggest this compromise:
- I think that Balint's request to change the schemata such that it is
     possible to avoid automatic changes of the user documents by the MDC
     is reasonable.
- Once the  stops to be mandatory (except for the
     information on when the configuration has been generated), the
     elements  do not make so much sense anymore.
- The recently suggested element  is only important when
     searching for new ensembles. Given the rather small number of
     ensembles generated, I think a seperate anouncement mechanism (e.g.
     ILDG sessions during lattice conferences, status reports on ILDG
     workshops) would be more appropriate to provide this information.
- These changes are clearly not backward compatible, but the changes are
     such that already existing documents can be made conform by deleting
     some elements.  Needless to say that I sincerely hope that this is the
     last non-backward compatible change for a long period of time!
((D. Pleiter, Nov. 25))

I think that most of "removed" in Dirk's compromise should be "optional".
I and my collaboration want to record some of them.  Namely, I propose

a) ensemble::: - list is allowed to be of zero length
                                  - element  will be optinal
                                  - element  will be removed
b) config:::   - list contains at least one element
                                    with revisionAction="generate"
                                  - element  will be optional
                                  - element  does not exist
c) ensemble:::      - will be optional
d) config:::        - will be optional
e) config:::      - remains unchanged
f) config:::   - optional enumeration extended by 'replace'
                                   'replace' means either ID or config
(I also replaced "generated" -> "generate", "replaced" -> "replace".

If all of you agree with this compromise, I also agree, although it is
quite unsatisfactory. In case that optional "replace" action is not
recorded in config XML, users cannot detect replacement of
configurations (without downloading configuration itself).
I'd like to ask you to remember and accept this weak point, before
you agree with this compromise.
((T. Yoshie, Nov. 26))

I agree with what Dirk is suggesting. To address Tomoteru's concern
I ask that you consider the following: If the data part of the configuration
changes the crcChecksum will change to reflect this. If the LFN in the
config changes the LFN in the ID document changes to reflect this. If the
Lattice size changes the ensemble ID will change to reflect this.
Precision is also recorded in the metadata. I am not sure off the top of my
head about endianness (but both endianness and precision changes would
change
the crcChecksum). So basically any change to the config should be
accompanied
by a corresponding XML ID replacement.
However, a change in the document cannot be detected without communication.
Either the user has to check back in the database to see if his/her copy is
still
current, or the database needs to inform the user (through say a registered
email) that a document hsa changed. The revisions tag alone cannot solve
this.
Given that other things will change, the revisions tag is redundant I think.
So I think Dirk's scheme is a good one, and I very happily agree to this
latest
compromise.
((B. Joo, Nov 26)

Request for an optional 	tag in ensemble metadata
document 
     ...
     ...
     MILC_COARSE_ENSEMBLE
     ...
   

Purpose and motivation: The label comes from the potential
desire of a collaboration to annotate their ensemble. The
context: MILC has worked hard to create several ensembles
with different input parameters but roughly the same lattice
spacing. They then have a  classifications of their
ensembles as being COARSE, FINE, SUPERFINE etc. The purpose
of the ensemble label is to mark the ensemble as belonging
to one of these classes. It is entirely separate from the issue
of observables and accurate definitions of lattice spacings,
and is similar in spirit to a CVS tag. The ensemble label
would not necessarily have any meaning to anyone outside
the generating collaboration. Since not all collaborations
may wish to use this label feature and also for backward
compatibility it should be optional (minOccurs=0). The
content of the tag would be a string containing no whitespaces
(ie a single word).
((B.Joo, Dec 1,05))