************************************************************************* ======================================================================= Status of MDWG discussions and Remaining works ----------------------------------------------------------------------- MDWG discussions concentrate mainly on QCDML. We have almost completed QCDML proposal. Remaining tasks for QCDML are a) listing up lattice actions used commonly and make a hierarchal tree of actions. b) writing a schema reflecting the hierarchal tree Discussions on binary format and file (packing) format are postponed. We plan to complete everything including binary/file format before Lattice 2004. We have a principle issue to better be discussed community-wide. QCDML has a tag in ensemble XML. Someone says reference (published article/preprint) is required for ensemble/configuration to be submitted to ILDG. Community decision is desired. ************************************************************************* Category A. Ensemble vs. Configuration B. Management (issues to be discussed with MWWG) C. Physics D. Misc ======================================================================= A-1. Are algorithm, machine, code and precision properties of an ensemble? ----------------------------------------------------------------------- (yoshie,Apr13) Algorithm and precision of calculations are properties of an ensemble. Machine and code are properties of configurations. (dirk,Apr13) Machine, code and precision should be properties of the configuration. It should be possible to generate two configurations of the same ensembles using two different (but otherwise possibly identical machines). Also code modifications should be allowed within one ensemble as long as the new program version in principle allows reproduction of old configura- tions. For consistency reasons I would put precision into the implemen- tation section (as it essentially depends on either machine, or program or compile time parameters). Concerning algorithms I would like to suggest moving the parameters section into the configuration metadata, as a minor adjustment of parameters like step size or maximum solver residuum should be allowed. (yoshie,Apr15) I agree that machine and code are properties of configurations because of the same reason as Dirk pointed out. For precision and algorithm, let me consider the following. 1) Some lattice QCD researchers think that hybrid monte carlo has to be carried out with double precision, in particular, when one calculates Hamiltonian for accept/reject. They might want to retrieve only configurations generated with double precision. If we place precision in ensemble xml, it will benefit them. If precision is placed in configuration xml, they have to develop a search engine in a way that it can filter out configurations generated with single precision. It is possible in principle, but the engine has to handle both ensemble and configuration xml, and cost to develop such engine will be higher. Therefore, I strongly suggest that precision is placed in ensemble xml. 2) Auto-correlation time depends on algorithm and its parameter such as step size of HMC. I hesitate to handle sets of configurations with different auto-correlation time on the same basis. Therefore I think that algorithm is better to be placed in ensemble xml. One possibility is that we divide algorithm section into two parts, one including parameters which affect auto-correlation, and one including other information (such as solver residue), and place the former in ensemble xml and the latter in configuration xml. We have already agreed that set of parameters to be marked up in algorithm section is group dependent. In this situation, we (LQCD community) may have to have an agreement that one places configurations with (approximately) the same auto-correlation time in one ensemble. (chris apr 15) I think I agree with Tomoteru here. Code and Machine are clearly properties of the configuration. What happens to the algorithmic parameters, including precision? I think we should be guided by the physics. If you change them, you could change the auto-correlations. Thus the configurations, strictly speaking are not in the same ensemble. So they should live in the ensemble namespace. A particular researcher can choose to ignore this effect and group ensembles that differ in this way as really being one ensemble with algorithmic tweaks, but that is up to them. A search of the ensemble data base would reveal which ones have different "physics" parameters and which ones have different "algorithmic" parameters. (chip apr 15) All of the meta-data for both ensemble and configurations will be stored in a single place (virutally single, with components for each collaboration). So creating a query which specifies that the precision is double will be possible whether that piece of meta-data is in the ensemble or the configuration. At worst, this will be a 2 step process: (1) find the relevant ensembles; (2) find the subset of configurations in those ensembles which are double precision. I think this could be done in a single step. The query might be more difficult if you need to probe both ensemble and configuration, but tools may eventually hide that complexity. (yoshie,Apr16) Thanks, Chip, for inputs from MWWG. I understand that a tool to handle both ensemble and configuration can be developed. I still think that precision is a property of ensemble from a viewpoint of physics. Are all of you happy with the conclusion below? 1) machine and code are placed in configuration 2) algorithm and precision are placed in ensemble (dirk,Apr19) I agree. (balint,Apr21) I believe precision is really a propery of the configuration. Algorithm can be in ensemble. I do know for sure from experience that we have had cases where we had to switch up to double precision and tune algorithmic parameters during a run (although maybe this is something I shouldn't be saying ). This can affect autocorrelations, sure. Did we throw away the previous configs? Of course not. Did we consider everything to be in the same ensemble? Yes we did. Hands up all those who didn't do such things in their previous lives. (yoshie,Apr27) I learned from recent discussions that there are many viewpoints and thinking. Shall we leave discussion at that and ask community opinions? (dirk,Apr27) I do not think that this is a good idea, since the "community" is likely to give rather few responses and is definitely not likely to reduce the number of viewpoints. It should clearly be foreseen that any user can select only configurations generated using double precision numbers. As Chip pointed out this is a solvable problem. It becomes slightly more difficult if a user insists that the step size does not change (I personally would not worry about changes within a reasonable range). But it still remains possible to make an appriopriate selection. As there are different opinions within the WG I think we can safely assume that within the community there will be different the opinions, too. In such a situation I would prefer the more flexible solution, ie. precision and possibly algorithm parameters. (jim Apr27) The example in (QCDML 265) splits algorithm into a part belonging to the ensemble and an implementation part containing machine, code and precision which should be properties of individual configurations. The (QCDML 265) examples fixes the algorithm and its parameters for all configurations. One could imagine variations such as keeping the algorithm type in the ensemble section but making algorithm parameters a property of the configurations. How much flexibility do collaborations desire in tweaking algorithm parameters within the same ensemble? (Chris Apr28) I think the MDWG should decide. I can see the case either way. I think this is what is meant by "abdicating responsibility" ;^) (yoshie,Apr29) Yes, we have to make our proposal anyway. I compromise on precision: Let us place precision in configuration xml, because, nevertheless, it is possible to select configurations using tag. For algorithm, we may split algorithmic parameters to those belonging to ensemble/configuration. Because parameter name value pairs depend on contributors (and also on algorithm), how to split them is in general dependent of contributors. Do you agree on this point? Another issue is that it is difficult to select configurations from one ensemble using algorithmic parameters. If we agree to split algorithmic parameters, do you think some guideline is necessary? ==================================================================== B-1. A mechanism to add/remove configurations to/from ensemble. -------------------------------------------------------------------- (yoshie,Apr13) and in section would be sufficient for this purpose. (Chris,April15) Sorry I am slightly confused. Are you suggesting that we leave the management section as is, or modify? (yoshie,Apr16) Will you look at an example markup I circulated in "qcdml 291"? (Let me call this as QCDML0.3). Comparing with Chris's QCDML0.2, I propose to add subsection in . This is a way we have employed in QCDML draft 4.0. (Chris,Apr19) OK with me. (Balint,Apr20) (refering to QCDML0.31) I personally find that having a "submitData" and "" a little non uniform. We should either have and then another , or two (yoshie,Apr21) I see. Because we may have to withdraw configurations, I prefer a series of rather than a series of . (Balint, Apr21) I refer mostly to the nonuniformity of style. I would be happy with ... ... ... as well as with a series of . Since this is going a growing list, it may be useful to mark it up with s eg: ... ... ... (yoshie,Apr27) I agree ... style. I think ... pairs may not be necessary, because there are no other tags just under . (jim Apr27) ... is good. I also like the subfield proposal ... from (qcdml 313). (yoshie,Apr29) We have almost agreed. (Balint, Apr29) I agree with (yoshie,Apr27) and (jim,Apr27) ==================================================================== B-2. Number of configurations stored in an ensemble -------------------------------------------------------------------- (yoshie,Apr13) We probably need an element like . (Chris,Apr15) Good idea (Balint,Apr20) (refering to QCDML0.31) These I find confusing. It seems to suggest that 500 configs are planned, 430 of which were initially submitted, and 70 which were subsequently submitted. Presumably this therefore is a growable list. It may make the first tag "totalNumberConfs" redundant. This can be calculated using XPath functions in an XQuery or in an XSLT transformation. (yoshie,Apr21) I suppose that such an XQuery can handle negative numbers (which will be necessary when we withdraw configs.) If yes, I agree to remove . (Balint, Apr 21) Consider this simple XML snippet: 50 40 -5 which is similar in spirit to above but with the participant etc removed for speed of typing. The XPath Query: sum(/stuff/archiveHistory/elem/action/numberConfs) returns 85. Below is the actual output from a program: [bj@blantons examples]$ ./basic_xpath_reader_test nconf.xml Reading via nconf.xml via the file interface... Reader Open Complete Enter XPATH expression (exit or quit to finish) sum(/stuff/archiveHistory/elem/action/numberConfs) ======== QUERY RESULT IS A NUMBER ========= ===== Value is: 85 Enter XPATH expression (exit or quit to finish) (yoshie,Apr27) OK, let us remove "totalNumberConfs". (yoshie,Apr29) We have almost agreed to remove "totalNumberConfs". An issue raised by Jim "Should a dependency list be added to QCDML?" is moved to a new item B-5. (balint,Apr29) I agree with (yoshie,Apr27) ==================================================================== B-3. Is in a free text? -------------------------------------------------------------------- (Balint,Apr20) As "free text" this is meaningless for any kind of database search, and is only useful for display. (Unless it becomes a strict enumeration. (yoshie,Apr21) I suggest to keep as a free text. Although it is used only for display, displaying a list of projects for a collaboration is very useful, because researchers can select one from the list. (Balint,Apr21) OK. So the tag content can be useful for say a name of a hyperlink. (yoshie,Apr27) How about this: Hyperlink is highly recommended. Free text is allowed. (dirk,Apr27) I do not think that we should recommend a hyperlink since links might become out-of-date. I would generally assume that this element contains human readable information. We should however recommend that the information does make sense to people outside the project, e.g. "NF2+1-Clover" is ok, "MUHS56X" is bad. I do not care whether this human understandable part is a substring of an URL like in Tomoteru's example "QCDML0.32". (jim,Apr27) Project names are likely to be free text. They are useful for humans to view when browsing the metadata. While browsing, project content can be a useful search term for finding related ensembles e.g. sets of ensembles having different choices of gauge coupling and quark masses at fixed lattice spacing. (yoshie,Apr29) Then, we conclude that is "understandable free text". Balint, do you agree? (balint,Apr29) I agree. The hyperlink idea was not an idea for compulsion, but a possibility of what a HTML rendering engine could maybe do with the tag. Sorry if this point was not clearly made ==================================================================== B-4. How to deal with if no preprint is published -------------------------------------------------------------------- (Balint,Apr20) What if we haven't written a preprint yet? (yoshie,Apr21) I added (required) in QCDML0.31, because I believe that configurations public to the ILDG should be supported by a written article. (Balint,Apr21) OK. This is then a matter of policy. I am thinking here of how we would work. We'd generate our configs and mark them up and put them on the Grid straight away as in our case the data grid is our "big disk". However the data cannot become "publicly available" until a paper (or at least a preprint is written. Somehow then we may need to distinguish between public and non-public configs -- this of course may not be in the remit of the ILDG MDWG. (yoshie,Apr27) Yes, this is really a principle issue to be discussed beyond MDWG. (jim Apr27) The reference should resolve to a known URI when an ensemble is published. The URI chould point to a web page with a list references updated as more published papers become available. (yoshie,Apr29) Can be conclude that any of the followings is allowed? reference to an article (Phys.Rev....) an URI to an article (http://xxx.lanl.gov....) an URI to a list of articles (http://www.rccp.tsukuba.ac.jp/publist_for_Nf3_project) (Balint,Apr29) I am happy to have a URI. Can I ask for compromise, that if the reference is to an article the URI point to a web page where this artile is described -- either to a private web page or say the Phys.Rev. web page for the article? That way we have uniformity -- that the tag is a URI (yoshie,Apr30) Then, URI (of publisher's or archive or private) to an article, or URI to a list of articles ==================================================================== B-5. Should a dependency list be added to QCDML? -------------------------------------------------------------------- (jim, Apr27) The Fermilab (meta)database will track the dependency tree (forest?) of individual configurations. We envision obtaining the parent dependency by requireing a user to also specify the starting (parent) configuration identity when adding a new configuration to the database. The example in (QCDML 265) contains such a dependency section under . The current QCDML scheme of assigning a series and sequence number to a configuration does not track dependencies among configuration series. Should a dependency list be added to QCDML? How should it be encoded? (Chris, Apr28) Nearly two years ago I chose series and sequence as a way of identify the configuration in the old ukqcd schema. Clearly this is not the only way of doing it. I don't have a strong preference, and clearly it has to satisfy everyone's design requirements, but we could have discussed this before. \begin{rant} I find it rather frustrating that we never seem to get to the end of the design. I don't mean to have a rant at Jim in particular, we have all been guilty of doing this ;^(. It does mean that this process takes a very long time. I am/was concious of making design decisions that had consequences down stream that I didn't necessarily understand. Hence the need to make things extensible, so that we can change the schema in the future without breaking the current version, e.g. Old XML IDs are still valid against the new one. We will only find out how well our schema works once we start using it. We will have to change the schema, but hopefully it will be extensible. Even if it isn't we can still change things, but it is more painfull. What I am trying to say is at the schema will evolve but we need to have a starting point, and start using it. Preferably now. \end{rant} Sorry folks, but I think it needed saying. ;^( Back to the specific. I don't have any attachement to series and sequence, but if we are going to change it can we have a proposal from SciDAC quickly please. (yoshie,Apr29) A dependency in configuration level is interesting, but I think it is difficult to realize for actual cases. A collaboration may loose a predecessor due to an accident. This really happened in CP-PACS collab. I think and are sufficient for us to trace history of HMC etc. (balint,Apr29) While dependency lists are potentially useful they can create a design and maintenance headache. I would go along with (yoshie,Apr29) here, with maybe a "to be added later" on the dependency list issue. (jim,Apr29) Yoshie observes that lattice QCD data may be lost by accident or be transitory like a valence heavy quark propagator. It's important to note, however, that the corresponding metadata is permanent. Once a metadata GFN is assigned it can never be reused for something else. Another way of saying this is that having zero replicas of the lqcd data is a vaid state. Now, a straw man proposal for tracking configuration provenance: www.ph.ed.ac.uk/ensemble1 Alpha-B 10000 I am willing to defer this discussion of this issue to a later version if the committee so desires. (yoshie,Apr30) I propose to put this issue on the list of "to be discussed for a future version of QCDML". ==================================================================== C-1. To what level we group actions (e.g. DBW2, IwasakiRG, ... vs. sixLinkGaugeAction) (e.g npWilson, tpWilson, 1looptpWilson ....) -------------------------------------------------------------------- (yoshie,Apr13) I think that this is the most important, an headache and still open for discussions. My opinion is the same as written in QCDML draft4.0. Note that SciDAC proposal is different, It uses sixLinkGaugeAction accompanied by label or annotation. (dirk,Apr13) In the old proposal by construction actions have been grouped by structure. I would like to preserve this and therefore favour the SciDAC proposal. (yoshie,Apr15) Let me consider use cases. Search engine will first list up names of actions stored in ensemble metadata catalog. User then will select one of actions. Therefore the name of the action has to be familiar with researches. In other words, it should be a community standard. I suppose that sixLinkGaugeAction is not a standard, and users want to search in terms of Symanzik, DBW2, IwasakiRG ... These standard names should be easily listed up by a search engine. Thus I propose a rule to markup actions. 1) the xpath "physics/action/gluon/*" or "physics/action/quark/*" is always a (standard) name of action. Grouping actions may benefit us. I propose the following. 2) We group actions by mathematical structure and place the name of the group under the name of the action. (See below for an example.) We guarantee that a name of parameter has the same meaning within the group of actions. e.g. is always the coefficient of rectangular loop. Another issue is to list up names of parameters for a particular action. I add an item to be discussed "C-3. How to describe parameters of action" below. I show an example to demonstrate what I'm thinking. 3 www.rccp.tsukuba.ac.jp/ildg/iwasakiRG.xml linkGluon 2.3 3.648 -0.331 www.rccp.tsukuba.ac.jp/ildg/npClover.xml wilsonQuark linkGluon 2 0.1350 2.01752 www.rccp.tsukuba.ac.jp/ildg/npClover.xml wilsonQuark linkGluon 1 0.1330 2.01752 (Chris Apr15) This is what I have been trying to do with the mark up so far using the substitution groups. So that actions have a gluonAction part. a sixLinkGlounAction is a specficic sort of gluonAction, and iwasakiRGSixLinkGlounAction is a special case of general sixLinkGlounActions. The naming convention is rather clumsy, but that's only because Xpath doesn't support substitution groups. Xpath2, out later this year does, so we could go for a less clumsy naming convention. Thus you can search for sixLinkGaugeActions, and would get back (as an example) iwasakiRGGluonAction, because it is an instance of a sixLinkGaugeAction. This Object Orientation is very useful. It also keeps the XML IDs rather simple, and we don't get deeply nested XML elements, which keeps Xpath simple. If we stick with wants in QCDML0.2 this will work. (yoshie,Apr16) I understand well that hierarchical structure of actions is realized in schema level and that we have moved operators (used to realize hierarchy in the old version) to glossary. These are already agreed. It is a different issue to design search-oriented QCDML. A search engine does not have to refer to the schema. (The search engine developed at Tsukuba (http://lqa.rccp.tsukuba.ac.jp) does not refer to the schema.) I'm also reluctant to design QCDML depending on a future technology like Xpath2. I think it is a good idea to reflect (a part of) structure of actions at XML document level, because flexibility of searches will be increased, leading to my proposal written in my message (yoshie,Apr15) above. A search engine (which does not to refer to the schema) can list up both names of actions (DBW2, iwasakiRG..) and names of action groups (sixLinkGaugeAction) if necessary. In this context, what I did is to add action group name to Chris's QCDML0.2. I think that the change does not spoil any ideas we have already agreed. Other changes 1) parameters of an action is moved to subsection (this is discussed in the item C-3) 2) splitted an action into and (I create a new item to be discussed, C-4. Do we split action into gluon and quark? ) 3) description of boundary conditions is moved to section (I create a new item to be discussed, C-5. How to describe fields and boundary conditions) (Chris,Apr19) 1) Sorry, I really did think we had agreed most of this stuff, but obviously we haven't! OK. Thoughts on this. We can go this way, and this was the original motivation for having the operator structure in the XML ID. However, SciDAC and others wanted a much simpler structure in the XML ID, so we came up with the new method, which keeps the structure in the XML schema, so that the ID's are kept simple, to simplify the XPath query. In the past I have argued for the former, that structure is good, but we have moved away from that. This is type of labelling of the type or class of action is simple, but doesn't allow us to inherit properties, i.e. NP clover is a clover action, which is a Wilson quark action, or even that a Non-deg Wilson is also a Wilson, because it's just a flat label. This mechanism for labelling the type or class of action is not very extensible. I think we either have all the extensible structure in the XML ID (which some people don't like) or we use the OO ideas of inheritance in the schema. (dirk,Apr19) I also thought that we agreed on using substitution groups to inherit the structures of actions. I think that it is a smart way of keeping the schema simple and extensible. One of the reasons for insisting on grouping actions by structure is because I think that in many cases actions which share the same structure but differ in the way the couplings/parameters have been determined should be considered as identical actions. This, e.g., applies to Tomoteru's examples "tpWilson" and "1looptpWilson", since we otherwise will end up with actions like "John's-favourite-tp-Wilson". If somebody is looking for a set of configurations, he should know for which parameter values he is looking for. Even in the NP case you might not want to use configurations where improvement coefficients have been determined by different groups and where these coefficients therefore differ due to different improvement conditions or statistical errors. (yoshie,Apr20) I have already agreed on substitution groups for inheritance structure of actions since just after the ILDGWS3. However, this does NOT mean that any structure of actions should be avoided in XML ID level. I have raised this issue again because there are different opinions on naming of actions. I prefer using DBW2, iwasakiRG,Symanzik etc, while SciDAC and Dirk prefer sixLinkAction. My proposal is a compromise. There are three types of markups, a) ..... (QCDML0.2, inheritance realized by schema with a long name of action) b) ..... (inheritance realized by schema, but a tiny part of action structure is explicitly written in XML ID.) c) iwasakiRG ..... (my guess of SciDAC and Dirk's proposal) I prefer b). If you disagree, I compromise to use b') ..... (not iwasakiRGSixLinkGlounAction). I think a name of action should be a community standard which often appears in references. I'd like to have inputs from SciDAC. (Balint,Apr21) In this case iwasakiRG action is a derivation of sixLinkGaugeAction (by specialising the meaning of parameters). The derivation is handled through the substitution group but current XPath cant cope with that. Hence Tomoteru wants to put an explicit pointer to the "base class" by adding in the Two uses arise: i) User wants to find all sixLinkGaugeActions In this case adding in a tag into the Iwasaki action is not helpful, as it is nested one level deeper than the Iwasaki action tag. One could add the sixLinkGaugeAction as an attribute ie: at least then the "label" is at the same level as iwasakiRGAction. However since sixLinkGaugeAction is probably the name of a substitution group, one would probably want type to be an enumeration eg: sixLinkGaugeActionType. I should add that checking the extra tag/attribute is at least as much work as doing a BOOLEAN OR of desired type names (which is what was on the cards until support for substitution groups came along.) I also note that SciDAC doesn't seem to like attributes on tags. ii) User wants to find all iwasakiActions. In this case matching iwasakiRGAction is straigtforward. To make this more concrete, allow me to add a snippet of XSLT that we used to render Chris's prototype actions for ILDG3: For now I see no simple alternative but to define a fixed set of such actions Note the rendering or in this case the XSL (or later the database for queries) does not depend on the Schema but recall that one of the purposes of the schema is standardisation. In the future when XPath 2.0 comes along all these things will depend on the schema (since XPath 2.0 depends on it). (yoshie,Apr27) How about one-level more nested markup? = ensemble1 = = ensemble2 = I suppose one can do the following. a) searching all iwasakiRGAction b) searching all sixLinkGaugeActionw c) listing up all actions (iwasakiRGAction and DBW2Action) belonging to sixLinkGaugeAction (jim Apr27) Having separate derived classes for the iwasakiRG and DBW2 actions seems an unnecessary schema complication since they are both the same six link action with different choices for setting numerical values for the couplings. The same comment also applies to the family of Wilson/Clover quark actions. If the MDWG accepts the proposal to label actions by commonly their used names, I have the following comments: The hierarchy of action types need not be made explicit in the ensemble XML instance documents. Encoding such hierarchies is the schema's job. The metadata catalog *must* be fully schema aware so that the query "give me all ensembles with a sixLinkGaugeAction action" returns both instances of IwasakiRG, DBW2 and ... actions. Any workarounds for XPath 1.0 query deficiencies should be addressed at the level of the database design. The present proposal's encoding an action's supertype in the XML is incomplete since it makes explicit only one higher level of an action hierarchy. (Chris,Apr 28) I think I agree with Jim. The present proposal has a flat structure, and so isn't very extensible. We have the full structure in the schema, Certainly UKQCD's metadata engine reads the schema before trying to read XML IDs so I would have thought the structure in the schema is sufficient. Before we did have structure in the XML ID as well, but lots of people didn't like this! Baring in mind my earlier rant I don't want to prolong things. I am a bit nervous about the extensibility, of but at this point I would rather we had a schema agreed than discuss for much longer. In conclusion. I am not sure this is a good idea, but seeing as we have structure in the schema and I would like discussions to conclude I will reluctantly accept this rather than keep discussing! (yoshie,Apr29) Almost all of you do not like . I proposed this to compromise. Someone want to use , others want to use , .... OK, I withdraw my proposal. I still prefer , ... strategy. If someone wants to listup all actions belonging to sixLinkGaugeAction, one has to refer to schema. That is OK to me. Let me note that how to group actions (in schema level) is to be discussed later. A category of "rectangularLoopAction" is also possible. (Balint,Apr29) I agree with Chris and Jim. Schema awareness in search engines is a good thing. Until deficiencie's of XPath are fixed simple boolean ORs on the names should suffice as explained my earlier example. I note, that I think our current search engine is schema aware, but only for constructing the query. Running the query itself requires no schema awareness currently, but will in the future with XPath 2. I really don't think this issue holds us back. (yoshie,Apr30) We have almost agreed with "no action hierarchy in XML ID". Do all of you agree that we use iwasakiRG, DBW2 ... rather than sixLinkAction npClover, tpClover, 1loopClover... rather than Clover ==================================================================== C-2. How to describe non-degenerate quark action -------------------------------------------------------------------- (yoshie,Apr13) I believe that grouping action and physics parameters is the only realistic solution. (Chris, Apr15) Er ... I thought we had agreed this one. 2 This is for degenerate and for non-degenerate 2 1 That is that numberOfQuarks is not integer, but a 1d array of integers. ... later on 0.135 0.1386 I haven't really thought about the names, but you can xpath query for the second instance of an element, so they could both be kappa. Maybe we don't want that. One could discover how many instances there would be by counting how many elems there were in numberOfQuarks. This seems to be a very good solution to me and I thought people were happy with this! (yoshie,Apr16) I don't think we have agreed. I have raised this issue again in qcdml 257 (yoshie) and proposed that we adopt the idea we employed in QCDML draft4.0. That is We classify flavors into a group if the action and coupling parameters are the same with each other, and mark up for each group. We got a (partially) affirmative answer from Chris (qcdml 258) and an affirmative answer from Dirk (qcdml 259). The idea is not realized in Chris's proposal above (which is the same as in SciDAC-QCDML.ps, circulated just after the ILDG3). Let me split discussions into two cases. 1) quark action is the same for all flavors but quarks are non- degenerate. 2) quark actions are different for each flavor. An example for case 2) is that u,d quarks are simulated with twisted mass quark action s quark is simulated with np clover quark action In this case, I suppose everyone agrees the following markup like 2 www.../twistedMassQuarkAction.xml ....... 1 www.../npCloverWilsonQuarkActionn.xml ....... (I skip here a discussion of 1 versus 1) What we agreed in QCDML draft4.0 is that the same principle should also be applied to case 1) like 2 www.../npCloverWilsonQuarkActionn.xml ....... 1 www.../npCloverWilsonQuarkActionn.xml ....... (This is the case for Nf=2+1.) A serious defect in SciDAC-QCDML.ps is that we would have to prepare a lot of action names, markup patterns (and glossaries) to describe general cases: for degenerate (Nf=3) case we would have 3 www.../npCloverWilsonQuarkActionn.xml ....... for Nf=2+1 case, 2 1 www.../nonDegNPCloverWilsonQuarkActionn.xml ....... for Nf=1+1+1 case, 1 1 1 www.../threeNonDegNPCloverWilsonQuarkActionn.xml ....... We have to prepare 5 patterns for 4 flavor cases. Of course, simulations for Nf=1+1+1 and 4 flavors will not appear in near future, but I think QCDML has to be designed in a way that it withstands any possible future use. I again propose to go back to the method we employed in QCDML draft4.0. (Chris Apr19) Sorry Tomoteru, I wasn't claiming that we shouldn't discuss stuff because it was "agreed", but we have had discussions with time deadlines which passed, so I had assumed that this meant the issue was closed. OK, so its not! The issue of number of flavours I think follows my previous comment. We need structure for extensibility. So it either goes in XML ID which is then quite verbose, or it goes in the schema with simple XML IDs. People have argued for that latter. The way I have proposed doing nonDeg actions means that yes, the means more extension to classes, but why is that a defect? It really does only take me 5 minutes to make a new one. We could even ditch the nonDeg label, then there is no need for new classes, maybe this is not a good idea but one could find out by searching for how many elems there are. Obviously if the action is different then one has to have another action piece. Looking over the examples there would be no need to differentiate between 2+1 and 1+1+1 in the action name. We haven't called the first one 2+1, but that can be discoved by examining the number of quarks array. Bottom line on this one. I am not adverse to verbose XML! I have argued for it in the past! If we use Tomoteru's proposal the XML documents are longer, others have argued against that, so perhaps they should comment. (yoshie,Apr20) I thought we agreed on We classify flavors into a group if the action and coupling parameters are the same with each other, and mark up for each group. See (qcdml 257), (qcdml 258), (qcdml 259). As far as I remember, no objection to this was circulated. I'm not claiming. Anyway, we have to compromise. How about this for non-degenerate case with the same action? 2 1 www.rccp.tsukuba.ac.jp/ildg/npClover.xml 0.1350 2.01752 0.1340 2.01752 Points are a) we do not prepare nonDegNPCloverWilsonQuarkAction b) we group flavors if all couplings are the same and place under tags. I like to avoid the situation that for degenerate case, for non-degenerate cases. It would be happy if hopping parameter is always referred to as . Please ignore here issues to be discussed elsewhere, e.g. whether we use ... (Chris,Apr20) I think this is a very good idea. (Balint,Apr20) (refering to QCDML0.31) This part has been causing much debate. Personally I'd probably mark it up as follows: 0.1350 2.01752 0.1350 2.01752 0.1340 2.01752 This last one is a little more verbose. Note that the total number of flavors is easily obtained by the XPath query count("/markovChain/physics/action/quark/npCloverAction/parameters/elem"); It also allows expression of an arbitrary no of flavours with no worries about the equivalence of a 2 + 1 and a 1 + 1 + 1 simulation (where in the second, the first two flavours happen to be degenerate) (yoshie,Apr21) Balint's proposal is interesting. Two questions. 1) Does an XPath query count("/markovChain/physics/action/quark/*/parameters/elem"); work? I'm asking because total number of quarks is the most important information and should easily be counted by a search engine before listing up actions. It will also be necessary for the case we use different actions for different flavors. 2) "Degenerate or non-degenerate" is also important. When we want to make a list like #quark kappa cSW 2 0.1350 2.01752 1 0.1340 2.01752 from Balint's markup, a search engine has to do some matching work. Do you agree that it is a task of a search engine? If yes for both 1) and 2), I agree to drop numberOfQuarks and numberOfSeaQuarks. (Balint, Apr21) First: Consider the following XML snippet: 0.1350 0.1350 2.1 0.1350 2.0 The result of running query count(/stuff/quark/*/parameters/elem) is 3 Program output: [bj@blantons examples]$ ./basic_xpath_reader_test yoshie.xml Reading via yoshie.xml via the file interface... Reader Open Complete Enter XPATH expression (exit or quit to finish) count(/stuff/quark/*/parameters/elem) ======== QUERY RESULT IS A NUMBER ========= ===== Value is: 3 Enter XPATH expression (exit or quit to finish) So the count() idea demonstrably works. The issue of degenerate quarks is more nontrivial for the usual reasons: i) Comparing floating point numbers can only be done to some epsilon. This is already a problem when looking for an individual kappa. So the same "fuzzy" matching you use to look for an individual kappa can be used to identify "degeneracy" within that value of epsilon. ii) One can compare string values but that is error prone iii) Matching several tags (kappa and cSW) can be cumbersome Is the feature of considering 2+1 and 1+1+1 say as separate really meaningful? Are they not both 3 flavour simulations? What is it we need to do with the degeneracy information except to reduce the number of rows in a table? (yoshie,Apr27) I understand there is no difficulty to count number of dynamical quarks, if we employ .. markup. Shall we take this markup? For gauge action, I prefer employing the same structure. I suppose it would be necessary to distinguish 2+1, 1+1+1, 3. This is not a point here. What we have to agree is that distinguishing them is a task of search engines, not of QCDML. (jim Apr27) The example in (QCDML 265) proposed recording the total number of flavors in as well as listing the number of flavors each quark action represents. As we see, totaling the number of flavors is not a problem. Then we need only record the number of flavors each instance of quark action represents. ... ... // list of type abstractQuarkAction 1 0.1333 ... 2 0.1380 ... (Chris Apr,28) Great. Is this agreement? (yoshie,Apr29) Everyone agrees to remove total number of quarks. The number of quarks for each action has to be discussed more. I proposed to remove the number of quarks, but I should have considered Balint's proposal in more detail. Namely, we have several cases that one parameter set describes many flavors, e.g. KS action and twisted mass quark action. Jim's proposal sounds good, but describing many quarks for WilsonClover quarks is not apparent in the above example. How about this? ... 2 0.01 1 0.02 ... 2 0.1345 0.02 This is for five flavor QCD. two degenerate quarks with KS action ( det(D)^{2/4} ) one non-degenerate quark with KS action two degenerate quarks with twisted mass action Describing many quarks for WilsonClover quarks is also done in the same spirit. ... 2 0.1350 2.012 1 0.1340 2.012 Counting total number of quarks is easy, making tables is also easy. (Balint, Apr29) In the snippet below: 2 0.01 1 0.02 2 0.1345 0.02 The query to count the total number of flavours is easy: sum(/quark//parameters/elem/numberOfFlavors) ======== QUERY RESULT IS A NUMBER ========= ===== Value is: 5 Counting the no of parameters with the KSAction and twistedMassWilson individually is easy: sum(/quark/KSAction/parameters/elem/numberOfFlavors) ======== QUERY RESULT IS A NUMBER ========= ===== Value is: 3 sum(/quark/twistedMassWilson/parameters/elem/numberOfFlavors) ======== QUERY RESULT IS A NUMBER ========= ===== Value is: 2 The counting of the detailed breakdown, that that the KSAction is 2+1 is not so easy. Requires a "for loop" over elems. This cannot be done in straight XPath, but should be doable in an imperative query language like XQuery/XSLT which allows the for loop to be done. I therefore acree with (yoshie,Apr29). ==================================================================== C-3. How to describe parameters of action. -------------------------------------------------------------------- (yoshie,Apr15) Parameters of an action have to be easily listed up by a search engine. I propose a rule 1) the xpath physics/action/gluon/*/parameters/* is always a name of parameter. I prefer regarding the number of dynamical quarks as a parameter of an action, because the number of quarks will be displayed together with other parameters. The total number of quarks is better to be moved to outside physics/action/quark. See an example in C-1. (Chris, Apr15) Again with the substitution group, the element kappa, or beta, or mass are all couplings, so one can search for kappa, or couplings. As for number of quarks see above. A general comment. Action has had lots and lots of discussion. At the last ILDG meeting I presented a possible solution, which seemed to satisfy everyone. I then refined and posted it to this list which again everyone seemed happy with. I don't think it fails any if the above tests, so I think we should stay as we are. (yoshie,Apr16) Again I consider use cases. A search engine which does not refer to schema and hence does not interpret substitution group, will not be able to know that elements kappa, beta .. are couplings. If we encapsulate elements kappa, beta .. with (naming of the tag can be or any other) like 0.1350 2.01752 such search engine can easily find names of couplings. A comment for Chris's general comment: I understand that we have agreed after ILDG3, a) the method to maintain inheritance property of lattice actions b) separation of ensemble and configuration xml c) preparation of glossary xml by contributors. I understand that your markups in QCDML0.1-0.2 are starting points for these discussions and still have much room to be improved from a viewpoint of use cases. Another general comment: We have discussed QCDML very extensively to complete the QCDML draft4.0 (which is once approved by this working group.) I propose to keep alive spirits/ideas/strategy/.. written in the draft4.0. We have discussed, for the draft4.0, items C-1, C-2, C-4, C-5 here and reached conclusions. I bring them up again, because QCDML0.2 is quite different from QCDML draft4.0 and I afraid that QCDML0.2 spoils good ideas in the draft4.0. Please note that in QCDML draft4.0 elements kappa, beta ... were encapsulated like 0.1340 because with this way a search engine can easily find coupling names. (Chris Apr19) See above. I deliberatly didn't do this so that the XPath query was kept short, as SciDAC required. I am happy with structure. SciDAC weren't so perhaps they should comment. (Balint Apr21) If the parameters form some kind of a list (eg in the case of the fermion action) then it is useful for them to have some kind of outer wrapper eg: Note that the tags are not strictly necessary. They could be more meaningful eg: whether this is or is a matter of taste. In the case of the gauge actions we have instead just one group of params (beta, u0,c1 ) so far. It probably doesn't hurt to put these in a couplings/parameters tag. The difficulty I see is that in the case of fermions we deal naturally with a list of (possibly degenerate) flavours whereas in the case of the gauge action the list nature is hidden by the naming. Someone has to give. Possible compromise 1: is always a list (possibly ) which can contain one element. For fermions as before and for gauge: Possible compromise 2: Simply decide that gauge and Fermion action are fundamentally different because a fermion action can have more than 1 flavour and a gauge action does not. Interpret the meaning of parameter appropriately (ie as a list for Fermion and as not a list for gauge) (yoshie,Apr27) As I say in C-2, I prefer .. markup even for gauge action. (jim Apr27) I favor having a list of paramters with enclosing tag for both gauge and quark actions. (Chris Apr 28) I am happy! (surely not ;^), no really I am! (Balint, Apr29) I too agree. ==================================================================== C-4. Do we split action into gluon and quark? -------------------------------------------------------------------- (yoshie,Apr16) This is also a revival of discussions. I want to split lattice action explicitly to gluon and quark like ..... ..... A search engine can easily find which is action for gluon and which is action for quark. This enables researches to search interactively. Researches first reduce number of candidates of ensemble with gluon action, then reduce further with quark action. (Chris Apr19) Now is something that has been agreed for a quite a while! (dirk,Apr19) I do not understand this discussion as QCDML0.2 does split the action in a quark and a gluon part. (yoshie,Apr20) Chris, Sorry, I don't remember when and where we agreed. Um, it seems to me that you think every detail of your markup is agreed. I thought global structure/strategy was agreed. Dirk, QCDML0.2 proposes I propose one-level more nested structure realized at XML ID level. (Chris,Apr20) Ah. Sorry, I got slightly confused again. As for agreement it's more the other way round. I don't think that everything I marked up is agreed, but that I marked up as I thought we had agreed :^) I don't mind the encapsulation, but SciDAC wanted a short XPath to the couplings, so I didn't put it in. I think they should commment. I had proposed a strong naming convention as a cover for XPath2.0 which can understand "is an X" so that a search with XPath1 can see that npCloverWilsonQuarkAction is an Action, a Quark Action, a Wilson Quark action, a Clover Wilson quark action, and a np clover wilson quark action. XPath2 should (allegedly) be able to understand this from the schema without the long names. If we want structure in the XML ID, then the following is a bit messy. where nameOfGluonAction is also an XML schema defined complex type of specific gluon action. The element attempts to classify this information, bit has no structure making it susceptible to difficulties with extension. This is combining both features - a bit messy. The extra enscapsulation I can live with, and I think we could classify the actions this way, plaquette actions, sixLinkactions, Wilson, SW (or Clover), staggered, but I am still a bit concerned about extensibility. As long as we keep the full structure somewhere (in the case the schema) then I suppose we are OK. (yoshie,Apr27) We are discussing here whether we separate gluon and quark actions explicitly with extra tags and , and one-level more nested. I understand all of you have no strong objections. (jim Apr27) Agree. Action section is divided into gauge and quark actions. see discussion under item C-2. (Chris Apr 28) Agree (yoshie,Apr29) We conclude and structure. (Balint,Apr29) I agree too. ==================================================================== C-5. How to describe fields and boundary conditions -------------------------------------------------------------------- (yoshie,Apr16) This is also a revival. In QCDML0.2, boundary conditions for gluon fields appear twice. This is because fields are placed under action. Properties of fields (boundary conditions, normalization...) is better to be placed in an independent field section, as we did for QCDML draft4.0. (Chris,Apr19) This is because the boundary conditions are properties of the field, and not the dimension, surely. (dirk,Apr19) I do not think that there should be a separate fields section. But Tomo- teru pointed to an issue which needs clarification. I think it is good to foresee "gluonField" within the element "generalQuarkActionType/field" since a different gauge field might be used for the fermionic part of the action (e.g. fat links). But it seems to me that both, documents with and without an element "gluonField" within the element "generalQuarkActionType/field" conforms to QCDML0.2. If my understanding is correct then the attribute 'minOccurs="0"' should be ommitted. (yoshie,Apr20) OK, I compromise if you agree a general rule that one omits description of gluonField from quark action when it is the same as in gluon action. I suppose minOccurs="0" is correct. By the way, don't we have to consider non-degenerate quarks with different boundary conditions, eg. ud periodic, s antiperiodic? I agree it is pathological, but I suppose it is allowed field theoretically. (Chris Apr28) Agree. Can we have gauge group back. UKQCD would like to be able to do large Nc calculations and in principle these would be in the ILDG. Thus we need gauge group. (yoshie,Apr29) We have agreed the general rule. Gauge group is fine to me. ==================================================================== D-1. Are glossary documents dependent of groups? ------------------------------------------------------------------- (yoshie,Apr13) I think everyone agree that glossary documents are group dependent and that MDWG proposes a guideline later. (yoshie,Apr27) This is agreed by everyone, because more that one week passed. (jim, Apr27) Glossary elements should be replaced by a documentation link. They are not expeced to be seachable, contents dependent on Collaboration. (yoshie,Apr29) How about this? A documentation link is OK. If some group wants to markup with XML, it is also allowed. (Balint,Apr29) Jim's suggestion is a good one, since it simplifies the Markup to a URI and removes any further future hassles to the MDWG from group dependent XML. Allowing group dependent XML markup seems to dirty the standard. However despite this misgiving, if there is strong feeling about having the XML included I will in the spirit of compromise go along with it. (yoshie,Apr30) Do all of you agree that glossary is a URI to a documentation provided by contributors ==================================================================== D-2. Gauge fixed configurations -------------------------------------------------------------------- (yoshie,Apr13) QCDML does not have to care gauge fixed configurations, because gauge fixing can be done easily with low CPU cost. (dirk,Apr13) Fields necessary for marking-up gauge fixed configurations could be added with little overhead. As many groups are using these kind of configurations, I think it would be good to foresee such fields. (yoshie,Apr15) I agree if the markup of gauge fixing is done with little cost. We probably omit description of gauge fixing algorithm and parameters like convergence criteria. Will someone provide us with a sample markup? (Chris,April15) I think we can add it later! (Balint,April21) Perhaps one could go on the premise that a gauge fixed gauge field is still just a gauge field. Would people care about searching for already gauge fixed gauges? (yoshie,Apr27) Sorry, Balint, I don't understand your point. I really want to distinguish gauge fixed configurations from unfixed configurations. Gauge fixing requires some numerical procedures. It may suffer from Gribov ambiguities... Shall we go back to Chris's proposal, "Let us add it later" ? (Dirk,Apr27) I agree with Tomoteru that we have to distinguish between configu- rations with and without gauge fixing. This is also why I think we should foresee this possibility from the very beginning. I would suggest to make gauge fixing parameters to be a property of the ensemble and to add the following element to : (jim,Apr27) Let's defer all the subtleties of describing gauge fixing beyond providing a tag with boolean value and a tag specifying a ildg keyword for more common fixing conditions e.g. CoulombT etc. (yoshie,Apr29) I suppose everyone agrees with Jim's proposal. Let us stop discussions for a moment. (Balint,Apr29) I agree too ==================================================================== D-3. Should the configuration metadata provide a CRC for the config? -------------------------------------------------------------------- (dirk,Apr13) Since we might consider SciDAC's LIME as a standard for packing configuration, it might make more sense to move the checksum inside a LIME Record and to remove it from the metadata. This would give us the freedom to define separate checksums for different messages within a LIME file. (yoshie,Apr15) I prefer keeping a checksum in configuration xml. One may unpack the LIME (DIME..) file before using the configuration, and want to check the CRC on fly. Another point is that the checksum is a property of configuration and all configuration information is better to be systematically marked up in xml. I agree that the checksum can also be placed in LIME record. (I'd like to postpone discussions on file format. Completing QCDML has higher priority.) (chip, Apr 15) I would prefer a simple CRC as a meta-data property of the configuration file, so that a generic utility can check the validity of the file without needing to understand LIME. This is being done elsewhere in the data grid community. (Balint,Apr20) Regarding checksums and LIME: Currently, in the code I use which does SciDAC I/O the following record structure is followed (more or less, I may be missing out internal records -- but the point is not obscured) Record: File Private XML Record: File User XML Record: Record Private XML Record: Record User XML Record: Data Record: Checksum Ie each binary datum has a separate LIME Record to hold its checksum. I see the mapping here that the Ensemble XML can become the File User XML and that the Configuration XML can becom the Record User XML. Then in the current model, the checksum would be placed in a separate Record. This of course does not preclude it from also being in the configuration XML as added security (to ensure for example that the right config and the right XML came bundled together). (DeTar, 20 Apr 2004) When we create a configuration file, not all of the ensemble information is known. Our preference would be to include only the Configuration XML in the file that is uploaded to the archive. The Ensemble XML would be added separately to the archive database and modified separately. The SciDAC format was intended for more general uses in which a file could contain multiple lattice fields. However an archive configuration file contains only one lattice field, so there is no meaningful distinction between File User XML and Record User XML. We could safely omit one of them in files intended for archiving. As for our peculiar checksum, it was designed to be computable in parallel, which is not possible with the noncommutative, nonassociative crc32 sum. If a serial checksum of the binary payload such as crc32 is desired, it could be generated offline, say, during the archiving process, and could presumably be recorded in the archive database at that time, preferably without altering the uploaded file. Since LIME supports parallel I/O, it can't take care of crc32 checksums. (dirk,Apr20) Dear Carleton, Wouldn't this give a problem if the Configuration XML containing the CRC32 becomes part of a file for which this checksum is calculated? If you consider an archive file containing, e.g., a Configuration XML plus a configuration itself one might pose the rule that the checksum refers to the configuration. But what happens when the configurations consists of several files? Maybe this situation could be solved by changing CRC32 into a list providing a checksum for each file in the archive file in the order as they are stored in there. (In case of parallel I/O these checksums can possibly be calculated in parallel.) ((((SUSPENDED by yoshie,Apr21)))) ==================================================================== D-4. Should the plaquette be stored in the configuration metadata? -------------------------------------------------------------------- (dirk,Apr13) Due to rounding errors a plaquette will not allow to check correctness of a configuration in a rigorous way. Nevertheless it allows to do some test which many users might find more convenient (especially those who do not think in terms of bits). I would therefore suggest to include plaquette into the configuration metadata (yoshie,Apr15) I certainly agree. Conversion of binary format will be done in user side. Plaquette values will be useful to check that the conversion does not mis-align data. (Chris,April15) As you may have guessed by now I am a BinX fan. I can read and determine the average plaquette of a gauge config with accompanying BinX mark up. In principle I can extend this to *any* format. This helps to compare that cfgs are the same in different formats. I am definetly in favour of the plaquette! (yoshie,Apr16) Chris, do you say that you agree to place in config xml? (Chris,Apr19) Yes. (Balint,Apr20) it may be useful to put the plaquette in the tag rather than management, but I am not too fussed there. (yoshie,Apr21) I have no preference. (Dirk,Apr27) As we have to come to an conclusion: I would prefer Balint's proposal. (Chris, Apr28) I think in markovStep would be ideal. (yoshie,Apr29) OK, this is decided. ==================================================================== D-5. and ------------------------------------------------------------------- (Balint,Apr20) The precision and cLibrary tags seem to belong to no subgroup in markovChain (this is not necessarily bad). It may be useful to find some other place for them. Also I would advocate using an 32bit or 64bit rather than "single" or "double". As a final comment, it is entirely plausible that people may use two kinds of machine to generate an ensemble: (say a custom supercomputer they have time on and a linux cluster they have time on). The native formats of these two computer may be different. An IBM supercomputer would have Big Endian byte order, a linux one may have a little endian byte order. Three (possibly more) solutions exist: i) Either we have a standard format with a standard byte order in which case there is no need for the tag to do the translation. ii) The cLibrary to do the translation and the byte order become part of the Configuration. I'd personally lump in the precision here too, and potentially a BinX record. iii) The precision and byte order stay in the ensemble XML and are fixed. The user takes care of ensuring that the precision and byte ordering is the same for all ensemble configs, irrespective of how they were originally produced. -- This is a "policy" matter. In either case I'd have something like this (either in ensemble or config: 32 BIG_ENDIAN .... ((((SUSPENDED by yoshie, Apr21))))