作者:duxiong
发布时间:8/15/2007

--  请教XML高手
[1]    document    ::=    ( prolog element Misc* ) - ( Char* RestrictedChar Char* )  

1. Misc*; Char*等的*号是什么意思?
2. ( prolog element Misc* ) 的意思是prolog + element + Misc*吗?
3. ( Char* RestrictedChar Char* )应该如何理解呢?也是Char* +  RestrictedChar + Char* 吗?
4. 有没有关于这种表达式及其语法规则?或者它只是XML标准中的文本约定?


作者:duxiong
发布时间:8/24/2007

--  自己找到答案了

document ::= prolog element Misc*

This production says that the symbol named document (which represents a well-formed XML document), consists simply of one prolog followed by one element followed by zero or more Miscs. Each of these symbols is defined in terms of other symbols and character sequences.

Note that the XML 1.0 Recommendation refers to UCS characters by their Unicode scalar values, using a notation of #x followed by only as many hex digits as needed. So #x9 in the EBNF productions means the abstract character that would be represented in Unicode 3.1's "U+" notation as U+0009. It does not necessarily mean a byte with hex value 9.

Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
S ::= (#x20 | #x9 | #xD | #xA)+

The first line means that Char is the one character that is in those ranges listed. Note that characters U+0000 through U+0008 and several other ranges are not considered Chars and are not allowed in XML documents. The second line shows that S is a sequence of one or more instances of any of the 4 "whitespace" characters. The definition of a Comment is given as:

Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

This means that Comment is the 4 characters <!-- and the 3 characters -->, in between which are 0 or more instances of either a Char that is not -, or the character - followed by a Char that is not -.

Misc ::= Comment | PI | S

This means that Misc is one of Comment, PI, or S. The definition of PI is too lengthy to include here, so we'll just leave it as it is.

Since Comment and S have been defined, it would be just as accurate to say:

Misc ::= '<!--' ((#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] - '-') | ('-' (#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] - '-')))* '-->' | PI | (#x20 | #x9 | #xD | #xA)+

The other components of document are defined in the same way. It follows that a well-formed XML document is a UCS character sequence that follows certain patterns.

作者:cndev
发布时间:9/23/2007

a* 表示0个或多个a
