![]() XML has solved many troublesome syntactic problems (e.g. We know for example of cases where ∼L (chlorine) has been converted to ∼ (carbon) by incorrectly written parsers for the ubiquitous PDB format.XML enables authors to remove all syntactic ambiguity and this, in itself, is an undramatic but major step forward for chemical informatics. ![]() Errors in parsing (the process of tokenising) and structuring this information) are therefore common and can be extremely damaging. Without a manual it is impossible to identify the tokens (e.g. An example of a syntactic problem from an MDL molfile is:YOHIMBINEGTMACCS-II11109515132D 1 0.00479 0.00000 0 GST 29 33 0 0 1 0 1 V20000.4699 2.1336 0.0000 C 0 0 0 0 0 00.6808 1.2945 0.0000 C 0 0 1 0 0 0Here the string in the second line contains information on how the molecular information was created, the date, the dimensionality, etc. It is dependent on the application and is frequently underspecified. What is Syntax This specifies how the byte stream should be tokenised. It is critical to define the character set used in a document and XML documents should start with a declaration such as:Lack of understanding of encoding can lead to serious corruption of information we know of cases where characters for degrees (superscript zero) and micro (Greek mu) have been corrupted by incorrect assumption of character sets. XML (and Java) are designed to be Unicode-compliant. The latter is based on 16 bits and endeavours to support all the major character sets in the world. Commonly used supersets of ASCII are ISO-8859, ISO-Latin-1 and ISO-10646 (Unicode). Thus ASCII (the American Standards Committee for Information Interchange) has specified that the character a is represented by the byte with value 65. Each of these is discussed separately below.What is Encoding? This specifies the method for mapping bytes (octets) or similar concepts onto characters. One of us (PMR) was invited to be part of the initial working group on XML and as a result we suggest that some aspects of the process, as well as the end-product, may be of value to the chemical informatics community.What are the basic levels of markup?The representation of information in electronic form usually involves several layers, Encoding SyntaxSemanticsOntology. HTML cannot address these, and so the World Wide Web Consortium (W3C) has undertaken a major program to support robust, extensible markup.Why is the W3C important for CML?The members of the W3C are (primarily commercial) organisations who have agreed to create communal, non-proprietary protocols for, inter alia, the exchange of information over the The cornerstone of this is eXtensible Markup Language (XML), a very simple subset of SGML. In scientific disciplines there is a key need to exchange data such as numeric quantities with units and ranges, and domain-specific objects such as mathematical equations or chemical reactions. It is generally recognised, however, that HTML has weak support for structure and poor tools for specific markup and functionality. ![]() We developed the use of these for chemistry by proposing the use of MIME types to label chemically significant components of hyperdocuments. More generally, the adoption of HTML has promoted the idea that documents are not monolithic objects but can be regarded as built from smaller components with defined and varied content and functionality. The use of HTML made a critically important contribution to hypermedia by introducing (unbounded) hyperlinks or anchors ( ) and thus encouraged the use of both hyperdocuments and active components.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |