MICRO CENTER: COMPUTERS AND ELECTRONICS
Random Access   chris, kp & rob
Tech Take-Apart
Introduction to XML, Part I
by kp

 

Geek Level: Beginner/Intermediate
HTML, XHTML or web-based programming experience helpful

Geek Tools:
Text editor or web development WYSIWYG; web browser

For many years, XML has been talked and written about by a lot of web gurus and programmers. Some may be asking: What can XML do for me? The answer: Efficiency. In last month's toolbox tip, I introduced using XML with an Excel spreadsheet. This is just one of the many ways in which this technology can be applied. When designing websites, XML can be a handy tool for updating lists.

The Basics
To understand how XML works, it is important to explain the essentials. By itself, XML can't do anything. It needs the help of other programming languages such as HTML to organize it into something understandable. XML is just a method of structuring data using custom tags to describe the data in which it contains.

Example XML File

In this example XML file, I listed my previous Random Access articles based on the date and type of article. XML tags depend on hierarchy in which these descriptive tags are nested within one another based on relevance. This structure is called a tree, and each tag represents an element of this tree. My starting tag is named "newsletter" which is referred to as the "root element", because it holds all of the subsequent tags called "child elements." My XML file has four child elements: articles, version, section and articletitle. When there is a new element, the XML follows the same logical pattern and ends with the final root element tag which in this case is </newsletter>. Every element has to have an open <mytag> and closing </mytag> tags, but you can nest them and use any name that you want. Also, there are a few rules on naming tags. You can't use dashes, colons, periods or non-English characters. It's essential to have good structure to your XML document, but the tree design can be unique between different documents. Another important point to keep in mind: XML is case sensitive, and you need keep the same syntax and format for your tags. Unlike HTML, XML keeps white space, so be careful inputting your data.

Write Your Code
To write your own XML file, all you need is a text editor. You can use something as simple as Notepad or comprehensive as Dreamweaver. The advantage of a WYSIWYG editor is that you have access to helpful tools like line numbering and color coded tags. If you prefer to have development guides without having to pay hundreds for a WYSIWYG editor, you can download Microsoft's XML Notepad 2007 for free. The program provides an easy-to-use interface for reviewing your XML tree structure and adding elements. In addition, you need a web browser such as Internet Explorer to check your XML. The browser works as a parser to read and validate the code. In the screenshots below, you can see the difference between acceptable code and an XML error.

XML Error
Valid XML

XML Error
XML Error

Once you have your editor of choice, you need to begin by saving a file with the .xml extension. Next, enter the document identifier tag in the first line of your file as shown in the sample code:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

This is called a declaration and shows the version and encoding type of the XML. The current version of XML is 1.0. The encoding schema refers to the character sets for a language. In the sample code, UTF-8 is designated as the default English language set, but there are many options depending on type of characters you want to use in your XML. The final piece of the declaration is the standalone reference. A "yes" notation makes the document self-contained. A "no" value declares that the document refers to an external document for information. When using a WYSIWYG editor or other programming-assist application, the XML declaration is usually created automatically for you, but it's good to know for future reference.

Below the declaration, there are a few comments included to explain the purpose of the document. Comments are created by using <!-- comments --> notation which is the same as HTML comments. Between the two tags you can write whatever you want, and it will be ignored by the browser. Remember: Comments are good and use them freely throughout your code. This helps the next person understand your logic and where to make changes.

XML Defined
Two topics that I haven't covered yet are DTDs (Document Type Definitions) and schemas. These are used to describe how the data should be constructed. Neither DTDs or schemas are necessary for XML to work; it just creates rules to validate the data. I haven't written a DTD for the sample XML document, so I'll use that as an illustration. DTD can be linked as an external file or composed within the document heading. In this exercise, I am using an internal DTD which starts with the declaration:

<!DOCTYPE root element name [ element declarations ]>

Sample DTD
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
     <!DOCTYPE newsletter
     [ <!ELEMENT articles ANY>
 <!ELEMENT version ANY>
 <!ELEMENT section ANY>
 <!ELEMENT articletitle ANY>
 ]>

Each element is declared by <!ELEMENT > followed by the element name. The word ANY represents that any type of data can be entered into this tag. There are other options available such as EMPTY for elements without content and PCDATA for parsed character data. DTDs have many more attributes to describe data, but going forward, schemas are more practical to use.

Schemas are actually the preferred method over DTDs. They are more versatile and are reusable for other XML documents. To create a schema, you need to make a new file with an .xsd extension.

Sample Schema
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <xs:schema> <xs:element name="newsletter"> <xs:complexType> <xs:sequence> <xs:element name="articles" type="xs:string"/> <xs:element name="version" type="xs:string"/> <xs:element name="section" type="xs:string"/> <xs:element name="articletitle" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:schema>

Unlike DTDs, schemas are always external and connected to the XML file through an import statement which is declared at the top of your XML file as:

<xsi:schemaLocation="http://www.yourwebsite.com/your_schema.xsd">

Now back to the schema file. You start with the same XML declaration as before, but add the schema declaration also like this:

<schema xmlns="http://www.w3.org/2001/XMLSchema"> your schema content </schema>

Then, start composing your schema definition. First, declare your root element by using:

<xs:element name="yourtagname">

On the next line, you need to define whether the content type is simple or complex. The difference is that simple has a single value where complex has one or more. In this example, I am using <xs:complexType> in my description. For the <xs:sequence> tag, I am stating that the following elements are always in the same order: articles, version, section and articletitle. Subsequently, I have to define the elements themselves. The notation follows the same pattern with <xs:element name="yourelementname" type="xs:datatype"/>. The type designation has many possible values. It depends on what kind of data is included in the element tag such as a number or characters. Since my data can be a combination of numbers or letters, I left the type as "string" to cover my bases. Finally, close all of the tags.

There is an easier way to create XML schemas by using third party software. Again, you don't need a DTD, schema or namespace for your XML to work, but it is good to know what they are in the event that you run across it.

Tune in next month for a continuation of my XML tutorial. Next, I am introducing how to use CSS (Cascading Style Sheets) and XSL (Extensible Stylesheet Language) to style your XML, plus how to integrate it into XHTML. In the meantime, check out Beginning XML from Wiley books for a more comprehensive look at XML.

Shop Online:
Beginning XML, 4th Edition

References:
W3C, Extensible Markup Language (XML)
W3Schools, XML Tutorial
W3Schools, ASCII Character Reference
XML Notepad 2007

Get Random Access

Understanding Tech

Print this article

Shop Online

Send-to-a-
Friend

Your Name:

Your Email:

Your Friend's Name:

Your Friend's Email:


 © Micro Center