XML is a way for you to define structured information of all kinds -- content, object data, inter-application messages, or syndicated content summaries and represents data apart from visual markup. It separates data from content.
SOURCE: COIN 78 Lesson 1
XML is a markup language for documents containing structured information.
Structured information contains both content (words, pictures, etc.) and some indication of what role that content plays (for example, content in a section heading has a different meaning from content in a footnote, which means something different than content in a figure caption or content in a database table, etc.). Almost all documents have some structure.
A markup language is a mechanism to identify structures in a document. The XML specification defines a standard way to add markup to documents. SOURCE: xml.com
XML is not a replacement for HTML. XML and HTML were designed with different goals:
HTML is about displaying information, while XML is about carrying information. SOURCE: W3C Schools
EXAMPLE (Reference: COIN 78 Lesson 1):
<?xml version="1.0" encoding="UTF-8"?>
<message>
<from>Jane Doe</from>
<to>John Doe</to>
<date>February 14, 2006</date>
<body>Are the annual report files finished yet?</body>
<priority>high</priority>
<attachments>
<attachment type="jpg">file_1.jpg</attachment>
<attachment type="pdf">file_2.pdf</attachment>
</attachments>
</message>
"XML was created so that richly structured documents could be used over the web. The only viable alternatives, HTML and SGML, are not practical for this purpose. HTML... comes bound with a set of semantics and does not provide arbitrary structure. SGML provides arbitrary structure, but is too difficult to implement just for a web browser." SOURCE: xml.com
XML is application agnostic, and vendor neutral; as such, it can be read by any XML-aware application (such as most of the current browsers). It's main advantage for businesses is that it separates content from data. This means that different teams can work on a website at the same time without interferring with one another: design teams can concentrate on CSS, structure teams can work on XSL, and content teams can work on HTML/XHTML/XML. Reference: COIN 78 Lesson 1
In HTML, both the tag semantics and the tag set are fixed. An <h1> is always a first level heading and the tag <ati.product.code> is meaningless. ...XML specifies neither semantics nor a tag set. In fact XML is really a meta-language for describing markup languages. In other words, XML provides a facility to define tags and the structural relationships between them. Since there's no predefined tag set, there can't be any preconceived semantics. All of the semantics of an XML document will either be defined by the applications that process them or by stylesheets. SOURCE: xml.com
XML is a markup language that has both syntax and grammar. In this sense it IS a language, but it is also a meta-language, a language ABOUT the language properties that it describes. Reference: COIN 78 Lesson 1.XML is defined as an application profile of SGML. SGML is the Standard Generalized Markup Language defined by ISO 8879. SGML has been the standard, vendor-independent way to maintain repositories of structured documentation for more than a decade, but it is not well suited to serving documents over the web...XML is, roughly speaking, a restricted form of SGML. SOURCE: xml.com
XML is usually created for and read by humans, but it is also almost always written and read by machines. This means that it needs to meet the demands of BOTH humans and machines. In fact, increasingly the web world is populated by machine-to-machine XML, such as Web services, SOAP, and e-commerce applications. Therefore, it's all the more important that XML satisfies the demands of both it's authors/audiences. Reference: COIN Lesson 1.
The XML specification defines an XML document as a text which is well-formed, i.e. it satisfies a list of syntax rules provided in the specification. The list is fairly lengthy; some key points are: * It contains only properly encoded legal Unicode characters. * None of the special syntax characters such as "<" and "&" appear except when performing their markup-delineation roles. * The begin, end, and empty-element tags which delimit the elements are correctly nested, with none missing and none overlapping. * The element tags are case-sensitive; the beginning and end tags must match exactly. * There is a single "root" element which contains all the other elements. SOURCE: Wikipedia
The primary 3 requirements are: one and only 1 root element, preservation of symetrical element nesting (FOLE - first element opened is the last element to end), and snytactically-correct names (start with an alphabetic character or an underscore, and maintain both case and order). Reference: COIN 78 Lesson 2
<?xml version="1.0"?>
<!-- The first step is to name and define our documents-->
<!DOCTYPE books [ <!ELEMENT books (book+)>
<!ELEMENT book (title, author, subject)> <!ELEMENT title (#PCDATA)>
<!ELEMENT author (#PCDATA)> <!ELEMENT subject (#PCDATA)> ]> <books>
<title>Title</title>
<author>Author</author> <subject>Subject</subject> </books>
CSS (Cascading Style Sheets) are used to enhance the presentation of content in an XML document. The CSS document is linked to the DTD or XML Schema via a processing intruction that tells the browser's XML parser to use the linked CSS to display the XML content.
books { text-indent: .5em; text-align: center; color: maroon; }
book { text-align: left; font-size: larger; color: blue;}
author { font-size: smaller; text-transform: uppercase; color: black;}
subject { color: black;}
The purpose of an XML Schema is to define the legal building blocks of an XML document, just like a DTD.
An XML Schema:
We think that very soon XML Schemas will be used in most Web applications as a replacement for DTDs. Here are some reasons:
XML namespaces provide a simple method for qualifying element and attribute names used in Extensible Markup Language documents by associating them with namespaces identified by URI references. SOURCE: Namespaces in XML - W3C
Reference: COIN 78 Lesson 7
Cascading Style Sheets or CSS were developed a few years ago to define the look and feel of markup languages. Extensible Style Sheet Language for Transformations or XSLT were created to transform documents. They are both style sheets, but they serve vastly different purposes.
What CSS Can Do
What XSLT Can Do
"An XSL stylesheet specifies the presentation of a class of XML documents by describing how an instance of the class is transformed into an XML document that uses the formatting vocabulary." In other words, a stylesheet tells a processor how to convert logical structures (the source XML document represented as a tree) into a presentational structure (the result tree). Note that an XSL stylesheet is actually an XML document! SOURCE: Web Developer's Virtual Library
A template is analogous to a method in an object-oriented programming language. It allows a single XSLT stylesheet to be broken into multiple logical units, each of which performs a specific transformation.
There are two types of templates, distinguished by having either a 'match' attribute or a 'name' attribute. If a template has a 'match' attribute, it will be invoked when the pattern specified as the value of the attribute is matched against one or more nodes in the input document. If a template has a 'name' attribute, it may be invoked by calling the template explicitly by name. SOURCE: XSLT by Example
Resource Description Framework (RDF) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata model but which has come to be used as a general method of modeling information, through a variety of syntax formats. ..The RDF metadata model is based upon the idea of making statements about resources in the form of subject-predicate-object expressions, called triples in RDF terminology. The subject denotes the resource, and the predicate denotes traits or aspects of the resource and expresses a relationship between the subject and the object. For example, one way to represent the notion "The sky has the color blue" in RDF is as a triple of specially formatted strings: a subject denoting "the sky", a predicate denoting "has the color", and an object denoting "blue".
This mechanism for describing resources is...an evolutionary stage of the World Wide Web in which automated software can store, exchange, and use machine-readable information distributed throughout the web, in turn enabling users to deal with the information with greater efficiency and certainty. SOURCE: COIN 78 Lesson 11
EAI encompasses methodologies such as object-oriented programming, distributed, cross-platform program communication using message brokers with Common Object Request Broker Architecture and COM+, the modification of enterprise resource planning (ERP) to fit new objectives, enterprise-wide content and data distribution using common databases and data standards implemented with the Extensible Markup Language (XML), middleware, message queueing, and other approaches.
SOURCE: SearchSOA.comXML's value to middleware is clear. Middleware simply "carries the load." It moves messages (XML documents) that encapsulate or abstract XML and ensures that those messages are understood by any source or target applications that need that information. Middleware may also manage the interfaces with the source or target applications and move information into and out of the applications through an unobtrusive point of integration, such as a database or an API.
Because of XML's value, every middleware vendor, new and old, has declared dominance in the XML space, applying its technology to EAI problem domains. None of us should be surprised that there's a certain degree of "puffery" to these declarations. The truth is that it's not particularly difficult to XML-enable a product. Therefore, vendors were able to react quickly.
XML-enabling a product is simply a matter of embedding a parser within the middleware and teaching the product to read and write XML from and to the canonical message format. In addition, since many of these products already have native connectors to traditional enterprise systems and data stores, such as SAP, PeopleSoft, and DB2, they provide enterprises with the ability to produce and consume XML without im- pacting the applications.
SOURCE: XML Journal
Microsoft today published its BizTalk framework, a set of guidelines that will help tie together the e-commerce systems for different industries, such as banking or manufacturing, by using the XML Web standard for data exchange.
Sun today released technology that links XML and the Java programming language together, allowing software developers to build applications that use both technologies.
Unlike HTML, which has a predefined vocabulary, XML allows developers to define their own vocabulary for data, such as price and product. The result is more efficient data exchange and better Internet searching capabilities.
Microsoft's BizTalk, previously available in draft form, provides a set of guidelines for specific industries to define their XML vocabularies. It also defines a common way businesses can handle and route data to each other.
SOURCE: news.cnet.comMicrosoft, IBM, and Ariba spearheaded UDDI. The project now includes 130 companies, including some of the biggest names in the corporate world. Compaq, American Express, SAP AG, and Ford Motor Company are all committed to UDDI, as is Hewlett-Packard, whose own XML-based directory approach, called e-speak, is now being integrated with UDDI.
While the group does not refer to itself as a standards body, it does offer a framework for Web services integration. The UDDI specification utilizes World Wide Web Consortium (W3C) and Internet Engineering Task Force (IETF) standards such as XML, HTTP, and Domain Name System (DNS) protocols. It has also adopted early versions of the proposed Simple Object Access Protocol (SOAP) messaging guidelines for cross platform programming.
In November 2000, UDDI entered its public beta-testing phase. Each of its three founders - Microsoft, IBM, and Ariba - now operates a registry server that is interoperable with servers from other members. As information goes into a registry server, it is shared by servers in the other businesses. The UDDI beta is scheduled to end in the first quarter of 2001. In the future, other companies will act as operators of the UDDI Business Registry.
UDDI registration is open to companies worldwide, regardless of their size.
SOURCE: SearchSOA.techtarget.com
A service-oriented architecture (SOA) is the underlying structure supporting communications between services. SOA defines how two computing entities, such as programs, interact in such a way as to enable one entity to perform a unit of work on behalf of another entity. Service interactions are defined using a description language. Each interaction is self-contained and loosely coupled, so that each interaction is independent of any other interaction. SOURCE: SearchSOAtechtarget.com
RDFa is a way to label content to describe a specific type of information, such as a restaurant review, an event, a person, or a product listing. These information types are called entities. Each entity has a number of properties. For example, a person has the properties name, address, job title, company, and email address.
In general, RDFa uses simple attributes in XHTML tags (often <span> or <div>) to assign brief and descriptive names to entities and properties.
SOURCE: Google Webmaster Central - About RDFa
The Semantic Web is a mesh of information linked up in such a way as to be easily processable by machines, on a global scale. You can think of it as being an efficient way of representing data on the World Wide Web, or as a globally linked database.
The Semantic Web was thought up by Tim Berners-Lee, inventor of the WWW, URIs, HTTP, and HTML. There is a dedicated team of people at the World Wide Web consortium (W3C) working to improve, extend and standardize the system, and many languages, publications, tools and so on have already been developed. However, Semantic Web technologies are still very much in their infancies, and although the future of the project in general appears to be bright, there seems to be little consensus about the likely direction and characteristics of the early Semantic Web.
What's the rationale for such a system? Data that is geneally hidden away in HTML files is often useful in some contexts, but not in others. The problem with the majority of data on the Web that is in this form at the moment is that it is difficult to use on a large scale, because there is no global system for publishing data in such a way as it can be easily processed by anyone. For example, just think of information about local sports events, weather information, plane times, Major League Baseball statistics, and television guides... all of this information is presented by numerous sites, but all in HTML. The problem with that is that, is some contexts, it is difficult to use this data in the ways that one might want to do so.
So the Semantic Web can be seen as a huge engineering solution... but it is more than that. We will find that as it becomes easier to publish data in a repurposable form, so more people will want to pubish data, and there will be a knock-on or domino effect. We may find that a large number of Semantic Web applications can be used for a variety of different tasks, increasing the modularity of applications on the Web. But enough subjective reasoning... onto how this will be accomplished.
The Semantic Web is generally built on syntaxes which use URIs to represent data, usually in triples based structures: i.e. many triples of URI data that can be held in databases, or interchanged on the world Wide Web using a set of particular syntaxes developed especially for the task. These syntaxes are called "Resource Description Framework" syntaxes.
SOURCE: infomesh.net - The Semantic Web
The Resource Description Framework (RDF) is a general framework for how to describe any Internet resource such as a Web site and its content. SOURCE: searchSOA.com
Bonus: In 300 to 400 words total (one page single spaced more or less), answer all three of the following:
XML is easier to learn and use than HTML
XML is NOT easier to learn that HTML! HTML has the advantage of using pre-defined tags with specific, non-changeable meaning. XML, on the other hand, is by its very purpose and development composed of flexible, extensible, custom-created tags. The meaning of a tag depends on its DTD or schema, which means that the same tag may have 2 different meanings in 2 different locations, or that 2 different tags may actually have the same functional meaning. The very nature of its variablilty makes XML much harder to learn. However, once learned, it may prove more useful...
Will XML completely replace HTML (soon or ever?)
XML will never completely replace HTML, but it may be the basis of far more variations and hence become increasingly more popular. However, HTML will remain the backbone of the Internet. It is the red-blue-and-yellow of the Internet's "color spectrum", whearas XML is the 64,000 variations.
How will you use XML? Please be specific and tell me more about your career and interests.
I will first and foremost use my understanding of XML to tweak my favorite XML-based applications, such as the simpleviewer and autoviewer photo slideshow programs. Secondly, I will extend my XML understandings to learn new programs and applications, such as KLM and RSS. Finally, I will apply my XML skills by creating web pages for myself, family, and friends. These web pages will highlight my hobbies, such as photography, travel, genealogy, web development, and blogging. Maybe someday, with a little bit of effort, I may even be able to use XML to retrieve data from databases, such as Excel or MySQL, and to display them on the web. Professionally, I will use XML to upgrade the capabilities of our federal retiress' organizational web site, and include adding a feed to our blog - I hope! Coincidentally, I have just signed up for Intermediate XML, since there is so much more to learn!!
Hints - you get 1 point for questions 1 -25, and 5 points for the bonus question. I'm looking for short answers in the first 25 questions, then your imagination in the bonus question. The total possible score is thus 30 points, out of 25 for the assignment.
If you can't find an answer then you haven't learned to use http://searchwebservices.techtarget.com/, whatis.techtargetcom or Google.com. Also try http://www.xml.com/ or http://www.xml.org/ but remember that whatis.com will be hyperlinked to other subjects. Please attempt the bonus question as it works in your favor, and it affects the entire exam score.