THE XML DATA FORUM
Conference Sessions
(NOTE: Some sessions have "Pre-Reading
Materials."
Follow the links to check them out!)
Monday,
November 4, 2002
5:00 pm - 6:00 pm
Night School
Concordance: Managing Mismatched Data from Multiple Sources
Denise
Draper
Chief Software Architect
Nimble Technology
When integrating data from multiple sources, one of the primary hurdles to overcome is how to match the data that refer to the same entity across different sources, when there is no 'natural key.' A classic example is two systems that house customer data that have to be matched on customer name or address, but names and addresses are subtly different.
This problem is solved with various kinds of 'merge/purge' techniques for creating warehouses or cleaning source data, but the problem is different when doing virtual data integration, when the underlying source data remains 'dirty' but cannot be changed. We call this the 'concordance' problem.
In this talk, we will describe the issues involved in the concordance problem, and describe a solution based on creating an independent concordance database, which tracks the relationships between records in multiple sources. The issues and steps involved in designing and constructing a concordance database will be described, as well as how the concordance database is then used.
Tuesday,
November 5, 2002
5:00 pm - 6:00 pm
Night School
Introduction to RosettaNet
Robert
Oberwetter
Application Development Manager
Tokyo Electron America
RosettaNet combines the disciplines of XML, Data Integration and Modeling into an integrated business process. All three areas are critical to, and used by, RosettaNet. This presentation describes:
What is RosettaNet?
How does RosettaNet use XML?
What is a Partner Interface Process (PIP) and how is the process modeled?
How is all this used in business-to-business integration?
Attendees will learn that there is more to application integration than just the integration activities. Each integration has a business process which must be defined and modeled.
Wednesday,
November 6, 2002
7:15 am - 8:15 a
SIG
Using XSLT for Cheap Data Transformation
Hal
Davis
Project Manager
Mellon Financial Services
XSLT (eXtensible Stylesheet Language Transformations) offers a standard, self-documenting format for transformations of XML data. Indeed, XSLT has become a universal format for describing and implementing data transformations. It can be used to automatically generate metadata for XML documents and is also characterized by its relative simplicity (including the availability of low cost tools). This presentation will describe XSLT, discuss its applications, compare XSLT to other data transformation alternatives, and demonstrate the simplicity of the to ols.
Where XSL fits in the XML universes, and how it can be used to implement and manage data transformations within and between enterprises
How to use XSL Transformations to develop and implement basic and advanced data transformations
Differences and similarities between XSL Transformations and more conventional data transformation approaches
Wednesday,
November 6, 2002
10:30 am - 11:30 am
Conference Session
Using Web Services for Integration Within and Outside the Enterprise
Leo
Kraunelis
Director
OASIS/XML.org
Web Services are reducing the cost of integration. This presentation explains what web services are, and demonstrates their application through case studies and ROI examples. It also offers insight on market trends and to ols.
Traditional EAI within the enterprise - but at what cost
Built-in web services support will commoditize integration
Web services simplifies integration by providing open standards-based interfaces
Integration with other businesses beyond the firewall - case studies
Web services integration still has its challenges
Web services- getting the ROI of integration
Overview of standards Web Services standards effort.
Wednesday,
November 6, 2002
10:30 am - 11:30 am
Conference Session
Data Model Patterns, Generalizations and XML
Roland
Berg
Principal Consultant
ThinkSpark
Patterns and generalizations have long been touted as the solution to managing unstable data environments. The use of metadata-centered database designs creates a great deal of flexibility in the structures and allows the database to become evolutionary in content while maintaining structural consistency.
XML and related technology is considered to be the solution to the problem of exchanging dynamic data across diverse systems.
It is only natural that we should explore the implications of applying XML-based data interchange to patterned/generalized databases and vice-versa. This presentation is a discussion of the impact that each technological approach has on the other and an exploration of the alternatives available for representing a patterned/generalized database as an XML structure.
Specific issues addressed will include the interaction of the object-like nature of the highly generalized database with the hierarchical structure of XML and options for transporting the highly metadata-driven information via XML.
Attendees will learn:
How XML and model patterns/generalizations interact
Options for using XML to transport these structures
Strengths and weaknesses of the various options
Wednesday,
November 6, 2002
11:40 am - 12:40 pm
Conference Session
XML as Meta Data
Matthew
Williams
Senior Data Analyst
Worldspan
Drawing from personal experience as a data analyst and an on-going effort to embrace and standardize on XML practices within Worldspan, the speaker will address XML issues as they relate to current data administration practices. Currently most of the information surrounding XML has been from a programming/application perspective, and little attention has been given to XML as metadata. From a DA or DBA perspective XML is metadata, and as such needs to be incorporated into the database design process. This necessitates that programmers/application developers, DAs and DBAs work closely with each other to harness the power of XML without jeopardizing data integrity within the organization. This presentation will address different facets of this issue including:
XML and data integrity
XML and legacy systems
XML and the RDBMS
XML and RDBMS constructs and their relationship
XML as metadata
Modeling XML in a relational database
Wednesday,
November 6, 2002
1:45 pm - 2:45 pm
Conference Session
Analytical API Update: XML for Analysis & JOLAP
Seth
Grimes
Principal Consultant
Alta Plana Corporation
BI vendors led by Microsoft, Hyperion, and SAS Institute last year released version 1.0 of the XML for Analysis (XML/A) specification, "an open-standards-based messaging interface" designed to "promote the standardization of the data access interaction between a client application and business intelligence systems and other applications over the Web and in distributing environments."
Meanwhile, the nascent JOLAP specification provides a similar API for the J2EE [Java] Web services environment, one that "supports the creation and maintenance of OLAP data and metadata, in a vendor-independent manner.
An overview of the XML/A and JOLAP specifications
their histories and design points
market and vendor acceptance
their role in the the .Net vs. J2EE Web services war
development directions.
Attendees will learn
how to develop and integrate compliant software systems into a coherent analytical computing environment.
Wednesday,
November 6, 2002
1:45 pm - 2:45 pm
Panel
The Semantic Web
Brett
Champlin
Process Center of Expertise
Allstate Insurance Company
William
Ruh
Senior Vice President of
Professional Service
Software AG, Inc
Dave
McComb
President
Semantic Arts
The Semantic Web is a much anticipated (and yet often misunderstood) concept. Fundamentally, the Semantic Web is a vision of the future in which documents and data contain descriptive metadata which allows them to be easily understood by computers. Like so many nascent ideas in technology, the potential payoffs are huge, but the implementation questions remain unanswered. Nonetheless, it deserves a much closer look.
In this session we’ll evaluate the business implications of the Semantic Web, and how much time and effort your organization should devote to further research.
What are the basic concepts and technologies underlying the Semantic Web?
Are the expectations realistic? Or are we getting carried away on an unachievable hype curve?
What are the practical business benefits? The high-payoff applications?
How will it integrate disparate information sources?
Are there any tools available to start building it now?
Wednesday,
November 6, 2002
3:15 pm - 4:15 pm
Conference Session
XML Tools: XML Views
Bradley
Wright
Vice President, Product Development
MetaMatrix, Inc
Mark Milodragovich
Senior Information Engineer
Nimble Technology, Inc.
Integration is clearly one of the core benefits of XML deployment, and can take various forms. In this session we examine two aspects of XML data integration.
The first, sometimes called XML Views (or virtual XML documents) dynamically mediates and integrates data from heterogenous data sources. The speaker will present a high level survey of vendor claims/announcements of XML Views to help attendees sort through confusing terminology.
In the second part we will discuss the integration of legacy systems into XML standard schemas. This presentation will show how the OMG Meta Object Facility (MOF) extends UML modeling to apply to modeling diverse information sources, including XML schemas to create Platform Independent Models (PIMs). The speaker will show how the schema can be represented as a virtual model and then mapped to Non-XML physical sources, as well as XML sources. He will then show how the virtual models are applied to the integration of the diverse information sources.
Virtual XML documents
Example of mapping disparate data sources into XML Documents for consumption by Web Services or other applications
Metadata modeling of XML Schema
Thursday,
November 7, 2002
7:15 am - 8:15 am
SIG
XML Tools: Native XML Databases
Alex Cheng
Director of Engineering
Ipedo
Native XML databases offer a number of advantages over traditional relational databases. This session will explain the similarities and differences between the two, in terms of how they store data, the query language and application interfaces used, and the methods for managing schema. It will also demonstrate the functionality of this new product class.
Thursday,
November 7, 2002
8:30 am - 9:30 am
Conference Session
Managing XML Assets
Kathryn
Breininger
Internet Librarian
The Boeing Company
Darren
Wrigley
Director of Repository Services
ASG
As XML becomes more widely used, the need for efficient management of XML-related assets becomes critical. This presentation describes, in two parts, how repository and registry technology are being used to manage XML assets – such as DTDs and schemas --- providing discovery of, access to, and sharing of these assets. The first part of the presentation describes the range of technologies used to access and control the information assets, with a particular emphasis on the role of the repository. Attendees will learn about XML-related facilities and standards for transportation, access, and communication of XML metadata, and their place in an integrated management strategy.
The second part of the presentation examines a successful current implementation of an enterprise repository and registry at Boeing. The Central Registration Authority and Locator (CENTRAL) at Boeing is designed to store and retrieve reusable XML assets. The CENTRAL Registry contains metadata and locations for XML assets and makes these assets available to the entire Boeing enterprise as reusable objects. The speaker, Kathryn Breininger, will discuss the scope of the project and the events that led up to the development of the CENTRAL Registry, including the design of the system, and the functions and roles of the users. She will provide an overview of content management and configuration control issues, and how these have been resolved. She will also discuss how the Registry enables data integration across the enterprise. Finally, she will the evolution of CENTRAL in terms of its architecture, services, and functions to be provided in future phases, and will discuss how CENTRAL ties in with activities in ebXML, and how existing standards are implemented in the system.
XML and metadata asset management
Registry/Repository development and implementation
Domain communities and roles
User roles and functions
How Boeing developed and implemented an in-house registry and repository
Requirements that impacted the development of the registry
Use cases for the registry
Lessons learned in the development of the registry
Benefits associated with the use of the registry
Thursday,
November 5, 2002
9:45 am - 10:45 am
XML Tools: XQUERY
Denise
Draper
Chief Software Architect
Nimble Technology
Alex Cheng
Director of Engineering
Ipedo
XQuery is the new query language being designed by the W3C to query XML data. This talk will introduce the main XQuery language features, in particular comparing them to SQL and existing XML access methods such as XPath. We will demonstrate how XQuery can be used to create a simple web application.
Thursday,
November 7, 2002
11:00 am - 12:00 pm
Conference Session
Managing Schema Chaos for XML
Dan
Chang & Lucian Popa
Research Staff
IBM
The growing need to integrate disparate systems or to allow them to exchange data has caused significant attention to and excitement about the use of XML as a "canonical data format" for such systems, especially on the Web. XML by itself, however, is necessary but not sufficient. What is needed further are common XML vocabularies or schemas.
Unfortunately, the likelihood that all disparate systems will agree to a set of common XML schemas is slim. The reality is that all sorts of "common" XML schemas that overlap or conflict with each other have been and will continue to be developed. We call this reality schema chaos and we have developed a solution framework, Xcalibur, for managing schema chaos. The key component of Xcalibur is a novel framework for mapping heterogeneous XML schemas. Our approach works in three phases. In the first phase, system supplied and/or user defined high-level mappings are checked for compatibility and expressed as a set of inter-schema correspondences. The second phase transforms these correspondences into a set of logical mappings based on the semantics of the source schemas and the target schema. The third phase translates these logical mappings into XQueries over the source schemas that produce data conforming to the structure and constraints of the target schema, and preserving the semantics of the source schemas and mappings.
Information integration and application integration are among the most critical challenges facing corporate IT staffs. XML is a promising technology for delivering the needed solution. However, without proper schema management, XML will only create a different chaos: the schema chaos. This presentation discusses a novel solution framework for managing schema chaos for XML.
Thursday,
November 7, 2002
11:00 am - 12:00 pm
Conference Session
XML in Unstructured Data Environments
Robert
Ainsbury
General Manager US Operations
Xyleme
The features of XML that allow it to make “unstructured” data environments more manageable are one of its greatest assets. Exciting new applications in the areas of content management, document publishing, multimedia, search and intellectual asset management will all be facilitated by the power of embedded, XML-enabled, computer-understandable “meaning.” Essentially, it will turn unstructured data into near-structured data. Order out of chaos.
The News and Press industries have been one of the fastest to adopt XML for this purpose; with widely accepted standards (such as NewsML) and utilization by virtually all the leading organizations worldwide. Many of the challenges and opportunities pioneered in news and publishing are now surfacing in other industries. Using this industry as an example, this lively session will offer no-nonsense insights about what to do, and what not to do, when wrestling with large-scale XML adoption.
Successful tactics for XML adoption in publishing, content management and other unstructured data applica tions
Common pitfalls and what not to do
Awareness of key indicators