THE DATA INTEGRATION
FORUM
Conference Sessions
(NOTE: Some sessions have
"Pre-Reading Materials." Follow the links to check them out!)
Monday, November 4, 2002
5:00 pm - 6:00 pm
Night School
Concordance: Managing Mismatched Data from Multiple Sources
Denise
Draper
Chief Software Architect
Nimble Technology
When integrating data from multiple sources, one of the primary hurdles to overcome is how to match the data that refer to the same entity across different sources, when there is no 'natural key.' A classic example is two systems that house customer data that have to be matched on customer name or address, but names and addresses are subtly different.
This problem is solved with various kinds of 'merge/purge' techniques for creating warehouses or cleaning source data, but the problem is different when doing virtual data integration, when the underlying source data remains 'dirty' but cannot be changed. We call this the 'concordance' problem.
In this talk, we will describe the issues involved in the concordance problem, and describe a solution based on creating an independent concordance database, which tracks the relationships between records in multiple sources. The issues and steps involved in designing and constructing a concordance database will be described, as well as how the concordance database is then used.
Monday,
November 4, 2002
5:00 pm - 6:00 pm
SIG
Database Product Futures: IBM
Berni
Schiefer
Distinguished Engineer
IBM Corporation
What's next for DB2? IBM is focusing on product features that simplify and automate database management, such as in DB2 Version 8 which will incorporate new self-managing and data-integration features. Come and hear what's in store for IBM's enterprise data management offerings.
Tuesday, November 5, 2002
5:00 pm - 6:00 pm
Night School
Introduction to RosettaNet
Robert
Oberwetter
Application Development Manager
Tokyo Electron America
RosettaNet combines the disciplines of XML, Data Integration and Modeling into an integrated business process. All three areas are critical to, and used by, RosettaNet. This presentation describes:
What is RosettaNet?
How does RosettaNet use XML?
What is a Partner Interface Process (PIP) and how is the process modeled?
How is all this used in business-to-business integration?
Attendees will learn that there is more to application integration than just the integration activities. Each integration has a business process which must be defined and modeled.
Tuesday,
November 5, 2002
5:00 pm - 6:00 pm
SIG
Database Product Futures: Microsoft
Sam
Batterman
Senior Technical Specialist
Microsoft
Microsoft's offerings continue to make inroads into most areas of the database market. During the briefing you'll learn what's next for SQL Server, its various enhancements and other major developments in Microsoft's enterprise data management product line.
Tuesday,
November 5, 2002
5:00 pm - 6:00 pm
SIG
ARUG Meeting
Tom
Bilcze
Roadway Express
The Advantage Repository Users Group will meet to conduct business and introduce interested attendees to the activities of the users group.
Wednesday, November 6, 2002
7:15 am - 8:15 am
SIG
Sharing Live Application Data Across the Internet: A New Concept in Data Storage
Harry Ellis
British Army
This presentation will outline a number of principles and associated technology that should enable automatic rule-based distribution of live application data.
Wednesday, November 6, 2002
10:30 am - 11:30 am
Conference Session
Using Web Services for Integration Within and Outside the Enterprise
Leo
Kraunelis
Director
OASIS/XML.org
Web Services are reducing the cost of integration. This presentation explains what web services are, and demonstrates their application through case studies and ROI examples. It also offers insight on market trends and tools.
Traditional EAI within the enterprise - but at what cost
Built-in web services support will commoditize integration
Web services simplifies integration by providing open standards-based interfaces
Integration with other businesses beyond the firewall - case studies
Web services integration still has its challenges
Web services- getting the ROI of integration
Overview of standards Web Services standards effort.
Wednesday,
November 6, 2002
10:30 am - 11:30 am
Conference Session
The Many Become One ... Integrating Disparate Data into an Enterprise Data Warehouse
Alan
Chow
SVP, R&D
Teradata, a division of NCR
At its onset, data warehousing promised businesses a better understanding of their customers' businesses as a basis for better decision-making. Fifteen years later, some organizations have achieved that goal. They know their customers better, adapt to change faster, and their more accurate predictions pay off in business terms. However, other organizations have poured money into data warehousing efforts, but haven't realized potential returns. What's the difference? All too often analysis is hindered by islands of data scattered across their organization.
Businesses house data throughout the organization in unconnected and incompatible data marts, creating multiple versions of the truth. Relatively speaking, data marts appear to be a cheap way, especially to business units, to have control over specific information. However, current research highlights the disadvantages of those data marts in terms of the cost of ongoing support and maintenance.
In this presentation, Alan Chow, SVP, R&D, Teradata, a division of NCR, explores how to integrate data from disparate transactional ERP systems into an enterprise active data warehouse - giving users a single view of their business and avoiding the use of costly data marts and ODS. Alan includes a discussion of consolidation strategies, from higher-level architectures to specific technical solutions, that increase the speed and reduce the cost of data integration between disparate systems.
Wednesday,
November 6, 2002
10:30 am - 11:30 am
Conference_Session
Be The Master Of Your Domain
Doug
Stacey
Team Leader, Metadata Infrastructure Support
Allstate Insurance Company
Renee Zea
Data Analyst
Allstate Insurance Company
Domain Management is at the core of Allstate Insurance Company's data integration strategy. Through the management of Business Domains, Allstate's Enterprise Data Management team has achieved consistency in business definitions, documented and integrated the multiple sets of values and codes used throughout the enterprise, and provided the links between physical schemas and logical data models. By building a Domain Management set of tools, Allstate has created a solution for researching, managing, and standardizing both encoded and non-encoded data. This presentation will discuss the tools and techniques we used for building the environment and how the application community is now leveraging that information for the integration of systems.
Learn the value of domain management as a basis of data integration
Understand Allstate's approach to domain management
See a demonstration of some of Allstate's domain management toolset
Understand the benefits domain management and data integration have afforded Allstate
Wednesday,
November 6, 2002
11:40 am - 12:40 pm
Panel
Database Futures
Alan
Chow
SVP, R&D
Teradata, a division of NCR
William
Ruh
Senior Vice President of Professional Service
Software AG, Inc
Sam
Batterman
Senior Technical Specialist
Microsoft
This panel session brings together three distinctly different, but strongly held, views of the future of database technology. Software AG sees that native XML databases will explode in popularity, and soon. Indeed, they are betting the company on this view. Teradata, always the proponent of “big” solutions, sees enterprise warehousing as the model for the future, at the expense of smaller, distributed data marts? Intuitively, consolidation seems to make sense, but does it work as well in practice? Then there’s Microsoft, perhaps the only organization big enough to cover all its bases. Where does it see the marketplace heading in the near, medium and long term?
Wednesday,
November 6, 2002
11:40 am - 12:40 pm
Conference Session
Are We Headed Towards Massively Distributed Integration?
Michael
Hoskins
President
Data Junction Corporation
As integration is now requisite both inside and outside the enterprise, new kinds of problems are created demanding new solutions. In today’s environment of Distributed Application Integration (DAI), each new application or integration point (inside or outside the enterprise) spawns a new set of integration issues, highlighting the dynamic, exponential (as opposed to the traditional linear) nature of today’s integration challenges. Subsequently, new problems are not readily addressed by the traditional standby solutions. Custom code, EAI tools and XML-only B2Bi solutions all fail to address these new challenges and concerns that crop up when attempting to "connect everyone to everything." The modus operandi of each of these solutions is to resolve one integration issue at a time, relying on a "problem du jour" framework that does not adequately confront integration at a widespread, systemic level.
To solve today’s massively distributed application integration projects, solutions must be massively distributed as well. Basic patterns in biology teach us what type of architecture effectively solves massively distributed problems -- not only must the solution itself be massively distributed, it must also be highly intelligent and dynamic, changing and developing as the challenges themselves evolve. Consequently, it is through emergent integration systems, working at the firewall of each business in an integration chain, that disparate data can be mediated (semantically and syntactically) into the enterprise’s own unique systems. The presentation will also broach the topic of how does Web services fit into the picture.
Wednesday, November 6, 2002
1:45 pm - 2:45 pm
Panel
The Semantic Web
Brett
Champlin
Process Center of Expertise
Allstate Insurance Company
William
Ruh
Senior Vice President of Professional Service
Software AG, Inc
Dave
McComb
President
Semantic Arts
The Semantic Web is a much anticipated (and yet often misunderstood) concept. Fundamentally, the Semantic Web is a vision of the future in which documents and data contain descriptive metadata which allows them to be easily understood by computers. Like so many nascent ideas in technology, the potential payoffs are huge, but the implementation questions remain unanswered. Nonetheless, it deserves a much closer look.
In this session we’ll evaluate the business implications of the Semantic Web, and how much time and effort your organization should devote to further research.
What are the basic concepts and technologies underlying the Semantic Web?
Are the expectations realistic? Or are we getting carried away on an unachievable hype curve?
What are the practical business benefits? The high-payoff applications?
How will it integrate disparate information sources?
Are there any tools available to start building it now?
Wednesday,
November 6, 2002
1:45 pm - 2:45 pm
Conference Session
ETL vs. EAI: Comparing Data Integration Approaches
Faisal
Shah
Chief Technology Officer
Knightsbridge Solutions
EAI follows ETL as the latest category of data integration tools. Many organizations are tempted to address all of their integration needs through just one category of tool. At first, this seems like the most cost-effective and efficient way to address the integration issue.
Unfortunately, the long-term costs of trying to solve ETL issues with EAI tools (and vice versa) can far outweigh the upfront costs. The two categories treat latency, unit of work granularity, meta data integration, third-party product integration, and other product dimensions differently.
An organization needs to address ETL and EAI holistically and at the same time understand that there are still significant differences between the tools and ways to approach integration projects. EAI and ETL tools continue to grow closer together, but there are still significant advantages to using each for its original purpose, and knowing how to leverage these will allow an integration project to deliver the right information at the right time and at the right cost.
Dimensions of the data integration challenge
Outlining situations that call for ETL, EAI, or both
Sample integration architectures
Future of ETL and EAI
Wednesday, November 6, 2002
1:45 pm - 2:45 pm
Conference Session
Data Refactoring: Enabling Iterative and Incremental Database Development
Scott W. Ambler
President and Senior Consultant
Ronin International
“Traditional” development practices, practices that are still followed by many data professionals today, are nearly serial in nature and “driven” by one or more forms of entity/data model that were baselined early in the software lifecycle. Times have changed. Dramatically. Modern software development methodologies, including both rigorous processes such as the Rational Unified Process (RUP) and agile processes such as eXtreme Programming (XP), are based on the premise that software should be developed in an iterative and incremental manner. Furthermore these processes are often driven by new types of artifacts, use cases and user stories respectively, and not data-oriented artifacts. Application developers are adopting new ways to work, why can’t data professionals?
Data refactoring is a technique that
enables data professionals to work in an iterative and incremental manner, just
like the application developers they support.
Like source code refactoring, data refactoring is based on the idea that
you can evolve your data schema over time by applying small changes that improve
its design without destroying its original invariants.
This presentation explores the issues surrounding data refactoring,
although it is quite simple in green field environments it becomes quite complex
in the highly-coupled reality of legacy databases, and overviews the techniques
and philosophies that data professionals need to adopt to support modern
development projects. Data refactoring
is an enabling technique of the Agile Data method.
Pre-Reading
Materials
Wednesday, November 6, 2002
1:45 pm - 2:45 pm
Conference Session
Enterprise Data Integration: Development of an Enterprise Data Model
Noreen
Kendle
Enterprise Architect
Delta Technology - Delta Air Lines
This presentation is focused on the "How" to develop an Enterprise Data Model. It describes the approach developed and used at Delta Air Lines for the creation of an Enterprise Data Model. The Delta Air Lines Enterprise Data Model is now being used to create the Operational or Enterprise Data Stores, integrating operational data across the airline business. It describes a 7 step practical methodology for developing an Enterprise Data Model that incorporates a "top Down" and "Bottom up" approach. It incorporates an enterprise view needed for integration to support an ODS and/or DW, as well as the current state (work already accomplished – existing models) for practicality and quicker development. The presentation focuses on How to build the enterprise data model using this methodology.
Definition of what is an Enterprise Data Model
Explanation of the methodology used, an approach of top down and bottom up
A description of HOW to create an Enterprise Subject Area Model with real world examples
A description of the development of the Enterprise Conceptual Models with real world examples
A description of data rationalization in the bottom up /top down integration resulting in the Enterprise Data Model
Wednesday, November 6, 2002
3:15 pm - 4:15 pm
Conference Session
XML Tools: XML Views
Bradley
Wright
Vice President, Product Development
MetaMatrix, Inc
Mark Milodragovich
Senior Information Engineer
Nimble Technology, Inc.
Integration is clearly one of the core benefits of XML deployment, and can take various forms. In this session we examine two aspects of XML data integration.
The first, sometimes called XML Views (or virtual XML documents) dynamically mediates and integrates data from heterogenous data sources. The speaker will present a high level survey of vendor claims/announcements of XML Views to help attendees sort through confusing terminology.
In the second part we will discuss the integration of legacy systems into XML standard schemas. This presentation will show how the OMG Meta Object Facility (MOF) extends UML modeling to apply to modeling diverse information sources, including XML schemas to create Platform Independent Models (PIMs). The speaker will show how the schema can be represented as a virtual model and then mapped to Non-XML physical sources, as well as XML sources. He will then show how the virtual models are applied to the integration of the diverse information sources.
Virtual XML documents
Example of mapping disparate data sources into XML Documents for consumption by Web Services or other applications
Metadata modeling of XML Schema
Wednesday,
November 6, 2002
3:15 pm - 4:15 pm
Conference Session
Roadmap to Federated Data Architecture
Ho-Chun
Ho
President
HoTech Corp
The goal of architectural planning is to enable organizations to optimize revenue and increase shareholder value by establishing the supporting strategy, standard process, culture, technology and best practices. Over the years organizations have been building silo systems and isolated data islands, oftentimes forced by realistic reasons. It is largely overlooked that inadequate design of the organization of data architecture contributes to this disparity. This presentation will discuss typical models of data architecture organizations in the U.S., the pros and cons of each type of organization, the concept of federation governance and local autonomy, and the roadmap to establish data architecture in a federated manner based on real-life experience.
Wednesday,
November 6, 2002
3:15 pm - 4:15 pm
Conference Session
Information Quality through Semantic Models
Joshua
Fox
Software Architect
Unicorn Solutions Ltd.
Understanding data source semantics and their reference to a unified business model is central to ensuring total information quality.
This presentation will show data managers how to apply a central conceptual model to provide semantics to data schemas. It will answer the two critical questions - where is the data? and what does it mean? Combining such a model with a formal development process ensures information quality that transcends the limits of a single system, transformation, or data warehouse.
Data integrators today analyze the business concepts behind their data, and design transformation logic to unify metadata. These procedures must be repeated individually for each data source and transformation, with the resulting integrations providing low quality output that is often impossible to maintain. Participants will learn how to apply these techniques when moving data into a data warehouse with an ETL tool, when integrating databases from recently merged organizations, or when cleansing legacy databases with badly-structured data.
The presentation will demonstrate how analysts can understand their numerous data sources without re-analyzing each schema’s semantics and structure. When the rich semantic model helps implement business information quality coherently across the enterprise, disjointed data is transformed into meaningful information.
This talk is targeted at data managers, data modelers, information quality specialists, and data stewards. The topic is also relevant to EAI specialists who develop transformations for EAI message brokers. The presentation will also appeal to conference attendees who are interested in new ideas in the fields of ontology and the Semantic Web.
Thursday,
November 7, 2002
8:30 am - 9:30 am
Conference Session
Engaging Data Administration in the Enterprise
Tom
Bilcze
Senior Group Coordinator
Roadway Express
Is your corporate Data Administration group in danger of falling like a house of cards? Why have companies abandoned sound principles of data design and administration? Often the bottom line is that Data Architects that offered the promise of building a sound data infrastructure ended up littering the road to systems development with walls and obst ructions.
In this session you will see how to build a collaborative environment by partnering with applications developers and end-user business staffs. You will discover some value-added techniques that will draw you in and make you a key player in business projects. You will see how your data modeling toolkit, analysis techniques and your company's Intranet can help you make this technique a reality.
Breaking out of the traffic cop mentality to data administration
Inventorying and building on your strengths and assets
Knowing your customer and adding value
Partnering to add to the bottom line
Building an information infrastructure for the future
Attendees will also learn:
How to update your methods and procedures for today's rapid development cycles
How to market and sell your new "value of information" philosophy
How to use your data modeling tools to effectively communicate with both technical and non-technical users
How to become an active participant in business analysis and application development efforts
Thursday,
November 7, 2002
8:30 am - 9:30 am
Conference Session
New Approaches to Customer Data Integration
A) Reference-Based Customer Data Integration: What it is and Why it’s Better
Chandos
Quill
Vice President, Strategic Marketing
Experian
Integrating customer data is, by nature, a reference process. Knowing whether data is accurate or not requires a picture of reality to which data cleansers and integrators can compare records. Any other process is a mathematical guessing game that tends to over-or-under merge customer records. If companies aren’t careful, they can accidentally eliminate customer relationships and perpetuate data inaccuracies.
This presentation details new reference-based data integration methods that achieve dramatically better results. These methods go beyond mere matching formulas to compare customer data to historical customer reference repositories. Case studies will be presented that demonstrate how reference-based matching has helped companies increase the accuracy and number of matching customer records, eliminate ever-matching, reduce processing times and costs, and keep data integrated over time.
Why it’s important to introduce a referential database to provide links throughout the matching process
How reference based matching keeps data integrated over time
How reference-based data integration pays off in more effective customer acquisition, retention and lower costs
B) Data Synchronization – A New Approach to Enterprise Customer Data Integration
Jeff
Canter
Vice President of Operations
Innovative Systems, Inc.
Data integration projects are complex and challenging. Customer data integration projects are even more complex and challenging because they usually support multiple business units, each with different requirements for defining "customer." The departments’ competing definitions and different business objectives often undermine the success of the traditional customer data integration project.
Data Synchronization provides a new approach to customer data integration, an approach that accommodates competing business objectives, and still provides an integrated, enterprise customer view.
This session will present a new vision for enterprise customer data integration, and real-world applications of this valuable approach. In this session, Jeff Canter will identify and explain the critical success factors for creating a sharable, enterprise customer profile that can easily be segmented into "purpose-driven" views to support the different requirements of departments and applications across the enterprise.
Critical to the success of this new approach to customer data integration is data quality. Canter will outline how organizations can ensure that these different "purpose-driven" views are consistent and accurate.
Thursday,
November 7, 2002
8:30 am - 9:30 am
Conference Session
Good (Data) to Great (Data) - Part 1
Robert
Seiner
Publisher, The Data Administration Newsletter (TDAN.com)
& Principal, KIK Consulting
In 2001, Jim Collins wrote a best selling business book titled “Good to Great: Why Some Companies Make the Leap … and Some Don’t”. In this book, Mr. Collins wrote about how the successful companies develop detailed business plans and build strict disciplines to go from “good” (or even mediocre or poor) companies to “great” companies.
This presentation by Robert S. Seiner (and ensuing brief discussions) will highlight a number of instances where companies did (or did not) create fundamental data management plans and implement disciplined enterprise data management efforts and what we as practitioners can learn from these efforts.
The presentation will focus on specific business needs for an enterprise data management approach, the pragmatic disciplines that will be most effective most quickly, and the data focused technologies that can be looked upon as the accelerator to enterprise data management success.
Thursday,
November 7, 2002
9:45 am - 10:45 am
Conference Session
EAI Aftermath - What Next?
Sheila
Jeffrey
Vice President
Wachovia
This presentation will discuss possible consequences of EAI (Enterprise Application Integration) strategies. The goal is to highlight less visible aspects of current EAI approaches for attendees, and outline an alternative end state.
EAI technology and business drivers will be briefly reviewed to explain the current challenges. The relationships between business organizations, processes and data will be presented as a context for examining the potential legacies of today’s EAI implementations. The evolution of data to information to knowledge (to wisdom) will be outlined as a driver for increasing solution complexity and growth in data volumes. Application architectures intended to address these expanding expectations will be reviewed - distributed solutions to complement legacy applications, ERP as EAI, data warehouses, and current middleware approaches.
Practical considerations of EAI implementations will be assessed in this context -- what is the real probable business model for Web services, security considerations, retaining control/ownership of ‘your’ data, data quality issues, and metadata management concerns. The need for a simplified, re-engineered, rationalized, distributed application portfolio will be presented as the conclusion.
The take-away from this session will be an awareness of pitfalls to avoid, design techniques to employ, and criteria for assessing EAI solutions for sustained benefit.
Thursday,
November 7, 2002
9:45 am - 10:45 am
Conference Session
Achieving Semantic Interoperability in Transactional Environments
Chito
Jovellanos
President & CEO
forward look, inc.
This presentation will examine and critique large-scale real-world applications that address the semantic interoperability problem. Using the Financial Securities industry as a reference point, attendees will understand the evolution of interoperability problems between trading systems, the resolution strategies to date, and the tactical approaches needed to achieve semantic interoperability. You will gain an understanding of the challenges presented by real-world objectives and constraints. The speaker will also explore a new approach to the semantic interoperability problem called "semantic signaling", and will debunk the notion that XML is a pre-requisite for semantic interoperability.
The approaches taken within the securities industry are indicative of the problems that will need to be dealt with in other industries that are attempting to participate in eCommerce (eg, governments, manufacturing etc). Every transaction has counter-parties, a need for usable product information, a reference data framework such as calendar information, currency regime, taxes, fees and so on - all of which are presented to industry partners using the semantics inherent in the enterprise's internal systems.
The presentation will be of interest to both practitioners and applied researchers who are currently engaged in large-scale Enterprise Applicaton Integration (EAI) projects. This presentation offers a practical hands-on assessment of commercial solutions and new techniques for addressing the deep semantic issues in EAI.
Attendees should have a fundamental understanding of metadata management, basic statistics, ontology development, XML, distributed systems, and middleware.
The Semantic Interoperability Problem
Business perspectives (using the Securities industry for examples)
Technology issues
Ontologies and Standards
Case Studies from the Securities Industry (including application demonstrations and walkthroughs)
Corporate action reconciliation using mediation (eg, global custodians who collect information from world-wide sources)
Trade matching using "semantic signaling"
Thursday,
November 7, 2002
9:45 am - 10:45 am
Conference Session
Good (Data) to Great (Data) - Part 2
Robert
Seiner
Publisher, The Data Administration Newsletter (TDAN.com)
Founder & Principal, KIK Consulting
Hal Davis
Project Manager
Mellor Financial Services
William Lewis
Senior Technology Specialist
Cambridge Technology Partners
In 2001, Jim Collins wrote a best selling business book titled “Good to Great: Why Some Companies Make the Leap … and Some Don’t”. In this book, Mr. Collins wrote about how the successful companies develop detailed business plans and build strict disciplines to go from “good” (or even mediocre or poor) companies to “great” companies.
This presentation by Robert S. Seiner (and ensuing brief discussions) will highlight a handful of case studies where companies did (or did not) create fundamental data management plans and implement disciplined enterprise data management efforts and what we as practitioners can learn from these efforts.
The presentation will focus on specific business needs for an enterprise data management approach, the pragmatic disciplines that will be most effective most quickly, and the data focused technologies that can be looked upon as the accelerator to enterprise data management success.
Thursday,
November 7, 2002
11:00 am - 12:00 pm
Conference Session
Real-Time Integration and Analytics
Seth
Grimes
Principal Consultant
Alta Plana Corporation
John
Ko
Product Marketing Manager
DataMirror
Ron Agresta
Products Engineer
DataFlux
In this session we look at the inexorable push for instantaneous information. Whatever your preferred terminology -- “Real-time”, “Zero latency”, “Information on Demand”, “Active Warehousing” – you need to be working towards shorter and shorter timeframes for getting data into and out of analytical systems.
How is real time integration accomplished in an XML world? Companies must be able to capture selected events such as purchase orders or invoicing from any application database and send them in industry standard XML formats across the enterprise and beyond. Is the “streaming” of XML documents to application servers, B2B exchanges or other XML-driven applications the answer?
What does it take to build a real-time integration infrastructure?
Is the business case really that compelling? Do the ends justify the effort?
Real time XML to database integration: What are the business benefits?
XML streaming
Practical strategies for integrating relational databases, legacy applications and XML
What are the main technical challenges in real-time analytics?