THE ANALYTICS FORUM
Conference Sessions
(NOTE: Some sessions have
"Pre-Reading Materials." Follow the links to check them out!)
Tuesday,
November 5, 2002
5:00 pm - 6:00 pm
SIG
The Data Vault: The Next Evolution of Data Modeling
Daniel
Linstedt
Chief Technology Officer
Core Integration Partners
The Data Vault is a patent-pending technique which some industry experts have predicted may start a revolution as the next big thing in data modeling for enterprise warehousing. This SIG session, led by the creator of the Data Vault, will explain what this new concept is, what its architecture and components are, its applications, and the advantages of the Data Vault over existing techniques.
Wednesday, November 6, 2002
10:30 am - 11:30 am
Conference Session
The Many Become One...Integrating Disparate Data into an Enterprise Data Warehouse
Alan
Chow
SVP, R&D
Teradata, a division of NCR
At its onset, data warehousing promised businesses a better understanding of their customers' businesses as a basis for better decision-making. Fifteen years later, some organizations have achieved that goal. They know their customers better, adapt to change faster, and their more accurate predictions pay off in business terms. However, other organizations have poured money into data warehousing efforts, but haven't realized potential returns. What's the difference? All too often analysis is hindered by islands of data scattered across their organization.
Businesses house data throughout the organization in unconnected and incompatible data marts, creating multiple versions of the truth. Relatively speaking, data marts appear to be a cheap way, especially to business units, to have control over specific information. However, current research highlights the disadvantages of those data marts in terms of the cost of ongoing support and maintenance.
In this presentation, Alan Chow, SVP, R&D, Teradata, a division of NCR, explores how to integrate data from disparate transactional ERP systems into an enterprise active data warehouse - giving users a single view of their business and avoiding the use of costly data marts and ODS. Alan includes a discussion of consolidation strategies, from higher-level architectures to specific technical solutions, that increase the speed and reduce the cost of data integration between disparate systems.
Wednesday, November 6, 2002
11:40 am - 12:40 pm
Conference Session
Analytical Modeling Manifesto
Tom
Haughey
Chief Technology Officer
Pepsi Bottling Group
This presentation will re-examine the concept of data design for analytical systems such as the data warehouse. It will take a close look at dimensional modeling and define its proper role and context. It will position ER modeling, dimensional modeling (and other forms) into a general framework. Dimensional modeling is usually presented as the end-all and be-all of data warehousing. Is dimensional modeling one of the great con jobs in data management history? In fact, dimensional modeling has strengths and weaknesses. In some ways it has become outmoded. In other ways, it has been around for decades (and will continue to be). There are three ways to improve performance: use better hardware, use better software and optimize the data. The primary justification for dimensional modeling is to improve performance by compromising the data to compensate for the inefficiency of technology. It uses the third method above. A secondary purpose is to provide a consistent base for analysis. Dimensional modeling comes with a price and with restrictions. There are times and places where dimensional modeling is appropriate and will work, and other times and places where it is inappropriate and will actually interfere with the goals of a warehouse.
To make matters worse, the data warehouse industry suffers from a host of double-entendres that make it difficult to communicate meaningfully. It is not uncommon for two “gurus” to disagree about something without realizing that they are not talking about the same thing. Because of this it is actually necessary to start over and define some terms. This presentation will do just that: it will reexamine these concepts and redefine them; it will establish a framework for integration; and it will address a number of specific analytical modeling issues or situations, such as the following:
The main characteristics of analytical models
How to distinguish logical from physical models
The importance of using principles (not patterns) to do design
How to do database optimization
Logical vs physical models
ER model vs. dimensional model
Data model optimization
Different fundamental grains of facts
Seamless extensibility of a database
Changing dimensions
Assignment of keys, including surrogate keys aggregates
Prodigal data
Ragged hierarchies
Dimensions with multiple values or roles
Representing what did and did not happen
Conforming dimensions
Unexpected data
Time variant models
Dealing with changes in the model
Wednesday, November 6, 2002
1:45 pm - 2:45 pm
Conference Session
Analytical API Update: XML for Analysis & JOLAP
Seth
Grimes
Principal Consultant
Alta Plana Corporation
BI vendors led by Microsoft, Hyperion, and SAS Institute last year released version 1.0 of the XML for Analysis (XML/A) specification, "an open-standards-based messaging interface" designed to "promote the standardization of the data access interaction between a client application and business intelligence systems and other applications over the Web and in distributing environments."
Meanwhile, the nascent JOLAP specification provides a similar API for the J2EE [Java] Web services environment, one that "supports the creation and maintenance of OLAP data and metadata, in a vendor-independent manner."
An overview of the XML/A and JOLAP specifications:
their histories and design points
market and vendor acceptance
their role in the the .Net vs. J2EE Web services war
development directions
Attendees will learn:
how to evaluate the role that XML/A and JOLAP can and should play in their analytical computing architectures, including
how to develop and integrate compliant software systems into a coherent analytical computing environment.
Wednesday,
November 6, 2002
3:15 pm - 4:15 pm
Conference Session
Drill-Thru and the Corporate Information Factory
Nicholas
Galemmo
Information Architect
Nestle
This presentation examines the issues involved in providing drill-through capability from summarized dimensional data marts into a detailed 3NF data warehouse as prescribed in the Corporate Information Factory architecture.
It presents Dr. Kimball's Comforming Dimensions concept and applies it to the CIF. It looks at issues involved in generating and preserving key values and dealing with structural differences between the 3NF and Dimensional models. It identifies problem areas and possible solutions. It investigates the level of functionality a query tool should provide to support cross-model drill through capabilities.
Issues discussed:
Data Mart publication from the Corporate Information Factory
How to maintain conforming dimensional keys in the Corporate Information Factory
The Dimensional context of a drill through query
Applying dimensional context to a 3NF model
What the query tool should do
Alternate solutions
Thursday, November 7, 2002
8:30 am - 9:30 am
Conference Session
New Approaches to Customer Data Integration
A) Reference-Based Customer Data Integration: What it is and Why it’s Better
Chandos
Quill
Vice President, Strategic Marketing
Experian
Integrating customer data is, by nature, a reference process. Knowing whether data is accurate or not requires a picture of reality to which data cleansers and integrators can compare records. Any other process is a mathematical guessing game that tends to over-or-under merge customer records. If companies aren’t careful, they can accidentally eliminate customer relationships and perpetuate data inaccuracies.
This presentation details new reference-based data integration methods that achieve dramatically better results. These methods go beyond mere matching formulas to compare customer data to historical customer reference repositories. Case studies will be presented that demonstrate how reference-based matching has helped companies increase the accuracy and number of matching customer records, eliminate ever-matching, reduce processing times and costs, and keep data integrated over time.
Why it’s important to introduce a referential database to provide links throughout the matching process
How reference based matching keeps data integrated over time
How reference-based data integration pays off in more effective customer acquisition, retention and lower costs
B) Data Synchronization – A New Approach to Enterprise Customer Data Integration
Jeff
Canter
Vice President of Operations
Innovative Systems, Inc.
Data integration projects are complex and challenging. Customer data integration projects are even more complex and challenging because they usually support multiple business units, each with different requirements for defining "customer." The departments’ competing definitions and different business objectives often undermine the success of the traditional customer data integration project.
Data Synchronization provides a new approach to customer data integration, an approach that accommodates competing business objectives, and still provides an integrated, enterprise customer view.
This session will present a new vision for enterprise customer data integration, and real-world applications of this valuable approach. In this session, Jeff Canter will identify and explain the critical success factors for creating a sharable, enterprise customer profile that can easily be segmented into "purpose-driven" views to support the different requirements of departments and applications across the enterprise.
Thursday, November 5, 2002
9:45 am - 10:45 am
XML Tools: XQUERY
Denise
Draper
Chief Software Architect
Nimble Technology
Alex Cheng
Director of Engineering
Ipedo
XQuery is the new query language being designed by the W3C to query XML data. This talk will introduce the main XQuery language features, in particular comparing them to SQL and existing XML access methods such as XPath. We will demonstrate how XQuery can be used to create a simple web application.
Thursday, November 5, 2002
11:00 am - 12:00 pm
Real-Time Integration & Analytics
Seth
Grimes
Principal Consultant
Alta Plana Corporation
John
Ko
Product Marketing Manager
DataMirror
Ron Agresta
Products Engineer
Dataflux
In this session we look at the inexorable push for instantaneous information. Whatever your preferred terminology -- “Real-time”, “Zero latency”, “Information on Demand”, “Active Warehousing” – you need to be working towards shorter and shorter timeframes for getting data into and out of analytical systems.
How is real time integration accomplished in an XML world? Companies must be able to capture selected events such as purchase orders or invoicing from any application database and send them in industry standard XML formats across the enterprise and beyond. Is the “streaming” of XML documents to application servers, B2B exchanges or other XML-driven applications the answer?
What does it take to build a real-time integration infrastructure?
Is the business case really that compelling? Do the ends justify the effort?
Real time XML to database integration: What are the business benefits?
XML streaming
Practical strategies for integrating relational databases, legacy applications and XML
What are the main technical challenges in real-time analytics?