TUTORIALS
(NOTE: Some sessions have "Pre-Reading Materials."  Follow the links to check them out!)


Monday, November 4, 2002
8:30 am - 4:45 pm 

T1: Introduction to XML for Enterprise Data Management and Application Integration

Peter  Aiken
Founding Director
Institute for Data Research

XML Data Management
XML represents a critical future direction for the management of data, metadata, business rules and will play an increasingly important role is business and systems engineering.  The first half of the tutorial describes how XML works, and shows you how to quickly and easily start incorporating XML capabilities into your data management programs.

XML Basics

What XML is?  What XML is not?  How does it work as a meta-language? 

XML Usage

What business problems can XML solve? How it is being used by organizations?  How can it save you money?

XML Architecture

How does XML work from an architectural perspective?  

Overview of XML Architectural Components including:

XML Application Integration

In the past, EAI, has focused on middleware-based solutions aimed at connecting disparate applications together. Now businesses are realizing that technical solutions alone cannot help us to tame the legacy dragon, integrating new and working applications, as well as new or existing data in databases or files, built using diverse technologies, across a network connecting the machines of a company or companies.  XML-based EAI technologies permits implementation with minimal or no change to the existing applications or data - a non-intrusive approach."  This talk highlights aspects of XML-based, EAI technologies that can deliver tangible integration, rapidly when implemented by data management.


Monday, November 4, 2002
8:30 am - 4:45 pm 

T2: Avoiding Catastrophe in Data Integration and ETL

Michael  Scofield
Consultant and Author

Today, with greater frequency, data stewards (DBA’s, data architects, data warehouse designers, etc.) are being asked to integrate data from multiple, dissimilar sources into a common database, or to a target data warehouse.  This can be because of mergers or acquisitions of companies, or the effort to integrate customer data from various owned applications to support a more aggressive CRM.  Or, it can be necessary when a data warehouse is designed with the goal of associating together cause and effect data from disparate source applications. 

Getting the data onto the same platform is the easy part.  Integrating it so it makes sense is much more difficult.  And this is not a challenge which most technically-oriented programmers can meet.  The successful data analyst must focus upon the data, and the business context of the data. 

Successful mapping of source data to target field depends upon a thorough understanding of the business meaning and data architectures of each source, and designing target database appropriately.  By semantic, we mean ensuring that each source data field has the comparable meaning, scope, and normal behavior (not merely field-name and format) corresponding with its peer source field(s).  Merging two sources is exciting enough.  Merging three or more can be terrifying. 

This workshop will cover a wide range of techniques showing many practical examples of actual data.  It is not enough to use documentation (file descriptions, etc.) of sources (which may be obsolete).  One must look at the actual data -- all of it. 

We will discuss step-by-step techniques for uncovering data anomalies, data quality problems, and semantical discontinuities in how a field is used in the context of a data source.   We start by creating an inventory of the data (particularly the sources), and the logical architecture of each data source, and the behavior of the data, from the high-level view down to the specific, detailed behavior of each field and column, and inter-dependencies.  Understanding data behavior includes understanding significant subtypes of subject entities, and their life cycle.  Data behavior also includes the quality and consistency of the data.  We will look at many examples of anomalies in data fields, and show techniques for evaluating these anomalies -- they could be errors, or merely reveal business behavior anomalies (are the auditors paying attention?).  

Techniques in data profiling and domain studies will be shown in detail with examples of surprise findings.  For example a field may be used in one way for one entity subtype, and in a different way for another subtype.  Never underestimate the creativity of application owners to use a field for a purpose different than its original intent.  Even the treatment of negative values (such as total invoice amount) may be different for different sources. 

Then, the task of evaluating the commonality of any pair of source fields, and determining the appropriate target field in the target database is not for the naive.  We will review some mistakes of wimp analysts who made unwarranted assumptions about source data, without even looking at the actual business data (gasp!).  In contrast, we will review sound analytical techniques for getting the correct mapping and translation to the target database.  Also, data quality issues such as validity, completeness, richness, and accuracy will be discussed with numerous examples. 

Finally, we will survey techniques of establishing an on-going data surveillance program to ensure that later production-ized loads of data will not be caught by surprise when a source changes definitions or scope of the data it supplies.  To recognize the importance of this, we must review all the external factors which force enterprises to “morph” their logical business data architectures.  Any external source which is subject to such morphing pressure may change some aspect of its logical architecture (and hence the precise meaning of the data they send you), and a receiving organization needs to be alert for such changes, or significant trauma to the target database (or data warehouse) may result. 

We will demonstrate a practical approach to (a) maintaining a comprehensive and up-to-date inventory of the significant data in an enterprise, and (b) maintaining a knowledge base of “dynamic meta-data” -- knowledge about the domains and behavior of each uncontrolled data source.  We will discuss the use of the metrics data warehouse as a means of automating data behavior surveillance.

Michael Scofield is a popular speaker and consultant in data quality and data management.  Most recently, he was Director of Data Quality at Experian (formerly TRW Credit Data).   Prior to that, he was Vice President and Manager of Information Quality for Home Savings of America (Los Angeles).  He is keenly interested in data quality assessment, and reverse engineering and mining of production databases.   

His articles on data architecture and data quality techniques have been published in Information Week, IBI System Journal, Data Management Review, the Cutter IT Journal, and the Database Newsletter.  His speaking engagements include DAMA-International conferences, Meta-data Conferences in London and the U.S., various DAMA chapters, DB2 user groups, and various CASE user group conferences.  He also writes humor, published in the Los Angeles Times and other journals.


Monday, November 5, 2002
8:30 am - 4:45 pm

T3: Transforming An Operational Model Into A Physical Warehouse Database

Tom  Haughey
Chief Technology Officer
Pepsi Bottling Group 

This presentation will show the progression of a data model from an operational model to an analytical database, including the central data warehouse model and different data mart structures. The purpose of this presentation is to discuss and demonstrate the natural progression that occurs when data migrates from the operational environment, through the data warehouse to the decision support or data mart environment. We will use a physical operational model as the start point. We will end with physical analytical database designs. The basic premise of this workshop is that this transformation must be based on principles, not just patterns. Throughout the workshop real life case study information will be presented to illustrate the concepts. 

Modeling
We start with a brief review of data modeling and dimensional modeling concepts with examples. Next, a review of the overall architecture of a typical data warehouse environment will be presented. The main purpose here is to set the framework for subsequent discussions. It is essential to agree on basic definitions for the progression of a data model to make proper sense.

Central Data Warehouse
The first major DW structure is the Central Data Warehouse (CDW).  The CDW is the focal and most granular repository within the DW. The CDW supports long-term strategic analysis and reporting. It has three main purposes: to provide data to any requiring application; to support some direct querying; and to support all ad hoc querying. It houses integrated data at both atomic and summarized levels. It main characteristics are that it is atomic, integrated, historical, read only, general purpose and application independent. The presentation will discuss and demonstrate in detail the typical kinds of transformations that an operational model undergoes in progressing to a CDW model.  

Database Optimization
A sensible process for doing database optimization includes a progression of optimizations. We will reveal why these are compromises or trade-offs. A compromise is a practice that emphasizes one feature, which then becomes an advantage, against another feature, which then becomes a potential disadvantage. We discuss the importance of using principles versus just patterns to do optimizations. All database design in the analytical environment hits upon several key issues, such as, surrogate versus natural keys, aggregation and history. Each will be discussed.
 

Data Marts
The next step in the progression is data marts. A data mart is an environment containing a specialized set of related data, customized for a specific community of knowledge workers, analysts or planners, to support their reporting and analysis needs. There are three types of data marts: embedded, dependent and independent.
 

Finally, we bring these all together and review the major concepts, rules and tasks involved in the transition from physical operational model to physical analytical database. 

Tom  Haughey is one of four originators of Information Engineering in America. He is currently CTO for Pepsi Bottling Group after being Pepsico’s Director of Enterprise Data Warehousing. He was formerly President of InfoModel, Inc., a consultancy in Data Warehousing. His courses have been delivered to companies around the world. He has worked on the development of seven different CASE and wrote his own in 1984. He formerly worked for IBM for 17 years. He is the author of many articles on DW and IE. He was VP of Technology for Silverrun Technologies. He is working on a book, "Designing the Data Warehouse - The Real Deal". Tom earned a BA in English. 


Monday, November 4, 2002
8:30 am - 4:45 pm 

T4: Data Modeling - The Big Issues 

Graeme  Simsion
Senior Fellow
University of Melbourne

In this full-day tutorial, Graeme Simsion, author of Data Modeling Essentials, will look at some of the most important issues facing today’s data modelers, and offer practical approaches to addressing them.  The tutorial will address the role of data modeling and data modelers, as well as the modeling process itself.

Topics covered will include:

There will opportunity for discussion and debate, and for attendees to introduce additional topics.

Graeme Simsion founded Australian consultancy Simsion Bowles & Associates in 1982, after working as a DBA for a major insurance company.  Over 20 years he grew the business from a one-person operation specializing in data modeling to some 70 staff in three states, offering consultancy in data management, information systems, and business process design.  Graeme sold Simsion Bowles in 1999, and is currently a Senior Fellow with Melbourne University’s Department of Information Systems.  Throughout his career, he has been a regular publisher and presenter and is the author of the widely used text, Data Modeling Essentials.


Tuesday, November 5, 2002
8:30 am - 4:45 pm 

T5: Implementing a Message-Based Data Integration Strategy

David  McComb, President
Simon  Robe, Senior Technical Consultant
Simon Hoare, Senior Technical Consultant
Semantic Arts

Messaging and Service Oriented Architectures offer huge potential improvements for enterprises, but in order to reap those benefits, companies need to have an organized approach to their adoption, configuration and use.  In this tutorial we will put message and service based architectures in context with the four other major integration strategies.  We will cover the economics, topologies and tradeoffs involved in getting a message oriented architecture implemented.  We demonstrate the need for enterprise message modeling as a new discipline and cover the need as well as methodology for achieving this.  Tutorial includes case studies and suggestions in getting started down this road. 

Message Based and Service Oriented Architectures

A Target Architecture

Enterprise Message Modeling

Getting Started

There are no strict prerequisites for the tutorial, other than perhaps an appreciation for the challenges of systems integration and curiosity about how some of the more current techniques can be used to address them.  We will take care to explain new concepts, and especially acronyms as they are introduced.   

Participants will come away with an understanding of how message based approaches can fit in with traditional systems integration approaches.  They will appreciate that this is not an all or nothing endeavor.  They will have a methodology and guidelines to implement enterprise message modeling and they will be given guideance on how to bring this into their organization.

Dave McComb, President of Semantic Arts, has been designing and managing enterprise integration projects for 26 years, 13 with Andersen Consulting and 13 independently.  He was the project manager and lead designer of the “Organic Architecture” at Velocity.com, which was perhaps the first completely meta level application architecture.  McComb is the lead inventor on three software patents, has written and spoken widely and is currently working on a book on Semantics.

Simon Robe, Senior Technical Consultant, Semantic Arts, has been doing enterprise data modeling for over 22 years, with clients including Nestle, Technicolor, Lucas, Los Angeles Police Department, and Velocity.  He specializes in the design and implementation of message based data warehouses and information delivery systems.

Simon Hoare, Senior Technical Consultant,  Semantic Arts, is an enterprise architecture at the implementation level, and has been for 10 Years.  He was the lead architect for the server side portion of the Organic Architecture at Velocity.  He has considerable experience with Object Oriented Databases, query implementation and software portability.  He is co-inventor on two software patents, and certified as a Web Methods developer.


Tuesday, November 5, 2002
8:30 am - 4:45 pm

T6: Enterprise Metadata Implementation: Learning from Best Practices

R. Todd  Stephens
Director of the Metadata Services Group
BellSouth Corporation

This tutorial focuses on the formulation and implementation of an enterprise metadata strategy. Participants will learn techniques to understand the role of metadata in an Enterprise Application Integration (EAI) environment. 

The Metadata Services Group within BellSouth has spent the last 3 years developing an enterprise metadata solution based on a solid product line and a customer service focus.  This tutorial will develop the attendees' understanding of how to develop a successful enterprise metadata implementation.  The presenter will be taking three years of metadata experience and condensing that knowledge this six-hour tutorial.  We will examine the principles of marketing, selling strategies, service offerings, product design, architecture, team construction and overall strategy of delivery for an enterprise metadata solution.   

Over the past three years, we have seen an onslaught of architecture developments. Web Services, SOAP, UDDI, EAI, and many others made headlines as the technology boom took place. If you're  thinking of implementing enterprise metadata, no one needs to tell you about the challenges of integration.  Technical architecture may hold center stage of the developing software industry, as attention focuses on toolkits and multi-platforms; however, this area of the industry is merely the tip of an iceberg, which reaches much further than merely delivering business functionality.  Under the waterline exists the core technology that drives new and old business models.  That technology is simply defined as enterprise data and enterprise metadata.

Come see why BellSouth, won a 2002 “High Commendation” Wilshire Award for Metadata Best Practices and why they have recently received praise including:

“After seeing the Enterprise Repository presentation, I can say that BellSouth has the best collection of repositories I have ever seen.”

Presentation Content

 Attendees will Learn the following

Todd Stephens is the Director of the Metadata Services Group for the BellSouth Corporation, an Atlanta-based telecommunications organization serving over 37 million customers in 20 countries. Todd has served as the director since 1999 and is responsible for setting the corporate strategy and architecture for the development and implementation of the Enterprise Metadata Repositories, which include metadata, data transformation, component, XML, content, documentation, UDDI, messaging, metrics, interfaces, and the Enterprise Information Portal using XML technologies. For the past 18 years, Todd has worked in the Information Technology field including leadership positions at BellSouth, Coca-Cola, Georgia-Pacific and Cingular Wireless.   

Todd holds degrees received in 1986 in Mathematics and Computer Science from Columbus State University, and he earned an MBA degree from Georgia State University in Atlanta, GA., in 1990.  Currently, Todd is pursuing his Ph.D. in Information Systems at Nova Southeastern University. The majority of his research is focused on Metadata Reuse, Semantic Zooming, enabling Trust within the Internet, Usability and Repository Frameworks. On this, he has been awarded two U.S. pending patents in the field of Metadata with three more in the process.


Tuesday, November 5, 2002
8:30 am - 4:45 pm

T7: Developing a Master Plan for Managing Your Ever-Growing Data

Daniel  Linstedt
Chief Technology Officer
Core Integration Partners

This course will handle the business of planning for extremely large data volumes.  We will talk about the terabytes today, to the future of hundreds of terabytes and petabytes tomorrow.  The course will have an underlying theme of unique issues that large data sets present.  These issues will encompass a mix of today’s technologies (available to solve some of the traps), and tomorrows architectures (not yet available).

We will discuss what needs to be considered from a business and technical perspective - including: capturing information, integration and management, application of analytics, filtering, reporting, and future thoughts. We will not be covering the technical details of each section – as that would take an entire week’s worth of time.  We will be focusing on large data sets, quality, and synthesized data as a theme.  We will not be covering the technical details of different hardware/software/platforms and their ability to handle extremely large data sets.  We will be covering the planning, thought processes, up-front work, tricks and traps of these extremely large systems.

The course will present some of the tricks, traps, and mitigation strategies for working with large data sets – including staffing and skill set requirements and how they differ from projects with smaller data sets. Some of the subtopics that will be covered are data movement/migration, data mining to assist with quality, handling near-real time and batch, dealing with change data capture, managing data stores, and understanding which metrics are the right metrics.  The analogy of managing very large data sets of the terabyte nature is similar to climbing Mt. Everest, compared to taking a walk in the park.  The analogy of handling petabytes is similar to manning a mission to mars or the moon, as opposed to climbing Mt. Everest. 

Additional sub-topics include: handling unstructured data, the role of metadata, reporting and querying from large scale information, the accuracy / inaccuracy of up-to-the-second information, and potential reporting bottlenecks.  We will conclude this course with a discussion on futures, the importance of the right framework (CIF), handling closed loop systems (lights out systems), dynamic data warehousing, unstructured information of the future and making the most of metadata.

What you might learn includes the following (from a planning perspective):

Course outline:

The business of ELDB, VLDB

Capturing Information

Integration

Management

Applying Analytics

 Filtering

Reporting

 Future Thoughts

Pre-Reading Materials

Dan Linstedt is a nationally known expert on data warehousing, business intelligence, databases, major computer languages, client/server, OLTP, operating systems concepts, and performance and tuning. He has been the lead engineer on enterprise-wide data warehouse projects and refinements for such large clients as KPMG, CH2MHILL, Echostar, Arrow Electronics, and Deloitte & Touche.  Recently, Dan developed his firm's groundbreaking The Matrix Methodology, which provides a consistent, repeatable process for building a data warehouse and is the first methodology of its kind to conform to SEI (Software Engineering Institute) CMM level 3 specifications. Dan is also the inventor of the Data Vault – an Enterprise Wide Data Warehousing architecture.


Tuesday, November 5, 2002
8:30 am - 4:45 pm

T8: OMG's Model Driven Architecture

Jon Siegel
Vice President, Technology Transfer
Object Management Group

Cory Casanave
CEO
Data Access Technologies

Michel Brassard
Chief Technology Officer
Codagen Technologies Corporation

David Bertrand
Director, Consulting
CGI Group, Inc.

Because each middleware platform works best in a particular network niche (such as behind the firewall, or over the Internet), today's enterprise must deal with a multitude of platforms and connectivity paradigms. OMG's new Model Driven Architecture (MDA) unifies and simplifies this environment by defining software fundamentally at the model level, expressed in the standard Unified Modeling Language (UML).

An application's base model specifies every detail of its business functionality and behavior in a technology-neutral way. Working from the base model, MDA tools use OMG-standard mappings to generate interfaces and most or all of the implementation code for one or more target middleware platforms. Tools also generate cross-platform invocations, allowing easy interworking with other applications wherever they reside. MDA supports applications over their full lifecycle starting with design and moving on to coding, testing, and deployment, through maintenance, and eventually to evolution to a new platform when an application's existing platform becomes obsolete.

Another benefit: because industry standards defined in the MDA are platform-independent, they can be used by every enterprise even in industries that haven't converged on a single middleware platform. The MDA is the base architecture for OMG standards (as of September 2001).

During the morning and first part of the afternoon, Jon Siegel will present a technical tutorial starting with MDA foundation technologies - OMG's Unified Modeling Language (UML), the MetaObject Facility (MOF), XML Metadata Interchange (XMI), and the Common Warehous Metamodel (CWM) - and finishing up with the MDA itself.

At the heart of business is collaboration - people, departments, systems and companies working together for common business goals. We have seen the importance of collaborative business processes in the emergence of the EAI and B2B marketplaces. The Enterprise Collaboration Architecture (ECA) is a part of the OMG standard EDOC profile for UML. (A profile tailors UML to a particular purpose; key profiles have been standardized by OMG.) ECA shows how to use UML to model collaborative business processes and apply MDA to map these technology-independent business models to executable systems using a variety of technologies such as Web services, .NET, and CORBA. Cory Casanave, a principal author of the Enterprise Collaboration Architecture, will follow the MDA presentation with a one-hour technical tutorial on the EDOC profile, emphasizing ECA.

Many companies market tools today that implement the MDA development process, including generation of code from UML models. Completing our day with MDA, Michael Bertrand of CGI, Inc, and Michel Brassard of Codagen Technologies will present a technical case study of their successful MDA-based development of an application that manages the lottery of a large Canadian province. Based on the MDA, this project realized about 60%-70% code generation from UML models, providing an estimated 25% productivity gain and 30% cost savings. Based on industry standards from start to finish, the project used Rational Rose and the RUP for modeling; Codagen Architect tool for code generation; and WebSphere Studio Application Developer and WebSphere Application Server for development and runtime.

Pre-Reading Materials

Dr. Jon Siegel, OMG's Vice President of Technology Transfer, heads OMG's technology transfer program with the goal of teaching the technical aspects and benefits of the Model Driven Architecture (MDA) based on OMG's modeling specifications UML, the MOF, XMI and CWM. Siegel's scope also includes OMG's industry-standard middleware, the Common Object Request Broker Architecture (CORBA) and the Object Management Architecture (OMA) comprised of the CORBAservices, the CORBAfacilities, and the Domain specifications in vertical markets ranging from healthcare, life sciences, and telecommunications to manufacturing and financial systems. In this capacity, he presents tutorials, seminars, and company briefings around the world, and writes magazine articles and books including the popular "CORBA 3 Fundamentals and Programming" and "Quick CORBA 3". With OMG since 1993, Siegel previously chaired the Domain Technology Committee responsible for OMG specifications in the vertical domains.


Tuesday, November 5, 2002
8:30 am - 4:45 pm 

T9: From Agile Modeling to Agile Data

Scott W. Ambler
President and Senior Consultant
Ronin International, Inc.

The Agile Modeling (AM) methodology attempts to answer many questions, including: How do you successfully model the complexities of modern-day software without getting bogged-down in mountains of paper work?   How do you effectively engineer the requirements for your system?  What techniques can you apply to analyze those requirements?  To architect and design your software?  How can you document your system in an effective manner?

The Agile Data (AD) methodology also attempts to answer many questions, including: How can developers and data professionals work together effectively?  How can enterprise-level people work together effectively with project-level people?  What techniques and technologies are available so that data professionals can work in an iterative and incremental manner, just as the majority of developers prefer to work?  How can data-related concerns be reflected on projects taking an agile approach to development ?

This workshop is a straightforward, easy to understand introduction to the principles and practices of the Agile Modeling (AM) methodology and the philosophies of the Agile Data (AD) method.  A significant portion of the workshop will be the effective application of object-oriented (OO), component-based, and essential modeling techniques for developing requirements, analysis, and design models.  It includes the industry-standard techniques of the Unified Modeling Language (UML) but goes beyond them to be sufficient for the real-world development of modern business applications. While objects and components are often used to develop complex systems, learning how to work with object-oriented techniques does not need to be complicated, nor do you need to develop complex documentation to be successful using them. 

By attending this workshop you will gain a solid understanding of leading-edge modeling techniques, how they fit together, and how they may be applied simply and effectively by project teams following common software processes such as eXtreme Programming (XP) or the Rational Unified Process (RUP).  Techniques such as data refactoring will be covered to show how data professionals can effectively support iterative and incremental development efforts.  Discussion of how data professionals, enterprise architects, and enterprise administrators can support and work on such efforts will also be covered.  A brief overview of agile documentation will also be presented. 

Attendees should be familiar with the writings of the Agile Alliance for this tutorial. Having at least a cursory understanding of the Agile Modeling and Agile Data methodologies would be helpful. Pre-reading references are available on the conference web site.

Modeling techniques applied:

Pre-Reading Materials

Scott W. Ambler is President and a senior consultant of Ronin International, Inc., a software services consulting firm that specializes in software process mentoring, Agile Modeling (AM), and object/component-based software architecture and development.  He is also founder and thought leader of the Agile Modeling (AM) methodology (www.agilemodeling.com) and the Agile Data (AD) method.

Scott is the author of Agile Modeling (2002) and co-author of Mastering EJB 2/e (2002) from John Wiley & Sons.  He is also the author of the books The Elements of UML Style (September 2002), The Object Primer 2nd Edition (2001), Building Object Applications That Work (1997), Process Patterns (1998), and More Process Patterns (1999), and co-author of The Elements of Java Style (2000), all published by Cambridge University Press.  He is co-editor with Larry Constantine of the Unified Process series from CMP books (2000-2002).  Scott is a contributing editor with Software Development magazine, a contributor to IBM DeveloperWorks, and a columnist with Computing Canada. 

Scott’s personal web site is www.ambysoft.com where he has a wide variety of white papers, including the AmbySoft Inc Coding Standards for Java, that are available for free download.  Scott has spoken at a wide variety of international conferences including Software Development, UML World, Object Expo, Java Expo, and Application Development.  Scott graduated from the University of Toronto with a Master of Information Science.


 Return to EDF Home Page