BODL, Massload, DataLoad : WCS Dataload
Options
In previous releases of WCS, mass load utility was the primary
tool available out of the box to load data into WCS database. In the beta
release of WCS 7, IBM had floated new dataload option known as Data Load
Utility, it is important to know these definitions as BODL is an asset that has
been developed by IBM Software Services for WebSphere to address some of the
shortcomings of id resolver/massload and Data load utility is inline with
the BODL architecture and technically BODL is not very different from Data Load
utility.
Officially IBM still recommends to make use of massload based dataload approach
for less commonly used data and to use data load utility to efficiently load
product, price, and inventory data.
Overview of Dataload Utility
1. The DataReader is a Java component which does the job of reading the source
file, in case of CSV reader you would define the source data structure into a
configuration file and DataReader will use this informtation to load the data.
The Data Reader component implements a next() method which returns one chunk
of data read from a data source. this can definitely not scale up well for high
volume of data as every row of record is a new Object going into your JVM
heap.
From my expierence I believe that the parsing of a source file using a Java
component is always a bad choice for high volume dataloads, compared to this
a SQL*Loader is a high-speed data loading
utility that loads data from external files into tables in an Oracle database.
2. Mediators: Mediators are available out of the box, CatalogMediator
for instance if you are loading catalog table it populates the physical object
of CATALOG table from the catalog logical object.
3. Dataload utility in my opinion is still not a serious contender for
Dataload, as it has very limited support in terms of readers, supports very
limited Business objects, I could not figure out how can one write custom
logic, as it seems to do one to one mapping of source to destination field.
Overview of BODL
·
Mostly an asset of IBM services group, it is freely available
for WCS customers only.
·
BODL is a set of Java files which work very similar to Data load
utility, it has more readers, can be customized and surprisingly it can handle
more components that Data load utility (I would imagine IBM will release all of
the BODL features into Dataload utlity in future versions).
·
Publicly IBM does not provide any documentation or sample code
on BODL, this is available only by request for WCS customers.
Overview of Massload
·
Source data should be converted to XML format (This should be
based on the DTD generated using DTDgen utility
of WCS)
·
Genrerate xml should be Id resolved using idresgen utility.
·
Mass load the idresolved xml file using massloader utility.
·
Can have serious performance issues, in case you are processing
large record sets, this is not the most efficient data load options.
·
Debugging errors is very difficult with this dataload option.
Custom Dataload Options
·
Some of the commercial ETL tools may not be good fit as WCS
primary key generation and translation logic may get very dirty.
·
From my experience Java is not my most preferred language for
high volume data processing and data translation, if you understand the WCS
data model well, you could write your own custom data processing tools using
SQL Loader / PLSQL / Python scripts, which ever technology you use, at the end
of the day you are inserting records into WCS tables using SQL.
Officially IBM still recommends to make use of massload based dataload approach for less commonly used data and to use data load utility to efficiently load product, price, and inventory data.
Overview of Dataload Utility
1. The DataReader is a Java component which does the job of reading the source file, in case of CSV reader you would define the source data structure into a configuration file and DataReader will use this informtation to load the data.
The Data Reader component implements a next() method which returns one chunk of data read from a data source. this can definitely not scale up well for high volume of data as every row of record is a new Object going into your JVM heap.
From my expierence I believe that the parsing of a source file using a Java component is always a bad choice for high volume dataloads, compared to this a SQL*Loader is a high-speed data loading utility that loads data from external files into tables in an Oracle database.
2. Mediators: Mediators are available out of the box, CatalogMediator
for instance if you are loading catalog table it populates the physical object of CATALOG table from the catalog logical object.
3. Dataload utility in my opinion is still not a serious contender for Dataload, as it has very limited support in terms of readers, supports very limited Business objects, I could not figure out how can one write custom logic, as it seems to do one to one mapping of source to destination field.
Overview of Massload
Custom Dataload Options
Good comparison !!
ReplyDeleteThank You and I have a nifty present: Who Repairs House Foundations split level home renovation
ReplyDelete