Extract, Transform, Load (ETL) is an integral part of Data Warehousing (DW)
implementation. The commercial tools that are used for this purpose captures
lot of execution trace in form of various log files with plethora of
information. However there has been hardly any initiative where any proactive
analyses have been done on the ETL logs to improve their efficiency. In this
paper we utilize outlier detection technique to find the processes varying most
from the group in terms of execution trace. As our experiment was carried on
actual production processes, any outlier we would consider as a signal rather
than a noise. To identify the input parameters for the outlier detection
algorithm we employ a survey among developer community with varied mix of
experience and expertise. We use simple text parsing to extract these features
from the logs, as shortlisted from the survey. Subsequently we applied outlier
detection technique (Clustering based) on the logs. By this process we reduced
our domain of detailed analysis from 500 logs to 44 logs (8 Percentage). Among
the 5 outlier cluster, 2 of them are genuine concern, while the other 3 figure
out because of the huge number of rows involved.Comment: 2011 3rd International Conference on Electronics Computer Technology
(ICECT 2011