52 research outputs found

    Easy designing steps of a local data warehouse for possible analytical data processing

    Get PDF
    Data warehouse (DW) are used in local or global level as per usages. Most of the DW was designed for online purposes targeting the multinational firms. Majority of local firms directly purchase such readymade DW applications for their usages. Customization, maintenance and enhancement are very costly for them. To provide fruitful e-services, the Government departments, academic Institutes, firms, Telemedicine firms etc. need a DW of themselves. Lack of electricity and internet facilities, especially in rural areas, does not motivate citizen to use the benefits of e-services. In this digital world, every local firm is interested in having their DW that may support strategic and decision making for the business. This study highlights the basic technical designing steps of a local DW. It gives several possible solutions that may arise during the design of the process of Extraction Transformation and Loading (ETL). It gives detail steps to develop the dimension table, fact table and loading data. Data analytics normally answers business questions and suggest future solutions

    Data mining industry : emerging trends and new opportunities

    Get PDF
    Thesis (M.Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, June 2000."May 2000."Includes bibliographical references (leaves 170-179).by Walter Alberto Aldana.M.Eng

    Business intelligence for sustainable competitive advantage: the case of telecommunications companies in Malaysia

    Get PDF
    The concept of Business Intelligence (BI) as an essential competitive tool has been widely emphasized in the strategic management literature. Yet the sustainability of the firms’ competitive advantage provided by BI capability is not well explained. To fill this gap, this study attempts to develop a model for successful BI deployment and empirically examines the association between BI deployment and sustainable competitive advantage.Taking the telecommunications industry in Malaysia as a case example, the research particularly focuses on the influencing perceptions held by telecommunications decision makers and executives on factors that impact successful BI deployment. The research further investigates the relationship between successful BI deployment and sustainable competitive advantage of the telecommunications organizations. Another important aim of this study is to determine the effect of moderating factors such as organization culture, business strategy and use of BI tools on BI deployment and the sustainability of firm’s competitive advantage.This research uses combination of theoretical foundation of resource-based theory and diffusion of innovation theory to examine BI success and its relationship with firm’s sustainability. The research adopts the positivist paradigm and a two-phase sequential mixed method consisting of qualitative and quantitative approaches are employed. A tentative research model is developed first based on extensive literature review. Qualitative field study then is carried out to fine tune the initial research model. Findings from the qualitative method are also used to develop measures and instruments for the next phase of quantitative method. A survey is carried out with sample of business analysts and decision makers in telecommunications firms and is analyzed by Partial Least Square-based Structural Equation Modeling.The findings revealed that some internal resources of the organizations such as BI governance and the perceptions of BI’s characteristics influence the successful deployment of BI. Organizations that practice good BI governance with strong moral and financial support from upper management will have better chance in realizing their dreams of having successful BI initiatives in place. The scope of BI governance includes providing sufficient support and commitment in BI funding and implementation, laying out proper BI infrastructure and staffing and establishing a corporate-wide policy and procedures regarding BI. The perceptions about the characteristics of BI such as its relative advantage, complexity, compatibility and observability are also significant in ensuring BI success. It thus implied that the executives’ positive perceptions towards BI initiatives are deemed necessary. Moreover, the most important results of this study indicated that with BI successfully deployed, executives would use the knowledge provided for their necessary actions in sustaining the organizations’ competitive advantage in terms of economics, social and environmental issues.The BI model well explained how BI was deployed in Malaysian telecommunications companies. This study thus contributes significantly to the existing literature that will assist future BI researchers especially in achieving sustainable competitive advantage. In particular, the model will help practitioners to consider the resources that they are likely to consider when deploying BI. Finally, the applications of this study can be extended through further adaptation in other industries and various geographic contexts

    A probabilistic multidimensional data model and its applications in business management

    Get PDF
    This dissertation develops a conceptual data model that can efficiently handle huge volumes of data containing uncertainty and are subject to frequent changes. This model can be used to build Decision Support Systems to improve decision-making process. Business intelligence and decision-making in today\u27s business world require extensive use of huge volumes of data. Real world data contain uncertainty and change over time. Business leaders should have access to Decision Support Systems that can efficiently handle voluminous data, uncertainty, and modifications to uncertain data. Database product vendors provide several extensions and features to support these requirements; however, these extensions lack support of standard conceptual models. Standardization generally creates more competition and leads to lower prices and improved standards of living. Results from this study could become a data model standard in the area of applied decisions sciences. The conceptual data model developed in this dissertation uses a mathematical concept based on set theory, probability axioms, and the Bayesian framework. Conceptual data model, algebra to manipulate data, a framework and an algorithm to modify the data are presented. The data modification algorithm is analyzed for time and space efficiency. Formal mathematical proof is provided to support identified properties of model, algebra, and the modification framework. Decision-making ability of this model was investigated using sample data. Advantages of this model and improvements in inventory management through its application are described. Comparison and contrast between this model and Bayesian belief networks are presented. Finally, scope and topics for further research are described

    Pragmatic development of service based real-time change data capture

    Get PDF
    This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources

    Parallel Methods for Mining Frequent Sequential patterns

    Get PDF
    The explosive growth of data and the rapid progress of technology have led to a huge amount of data that is collected every day. In that data volume contains much valuable information. Data mining is the emerging field of applying statistical and artificial intelligence techniques to the problem of finding novel, useful and non-trivial patterns from large databases. It is the task of discovering interesting patterns from large amounts of data. This is achieved by determining both implicit and explicit unidentified patterns in data that can direct the process of decision making. There are many data mining tasks, such as classification, clustering, association rule mining and sequential pattern mining. In that, sequential pattern mining is an important problem in data mining. It provides an effective way to analyze the sequence data. The goal of sequential pattern mining is to discover interesting, unexpected and useful patterns from sequence databases. This task is used in many wide applications such as financial data analysis of banks, retail industry, customer shopping history, goods transportation, consumption and services, telecommunication industry, biological data analysis, scientific applications, network intrusion detection, scientific research, etc. Different types of sequential pattern mining can be performed, they are sequential patterns, maximal sequential patterns, closed sequences, constraint based and time interval based sequential patterns. Sequential pattern mining refers to the identification of frequent subsequences in sequence databases as patterns. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent sequential patterns, in which the downward closure property plays a fundamental role. Sequential pattern is a sequence of itemsets that frequently occur in a specific order, where all items in the same itemsets are supposed to have the same transaction time value. One of the challenges for sequential pattern mining is the computational costs beside that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for sequential pattern mining and develop parallel methods for mining frequent sequential patterns in sequence databases that can tackle emerging data processing workloads while coping with larger and larger scales.The explosive growth of data and the rapid progress of technology have led to a huge amount of data that is collected every day. In that data volume contains much valuable information. Data mining is the emerging field of applying statistical and artificial intelligence techniques to the problem of finding novel, useful and non-trivial patterns from large databases. It is the task of discovering interesting patterns from large amounts of data. This is achieved by determining both implicit and explicit unidentified patterns in data that can direct the process of decision making. There are many data mining tasks, such as classification, clustering, association rule mining and sequential pattern mining. In that, sequential pattern mining is an important problem in data mining. It provides an effective way to analyze the sequence data. The goal of sequential pattern mining is to discover interesting, unexpected and useful patterns from sequence databases. This task is used in many wide applications such as financial data analysis of banks, retail industry, customer shopping history, goods transportation, consumption and services, telecommunication industry, biological data analysis, scientific applications, network intrusion detection, scientific research, etc. Different types of sequential pattern mining can be performed, they are sequential patterns, maximal sequential patterns, closed sequences, constraint based and time interval based sequential patterns. Sequential pattern mining refers to the identification of frequent subsequences in sequence databases as patterns. In the last two decades, researchers have proposed many techniques and algorithms for extracting the frequent sequential patterns, in which the downward closure property plays a fundamental role. Sequential pattern is a sequence of itemsets that frequently occur in a specific order, where all items in the same itemsets are supposed to have the same transaction time value. One of the challenges for sequential pattern mining is the computational costs beside that is the potentially huge number of extracted patterns. In this thesis, we present an overview of the work done for sequential pattern mining and develop parallel methods for mining frequent sequential patterns in sequence databases that can tackle emerging data processing workloads while coping with larger and larger scales.460 - Katedra informatikyvyhově

    CWI Self-evaluation 1999-2004

    Get PDF

    Pragmatic development of service based real-time change data capture

    Get PDF
    This thesis makes a contribution to the Change Data Capture (CDC) field by providing an empirical evaluation on the performance of CDC architectures in the context of realtime data warehousing. CDC is a mechanism for providing data warehouse architectures with fresh data from Online Transaction Processing (OLTP) databases. There are two types of CDC architectures, pull architectures and push architectures. There is exiguous data on the performance of CDC architectures in a real-time environment. Performance data is required to determine the real-time viability of the two architectures. We propose that push CDC architectures are optimal for real-time CDC. However, push CDC architectures are seldom implemented because they are highly intrusive towards existing systems and arduous to maintain. As part of our contribution, we pragmatically develop a service based push CDC solution, which addresses the issues of intrusiveness and maintainability. Our solution uses Data Access Services (DAS) to decouple CDC logic from the applications. A requirement for the DAS is to place minimal overhead on a transaction in an OLTP environment. We synthesize DAS literature and pragmatically develop DAS that eciently execute transactions in an OLTP environment. Essentially we develop effeicient RESTful DAS, which expose Transactions As A Resource (TAAR). We evaluate the TAAR solution and three pull CDC mechanisms in a real-time environment, using the industry recognised TPC-C benchmark. The optimal CDC mechanism in a real-time environment, will capture change data with minimal latency and will have a negligible affect on the database's transactional throughput. Capture latency is the time it takes a CDC mechanism to capture a data change that has been applied to an OLTP database. A standard definition for capture latency and how to measure it does not exist in the field. We create this definition and extend the TPC-C benchmark to make the capture latency measurement. The results from our evaluation show that pull CDC is capable of real-time CDC at low levels of user concurrency. However, as the level of user concurrency scales upwards, pull CDC has a significant impact on the database's transaction rate, which affirms the theory that pull CDC architectures are not viable in a real-time architecture. TAAR CDC on the other hand is capable of real-time CDC, and places a minimal overhead on the transaction rate, although this performance is at the expense of CPU resources.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Sixth Goddard Conference on Mass Storage Systems and Technologies Held in Cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Sixth Goddard Conference on Mass Storage Systems and Technologies which is being held in cooperation with the Fifteenth IEEE Symposium on Mass Storage Systems at the University of Maryland-University College Inn and Conference Center March 23-26, 1998. As one of an ongoing series, this Conference continues to provide a forum for discussion of issues relevant to the management of large volumes of data. The Conference encourages all interested organizations to discuss long term mass storage requirements and experiences in fielding solutions. Emphasis is on current and future practical solutions addressing issues in data management, storage systems and media, data acquisition, long term retention of data, and data distribution. This year's discussion topics include architecture, tape optimization, new technology, performance, standards, site reports, vendor solutions. Tutorials will be available on shared file systems, file system backups, data mining, and the dynamics of obsolescence
    corecore