7,807 research outputs found

    Integrating E-Commerce and Data Mining: Architecture and Challenges

    Full text link
    We show that the e-commerce domain can provide all the right ingredients for successful data mining and claim that it is a killer domain for data mining. We describe an integrated architecture, based on our expe-rience at Blue Martini Software, for supporting this integration. The architecture can dramatically reduce the pre-processing, cleaning, and data understanding effort often documented to take 80% of the time in knowledge discovery projects. We emphasize the need for data collection at the application server layer (not the web server) in order to support logging of data and metadata that is essential to the discovery process. We describe the data transformation bridges required from the transaction processing systems and customer event streams (e.g., clickstreams) to the data warehouse. We detail the mining workbench, which needs to provide multiple views of the data through reporting, data mining algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200

    Storage Location Assignment Problem: implementation in a warehouse design optimization tool

    Get PDF
    This paper focuses on possible improvements of common practices of warehouse storage management taking cue from Operations Research SLAP (Storage Location Assignment Problem), thus aiming to reach an efficient and organized allocation of products to the warehouse slots. The implementation of a SLAP approach in a tool able to model multiple storage policies will be discussed, with the aim both to reduce the overall required warehouse space - to efficiently allocate produced goods - and to minimize the internal material handling times. The overcome of some of the limits of existing warehousing information management systems modules will be shown, sketching the design of a software tool able to return an organized slot-product allocation. The results of the validation of a prototype on an industrial case are presented, showing the efficiency increase of using the proposed approach with dedicated slot storage policy adoption

    Storage Location Assignment Problem: implementation in a warehouse design optimization tool

    Get PDF
    This paper focuses on possible improvements of common practices of warehouse storage management taking cue from Operations Research SLAP (Storage Location Assignment Problem), thus aiming to reach an efficient and organized allocation of products to the warehouse slots. The implementation of a SLAP approach in a tool able to model multiple storage policies will be discussed, with the aim both to reduce the overall required warehouse space - to efficiently allocate produced goods - and to minimize the internal material handling times. The overcome of some of the limits of existing warehousing information management systems modules will be shown, sketching the design of a software tool able to return an organized slot-product allocation. The results of the validation of a prototype on an industrial case are presented, showing the efficiency increase of using the proposed approach with dedicated slot storage policy adoption

    Multidimensional Modeling

    Get PDF

    Data warehouse automation trick or treat?

    Get PDF
    Data warehousing systems have been around for 25 years playing a crucial role in collecting data and transforming that data into value, allowing users to make decisions based on informed business facts. It is widely accepted that a data warehouse is a critical component to a data-driven enterprise, and it becomes part of the organisationā€™s information systems strategy, with a significant impact on the business. However, after 25 years, building a Data Warehouse is still painful, they are too time-consuming, too expensive and too difficult to change after deployment. Data Warehouse Automation appears with the promise to address the limitations of traditional approaches, turning the data warehouse development from a prolonged effort into an agile one, with gains in efficiency and effectiveness in data warehousing processes. So, is Data Warehouse Automation a Trick or Treat? To answer this question, a case study of a data warehousing architecture using a data warehouse automation tool, called WhereScape, was developed. Also, a survey was made to organisations that are using data warehouse automation tools, in order to understand their motivation in the adoption of this kind of tools in their data warehousing systems. Based on the results of the survey and on the case study, automation in the data warehouses building process is necessary to deliver data warehouse systems faster, and a solution to consider when modernize data warehouse architectures as a way to achieve results faster, keeping costs controlled and reduce risk. Data Warehouse Automation definitely may be a Treat.Os sistemas de armazenamento de dados existem hĆ” 25 anos, desempenhando um papel crucial na recolha de dados e na transformaĆ§Ć£o desses dados em valor, permitindo que os utilizadores tomem decisƵes com base em fatos. Ɖ amplamente aceite, que um data warehouse Ć© um componente crĆ­tico para uma empresa orientada a dados e se torna parte da estratĆ©gia de sistemas de informaĆ§Ć£o da organizaĆ§Ć£o, com um impacto significativo nos negĆ³cios. No entanto, apĆ³s 25 anos, a construĆ§Ć£o de um Data Warehouse ainda Ć© uma tarefa penosa, demora muito tempo, Ć© cara e difĆ­cil de mudar apĆ³s a sua conclusĆ£o. A automaĆ§Ć£o de Data Warehouse aparece com a promessa de endereƧar as limitaƧƵes das abordagens tradicionais, transformando o desenvolvimento da data warehouse de um esforƧo prolongado em um esforƧo Ć”gil, com ganhos de eficiĆŖncia e eficĆ”cia. SerĆ”, a automaĆ§Ć£o de Data Warehouse uma doƧura ou travessura? Foi desenvolvido um estudo de caso de uma arquitetura de data warehousing usando uma ferramenta de automaĆ§Ć£o, designada WhereScape. Foi tambĆ©m conduzido um questionĆ”rio a organizaƧƵes que utilizam ferramentas de automaĆ§Ć£o de data warehouse, para entender sua motivaĆ§Ć£o na adoĆ§Ć£o deste tipo de ferramentas. Com base nos resultados da pesquisa e no estudo de caso, a automaĆ§Ć£o no processo de construĆ§Ć£o de data warehouses, Ć© necessĆ”ria para uma maior agilidade destes sistemas e uma soluĆ§Ć£o a considerar na modernizaĆ§Ć£o destas arquiteturas, pois permitem obter resultados mais rapidamente, mantendo os custos controlados e reduzindo o risco. A automaĆ§Ć£o de data warehouse pode bem vir a ser uma ā€œdoƧuraā€

    Design of dimensional model for clinical data storage and analysis

    Get PDF
    Current research in the field of Life and Medical Sciences is generating chunk of data on daily basis. It has thus become a necessity to find solutions for efficient storage of this data, trying to correlate and extract knowledge from it. Clinical data generated in Hospitals, Clinics & Diagnostics centers is falling under a similar paradigm. Patientā€™s records in various hospitals are increasing at an exponential rate, thus adding to the problem of data management and storage. Major problem being faced corresponding to storage, is the varied dimensionality of the data, ranging from images to numerical form. Therefore there is a need for development of efficient data model which can handle this multi-dimensionality data issue and store the data with historical aspect. For the stated problem lying in faƧade of clinical informatics we propose a clinical dimensional model design which can be used for development of a clinical data mart. The model has been designed keeping in consideration temporal storage of patient's data with respect to all possible clinical parameters which can include both textual and image based data. Availability of said data for each patient can be then used for application of data mining techniques for finding the correlation of all the parameters at the level of individual and population

    The Data Lakehouse: Data Warehousing and More

    Full text link
    Relational Database Management Systems designed for Online Analytical Processing (RDBMS-OLAP) have been foundational to democratizing data and enabling analytical use cases such as business intelligence and reporting for many years. However, RDBMS-OLAP systems present some well-known challenges. They are primarily optimized only for relational workloads, lead to proliferation of data copies which can become unmanageable, and since the data is stored in proprietary formats, it can lead to vendor lock-in, restricting access to engines, tools, and capabilities beyond what the vendor offers. As the demand for data-driven decision making surges, the need for a more robust data architecture to address these challenges becomes ever more critical. Cloud data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but they present their own set of challenges. More recently, organizations have often followed a two-tier architectural approach to take advantage of both these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems. However, this approach brings additional challenges, complexities, and overhead. This paper discusses how a data lakehouse, a new architectural approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake combined, while also providing additional advantages. We take today's data warehousing and break it down into implementation independent components, capabilities, and practices. We then take these aspects and show how a lakehouse architecture satisfies them. Then, we go a step further and discuss what additional capabilities and benefits a lakehouse architecture provides over an RDBMS-OLAP

    Modern technologies for data storage, organization and managing in CRM systems

    Get PDF
    In our study we intend to emphasize the main targeted objectives for the implementation of CRM type platforms. According to these objectives, in order to provide the functionality of CRM platforms, we will make a reference to\ud the prime methods of collecting and organizing information: databases, data warehouses, data centers from Cloud Computing field. As a representative procedure of handling information we will exemplify the OLAP technique which\ud is implemented by means of SQL Server Analysis Service software instrument. Finally, we will try to look over some of the Cloud Computing based CRM platforms and how the OLAP techniques can be applied to them

    Designing and Implementing a Data Warehouse using Dimensional Modeling

    Get PDF
    As a part of the business intelligence activities initiated at the University of New Mexico (UNM) in the O ce of Institutional Analytics, a need for a data warehouse was established. The goal of the data warehouse is to host data related to students, faculty, sta , nance data and research and make it readily available for the purposes of university analytics. In addition, this data warehouse will be used to generate required reports and help the university better analyze student success activities. In order to build real-time reports, it is essential that the massive amounts of transactional data related to university activities be structured in a way that is op- timal for querying and reporting. This transactional data is stored in relational databases in an Operational Data Store (ODS) at UNM. But for reporting purposes, this design currently requires scores of database join operations between relational database views in order to answer even simple questions. Apart from a ecting per- formance, i.e., the time taken to run these reports, development time is also a factor, as it is very di cult to comprehend the complex data models associated with the ODS in order to generate the appropriate queries. Dimensional modeling was employed to address this issue. Dimensional mod- eling was developed by two pioneers in the eld, Bill Inmon and Ralph Kimball. This thesis explores both methods and implements Kimball\u27s method of dimensional modeling leading to a dimensional data mart based on a star schema design that was implemented using a high performance commercial database. In addition, a data integration tool was used for performing extract-transform-load (ETL) operations necessary to develop jobs and design work ows and to automate the loading of data into the data mart. HTML reports were developed from the data mart using a reporting tool and performance was evaluated relative to reports generated directly from the ODS. On average, the reports developed on top of the data mart were at least 65% faster than those generated from directly from the ODS. One of the reason for this is because the number of joins between tables were drastically reduced. Another reason is that in the ODS, reports were built against views which when queried are slower to perform as compared to reports developed against tables
    • ā€¦
    corecore