7,807 research outputs found
Integrating E-Commerce and Data Mining: Architecture and Challenges
We show that the e-commerce domain can provide all the right ingredients for
successful data mining and claim that it is a killer domain for data mining. We
describe an integrated architecture, based on our expe-rience at Blue Martini
Software, for supporting this integration. The architecture can dramatically
reduce the pre-processing, cleaning, and data understanding effort often
documented to take 80% of the time in knowledge discovery projects. We
emphasize the need for data collection at the application server layer (not the
web server) in order to support logging of data and metadata that is essential
to the discovery process. We describe the data transformation bridges required
from the transaction processing systems and customer event streams (e.g.,
clickstreams) to the data warehouse. We detail the mining workbench, which
needs to provide multiple views of the data through reporting, data mining
algorithms, visualization, and OLAP. We con-clude with a set of challenges.Comment: KDD workshop: WebKDD 200
Storage Location Assignment Problem: implementation in a warehouse design optimization tool
This paper focuses on possible improvements of common practices of warehouse storage management taking cue from Operations Research SLAP (Storage Location Assignment Problem), thus aiming to reach an efficient and organized allocation of products to the warehouse slots. The implementation of a SLAP approach in a tool able to model multiple storage policies will be discussed, with the aim both to reduce the overall required warehouse space - to efficiently allocate produced goods - and to minimize the internal material handling times. The overcome of some of the limits of existing warehousing information management systems modules will be shown, sketching the design of a software tool able to return an organized slot-product allocation. The results of the validation of a prototype on an industrial case are presented, showing the efficiency increase of using the proposed approach with dedicated slot storage policy adoption
Storage Location Assignment Problem: implementation in a warehouse design optimization tool
This paper focuses on possible improvements of common practices of warehouse storage management taking cue from Operations Research SLAP (Storage Location Assignment Problem), thus aiming to reach an efficient and organized allocation of products to the warehouse slots. The implementation of a SLAP approach in a tool able to model multiple storage policies will be discussed, with the aim both to reduce the overall required warehouse space - to efficiently allocate produced goods - and to minimize the internal material handling times. The overcome of some of the limits of existing warehousing information management systems modules will be shown, sketching the design of a software tool able to return an organized slot-product allocation. The results of the validation of a prototype on an industrial case are presented, showing the efficiency increase of using the proposed approach with dedicated slot storage policy adoption
Data warehouse automation trick or treat?
Data warehousing systems have been around for 25 years playing a crucial role in
collecting data and transforming that data into value, allowing users to make decisions
based on informed business facts. It is widely accepted that a data warehouse is a critical
component to a data-driven enterprise, and it becomes part of the organisationās
information systems strategy, with a significant impact on the business. However, after
25 years, building a Data Warehouse is still painful, they are too time-consuming, too
expensive and too difficult to change after deployment.
Data Warehouse Automation appears with the promise to address the limitations of
traditional approaches, turning the data warehouse development from a prolonged effort
into an agile one, with gains in efficiency and effectiveness in data warehousing
processes. So, is Data Warehouse Automation a Trick or Treat?
To answer this question, a case study of a data warehousing architecture using a data
warehouse automation tool, called WhereScape, was developed. Also, a survey was made
to organisations that are using data warehouse automation tools, in order to understand
their motivation in the adoption of this kind of tools in their data warehousing systems.
Based on the results of the survey and on the case study, automation in the data
warehouses building process is necessary to deliver data warehouse systems faster, and a
solution to consider when modernize data warehouse architectures as a way to achieve
results faster, keeping costs controlled and reduce risk. Data Warehouse Automation
definitely may be a Treat.Os sistemas de armazenamento de dados existem hĆ” 25 anos, desempenhando um
papel crucial na recolha de dados e na transformaĆ§Ć£o desses dados em valor, permitindo
que os utilizadores tomem decisƵes com base em fatos. Ć amplamente aceite, que um data
warehouse Ć© um componente crĆtico para uma empresa orientada a dados e se torna parte
da estratĆ©gia de sistemas de informaĆ§Ć£o da organizaĆ§Ć£o, com um impacto significativo
nos negĆ³cios. No entanto, apĆ³s 25 anos, a construĆ§Ć£o de um Data Warehouse ainda Ć© uma
tarefa penosa, demora muito tempo, Ć© cara e difĆcil de mudar apĆ³s a sua conclusĆ£o.
A automaĆ§Ć£o de Data Warehouse aparece com a promessa de endereƧar as limitaƧƵes
das abordagens tradicionais, transformando o desenvolvimento da data warehouse de um
esforƧo prolongado em um esforƧo Ć”gil, com ganhos de eficiĆŖncia e eficĆ”cia. SerĆ”, a
automaĆ§Ć£o de Data Warehouse uma doƧura ou travessura?
Foi desenvolvido um estudo de caso de uma arquitetura de data warehousing usando
uma ferramenta de automaĆ§Ć£o, designada WhereScape. Foi tambĆ©m conduzido um
questionĆ”rio a organizaƧƵes que utilizam ferramentas de automaĆ§Ć£o de data warehouse,
para entender sua motivaĆ§Ć£o na adoĆ§Ć£o deste tipo de ferramentas.
Com base nos resultados da pesquisa e no estudo de caso, a automaĆ§Ć£o no processo de
construĆ§Ć£o de data warehouses, Ć© necessĆ”ria para uma maior agilidade destes sistemas e
uma soluĆ§Ć£o a considerar na modernizaĆ§Ć£o destas arquiteturas, pois permitem obter
resultados mais rapidamente, mantendo os custos controlados e reduzindo o risco. A
automaĆ§Ć£o de data warehouse pode bem vir a ser uma ādoƧuraā
Design of dimensional model for clinical data storage and analysis
Current research in the field of Life and Medical Sciences is generating chunk of data on daily basis. It has thus become a necessity to find solutions for efficient storage of this data, trying to correlate and extract knowledge from it. Clinical data generated in Hospitals, Clinics & Diagnostics centers is falling under a similar paradigm. Patientās records in various hospitals are increasing at an exponential rate, thus adding to the problem of data management and storage. Major problem being faced corresponding to storage, is the varied dimensionality of the data, ranging from images to numerical form. Therefore there is a need for development of efficient data model which can handle this multi-dimensionality data issue and store the data with historical aspect.
For the stated problem lying in faƧade of clinical informatics we propose a clinical dimensional model design which can be used for development of a clinical data mart. The model has been designed keeping in consideration temporal storage of patient's data with respect to all possible clinical parameters which can include both textual and image based data. Availability of said data for each patient can be then used for application of data mining techniques for finding the correlation of all the parameters at the level of individual and population
The Data Lakehouse: Data Warehousing and More
Relational Database Management Systems designed for Online Analytical
Processing (RDBMS-OLAP) have been foundational to democratizing data and
enabling analytical use cases such as business intelligence and reporting for
many years. However, RDBMS-OLAP systems present some well-known challenges.
They are primarily optimized only for relational workloads, lead to
proliferation of data copies which can become unmanageable, and since the data
is stored in proprietary formats, it can lead to vendor lock-in, restricting
access to engines, tools, and capabilities beyond what the vendor offers. As
the demand for data-driven decision making surges, the need for a more robust
data architecture to address these challenges becomes ever more critical. Cloud
data lakes have addressed some of the shortcomings of RDBMS-OLAP systems, but
they present their own set of challenges. More recently, organizations have
often followed a two-tier architectural approach to take advantage of both
these platforms, leveraging both cloud data lakes and RDBMS-OLAP systems.
However, this approach brings additional challenges, complexities, and
overhead. This paper discusses how a data lakehouse, a new architectural
approach, achieves the same benefits of an RDBMS-OLAP and cloud data lake
combined, while also providing additional advantages. We take today's data
warehousing and break it down into implementation independent components,
capabilities, and practices. We then take these aspects and show how a
lakehouse architecture satisfies them. Then, we go a step further and discuss
what additional capabilities and benefits a lakehouse architecture provides
over an RDBMS-OLAP
Modern technologies for data storage, organization and managing in CRM systems
In our study we intend to emphasize the main targeted objectives for the implementation of CRM type platforms. According to these objectives, in order to provide the functionality of CRM platforms, we will make a reference to\ud
the prime methods of collecting and organizing information: databases, data warehouses, data centers from Cloud Computing field. As a representative procedure of handling information we will exemplify the OLAP technique which\ud
is implemented by means of SQL Server Analysis Service software instrument. Finally, we will try to look over some of the Cloud Computing based CRM platforms and how the OLAP techniques can be applied to them
Designing and Implementing a Data Warehouse using Dimensional Modeling
As a part of the business intelligence activities initiated at the University of New Mexico (UNM) in the O ce of Institutional Analytics, a need for a data warehouse was established. The goal of the data warehouse is to host data related to students, faculty, sta , nance data and research and make it readily available for the purposes of university analytics. In addition, this data warehouse will be used to generate required reports and help the university better analyze student success activities. In order to build real-time reports, it is essential that the massive amounts of transactional data related to university activities be structured in a way that is op- timal for querying and reporting. This transactional data is stored in relational databases in an Operational Data Store (ODS) at UNM. But for reporting purposes, this design currently requires scores of database join operations between relational database views in order to answer even simple questions. Apart from a ecting per- formance, i.e., the time taken to run these reports, development time is also a factor, as it is very di cult to comprehend the complex data models associated with the ODS in order to generate the appropriate queries. Dimensional modeling was employed to address this issue. Dimensional mod- eling was developed by two pioneers in the eld, Bill Inmon and Ralph Kimball. This thesis explores both methods and implements Kimball\u27s method of dimensional modeling leading to a dimensional data mart based on a star schema design that was implemented using a high performance commercial database. In addition, a data integration tool was used for performing extract-transform-load (ETL) operations necessary to develop jobs and design work ows and to automate the loading of data into the data mart. HTML reports were developed from the data mart using a reporting tool and performance was evaluated relative to reports generated directly from the ODS. On average, the reports developed on top of the data mart were at least 65% faster than those generated from directly from the ODS. One of the reason for this is because the number of joins between tables were drastically reduced. Another reason is that in the ODS, reports were built against views which when queried are slower to perform as compared to reports developed against tables
- ā¦