33,191 research outputs found
An Open Source Based Data Warehouse Architecture to Support Decision Making in the Tourism Sector
In this paper an alternative Tourism oriented Data Warehousing architecture is proposed which makes use of the most recent free and open source technologies like Java, Postgresql and XML. Such architecture's aim will be to support the decision making process and giving an integrated view of the whole Tourism reality in an established context (local, regional, national, etc.) without requesting big investments for getting the necessary software.Tourism, Data warehousing architecture
Data Warehousing Modernization: Big Data Technology Implementation
Considering the challenges posed by Big Data, the cost to scale traditional data warehouses is high and the performances would be inadequate to meet the growing needs of the volume, variety and velocity of data. The Hadoop ecosystem answers both of the shortcomings. Hadoop has the ability to store and analyze large data sets in parallel on a distributed environment but cannot replace the existing data warehouses and RDBMS systems due to its own limitations explained in this paper. In this paper, I identify the reasons why many enterprises fail and struggle to adapt to Big Data technologies. A brief outline of two different technologies to handle Big Data will be presented in this paper: Using IBMâs Pure Data system for analytics (Netezza) usually used in reporting, and Hadoop with Hive which is used in analytics. Also, this paper covers the Enterprise architecture consisting of Hadoop that successful companies are adapting to analyze, filter, process, and store the data running along a massively parallel processing data warehouse. Despite, having the technology to support and process Big Data, industries are still struggling to meet their goals due to the lack of skilled personnel to study and analyze the data, in short data scientists and data statisticians
Building data warehouses in the era of big data: an approach for scalable and flexible big data warehouses
During the last few years, the concept of Big Data Warehousing gained significant attention from the scientific community, highlighting the need to make design changes to the traditional Data Warehouse (DW) due to its limitations, in order to achieve new characteristics relevant in Big Data contexts (e.g., scalability on commodity hardware, real-time performance, and flexible storage). The state-of-the-art in Big Data Warehousing reflects the young age of the concept, as well as ambiguity and the lack of common approaches to build Big Data Warehouses (BDWs). Consequently, an approach to design and implement these complex systems is of major relevance to business analytics researchers and practitioners. In this tutorial, the design and implementation of BDWs is targeted, in order to present a general approach that researchers and practitioners can follow in their Big Data Warehousing projects, exploring several demonstration cases focusing on system design and data modelling examples in areas like smart cities, retail, finance, manufacturing, among others
Big Data Harmonization â Challenges and Applications
As data grow, need for big data solution gets increased day by day. Concept of data harmonization exist since two decades. As data is to be collected from various heterogeneous sources and techniques of data harmonization allow them to be in a single format at same place it is also called data warehouse. Lot of advancement occurred to analyses historical data by using data warehousing. Innovations uncover the challenges and problems faced by data warehousing every now and then. When the volume and variety of data gets increased exponentially, existing tools might not support the OLAP operations by traditional warehouse approach. In this paper we tried to focus on the research being done in the field of big data warehouse category wise. Research issues and proposed approaches on various kind of dataset is shown. Challenges and advantages of using data warehouse before data mining task are also explained in detail
Enhancing Big Data Warehousing and Analytics for Spatio-Temporal Massive Data
The increasing amount of data generated by earth observation missions like Copernicus, NASA Earth Data, and climate stations is overwhelming. Every day, terabytes of data are collected from these resources for different environment applications. Thus, this massive amount of data should be effectively managed and processed to support decision-makers. In this paper, we propose an information system-based on a low latency spatio-temporal data warehouse which aims to improve drought monitoring analytics and to support the decision-making process. The proposed framework consists of 4 main modules: (1) data collection, (2) data preprocessing, (3) data loading and storage, and (4) the visualization and interpretation module. The used data are multi-source and heterogeneous collected from various sources like remote sensing sensors, biophysical sensors, and climate sensors. Hence, this allows us to study drought in different dimensions. Experiments were carried out on a real case of drought monitoring in China between 2000 and 2020
Transaction Method of Warehouse Sharing Platform Based on Blockchain Technology
With the continuous development of big data and blockchain technology, there are more applications of warehousing sharing platform, and warehousing transaction method has become the research focus. The original barter method can not solve the problem of accurate warehousing transactions, and the calculation accuracy of warehousing transactions is poor. Therefore, this paper proposes a warehouse transaction model based on blockchain technology, and comprehensively analyzes the form and accuracy of warehouse transactions. Firstly, the warehouse trading platform is used to count the transaction data and transaction methods, and the transaction forms and results are judged according to the warehouse characteristics, and irrelevant transaction information is abandoned. Then, according to the change rate of transaction data and transaction mode, the results are calculated, and compared with the actual transaction situation, and the parameters and indicators of transaction calculation are adjusted. MATLAB simulation test analysis shows that blockchain calculation method can improve the accuracy of warehousing transactions, and the accuracy rate reaches 95.3%. According to different transaction contents, the platform and form are judged, and the transaction time is calculated. It is found that the blockchain calculation method can meet the needs of warehousing transactions
Ontology based data warehousing for mining of heterogeneous and multidimensional data sources
Heterogeneous and multidimensional big-data sources are virtually prevalent in all business environments. System and data analysts are unable to fast-track and access big-data sources. A robust and versatile data warehousing system is developed, integrating domain ontologies from multidimensional data sources. For example, petroleum digital ecosystems and digital oil field solutions, derived from big-data petroleum (information) systems, are in increasing demand in multibillion dollar resource businesses worldwide. This work is recognized by Industrial Electronic Society of IEEE and appeared in more than 50 international conference proceedings and journals
Comparison of Data Warehousing and Big Data Principles from an Economic and Technical Standpoint and Their Applicability to Natural Gas Remote Readout Systems
In natural gas remote reading, a large amount of data is collected, posing a problem of storing and processing such data to the companies involved. Two major technologies have recently appeared, becoming a de facto standard in processing large amounts of data, i.e., data warehousing and big data. Each of these technologies provides different data processing techniques. In this paper, serial data processing and parallel data processing are considered in data warehousing and big data, respectively. The paper analyzes the feasibility of implementing new technologies for processing a large amount of data generated by remote reading of natural gas consumption. The research conducted in this paper was made in collaboration with a local natural gas distribution company. A comparison of potential software vendors has shown that Qlik offers the best software package for the requirements provided by the local natural gas distribution company. Comparison results have also shown that other potential vendors also offer software packages of good quality
An Efficient Supply Chain Data Warehousing Model For Big Data Analytics
This research work is aimed at developing a supply chain data warehousing model for big data analytics that will be used for reporting and analysis purposes. Objected-Oriented Design methodology was adopted for the study. A big data supply chain dataset of a retail outlet from a real world business transaction was used for data analysis. Google storage bucket was created in Google BigQuery for storage and analysis of the data. Data was uploaded into Google storage in the cloud, after which the supply chain data table was created using SQL query. Star Schema dimensional model was created for integrating data into the cloud. For descriptive and diagnostic analytics including feature engineering, the integrated datasets, advanced feature engineering techniques were applied to create derived variables that enhanced the model interpretability and predictive power. Google big Query was linked to Google collab for big data analytics, after which a preliminary analysis was conducted in Google collab showing the first row of the dataset. There was then a decomposition of the time series analysis into trend, seasonality, residuals and original. To perform predictive analytics, the processed dataset was split into training and test datasets to prevent over-fitting. To optimize the model performance, the hyperparameters were adjusted. The forecasting model was implemented within the dashboard using ARIMA and Prophet time series forecasting methods in training the models; and Random forest regression machine learning model in order to implement the most important features that drives sales as well as demand. MAPE and RMSE were used as model evaluation metrics for the predictive analytics of the proposed model. After cross validation of the performance metrics, the study revealed that incorporating advanced Prophet, ARIMA and Random Forest models enhanced the predictive capabilities of the proposed system, leading to more precise inventory management. In conclusion, the proposed system offers better improvements with respect to reliability, performance, scalability, and recoverability because it is designed to handle complex, large scale data operations which are very crucial in modern business environments. The proposed supply chain data warehousing model for big data analytics is highly recommended for supply chain management/managers in inventory management, as the model will help in optimizing the inventory levels as well as improving the supply chain business. Keywords: Supply Chain, Data Warehousing, Big Data, Analytic
- âŠ