64,586 research outputs found

    Mining Bad Credit Card Accounts from OLAP and OLTP

    Full text link
    Credit card companies classify accounts as a good or bad based on historical data where a bad account may default on payments in the near future. If an account is classified as a bad account, then further action can be taken to investigate the actual nature of the account and take preventive actions. In addition, marking an account as "good" when it is actually bad, could lead to loss of revenue - and marking an account as "bad" when it is actually good, could lead to loss of business. However, detecting bad credit card accounts in real time from Online Transaction Processing (OLTP) data is challenging due to the volume of data needed to be processed to compute the risk factor. We propose an approach which precomputes and maintains the risk probability of an account based on historical transactions data from offline data or data from a data warehouse. Furthermore, using the most recent OLTP transactional data, risk probability is calculated for the latest transaction and combined with the previously computed risk probability from the data warehouse. If accumulated risk probability crosses a predefined threshold, then the account is treated as a bad account and is flagged for manual verification.Comment: Conference proceedings of ICCDA, 201

    Modular sensory hardware and data processing solution for implementation of the precision beekeeping

    Get PDF
    ArticleFor successful implementation of the Precision Apiculture (Precision Beekeeping) approach, immense amount of bee colony data collection and processing using various hardware and software solutions is needed. This paper presents standalone wireless hardware system for bee colony main parameters monitoring (temperature, weight and sound). Monitoring system is based on Raspberry Pi 3 computer with connected sensors. Power supply is granted by the solar panel for reliable operation in places without constant source for power. For convenient data management cloud based data warehouse (DW) is proposed and developed for ease data storage and analysis. Proposed data warehouse is scalable and extendable and can be used for variety of other ready hardware solutions, using variety of data-in/data-out interfaces. The core of the data warehouse is designed to provide data processing flexibility and versatility, whereas data flow within the core is organized between data vaults in a controllable and reliable way. Our paper presents an approach for linking together hardware for bee colony real-time monitoring with cloud software for data processing and visualisation. Integrating specific algorithms and models to the system will help the beekeepers to remotely identify different states of their colonies, like swarming, brood rearing, death of the colony etc. and inform the beekeepers to make appropriate decisions/actions. This research work is carried out within the SAMS project, which is funded by the European Union within the H2020-ICT-39-2016-2017 call. To find out more visit the project website https://sams-project.eu/

    Striving towards Near Real-Time Data Integration for Data Warehouses

    Full text link
    Abstract. The amount of information available to large-scale enterprises is growing rapidly. While operational systems are designed to meet well-specified (short) response time requirements, the focus of data warehouses is generally the strategic analysis of business data integrated from heterogeneous source systems. The decision making process in traditional data warehouse environments is often delayed because data cannot be propagated from the source system to the data warehouse in time. A real-time data warehouse aims at decreasing the time it takes to make business decisions and tries to attain zero latency between the cause and effect of a business decision. In this paper we present an architecture of an ETL environment for real-time data warehouses, which supports a continual near real-time data propagation. The architecture takes full advantage of existing J2EE (Java 2 Platform, Enterprise Edition) technology and enables the implementation of a distributed, scalable, near real-time ETL environment. Instead of using vendor proprietary ETL (extraction, transformation, loading) solutions, which are often hard to scale and often do not support an optimization of allocated time frames for data extracts, we propose in our approach ETLets (spoken “et-lets”) and Enterprise Java Beans (EJB) for the ETL processing tasks. 1

    Decision making on operational data: a remote approach to distributed data monitoring

    Get PDF
    Information gathering and assimilation is normally performed by data mining tools and Online analytic processing (OLAP) operating on historic data stored in a data warehouse. Data mining and OLAP queries are very complex, access a significant fraction of a database and require significant time and resources to be executed. Therefore, it has been impossible to draw the data analysis benefits in operational data environments. When it comes to analysis of operational (dynamic) data, running complex queries on frequently changing data is next to impossible. The complexity of active data integration increases dramatically in distributed applications which are very common in automated or e-commerce applications. We suggest a remote data analysis approach to find hidden patterns and relationships in distributed operational data, which does not adversely affect routine transaction processing. Distributed data integration on frequently updated data has been performed by analysing SQL commands coming to the distributed databases and aggregating data centrally to produce a real-time view of fast changing data. This approach has been successfully evaluated on data sources for over 30 data sources for hotel properties. This paper presents the performance results of the method, and its comparative study of the state-of-the art data integration techniques. The remote approach to data integration and analysis has been built into a scalable data monitoring system. It demonstrates the ease of application and performance results of operational data integration

    Utilization of NoSQL database for disaster preparedness

    Get PDF
    Ponencias, comunicaciones y pĂłsters presentados en el 17th AGILE Conference on Geographic Information Science "Connecting a Digital Europe through Location and Place", celebrado en la Universitat Jaume I del 3 al 6 de junio de 2014.Nowadays, in the age of big data, geodatabases become more critical with respect to geospatial data volume, variety and capacity. It is required that geodatabases must be capable enough to cope with high stakes of geospatial data service during production, manipulation and publication stages. The concept of NoSQL database has been introduced as a potential alternative solution to existing SQL databases which is supposed to grow more rapidly in the near future. It has the prospective to combine the powerful capability of GIS data processing with an approach of non-relational Data Base Management System (DBMS). This type of data warehouse can potentially accommodate variety of information over the World Wide Web (www) space with different structures into one single geodatabase. MongoDB as one instance of NoSQL database introduces an open source document storage empowered by a replication using data partitioning approach across multiple machines. For the work described in this paper it has been used for the integration of open access geo-information by extracting geospatial information from a near real time earthquake service i.e. Geofon. Geospatial information is extracted from the Geofon uniform resource locator (url) then transferred into documents in MongoDB. This demonstrates the geospatial data integration in order to improve earthquake information contents as well as to enable GIS analysis approach using Python scripting environment in ArcGIS 10 platform. It shows a reliable performance even for handling a relatively big geographical names data from GEOnet Names Service (GNS)

    A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing

    Full text link
    The overwhelmingly increasing amount of stored data has spurred researchers seeking different methods in order to optimally take advantage of it which mostly have faced a response time problem as a result of this enormous size of data. Most of solutions have suggested materialization as a favourite solution. However, such a solution cannot attain Real- Time answers anyhow. In this paper we propose a framework illustrating the barriers and suggested solutions in the way of achieving Real-Time OLAP answers that are significantly used in decision support systems and data warehouses

    High-Level Object Oriented Genetic Programming in Logistic Warehouse Optimization

    Get PDF
    Disertační práce je zaměřena na optimalizaci průběhu pracovních operací v logistických skladech a distribučních centrech. Hlavním cílem je optimalizovat procesy plánování, rozvrhování a odbavování. Jelikož jde o problém patřící do třídy složitosti NP-težký, je výpočetně velmi náročné nalézt optimální řešení. Motivací pro řešení této práce je vyplnění pomyslné mezery mezi metodami zkoumanými na vědecké a akademické půdě a metodami používanými v produkčních komerčních prostředích. Jádro optimalizačního algoritmu je založeno na základě genetického programování řízeného bezkontextovou gramatikou. Hlavním přínosem této práce je a) navrhnout nový optimalizační algoritmus, který respektuje následující optimalizační podmínky: celkový čas zpracování, využití zdrojů, a zahlcení skladových uliček, které může nastat během zpracování úkolů, b) analyzovat historická data z provozu skladu a vyvinout sadu testovacích příkladů, které mohou sloužit jako referenční výsledky pro další výzkum, a dále c) pokusit se předčit stanovené referenční výsledky dosažené kvalifikovaným a trénovaným operačním manažerem jednoho z největších skladů ve střední Evropě.This work is focused on the work-flow optimization in logistic warehouses and distribution centers. The main aim is to optimize process planning, scheduling, and dispatching. The problem is quite accented in recent years. The problem is of NP hard class of problems and where is very computationally demanding to find an optimal solution. The main motivation for solving this problem is to fill the gap between the new optimization methods developed by researchers in academic world and the methods used in business world. The core of the optimization algorithm is built on the genetic programming driven by the context-free grammar. The main contribution of the thesis is a) to propose a new optimization algorithm which respects the makespan, the utilization, and the congestions of aisles which may occur, b) to analyze historical operational data from warehouse and to develop the set of benchmarks which could serve as the reference baseline results for further research, and c) to try outperform the baseline results set by the skilled and trained operational manager of the one of the biggest warehouses in the middle Europe.
    • …
    corecore