64,586 research outputs found
Mining Bad Credit Card Accounts from OLAP and OLTP
Credit card companies classify accounts as a good or bad based on historical
data where a bad account may default on payments in the near future. If an
account is classified as a bad account, then further action can be taken to
investigate the actual nature of the account and take preventive actions. In
addition, marking an account as "good" when it is actually bad, could lead to
loss of revenue - and marking an account as "bad" when it is actually good,
could lead to loss of business. However, detecting bad credit card accounts in
real time from Online Transaction Processing (OLTP) data is challenging due to
the volume of data needed to be processed to compute the risk factor. We
propose an approach which precomputes and maintains the risk probability of an
account based on historical transactions data from offline data or data from a
data warehouse. Furthermore, using the most recent OLTP transactional data,
risk probability is calculated for the latest transaction and combined with the
previously computed risk probability from the data warehouse. If accumulated
risk probability crosses a predefined threshold, then the account is treated as
a bad account and is flagged for manual verification.Comment: Conference proceedings of ICCDA, 201
Modular sensory hardware and data processing solution for implementation of the precision beekeeping
ArticleFor successful implementation of the Precision Apiculture (Precision Beekeeping)
approach, immense amount of bee colony data collection and processing using various hardware
and software solutions is needed. This paper presents standalone wireless hardware system for
bee colony main parameters monitoring (temperature, weight and sound). Monitoring system is
based on Raspberry Pi 3 computer with connected sensors. Power supply is granted by the solar
panel for reliable operation in places without constant source for power. For convenient data
management cloud based data warehouse (DW) is proposed and developed for ease data storage
and analysis. Proposed data warehouse is scalable and extendable and can be used for variety of
other ready hardware solutions, using variety of data-in/data-out interfaces. The core of the data
warehouse is designed to provide data processing flexibility and versatility, whereas data flow
within the core is organized between data vaults in a controllable and reliable way. Our paper
presents an approach for linking together hardware for bee colony real-time monitoring with
cloud software for data processing and visualisation. Integrating specific algorithms and models
to the system will help the beekeepers to remotely identify different states of their colonies, like
swarming, brood rearing, death of the colony etc. and inform the beekeepers to make appropriate
decisions/actions. This research work is carried out within the SAMS project, which is funded by
the European Union within the H2020-ICT-39-2016-2017 call. To find out more visit the project
website https://sams-project.eu/
Striving towards Near Real-Time Data Integration for Data Warehouses
Abstract. The amount of information available to large-scale enterprises is growing rapidly. While operational systems are designed to meet well-specified (short) response time requirements, the focus of data warehouses is generally the strategic analysis of business data integrated from heterogeneous source systems. The decision making process in traditional data warehouse environments is often delayed because data cannot be propagated from the source system to the data warehouse in time. A real-time data warehouse aims at decreasing the time it takes to make business decisions and tries to attain zero latency between the cause and effect of a business decision. In this paper we present an architecture of an ETL environment for real-time data warehouses, which supports a continual near real-time data propagation. The architecture takes full advantage of existing J2EE (Java 2 Platform, Enterprise Edition) technology and enables the implementation of a distributed, scalable, near real-time ETL environment. Instead of using vendor proprietary ETL (extraction, transformation, loading) solutions, which are often hard to scale and often do not support an optimization of allocated time frames for data extracts, we propose in our approach ETLets (spoken “et-lets”) and Enterprise Java Beans (EJB) for the ETL processing tasks. 1
Decision making on operational data: a remote approach to distributed data monitoring
Information gathering and assimilation is normally performed by data mining tools and Online analytic processing (OLAP) operating on historic data stored in a data warehouse. Data mining and OLAP queries are very complex, access a significant fraction of a database and require significant time and resources to be executed. Therefore, it has been impossible to draw the data analysis benefits in operational data environments. When it comes to analysis of operational (dynamic) data, running complex queries on frequently changing data is next to impossible. The complexity of active data integration increases dramatically in distributed applications which are very common in automated or e-commerce applications.
We suggest a remote data analysis approach to find hidden patterns and relationships in distributed operational data, which does not adversely affect routine transaction processing. Distributed data integration on frequently updated data has been performed by analysing SQL commands coming to the distributed databases and aggregating data centrally to produce a real-time view of fast changing data. This approach has been successfully evaluated on data sources for over 30 data sources for hotel properties. This paper presents the performance results of the method, and its comparative study of the state-of-the art data integration techniques. The remote approach to data integration and analysis has been built into a scalable data monitoring system. It demonstrates the ease of application and performance results of operational data integration
Utilization of NoSQL database for disaster preparedness
Ponencias, comunicaciones y pĂłsters presentados en el 17th AGILE Conference on Geographic Information Science
"Connecting a Digital Europe through Location and Place", celebrado en la Universitat Jaume I del 3 al 6 de junio de 2014.Nowadays, in the age of big data, geodatabases become more critical with respect to geospatial data volume, variety and capacity. It is required that geodatabases must be capable enough to cope with high stakes of geospatial data service during production, manipulation and publication stages.
The concept of NoSQL database has been introduced as a potential alternative solution to existing SQL databases which is supposed to grow more rapidly in the near future. It has the prospective to combine the powerful capability of GIS data processing with an approach of non-relational Data Base Management System (DBMS). This type of data warehouse can potentially accommodate variety of information over the World Wide Web (www) space with different structures into one single geodatabase. MongoDB as one instance of NoSQL database introduces an open source document storage empowered by a replication using data partitioning approach across multiple machines.
For the work described in this paper it has been used for the integration of open access geo-information by extracting geospatial information from a near real time earthquake service i.e. Geofon. Geospatial information is extracted from the Geofon uniform resource locator (url) then transferred into documents in MongoDB. This demonstrates the geospatial data integration in order to improve earthquake information contents as well as to enable GIS analysis approach using Python scripting environment in ArcGIS 10 platform. It shows a reliable performance even for handling a relatively big geographical names data from GEOnet Names Service (GNS)
A Framework for Developing Real-Time OLAP algorithm using Multi-core processing and GPU: Heterogeneous Computing
The overwhelmingly increasing amount of stored data has spurred researchers
seeking different methods in order to optimally take advantage of it which
mostly have faced a response time problem as a result of this enormous size of
data. Most of solutions have suggested materialization as a favourite solution.
However, such a solution cannot attain Real- Time answers anyhow. In this paper
we propose a framework illustrating the barriers and suggested solutions in the
way of achieving Real-Time OLAP answers that are significantly used in decision
support systems and data warehouses
High-Level Object Oriented Genetic Programming in Logistic Warehouse Optimization
DisertaÄŤnĂ práce je zaměřena na optimalizaci prĹŻbÄ›hu pracovnĂch operacĂ v logistickĂ˝ch skladech a distribuÄŤnĂch centrech. HlavnĂm cĂlem je optimalizovat procesy plánovánĂ, rozvrhovánĂ a odbavovánĂ. JelikoĹľ jde o problĂ©m patĹ™ĂcĂ do tĹ™Ădy sloĹľitosti NP-teĹľkĂ˝, je vĂ˝poÄŤetnÄ› velmi nároÄŤnĂ© nalĂ©zt optimálnĂ Ĺ™ešenĂ. MotivacĂ pro Ĺ™ešenĂ tĂ©to práce je vyplnÄ›nĂ pomyslnĂ© mezery mezi metodami zkoumanĂ˝mi na vÄ›deckĂ© a akademickĂ© pĹŻdÄ› a metodami pouĹľĂvanĂ˝mi v produkÄŤnĂch komerÄŤnĂch prostĹ™edĂch. Jádro optimalizaÄŤnĂho algoritmu je zaloĹľeno na základÄ› genetickĂ©ho programovánĂ Ĺ™ĂzenĂ©ho bezkontextovou gramatikou. HlavnĂm pĹ™Ănosem tĂ©to práce je a) navrhnout novĂ˝ optimalizaÄŤnĂ algoritmus, kterĂ˝ respektuje následujĂcĂ optimalizaÄŤnĂ podmĂnky: celkovĂ˝ ÄŤas zpracovánĂ, vyuĹľitĂ zdrojĹŻ, a zahlcenĂ skladovĂ˝ch uliÄŤek, kterĂ© mĹŻĹľe nastat bÄ›hem zpracovánĂ ĂşkolĹŻ, b) analyzovat historická data z provozu skladu a vyvinout sadu testovacĂch pĹ™ĂkladĹŻ, kterĂ© mohou slouĹľit jako referenÄŤnĂ vĂ˝sledky pro dalšà vĂ˝zkum, a dále c) pokusit se pĹ™edÄŤit stanovenĂ© referenÄŤnĂ vĂ˝sledky dosaĹľenĂ© kvalifikovanĂ˝m a trĂ©novanĂ˝m operaÄŤnĂm manaĹľerem jednoho z nejvÄ›tšĂch skladĹŻ ve stĹ™ednĂ EvropÄ›.This work is focused on the work-flow optimization in logistic warehouses and distribution centers. The main aim is to optimize process planning, scheduling, and dispatching. The problem is quite accented in recent years. The problem is of NP hard class of problems and where is very computationally demanding to find an optimal solution. The main motivation for solving this problem is to fill the gap between the new optimization methods developed by researchers in academic world and the methods used in business world. The core of the optimization algorithm is built on the genetic programming driven by the context-free grammar. The main contribution of the thesis is a) to propose a new optimization algorithm which respects the makespan, the utilization, and the congestions of aisles which may occur, b) to analyze historical operational data from warehouse and to develop the set of benchmarks which could serve as the reference baseline results for further research, and c) to try outperform the baseline results set by the skilled and trained operational manager of the one of the biggest warehouses in the middle Europe.
- …