715 research outputs found

    Visually Mining the Datacube using a Pixel-Oriented Technique

    No full text
    International audienceThis paper introduces a new technique easing the navigation and interactive exploration of huge multidimensional datasets. Following the pixel-oriented paradigm, the key ingredients enabling the interactive navigation of extreme volumes of data rely on a set of functions bijectively mapping data elements to screen pixels. The use of the mapping from data elements to pixels constrain the computational complexity for the rendering process to be linear with respect to the number of rendered pixels on the screen as opposed to the dataset size. Our method furthermore allows the implementation of usual information visualization techniques such as zoom and pan, anamorphosis and texturing. As a proof-of-concept, we show how our technique can be adapted to interactively explore the Datacube, turning our approach into an efficient system for visual datamining. We report experiments conducted on a Datacube containing 50 millions of items. To our knowledge, our technique outperforms all existing ones and push the scalability limit close to the billion of elements. Supporting all basic navigation techniques, and being moreover flexible makes it easily reusable for a large number of applications

    Visually Mining the Datacube using a Pixel-Oriented Technique

    No full text
    International audienceThis paper introduces a new technique easing the navigation and interactive exploration of huge multidimensional datasets. Following the pixel-oriented paradigm, the key ingredients enabling the interactive navigation of extreme volumes of data rely on a set of functions bijectively mapping data elements to screen pixels. The use of the mapping from data elements to pixels constrain the computational complexity for the rendering process to be linear with respect to the number of rendered pixels on the screen as opposed to the dataset size. Our method furthermore allows the implementation of usual information visualization techniques such as zoom and pan, anamorphosis and texturing. As a proof-of-concept, we show how our technique can be adapted to interactively explore the Datacube, turning our approach into an efficient system for visual datamining. We report experiments conducted on a Datacube containing 50 millions of items. To our knowledge, our technique outperforms all existing ones and push the scalability limit close to the billion of elements. Supporting all basic navigation techniques, and being moreover flexible makes it easily reusable for a large number of applications

    Mining Event Logs to Support Workflow Resource Allocation

    Full text link
    Workflow technology is widely used to facilitate the business process in enterprise information systems (EIS), and it has the potential to reduce design time, enhance product quality and decrease product cost. However, significant limitations still exist: as an important task in the context of workflow, many present resource allocation operations are still performed manually, which are time-consuming. This paper presents a data mining approach to address the resource allocation problem (RAP) and improve the productivity of workflow resource management. Specifically, an Apriori-like algorithm is used to find the frequent patterns from the event log, and association rules are generated according to predefined resource allocation constraints. Subsequently, a correlation measure named lift is utilized to annotate the negatively correlated resource allocation rules for resource reservation. Finally, the rules are ranked using the confidence measures as resource allocation rules. Comparative experiments are performed using C4.5, SVM, ID3, Na\"ive Bayes and the presented approach, and the results show that the presented approach is effective in both accuracy and candidate resource recommendations.Comment: T. Liu et al., Mining event logs to support workflow resource allocation, Knowl. Based Syst. (2012), http://dx.doi.org/ 10.1016/j.knosys.2012.05.01

    Forecasting in Database Systems

    Get PDF
    Time series forecasting is a fundamental prerequisite for decision-making processes and crucial in a number of domains such as production planning and energy load balancing. In the past, forecasting was often performed by statistical experts in dedicated software environments outside of current database systems. However, forecasts are increasingly required by non-expert users or have to be computed fully automatically without any human intervention. Furthermore, we can observe an ever increasing data volume and the need for accurate and timely forecasts over large multi-dimensional data sets. As most data subject to analysis is stored in database management systems, a rising trend addresses the integration of forecasting inside a DBMS. Yet, many existing approaches follow a black-box style and try to keep changes to the database system as minimal as possible. While such approaches are more general and easier to realize, they miss significant opportunities for improved performance and usability. In this thesis, we introduce a novel approach that seamlessly integrates time series forecasting into a traditional database management system. In contrast to flash-back queries that allow a view on the data in the past, we have developed a Flash-Forward Database System (F2DB) that provides a view on the data in the future. It supports a new query type - a forecast query - that enables forecasting of time series data and is automatically and transparently processed by the core engine of an existing DBMS. We discuss necessary extensions to the parser, optimizer, and executor of a traditional DBMS. We furthermore introduce various optimization techniques for three different types of forecast queries: ad-hoc queries, recurring queries, and continuous queries. First, we ease the expensive model creation step of ad-hoc forecast queries by reducing the amount of processed data with traditional sampling techniques. Second, we decrease the runtime of recurring forecast queries by materializing models in a specialized index structure. However, a large number of time series as well as high model creation and maintenance costs require a careful selection of such models. Therefore, we propose a model configuration advisor that determines a set of forecast models for a given query workload and multi-dimensional data set. Finally, we extend forecast queries with continuous aspects allowing an application to register a query once at our system. As new time series values arrive, we send notifications to the application based on predefined time and accuracy constraints. All of our optimization approaches intend to increase the efficiency of forecast queries while ensuring high forecast accuracy

    A Process Warehouse for Process Variants Analysis

    Get PDF
    Process model variants are collections of similar process models evolved over time because of the adjustments that were made to a particular process in a given domain, \eg ,order-to-cash or procure-to-pay process in reseller or procurement domain. These adjustments produce some variations between these process models that mainly should be identical but may differ slightly. These variations are due to new procedures, law regulations in different countries, variations due to different decision histories and organizational responsibilities and to different requirements for different branches of an enterprise. Existing approaches related to data warehouse solutions suffer from adequately abstracting and consolidating all variants into one generic process model, to provide the possibility to distinguish and compare among different parts of different variants. This shortcoming affects decision making of business analysts for a specific process context. This paper addresses the above shortcoming by proposing a framework to analyse process variants.\\ The framework consists of two original contributions: (i) a novel meta-model of processes as a generic data model to capture and consolidate process variants into a reference process model; (ii) a process warehouse model to perform typical online analytical processing operations on different variation parts thus providing support to decision-making through KPIs; The framework concepts were defined and validated using a real-life case study. Moreover, a prototype is implemented to support the validation of the framework and performance dashboards are provided with detailed statistics at different levels of abstraction

    Dynamic topic herarchies and segmented rankings in textual OLAP technology.

    Get PDF
    Programa de P?s-Gradua??o em Ci?ncia da Computa??o. Departamento de Ci?ncia da Computa??o, Instituto de Ci?ncias Exatas e Biol?gicas, Universidade Federal de Ouro Preto.A tecnologia OLAP tem se consolidado h? 20 anos e recentemente foi redesenhada para que suas dimens?es, hierarquias e medidas possam suportar as particularidades dos dados textuais. A tarefa de organizar dados textuais de forma hier?rquica pode ser resolvida com a constru??o de hierarquias de t?picos. Atualmente, a hierarquia de t?picos ? definida apenas uma vez no cubo de dados, ou seja, para todo o \textit{lattice} de cuboides. No entanto, tal hierarquia ? sens?vel ao conte?do da cole??o de documentos, portanto em um mesmo cubo de dados podem existir c?lulas com conte?dos completamente diferentes, agregando cole??es de documentos distintas, provocando potenciais altera??es na hierarquia de t?picos. Al?m disso, o segmento de texto utilizado na an?lise OLAP tamb?m influencia diretamente nos t?picos elencados por tal hierarquia. Neste trabalho, apresentamos um cubo de dados textual com m?ltiplas e din?micas hierarquias de t?picos. M?ltiplas por serem constru?das a partir de diferentes segmentos de texto e din?micas por serem constru?das para cada c?lula do cubo. Outra contribui??o deste trabalho refere-se ? resposta das consultas multidimensionais. O estado da arte normalmente retorna os top-k documentos mais relevantes para um determinado t?pico. Vamos al?m disso, retornando outros segmentos de texto, como os t?tulos mais significativos, resumos e par?grafos. A abordagem ? projetada em quatro etapas adicionais, onde cada passo atenua um pouco mais o impacto da constru??o de v?rias hierarquias de t?picos e rankings de segmentos por c?lula de cubo. Experimentos que utilizam parte dos documentos da DBLP como uma cole??o de documentos refor?am nossas hip?teses.The OLAP technology emerged 20 years ago and recently has been redesigned so that its dimensions, hierarchies and measures can support the particularities of textual data. Organizing textual data hierarchically can be solved with topic hierarchies. Currently, the topic hierarchy is de ned only once in the data cube, e.g., forthe entire lattice of cubo ids. However, such hierarchy is sensitive to the document collection content. Thus, a data cube cell can contain a collection of documents distinct fromothers in the same cube, causing potential changes in the topic hierarchy. Further more, the text segment used in OLAP analysis also changes this hierarchy. In this work, we present a textual data cube with multiple dynamic topic hierarchies for each cube cell. Multiple hierarchies, since the presented approach builds a topic hierarchy per text segment. Another contribution of this work refers to query response. The state-of-the-art normally returns the top-k documents to the topic selected in the query. We go beyond by returning other text segments, such as the most signi cant titles, abstracts and paragraphs. The approach is designed in four complementary steps and each step attenuates a bit more the impact of building multiple to pic hierarchies and segmented rankings per cube cell. Experiments using part of the DBLP papers as a document collection reinforce our hypotheses

    Integrating data warehouses with web data : a survey

    Get PDF
    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query, and retrieve Web data and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semistructured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources, and the XML extensions of OnLine Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich document collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as to identify open research line

    Business Intelligence on Non-Conventional Data

    Get PDF
    The revolution in digital communications witnessed over the last decade had a significant impact on the world of Business Intelligence (BI). In the big data era, the amount and diversity of data that can be collected and analyzed for the decision-making process transcends the restricted and structured set of internal data that BI systems are conventionally limited to. This thesis investigates the unique challenges imposed by three specific categories of non-conventional data: social data, linked data and schemaless data. Social data comprises the user-generated contents published through websites and social media, which can provide a fresh and timely perception about people’s tastes and opinions. In Social BI (SBI), the analysis focuses on topics, meant as specific concepts of interest within the subject area. In this context, this thesis proposes meta-star, an alternative strategy to the traditional star-schema for modeling hierarchies of topics to enable OLAP analyses. The thesis also presents an architectural framework of a real SBI project and a cross-disciplinary benchmark for SBI. Linked data employ the Resource Description Framework (RDF) to provide a public network of interlinked, structured, cross-domain knowledge. In this context, this thesis proposes an interactive and collaborative approach to build aggregation hierarchies from linked data. Schemaless data refers to the storage of data in NoSQL databases that do not force a predefined schema, but let database instances embed their own local schemata. In this context, this thesis proposes an approach to determine the schema profile of a document-based database; the goal is to facilitate users in a schema-on-read analysis process by understanding the rules that drove the usage of the different schemata. A final and complementary contribution of this thesis is an innovative technique in the field of recommendation systems to overcome user disorientation in the analysis of a large and heterogeneous wealth of data
    • …
    corecore