370 research outputs found

    Keyword-based object search and exploration in multidimensional text databases

    Get PDF
    We propose a novel system TEXplorer that integrates keyword-based object ranking with the aggregation and exploration power of OLAP in a text database with rich structured attributes available, e.g., a product review database. TEXplorer can be implemented within a multi-dimensional text database, where each row is associated with structural dimensions (attributes) and text data (e.g., a document). The system utilizes the text cube data model, where a cell aggregates a set of documents with matching values in a subset of dimensions. Cells in a text cube capture different levels of summarization of the documents, and can represent objects at different conceptual levels. Users query the system by submitting a set of keywords. Instead of returning a ranked list of all the cells, we propose a keyword-based interactive exploration framework that could offer flexible OLAP navigational guides and help users identify the levels and objects they are interested in. A novel significance measure of dimensions is proposed based on the distribution of IR relevance of cells. During each interaction stage, dimensions are ranked according to their significance scores to guide drilling down; and cells in the same cuboids are ranked according to their relevance to guide exploration. We propose efficient algorithms and materialization strategies for ranking top-k dimensions and cells. Finally, extensive experiments on real datasets demonstrate the efficiency and effectiveness of our approach

    Revisting SQL Query Recommender System Using Hierarchical Classification

    Get PDF
    For analytical purposes, lots of data are gathered which are gathered and explored in data warehouses. Even to handle such a large data is a tough task for expert people. For non-expert users or for users who are not familiar with the database schema, handling such a voluminous data is more difficult task. The aim of this paper is to facilitate this class of users by recommending them SQL queries that they may use. By following the users past behavior and comparing them with other users, these SQL recommendations are selected. Initially, users may not know from where they can start their exploration. Secondly, users may overlook queries which help them to retrieve important data. Using hierarchical classification, the queries are recorded and compared which is then re-ranked according to relevance. Using users querying behavior, the relevant queries are retrieved. To issue a series of SQL queries, users use a query interface which aim to analyze the data and mine it for interesting information. DOI: 10.17762/ijritcc2321-8169.150614

    Decision support systems

    Get PDF
    Decision Support Systems (DSS) are a specific class of computerized information system that supports business and organizational decision-making activities. A properly-designed DSS is an interactive software-based system intended to help decision makers compile useful information from raw data, documents, personal knowledge, and/or business models to identify and solve problems and make decisions. DSS belong to an environment with multidisciplinary foundations, including database reasearch, artificial intelligence, human computer interaction, simulation methods, software engineering and telecomunicationdecision support system, decision makers, computer-based

    Evaluation of the Contemporary Issues in Data Mining and Data Warehousing

    Get PDF
    Over the past years data warehousing and data mining tools have evolved from research into a unique and popular business application class for decision support and business intelligence. This paper focuses on presenting the applications of data mining in the business environment. It contains a general overview of data mining, providing a definition of the concept, enumerating six primary data mining techniques and mentioning the main fields for which data mining can be applied. The paper also presents the main business areas which can benefit from the use of data mining tools, along with their use cases: retail, banking and insurance. Also the main commercially available data mining tools and their key features are presented within the paper. Theoretical and empirical literature was reviewed and various gaps in literature were identified. Besides the analysis of data mining and the business areas that can successfully apply it, the paper suggested and concluded that firms and scholars need to carry out more empirical research in the area of integrity of data mining and data warehousing since this will help eliminate marketing errors in operations and practice

    The potential of semantic paradigm in warehousing of big data

    Get PDF
    Big data have analytical potential that was hard to realize with available technologies. After new storage paradigms intended for big data such as NoSQL databases emerged, traditional systems got pushed out of the focus. The current research is focused on their reconciliation on different levels or paradigm replacement. Similarly, the emergence of NoSQL databases has started to push traditional (relational) data warehouses out of the research and even practical focus. Data warehousing is known for the strict modelling process, capturing the essence of the business processes. For that reason, a mere integration to bridge the NoSQL gap is not enough. It is necessary to deal with this issue on a higher abstraction level during the modelling phase. NoSQL databases generally lack clear, unambiguous schema, making the comprehension of their contents difficult and their integration and analysis harder. This motivated involving semantic web technologies to enrich NoSQL database contents by additional meaning and context. This paper reviews the application of semantics in data integration and data warehousing and analyses its potential in integrating NoSQL data and traditional data warehouses with some focus on document stores. Also, it gives a proposal of the future pursuit directions for the big data warehouse modelling phases

    A Survey on Automatically Mining Facets for Web Queries

    Get PDF
    In this paper, a detailed survey on different facet mining techniques, their advantages and disadvantages is carried out. Facets are any word or phrase which summarize an important aspect about the web query. Researchers proposed different efficient techniques which improves the user’s web query search experiences magnificently. Users are happy when they find the relevant information to their query in the top results. The objectives of their research are: (1) To present automated solution to derive the query facets by analyzing the text query; (2) To create taxonomy of query refinement strategies for efficient results; and (3) To personalize search according to user interest

    A conceptual framework and a risk management approach for interoperability between geospatial datacubes

    Get PDF
    De nos jours, nous observons un intĂ©rĂȘt grandissant pour les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es sont dĂ©veloppĂ©es pour faciliter la prise de dĂ©cisions stratĂ©giques des organisations, et plus spĂ©cifiquement lorsqu’il s’agit de donnĂ©es de diffĂ©rentes Ă©poques et de diffĂ©rents niveaux de granularitĂ©. Cependant, les utilisateurs peuvent avoir besoin d’utiliser plusieurs bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ces bases de donnĂ©es peuvent ĂȘtre sĂ©mantiquement hĂ©tĂ©rogĂšnes et caractĂ©risĂ©es par diffĂ©rent degrĂ©s de pertinence par rapport au contexte d’utilisation. RĂ©soudre les problĂšmes sĂ©mantiques liĂ©s Ă  l’hĂ©tĂ©rogĂ©nĂ©itĂ© et Ă  la diffĂ©rence de pertinence d’une maniĂšre transparente aux utilisateurs a Ă©tĂ© l’objectif principal de l’interopĂ©rabilitĂ© au cours des quinze derniĂšres annĂ©es. Dans ce contexte, diffĂ©rentes solutions ont Ă©tĂ© proposĂ©es pour traiter l’interopĂ©rabilitĂ©. Cependant, ces solutions ont adoptĂ© une approche non systĂ©matique. De plus, aucune solution pour rĂ©soudre des problĂšmes sĂ©mantiques spĂ©cifiques liĂ©s Ă  l’interopĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles n’a Ă©tĂ© trouvĂ©e. Dans cette thĂšse, nous supposons qu’il est possible de dĂ©finir une approche qui traite ces problĂšmes sĂ©mantiques pour assurer l’interopĂ©rabilitĂ© entre les bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Ainsi, nous dĂ©finissons tout d’abord l’interopĂ©rabilitĂ© entre ces bases de donnĂ©es. Ensuite, nous dĂ©finissons et classifions les problĂšmes d’hĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique qui peuvent se produire au cours d’une telle interopĂ©rabilitĂ© de diffĂ©rentes bases de donnĂ©es gĂ©ospatiales multidimensionnelles. Afin de rĂ©soudre ces problĂšmes d’hĂ©tĂ©rogĂ©nĂ©itĂ© sĂ©mantique, nous proposons un cadre conceptuel qui se base sur la communication humaine. Dans ce cadre, une communication s’établit entre deux agents systĂšme reprĂ©sentant les bases de donnĂ©es gĂ©ospatiales multidimensionnelles impliquĂ©es dans un processus d’interopĂ©rabilitĂ©. Cette communication vise Ă  Ă©changer de l’information sur le contenu de ces bases. Ensuite, dans l’intention d’aider les agents Ă  prendre des dĂ©cisions appropriĂ©es au cours du processus d’interopĂ©rabilitĂ©, nous Ă©valuons un ensemble d’indicateurs de la qualitĂ© externe (fitness-for-use) des schĂ©mas et du contexte de production (ex., les mĂ©tadonnĂ©es). Finalement, nous mettons en Ɠuvre l’approche afin de montrer sa faisabilitĂ©.Today, we observe wide use of geospatial databases that are implemented in many forms (e.g., transactional centralized systems, distributed databases, multidimensional datacubes). Among those possibilities, the multidimensional datacube is more appropriate to support interactive analysis and to guide the organization’s strategic decisions, especially when different epochs and levels of information granularity are involved. However, one may need to use several geospatial multidimensional datacubes which may be semantically heterogeneous and having different degrees of appropriateness to the context of use. Overcoming the semantic problems related to the semantic heterogeneity and to the difference in the appropriateness to the context of use in a manner that is transparent to users has been the principal aim of interoperability for the last fifteen years. However, in spite of successful initiatives, today's solutions have evolved in a non systematic way. Moreover, no solution has been found to address specific semantic problems related to interoperability between geospatial datacubes. In this thesis, we suppose that it is possible to define an approach that addresses these semantic problems to support interoperability between geospatial datacubes. For that, we first describe interoperability between geospatial datacubes. Then, we define and categorize the semantic heterogeneity problems that may occur during the interoperability process of different geospatial datacubes. In order to resolve semantic heterogeneity between geospatial datacubes, we propose a conceptual framework that is essentially based on human communication. In this framework, software agents representing geospatial datacubes involved in the interoperability process communicate together. Such communication aims at exchanging information about the content of geospatial datacubes. Then, in order to help agents to make appropriate decisions during the interoperability process, we evaluate a set of indicators of the external quality (fitness-for-use) of geospatial datacube schemas and of production context (e.g., metadata). Finally, we implement the proposed approach to show its feasibility

    A model to integrate Data Mining and On-line Analytical Processing: with application to Real Time Process Control

    Get PDF
    Since the widespread use of computers in business and industry, a lot of research has been done on the design of computer systems to support the decision making task. Decision support systems support decision makers in solving unstructured decision problems by providing tools to help understand and analyze decision problems to help make better decisions. ArtiïŹcial intelligence is concerned with creating computer systems that perform tasks that would require intelligence if performed by humans. Much research has focused on using artiïŹcial intelligence to develop decision support systems to provide intelligent decision support. Knowledge discovery from databases, centers around data mining algorithms to discover novel and potentially useful information contained in the large volumes of data that is ubiquitous in contemporary business organizations. Data mining deals with large volumes of data and tries to develop multiple views that the decision maker can use to study this multi-dimensional data. On-line analytical processing (OLAP) provides a mechanism that supports multiple views of multi-dimensional data to facilitate efficient analysis. These two techniques together can provide a powerful mechanism for the analysis of large quantities of data to aid the task of making decisions. This research develops a model for the real time process control of a large manufacturing process using an integrated approach of data mining and on-line analytical processing. Data mining is used to develop models of the process based on the large volumes of the process data. The purpose is to provide prediction and explanatory capability based on the models of the data and to allow for efïŹcient generation of multiple views of the data so as to support analysis on multiple levels. ArtiïŹcial neural networks provide a mechanism for predicting the behavior of nonlinear systems, while decision trees provide a mechanism for the explanation of states of systems given a set of inputs and outputs. OLAP is used to generate multidimensional views of the data and support analysis based on models developed by data mining. The architecture and implementation of the model for real-time process control based on the integration of data mining and OLAP is presented in detail. The model is validated by comparing results obtained from the integrated system, OLAP-only and expert opinion. The system is validated using actual process data and the results of this veriïŹcation are presented. A discussion of the results of the validation of the integrated system and some limitations of this research with discussion on possible future research directions is provided
    • 

    corecore