21 research outputs found

    A framework for an Integrated Mining of Heterogeneous data in decision support systems

    Get PDF
    The volume of information available on the Internet and corporate intranets continues to increase along with the corresponding increase in the data (structured and unstructured) stored by many organizations. Over the past years, data mining techniques have been used to explore large volume of data (structured) in order to discover knowledge, often in form of a decision support system. For effective decision making, there is need to discover knowledge from both structured and unstructured data for completeness and comprehensiveness. The aim of this paper is to present a framework to discover this kind of knowledge and to present a report on the work-in-progress on an on going research work. The proposed framework is composed of three basic phases: extraction and integration, data mining and finally the relevance of such a system to the business decision support system. In the first phase, both the structured and unstructured data are combined to form an XML database (combined data warehouse (CDW)). Efficiency is enhanced by clustering of unstructured data (documents) using SOM (Self Organized Maps) clustering algorithm, extracting keyphrases based on training and TF/IDF (Term Frequency/Inverse Document Frequency) by using the KEA (Keyphrases Extraction Algorithm) toolkit. In the second phase, association rule mining technique is applied to discover knowledge from the combined data warehouse. The final phase reflects the changes that such a system will bring about to the marketing decision support system. The paper also describes a developed system which evaluates the association rules mined from structured data that forms the first phase of the research work. The proposed system is expected to improve the quality of decisions, and this will be evaluated by using standard metrics for evaluating the interestingness of association rule which is based on statistical independence and correlation analysis

    Making accident data compatible with ITS-based traffic management: Turkish case

    Get PDF
    One of the most important reasons of the high rate of accidents would largely lend itself to ineffective data collection and evaluation process since the necessary information cannot be obtained effectively from the traffic accidents reports (TAR). The discord and dealing with non-relevant data may appear at four levels: (1) Country and Cultural, (2) Institutional and organizational, (3) Data collection, (4) Data analysis and Evaluation. The case findings are consistent with this knowledge put forward in the literature; there is a transparency problem in coordination between the institutions as well as the inefficient TAR data, which is open to manipulation; the problem of under-reporting and inappropriate data storage prevails before the false statistical evaluation methods. The old-fashioned data management structure causes incompatibility with the novel technologies, avoiding timely interventions in reducing accidents and alleviating the fatalities. Transmission of the data to the interest agencies for evaluation and effective operation of the ITS-based systems should be considered. The problem areas were explored through diagnoses at institutional, data collection, and evaluation steps and the solutions were determined accordingly for the case city of Izmir.The Turkish Scientific and Technical Research Institut

    Classification of movements of the rat based on intra-cortical signals using artificial neural network and support vector machine

    Get PDF
    A BCI aims at creating a communication pathway between the brain and an external device. This is possible by decoding signals from the primary motor cortex and translating them into commands for a prosthetic device. The experimental design was developed starting from intra-cortical signal recorded in the rat brain. The data pre-processing included denoising with wavelet technique, spike detection, and feature extraction. Artificial neural network and support vector machine were applied to classify the rat movements into two possible classes, Hit or No Hit. The misclassification error rates from denoised and not denoised data were statistically different (p<0.05), proving the efficiency of the denoising technique. ANN and SVM gave comparable classification result

    QUANTITATIVE METHODS FOR RESERVOIR CHARACTERIZATION AND IMPROVED RECOVERY: APPLICATION TO HEAVY OIL SANDS

    Full text link

    Intégration holistique et entreposage automatique des données ouvertes

    Get PDF
    Statistical Open Data present useful information to feed up a decision-making system. Their integration and storage within these systems is achieved through ETL processes. It is necessary to automate these processes in order to facilitate their accessibility to non-experts. These processes have also need to face out the problems of lack of schemes and structural and sematic heterogeneity, which characterize the Open Data. To meet these issues, we propose a new ETL approach based on graphs. For the extraction, we propose automatic activities performing detection and annotations based on a model of a table. For the transformation, we propose a linear program fulfilling holistic integration of several graphs. This model supplies an optimal and a unique solution. For the loading, we propose a progressive process for the definition of the multidimensional schema and the augmentation of the integrated graph. Finally, we present a prototype and the experimental evaluations.Les statistiques présentes dans les Open Data ou données ouvertes constituent des informations utiles pour alimenter un systÚme décisionnel. Leur intégration et leur entreposage au sein du systÚme décisionnel se fait à travers des processus ETL. Il faut automatiser ces processus afin de faciliter leur accessibilité à des non-experts. Ces processus doivent pallier aux problÚmes de manque de schémas, d'hétérogénéité structurelle et sémantique qui caractérisent les données ouvertes. Afin de répondre à ces problématiques, nous proposons une nouvelle démarche ETL basée sur les graphes. Pour l'extraction du graphe d'un tableau, nous proposons des activités de détection et d'annotation automatiques. Pour la transformation, nous proposons un programme linéaire pour résoudre le problÚme d'appariement holistique de données structurelles provenant de plusieurs graphes. Ce modÚle fournit une solution optimale et unique. Pour le chargement, nous proposons un processus progressif pour la définition du schéma multidimensionnel et l'augmentation du graphe intégré. Enfin, nous présentons un prototype et les résultats d'expérimentations

    Disentangling the Formation Pathways of Protostars

    Get PDF
    Star formation occurs within dense cores within molecular clouds, often associated with filamentary structures. However, there exist isolated instances of star formation, far from nearby forming stars. As a core collapses, a rotationally supported circumstellar disk emerges around a central, gravitating potential with the accretion of gas and dust playing a vital role in regulating the subsequent stellar mass assembly. With recent studies revealing nearly half of all solar-type star systems are multiples, this raises questions about the mechanisms behind their formation. Furthermore, despite numerous discoveries of exoplanets with state-of-the-art space telescopes, the initial stages of planetary formation remain elusive. High-resolution interferometric imaging using ALMA of protoplanetary disks has ubiquitously unveiled intricate substructures, hinting at ongoing planet formation processes. To understand the formation and evolution of stars and their planetary systems, it is essential to better characterize their progenitors, known as "protostars”, particularly the youngest known phases of protostars, so-called Class 0. During the early stages of star formation, when gravitational collapse initiates, the conservation of angular momentum leads to the formation of a rotationally supported disk. However, only a handful of Class 0 protostellar disks, which are highly embedded in gas and dust, have been rigorously detailed so far. Consequently, a more comprehensive analysis of Class 0/I systems is imperative for understanding their formation and evolution. This dissertation aims to address multiple outstanding questions in star formation, beginning with a detailed investigation of an extraordinary triple-source protostellar Class 0 system, L1448 IRS3B. Expanding the focus, BHR7, an isolated Class 0 source, is studied to map the transfer of angular momentum from 1000s of au down to the disk. BHR7 serves as an ideal testbed for non-ideal MHD theory and represents a prototypical isolated Class 0 source, free from contamination by nearby forming stars. Furthermore, a high-resolution survey is conducted in the Perseus region, encompassing 12 known multiple star systems, to determine the most probable formation pathways for each of these sources. Rigorous modeling techniques and statistical tests are employed to disentangle the formation pathways of the protostars. By undertaking these investigations, I aim to enhance our understanding of star formation processes, provide observational constraints on star formation theory, and shed light on the complex formation mechanisms underlying multiple star systems, which are thought to be the early stages of exoplanet progenitors

    Approximation methodologies for explicit model predictive control of complex systems

    No full text
    This thesis concerns the development of complexity reduction methodologies for the application of multi-parametric/explicit model predictive (mp-MPC) control to complex high fidelity models. The main advantage of mp-MPC is the offline relocation of the optimization task and the associated computational expense through the use of multi-parametric programming. This allows for the application of MPC to fast sampling systems or systems for which it is not possible to perform online optimization due to cycle time requirements. The application of mp-MPC to complex nonlinear systems is of critical importance and is the subject of the thesis. The first part is concerned with the adaptation and development of model order reduction (MOR) techniques for application in combination to mp-MPC algorithms. This first part includes the mp-MPC oriented use of existing MOR techniques as well as the development of new ones. The use of MOR for multi-parametric moving horizon estimation is also investigated. The second part of the thesis introduces a framework for the ‘equation free’ surrogate-model based design of explicit controllers as a possible alternative to multi-parametric based methods. The methodology relies upon the use of advanced data-classification approaches and surrogate modelling techniques, and is illustrated with different numerical examples.Open Acces

    Joint Discourse-aware Concept Disambiguation and Clustering

    Get PDF
    This thesis addresses the tasks of concept disambiguation and clustering. Concept disambiguation is the task of linking common nouns and proper names in a text – henceforth called mentions – to their corresponding concepts in a predefined inventory. Concept clustering is the task of clustering mentions, so that all mentions in one cluster denote the same concept. In this thesis, we investigate concept disambiguation and clustering from a discourse perspective and propose a discourse-aware approach for joint concept disambiguation and clustering in the framework of Markov logic. The contributions of this thesis are fourfold: Joint Concept Disambiguation and Clustering. In previous approaches, concept disambiguation and concept clustering have been considered as two separate tasks (SchĂŒtze, 1998; Ji & Grishman, 2011). We analyze the relationship between concept disambiguation and concept clustering and argue that these two tasks can mutually support each other. We propose the – to our knowledge – first joint approach for concept disambiguation and clustering. Discourse-Aware Concept Disambiguation. One of the determining factors for concept disambiguation and clustering is the context definition. Most previous approaches use the same context definition for all mentions (Milne & Witten, 2008b; Kulkarni et al., 2009; Ratinov et al., 2011, inter alia). We approach the question which context is relevant to disambiguate a mention from a discourse perspective and state that different mentions require different notions of contexts. We state that the context that is relevant to disambiguate a mention depends on its embedding into discourse. However, how a mention is embedded into discourse depends on its denoted concept. Hence, the identification of the denoted concept and the relevant concept mutually depend on each other. We propose a binwise approach with three different context definitions and model the selection of the context definition and the disambiguation jointly. Modeling Interdependencies with Markov Logic. To model the interdependencies between concept disambiguation and concept clustering as well as the interdependencies between the context definition and the disambiguation, we use Markov logic (Domingos & Lowd, 2009). Markov logic combines first order logic with probabilities and allows us to concisely formalize these interdependencies. We investigate how we can balance between linguistic appropriateness and time efficiency and propose a hybrid approach that combines joint inference with aggregation techniques. Concept Disambiguation and Clustering beyond English: Multi- and Cross-linguality. Given the vast amount of texts written in different languages, the capability to extend an approach to cope with other languages than English is essential. We thus analyze how our approach copes with other languages than English and show that our approach largely scales across languages, even without retraining. Our approach is evaluated on multiple data sets originating from different sources (e.g. news, web) and across multiple languages. As an inventory, we use Wikipedia. We compare our approach to other approaches and show that it achieves state-of-the-art results. Furthermore, we show that joint concept disambiguating and clustering as well as joint context selection and disambiguation leads to significant improvements ceteris paribus
    corecore