726 research outputs found

    Towards an automated classification of spreadsheets

    Get PDF
    Many spreadsheets in the wild do not have documentation nor categorization associated with them. This makes difficult to apply spreadsheet research that targets specific spreadsheet domains such as financial or database.We introduce with this paper a methodology to automatically classify spreadsheets into different domains. We exploit existing data mining classification algorithms using spreadsheet-specific features. The algorithms were trained and validated with cross-validation using the EUSES corpus, with an up to 89% accuracy. The best algorithm was applied to the larger Enron corpus in order to get some insight from it and to demonstrate the usefulness of this work

    Operation of a quantum dot in the finite-state machine mode: single-electron dynamic memory

    Full text link
    A single electron dynamic memory is designed based on the non-equilibrium dynamics of charge states in electrostatically-defined metallic quantum dots. Using the orthodox theory for computing the transfer rates and a master equation, we model the dynamical response of devices consisting of a charge sensor coupled to either a single and or a double quantum dot subjected to a pulsed gate voltage. We show that transition rates between charge states in metallic quantum dots are characterized by an asymmetry that can be controlled by the gate voltage. This effect is more pronounced when the switching between charge states corresponds to a Markovian process involving electron transport through a chain of several quantum dots. By simulating the dynamics of electron transport we demonstrate that the quantum box operates as a finite-state machine that can be addressed by choosing suitable shapes and switching rates of the gate pulses. We further show that writing times in the ns range and retention memory times six orders of magnitude longer, in the ms range, can be achieved on the double quantum dot system using experimentally feasible parameters thereby demonstrating that the device can operate as a dynamic single electron memory.Comment: 18 pages, 8 figure

    Digging into acceptor splice site prediction : an iterative feature selection approach

    Get PDF
    Feature selection techniques are often used to reduce data dimensionality, increase classification performance, and gain insight into the processes that generated the data. In this paper, we describe an iterative procedure of feature selection and feature construction steps, improving the classification of acceptor splice sites, an important subtask of gene prediction. We show that acceptor prediction can benefit from feature selection, and describe how feature selection techniques can be used to gain new insights in the classification of acceptor sites. This is illustrated by the identification of a new, biologically motivated feature: the AG-scanning feature. The results described in this paper contribute both to the domain of gene prediction, and to research in feature selection techniques, describing a new wrapper based feature weighting method that aids in knowledge discovery when dealing with complex datasets

    Different Approaches to Community Evolution Prediction in Blogosphere

    Full text link
    Predicting the future direction of community evolution is a problem with high theoretical and practical significance. It allows to determine which characteristics describing communities have importance from the point of view of their future behaviour. Knowledge about the probable future career of the community aids in the decision concerning investing in contact with members of a given community and carrying out actions to achieve a key position in it. It also allows to determine effective ways of forming opinions or to protect group participants against such activities. In the paper, a new approach to group identification and prediction of future events is presented together with the comparison to existing method. Performed experiments prove a high quality of prediction results. Comparison to previous studies shows that using many measures to describe the group profile, and in consequence as a classifier input, can improve predictions.Comment: SNAA2013 at ASONAM2013 IEEE Computer Societ

    On the Construction of Human-Automation Interfaces by Formal Abstraction

    Full text link
    In this paper we present a formal methodology and an algorithmic procedure for constructing human-auto-mation interfaces and corresponding user-manuals. Our focus is the information provided to the user about the behavior of the underlying machine, rather than the graphical and layout features of the interface itself. Our approach involves a systematic reduction of the behavioral model of the machine, as well as systematic abstraction of information that displayed in the inter-face. This reduction procedure satisfies two require-ments: First, the interface must be correct so as not to cause mode confusion that may lead the user to per-form incorrect actions. Secondly, the interface must be as simple as possible and not include any unnecessary information. The algorithm for generating such inter-faces can be automated, and a preliminary software system for its implementation has been developed

    Preceding rule induction with instance reduction methods

    Get PDF
    A new prepruning technique for rule induction is presented which applies instance reduction before rule induction. An empirical evaluation records the predictive accuracy and size of rule-sets generated from 24 datasets from the UCI Machine Learning Repository. Three instance reduction algorithms (Edited Nearest Neighbour, AllKnn and DROP5) are compared. Each one is used to reduce the size of the training set, prior to inducing a set of rules using Clark and Boswell's modification of CN2. A hybrid instance reduction algorithm (comprised of AllKnn and DROP5) is also tested. For most of the datasets, pruning the training set using ENN, AllKnn or the hybrid significantly reduces the number of rules generated by CN2, without adversely affecting the predictive performance. The hybrid achieves the highest average predictive accuracy

    Building cloud applications for challenged networks

    Get PDF
    Cloud computing has seen vast advancements and uptake in many parts of the world. However, many of the design patterns and deployment models are not very suitable for locations with challenged networks such as countries with no nearby datacenters. This paper describes the problem and discusses the options available for such locations, focusing specifically on community clouds as a short-term solution. The paper highlights the impact of recent trends in the development of cloud applications and how changing these could better help deployment in challenged networks. The paper also outlines the consequent challenges in bridging different cloud deployments, also known as cross-cloud computing

    Fairness in Algorithmic Decision Making: An Excursion Through the Lens of Causality

    Full text link
    As virtually all aspects of our lives are increasingly impacted by algorithmic decision making systems, it is incumbent upon us as a society to ensure such systems do not become instruments of unfair discrimination on the basis of gender, race, ethnicity, religion, etc. We consider the problem of determining whether the decisions made by such systems are discriminatory, through the lens of causal models. We introduce two definitions of group fairness grounded in causality: fair on average causal effect (FACE), and fair on average causal effect on the treated (FACT). We use the Rubin-Neyman potential outcomes framework for the analysis of cause-effect relationships to robustly estimate FACE and FACT. We demonstrate the effectiveness of our proposed approach on synthetic data. Our analyses of two real-world data sets, the Adult income data set from the UCI repository (with gender as the protected attribute), and the NYC Stop and Frisk data set (with race as the protected attribute), show that the evidence of discrimination obtained by FACE and FACT, or lack thereof, is often in agreement with the findings from other studies. We further show that FACT, being somewhat more nuanced compared to FACE, can yield findings of discrimination that differ from those obtained using FACE.Comment: 7 pages, 2 figures, 2 tables.To appear in Proceedings of the International Conference on World Wide Web (WWW), 201

    Assisted Diagnosis of Parkinsonism Based on the Striatal Morphology

    Get PDF
    Parkinsonism is a clinical syndrome characterized by the progressive loss of striatal dopamine. Its diagnosis is usually corroborated by neuroimaging data such as DaTSCAN neuroimages that allow visualizing the possible dopamine deficiency. During the last decade, a number of computer systems have been proposed to automatically analyze DaTSCAN neuroimages, eliminating the subjectivity inherent to the visual examination of the data. In this work, we propose a computer system based on machine learning to separate Parkinsonian patients and control subjects using the size and shape of the striatal region, modeled from DaTSCAN data. First, an algorithm based on adaptative thresholding is used to parcel the striatum. This region is then divided into two according to the brain hemisphere division and characterized with 152 measures, extracted from the volume and its three possible 2-dimensional projections. Afterwards, the Bhattacharyya distance is used to discard the least discriminative measures and, finally, the neuroimage category is estimated by means of a Support Vector Machine classifier. This method was evaluated using a dataset with 189 DaTSCAN neuroimages, obtaining an accuracy rate over 94%. This rate outperforms those obtained by previous approaches that use the intensity of each striatal voxel as a feature.This work was supported by the MINECO/ FEDER under the TEC2015-64718-R project, the Ministry of Economy, Innovation, Science and Employment of the Junta de Andaluc´ıa under the P11-TIC-7103 Excellence Project and the Vicerectorate of Research and Knowledge Transfer of the University of Granada

    The identification of informative genes from multiple datasets with increasing complexity

    Get PDF
    Background In microarray data analysis, factors such as data quality, biological variation, and the increasingly multi-layered nature of more complex biological systems complicates the modelling of regulatory networks that can represent and capture the interactions among genes. We believe that the use of multiple datasets derived from related biological systems leads to more robust models. Therefore, we developed a novel framework for modelling regulatory networks that involves training and evaluation on independent datasets. Our approach includes the following steps: (1) ordering the datasets based on their level of noise and informativeness; (2) selection of a Bayesian classifier with an appropriate level of complexity by evaluation of predictive performance on independent data sets; (3) comparing the different gene selections and the influence of increasing the model complexity; (4) functional analysis of the informative genes. Results In this paper, we identify the most appropriate model complexity using cross-validation and independent test set validation for predicting gene expression in three published datasets related to myogenesis and muscle differentiation. Furthermore, we demonstrate that models trained on simpler datasets can be used to identify interactions among genes and select the most informative. We also show that these models can explain the myogenesis-related genes (genes of interest) significantly better than others (P < 0.004) since the improvement in their rankings is much more pronounced. Finally, after further evaluating our results on synthetic datasets, we show that our approach outperforms a concordance method by Lai et al. in identifying informative genes from multiple datasets with increasing complexity whilst additionally modelling the interaction between genes. Conclusions We show that Bayesian networks derived from simpler controlled systems have better performance than those trained on datasets from more complex biological systems. Further, we present that highly predictive and consistent genes, from the pool of differentially expressed genes, across independent datasets are more likely to be fundamentally involved in the biological process under study. We conclude that networks trained on simpler controlled systems, such as in vitro experiments, can be used to model and capture interactions among genes in more complex datasets, such as in vivo experiments, where these interactions would otherwise be concealed by a multitude of other ongoing events
    corecore