136,386 research outputs found

    Constrained LQR for Low-Precision Data Representation

    Get PDF
    Performing computations with a low-bit number representation results in a faster implementation that uses less silicon, and hence allows an algorithm to be implemented in smaller and cheaper processors without loss of performance. We propose a novel formulation to efficiently exploit the low (or non-standard) precision number representation of some computer architectures when computing the solution to constrained LQR problems, such as those that arise in predictive control. The main idea is to include suitably-defined decision variables in the quadratic program, in addition to the states and the inputs, to allow for smaller roundoff errors in the solver. This enables one to trade off the number of bits used for data representation against speed and/or hardware resources, so that smaller numerical errors can be achieved for the same number of bits (same silicon area). Because of data dependencies, the algorithm complexity, in terms of computation time and hardware resources, does not necessarily increase despite the larger number of decision variables. Examples show that a 10-fold reduction in hardware resources is possible compared to using double precision floating point, without loss of closed-loop performance

    Computational Complexity And Algorithms For Dirty Data Evaluation And Repairing

    Get PDF
    In this dissertation, we study the dirty data evaluation and repairing problem in relational database. Dirty data is usually inconsistent, inaccurate, incomplete and stale. Existing methods and theories of consistency describe using integrity constraints, such as data dependencies. However, integrity constraints are good at detection but not at evaluating the degree of data inconsistency and cannot guide the data repairing. This dissertation first studies the computational complexity of and algorithms for the database inconsistency evaluation. We define and use the minimum tuple deletion to evaluate the database inconsistency. For such minimum tuple deletion problem, we study the relationship between the size of rule set and its computational complexity. We show that the minimum tuple deletion problem is NP-hard to approximate the minimum tuple deletion within 17/16 if given three functional dependencies and four attributes involved. A near optimal approximated algorithm for computing the minimum tuple deletion is proposed with a ratio of 2 − 1/2r , where r is the number of given functional dependencies. To guide the data repairing, this dissertation also investigates the data repairing method by using query feedbacks, formally studies two decision problems, functional dependency restricted deletion and insertion propagation problem, corresponding to the feedbacks of deletion and insertion. A comprehensive analysis on both combined and data complexity of the cases is provided by considering different relational operators and feedback types. We have identified the intractable and tractable cases to picture the complexity hierarchy of these problems, and provided the efficient algorithm on these tractable cases. Two improvements are proposed, one focuses on figuring out the minimum vertex cover in conflict graph to improve the upper bound of tuple deletion problem, and the other one is a better dichotomy for deletion and insertion propagation problems at the absence of functional dependencies from the point of respectively considering data, combined and parameterized complexities

    Identifying Character Non-Independence in Phylogenetic Data using Data Mining Techniques

    Get PDF
    Undiscovered relationships in a data set may confound analyses, particularly those that assume data independence. Such problems occur when characters used for phylogenetic analyses are not independent of one another. A main assumption of phylogenetic inference methods such as maximum likelihood and parsimony is that each character serves as an independent hypothesis of evolution. When this assumption is violated, the resulting phylogeny may not reflect true evolutionary history. Therefore, it is imperative that character non-independence be identified prior to phylogenetic analyses. To identify dependencies between phylogenetic characters, we applied three data mining techniques: 1) Bayesian networks, 2) decision tree induction, and 3) rule induction from coverings. We briefly discuss the main ideas behind each strategy, show how each technique performs on a small sample data set, and apply each method to an existing phylogenetic data set. We discuss the interestingness of the results of each method, and show that, although each method has its own strengths and weaknesses, rule induction from coverings presents the most useful solution for determining dependencies among phylogenetic data at this time

    Backward Sequential Feature Elimination And Joining Algorithms In Machine Learning

    Get PDF
    The Naïve Bayes Model is a special case of Bayesian networks with strong independence assumptions. It is typically used for classification problems. The Naïve Bayes model is trained using the given data to estimate the parameters necessary for classification. This model of classification is very popular since it is simple yet efficient and accurate. While the Naïve Bayes model is considered accurate on most of the problem instances, there is a set of problems for which the Naïve Bayes does not give accurate results when compared to other classifiers such as the decision tree algorithms. One reason for it could be the strong independence assumption of the Naïve Bayes model. This project aims at searching for dependencies between the features and studying the consequences of applying these dependencies in classifying instances. We propose two different algorithms, the Backward Sequential Joining and the Backward Sequential Elimination that can be applied in order to improve the accuracy of the Naïve Bayes model. We then compare the accuracies of the different algorithms and derive conclusion based on the results

    A Hierarchical Temporal Memory Sequence Classifier for Streaming Data

    Get PDF
    Real-world data streams often contain concept drift and noise. Additionally, it is often the case that due to their very nature, these real-world data streams also include temporal dependencies between data. Classifying data streams with one or more of these characteristics is exceptionally challenging. Classification of data within data streams is currently the primary focus of research efforts in many fields (i.e., intrusion detection, data mining, machine learning). Hierarchical Temporal Memory (HTM) is a type of sequence memory that exhibits some of the predictive and anomaly detection properties of the neocortex. HTM algorithms conduct training through exposure to a stream of sensory data and are thus suited for continuous online learning. This research developed an HTM sequence classifier aimed at classifying streaming data, which contained concept drift, noise, and temporal dependencies. The HTM sequence classifier was fed both artificial and real-world data streams and evaluated using the prequential evaluation method. Cost measures for accuracy, CPU-time, and RAM usage were calculated for each data stream and compared against a variety of modern classifiers (e.g., Accuracy Weighted Ensemble, Adaptive Random Forest, Dynamic Weighted Majority, Leverage Bagging, Online Boosting ensemble, and Very Fast Decision Tree). The HTM sequence classifier performed well when the data streams contained concept drift, noise, and temporal dependencies, but was not the most suitable classifier of those compared against when provided data streams did not include temporal dependencies. Finally, this research explored the suitability of the HTM sequence classifier for detecting stalling code within evasive malware. The results were promising as they showed the HTM sequence classifier capable of predicting coding sequences of an executable file by learning the sequence patterns of the x86 EFLAGs register. The HTM classifier plotted these predictions in a cardiogram-like graph for quick analysis by reverse engineers of malware. This research highlights the potential of HTM technology for application in online classification problems and the detection of evasive malware

    Learning Interpretable Rules for Multi-label Classification

    Full text link
    Multi-label classification (MLC) is a supervised learning problem in which, contrary to standard multiclass classification, an instance can be associated with several class labels simultaneously. In this chapter, we advocate a rule-based approach to multi-label classification. Rule learning algorithms are often employed when one is not only interested in accurate predictions, but also requires an interpretable theory that can be understood, analyzed, and qualitatively evaluated by domain experts. Ideally, by revealing patterns and regularities contained in the data, a rule-based theory yields new insights in the application domain. Recently, several authors have started to investigate how rule-based models can be used for modeling multi-label data. Discussing this task in detail, we highlight some of the problems that make rule learning considerably more challenging for MLC than for conventional classification. While mainly focusing on our own previous work, we also provide a short overview of related work in this area.Comment: Preprint version. To appear in: Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer (2018). See http://www.ke.tu-darmstadt.de/bibtex/publications/show/3077 for further informatio
    corecore