5,284 research outputs found

    A New General Method to Generate Random Modal Formulae for Testing Decision Procedures

    Get PDF
    The recent emergence of heavily-optimized modal decision procedures has highlighted the key role of empirical testing in this domain. Unfortunately, the introduction of extensive empirical tests for modal logics is recent, and so far none of the proposed test generators is very satisfactory. To cope with this fact, we present a new random generation method that provides benefits over previous methods for generating empirical tests. It fixes and much generalizes one of the best-known methods, the random CNF_[]m test, allowing for generating a much wider variety of problems, covering in principle the whole input space. Our new method produces much more suitable test sets for the current generation of modal decision procedures. We analyze the features of the new method by means of an extensive collection of empirical tests

    Classifying document types to enhance search and recommendations in digital libraries

    Full text link
    In this paper, we address the problem of classifying documents available from the global network of (open access) repositories according to their type. We show that the metadata provided by repositories enabling us to distinguish research papers, thesis and slides are missing in over 60% of cases. While these metadata describing document types are useful in a variety of scenarios ranging from research analytics to improving search and recommender (SR) systems, this problem has not yet been sufficiently addressed in the context of the repositories infrastructure. We have developed a new approach for classifying document types using supervised machine learning based exclusively on text specific features. We achieve 0.96 F1-score using the random forest and Adaboost classifiers, which are the best performing models on our data. By analysing the SR system logs of the CORE [1] digital library aggregator, we show that users are an order of magnitude more likely to click on research papers and thesis than on slides. This suggests that using document types as a feature for ranking/filtering SR results in digital libraries has the potential to improve user experience.Comment: 12 pages, 21st International Conference on Theory and Practise of Digital Libraries (TPDL), 2017, Thessaloniki, Greec

    A New General Method to Generate Random Modal Formulae for Testing Decision Procedures

    Full text link
    The recent emergence of heavily-optimized modal decision procedures has highlighted the key role of empirical testing in this domain. Unfortunately, the introduction of extensive empirical tests for modal logics is recent, and so far none of the proposed test generators is very satisfactory. To cope with this fact, we present a new random generation method that provides benefits over previous methods for generating empirical tests. It fixes and much generalizes one of the best-known methods, the random CNF_[]m test, allowing for generating a much wider variety of problems, covering in principle the whole input space. Our new method produces much more suitable test sets for the current generation of modal decision procedures. We analyze the features of the new method by means of an extensive collection of empirical tests

    Enhancing Sensitivity Classification with Semantic Features using Word Embeddings

    Get PDF
    Government documents must be reviewed to identify any sensitive information they may contain, before they can be released to the public. However, traditional paper-based sensitivity review processes are not practical for reviewing born-digital documents. Therefore, there is a timely need for automatic sensitivity classification techniques, to assist the digital sensitivity review process. However, sensitivity is typically a product of the relations between combinations of terms, such as who said what about whom, therefore, automatic sensitivity classification is a difficult task. Vector representations of terms, such as word embeddings, have been shown to be effective at encoding latent term features that preserve semantic relations between terms, which can also be beneficial to sensitivity classification. In this work, we present a thorough evaluation of the effectiveness of semantic word embedding features, along with term and grammatical features, for sensitivity classification. On a test collection of government documents containing real sensitivities, we show that extending text classification with semantic features and additional term n-grams results in significant improvements in classification effectiveness, correctly classifying 9.99% more sensitive documents compared to the text classification baseline

    On Security and Sparsity of Linear Classifiers for Adversarial Settings

    Full text link
    Machine-learning techniques are widely used in security-related applications, like spam and malware detection. However, in such settings, they have been shown to be vulnerable to adversarial attacks, including the deliberate manipulation of data at test time to evade detection. In this work, we focus on the vulnerability of linear classifiers to evasion attacks. This can be considered a relevant problem, as linear classifiers have been increasingly used in embedded systems and mobile devices for their low processing time and memory requirements. We exploit recent findings in robust optimization to investigate the link between regularization and security of linear classifiers, depending on the type of attack. We also analyze the relationship between the sparsity of feature weights, which is desirable for reducing processing cost, and the security of linear classifiers. We further propose a novel octagonal regularizer that allows us to achieve a proper trade-off between them. Finally, we empirically show how this regularizer can improve classifier security and sparsity in real-world application examples including spam and malware detection

    From LTL and Limit-Deterministic B\"uchi Automata to Deterministic Parity Automata

    Full text link
    Controller synthesis for general linear temporal logic (LTL) objectives is a challenging task. The standard approach involves translating the LTL objective into a deterministic parity automaton (DPA) by means of the Safra-Piterman construction. One of the challenges is the size of the DPA, which often grows very fast in practice, and can reach double exponential size in the length of the LTL formula. In this paper we describe a single exponential translation from limit-deterministic B\"uchi automata (LDBA) to DPA, and show that it can be concatenated with a recent efficient translation from LTL to LDBA to yield a double exponential, \enquote{Safraless} LTL-to-DPA construction. We also report on an implementation, a comparison with the SPOT library, and performance on several sets of formulas, including instances from the 2016 SyntComp competition

    Evolving rules for document classification

    Get PDF
    We describe a novel method for using Genetic Programming to create compact classification rules based on combinations of N-Grams (character strings). Genetic programs acquire fitness by producing rules that are effective classifiers in terms of precision and recall when evaluated against a set of training documents. We describe a set of functions and terminals and provide results from a classification task using the Reuters 21578 dataset. We also suggest that because the induced rules are meaningful to a human analyst they may have a number of other uses beyond classification and provide a basis for text mining applications
    corecore