28 research outputs found

    Inhibition in multiclass classification

    Get PDF
    The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions, that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems. These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches

    Inhibition in multiclass classification

    Get PDF
    The role of inhibition is investigated in a multiclass support vector machine formalism inspired by the brain structure of insects. The so-called mushroom bodies have a set of output neurons, or classification functions, that compete with each other to encode a particular input. Strongly active output neurons depress or inhibit the remaining outputs without knowing which is correct or incorrect. Accordingly, we propose to use a classification function that embodies unselective inhibition and train it in the large margin classifier framework. Inhibition leads to more robust classifiers in the sense that they perform better on larger areas of appropriate hyperparameters when assessed with leave-one-out strategies. We also show that the classifier with inhibition is a tight bound to probabilistic exponential models and is Bayes consistent for 3-class problems. These properties make this approach useful for data sets with a limited number of labeled examples. For larger data sets, there is no significant comparative advantage to other multiclass SVM approaches

    The Protein-Protein Interaction tasks of BioCreative III: classification/ranking of articles and linking bio-ontology concepts to full text

    Get PDF
    BACKGROUND: Determining usefulness of biomedical text mining systems requires realistic task definition and data selection criteria without artificial constraints, measuring performance aspects that go beyond traditional metrics. The BioCreative III Protein-Protein Interaction (PPI) tasks were motivated by such considerations, trying to address aspects including how the end user would oversee the generated output, for instance by providing ranked results, textual evidence for human interpretation or measuring time savings by using automated systems. Detecting articles describing complex biological events like PPIs was addressed in the Article Classification Task (ACT), where participants were asked to implement tools for detecting PPI-describing abstracts. Therefore the BCIII-ACT corpus was provided, which includes a training, development and test set of over 12,000 PPI relevant and non-relevant PubMed abstracts labeled manually by domain experts and recording also the human classification times. The Interaction Method Task (IMT) went beyond abstracts and required mining for associations between more than 3,500 full text articles and interaction detection method ontology concepts that had been applied to detect the PPIs reported in them.RESULTS:A total of 11 teams participated in at least one of the two PPI tasks (10 in ACT and 8 in the IMT) and a total of 62 persons were involved either as participants or in preparing data sets/evaluating these tasks. Per task, each team was allowed to submit five runs offline and another five online via the BioCreative Meta-Server. From the 52 runs submitted for the ACT, the highest Matthew's Correlation Coefficient (MCC) score measured was 0.55 at an accuracy of 89 and the best AUC iP/R was 68. Most ACT teams explored machine learning methods, some of them also used lexical resources like MeSH terms, PSI-MI concepts or particular lists of verbs and nouns, some integrated NER approaches. For the IMT, a total of 42 runs were evaluated by comparing systems against manually generated annotations done by curators from the BioGRID and MINT databases. The highest AUC iP/R achieved by any run was 53, the best MCC score 0.55. In case of competitive systems with an acceptable recall (above 35) the macro-averaged precision ranged between 50 and 80, with a maximum F-Score of 55. CONCLUSIONS: The results of the ACT task of BioCreative III indicate that classification of large unbalanced article collections reflecting the real class imbalance is still challenging. Nevertheless, text-mining tools that report ranked lists of relevant articles for manual selection can potentially reduce the time needed to identify half of the relevant articles to less than 1/4 of the time when compared to unranked results. Detecting associations between full text articles and interaction detection method PSI-MI terms (IMT) is more difficult than might be anticipated. This is due to the variability of method term mentions, errors resulting from pre-processing of articles provided as PDF files, and the heterogeneity and different granularity of method term concepts encountered in the ontology. However, combining the sophisticated techniques developed by the participants with supporting evidence strings derived from the articles for human interpretation could result in practical modules for biological annotation workflows

    Adaptive Locking

    Full text link
    Adaptive locking is a new concurrency control scheme for relational database systems. An adaptive locking scheduler automatically issues to each transaction appropriate locks on its read and write sets. The read and write sets of transactions are exactly the parts of the shared database that it is necessary and sufficient to lock in order to prevent all state and view inconsistencies. This paper shows how to compute logical expressions representing the read and write sets of access statements and describes an efficient algorithm to check whether the locks issued to different transactions cause them to conflict. The algorithm is based on extended tableaux capable of representing all conjunctive queries. The paper discusses how to use adaptive locking with complex queries and compares the new scheme to conventional locking. A prototype database system demonstrating how an adaptive locking scheduler reasons about conflict is presented

    Flexible Concurrency Control by Reasoning About Database Queries and Updates

    Full text link
    A number of database management problems involve reasoning about queries and updates. Concurrency control is the most important example: two transactions should not be executed simultaneously if it is possible that an update command issued by one transaction might change information used in answering a query issued by the other. Existing concurrency control schemes are based on the idea of protecting discrete items of data. This thesis describes a concurrency control scheme called adaptive locking that is based instead on logical reasoning. The central notion is that of independence. Informally, a query is independent of an update if executing the update cannot change the result of evaluating the query. First the general properties of the concept of independence are investigated, using a formal model-theoretic definition in the context of deductive databases. Then proof-theoretic sufficient conditions are obtained for the independence of queries and updates. These results apply to arbitrary queries and updates, and they take into account integrity constraints and recursive rules. For the special case where a query and an update are both specified by conjunctive relational algebra expressions, a decision procedure for independence is given. The procedure is of practical use because typically it requires linear time, and it produces answers that are precise enough to be relied upon. The procedure takes into account functional dependencies, so it constitutes a solution to an open problem identified by Blakeley, Coburn, and Larson. It is of theoretical interest for two reasons. First, its quadratic worst-case time complexity cannot be improved unless the reachability problem for directed graphs can be solved in sublinear time. Second, it applies to the widest possible natural class of queries, since deciding independence is NP-hard for nonconjunctive queries and updates

    Not the Last Word on EBL Algorithms

    Full text link
    This paper describes a new domain-independent explanation-based learning (EBL) algorithm that is able to acquire useful new rules in situations where previous EBL algorithms would fail. The new algorithm is complete in the sense that every valid rule that can be extracted from an explanation can be extracted by this algorithm. The new algorithm is described inside a framework that provides insight into how the design of successful EBL systems takes into account operationality and imperfect domain theory issues

    Integrating External Information Sources to Guide Worldwide Web Information Retrieval

    No full text
    : Information retrieval in the worldwide web environment poses unique challenges. The most common approaches involve indexing, but indexes introduce centralization and can never be up-to-date. This paper advocates using external databases and information sources as guides for locating worldwide web information. This approach has been implemented in WEBFIND, a tool which discovers scientific papers made available on the web by their authors. The external information sources used by WEBFIND are MELVYL, the online University of California library catalog, and NETFIND, a service for finding email addresses. WEBFIND combines the information available from these services in order to find good starting points for searching for the papers that a user wants. At several stages in its operation, WEBFIND must solve instances of what we call the field matching problem. This problem is to determine whether or not two syntactic values are alternative representations of the same semantic entity. For e..

    On Valid and Invalid Methodologies for Experimentala Evaluations of EBL

    Full text link
    A number of experimental evaluations of explanation-based learning (EBL) have appeared in the literature on machine learning. Closer examination of experimental methodologies used in the past reveals certain methodological flaws that call into question the conclusions drawn from these experiments. This paper illustrates some of the more common methodological problems, proposes a novel experimental framework for future empirical studies of EBL, and presents an example of an experiment performed within this new framework
    corecore