3,189 research outputs found

    Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

    Get PDF
    A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words. Topic models play a central role in this approach. Using standard topic models for measuring diversity of documents is suboptimal due to generality and impurity. General topics only include common information from a background corpus and are assigned to most of the documents in the collection. Impure topics contain words that are not related to the topic; impurity lowers the interpretability of topic models and impure topics are likely to get assigned to documents erroneously. We propose a hierarchical re-estimation approach for topic models to combat generality and impurity; the proposed approach operates at three levels: words, topics, and documents. Our re-estimation approach for measuring documents' topical diversity outperforms the state of the art on PubMed dataset which is commonly used for diversity experiments.Comment: Proceedings of the 39th European Conference on Information Retrieval (ECIR2017

    A comparison of strategies for selecting auxiliary variables for multiple imputation

    Get PDF
    Multiple imputation (MI) is a popular method for handling missing data. Auxiliary variables can be added to the imputation model(s) to improve MI estimates. However, the choice of which auxiliary variables to include is not always straightforward. Several data-driven auxiliary variable selection strategies have been proposed, but there has been limited evaluation of their performance. Using a simulation study we evaluated the performance of eight auxiliary variable selection strategies: (1, 2) two versions of selection based on correlations in the observed data; (3) selection using hypothesis tests of the “missing completely at random” assumption; (4) replacing auxiliary variables with their principal components; (5, 6) forward and forward stepwise selection; (7) forward selection based on the estimated fraction of missing information; and (8) selection via the least absolute shrinkage and selection operator (LASSO). A complete case analysis and an MI analysis using all auxiliary variables (the “full model”) were included for comparison. We also applied all strategies to a motivating case study. The full model outperformed all auxiliary variable selection strategies in the simulation study, with the LASSO strategy the best performing auxiliary variable selection strategy overall. All MI analysis strategies that we were able to apply to the case study led to similar estimates, although computational time was substantially reduced when variable selection was employed. This study provides further support for adopting an inclusive auxiliary variable strategy where possible. Auxiliary variable selection using the LASSO may be a promising alternative when the full model fails or is too burdensome

    Carbon nanotube four-terminal devices for pressure sensing applications

    Get PDF
    Carbon nanotubes (CNTs) are of high interest for sensing applications,owing to their superior mechanical strength, high Young’s modulus and low density. In this work, we report on a facile approach for the fabrication of carbon nanotube devices using a four terminal configuration. Oriented carbon nanotube films were pulled out from a CNT forest wafer and then twisted into a yarn. Both the CNT film and yarn were arranged on elastomer membranes/diaphragms which were arranged on a laser cut acrylic frame to form pressure sensors. The sensors were calibrated using a precisely controlled pressure system, showing a large change of the output voltage of approximately 50 mV at a constant supply current of 100 μA and under a low applied pressure of 15 mbar. The results indicate the high potential of using CNT films and yarns for pressure sensing applications

    Semantic Mutation Testing for Multi-Agent Systems

    Get PDF
    This paper introduces semantic mutation testing (SMT) into multiagent systems. SMT is a test assessment technique that makes changes to the interpretation of a program and then examines whether a given test set has the ability to detect each change to the original interpretation. These changes represent possible misunderstandings of how the program is interpreted. SMT is also a technique for assessing the robustness of a program to semantic changes. This paper applies SMT to three rule-based agent programming languages, namely Jason, GOAL and 2APL, provides several contexts in which SMT for these languages is useful, and proposes three sets of semantic mutation operators (i.e., rules to make semantic changes) for these languages respectively, and a set of semantic mutation operator classes for rule-based agent languages. This paper then shows, through preliminary evaluation of our semantic mutation operators for Jason, that SMT has some potential to assess tests and program robustness

    Finding the Optimal Balance between Over and Under Approximation of Models Inferred from Execution Logs

    Full text link
    Models inferred from execution traces (logs) may admit more behaviours than those possible in the real system (over-approximation) or may exclude behaviours that can indeed occur in the real system (under-approximation). Both problems negatively affect model based testing. In fact, over-approximation results in infeasible test cases, i.e., test cases that cannot be activated by any input data. Under-approximation results in missing test cases, i.e., system behaviours that are not represented in the model are also never tested. In this paper we balance over- and under-approximation of inferred models by resorting to multi-objective optimization achieved by means of two search-based algorithms: A multi-objective Genetic Algorithm (GA) and the NSGA-II. We report the results on two open-source web applications and compare the multi-objective optimization to the state-of-the-art KLFA tool. We show that it is possible to identify regions in the Pareto front that contain models which violate fewer application constraints and have a higher bug detection ratio. The Pareto fronts generated by the multi-objective GA contain a region where models violate on average 2% of an application's constraints, compared to 2.8% for NSGA-II and 28.3% for the KLFA models. Similarly, it is possible to identify a region on the Pareto front where the multi-objective GA inferred models have an average bug detection ratio of 110: 3 and the NSGA-II inferred models have an average bug detection ratio of 101: 6. This compares to a bug detection ratio of 310928: 13 for the KLFA tool. © 2012 IEEE

    Using Neural Networks for Relation Extraction from Biomedical Literature

    Full text link
    Using different sources of information to support automated extracting of relations between biomedical concepts contributes to the development of our understanding of biological systems. The primary comprehensive source of these relations is biomedical literature. Several relation extraction approaches have been proposed to identify relations between concepts in biomedical literature, namely, using neural networks algorithms. The use of multichannel architectures composed of multiple data representations, as in deep neural networks, is leading to state-of-the-art results. The right combination of data representations can eventually lead us to even higher evaluation scores in relation extraction tasks. Thus, biomedical ontologies play a fundamental role by providing semantic and ancestry information about an entity. The incorporation of biomedical ontologies has already been proved to enhance previous state-of-the-art results.Comment: Artificial Neural Networks book (Springer) - Chapter 1

    Pneumococcal carriage in vaccine-eligible children and unvaccinated infants in Lao PDR two years following the introduction of the 13-valent pneumococcal conjugate vaccine.

    Get PDF
    Pneumococcal carriage is a prerequisite for disease, and underpins herd protection provided by pneumococcal conjugate vaccines (PCVs). There are few data on the impact of PCVs in lower income settings, particularly in Asia. In 2013, the Lao People's Democratic Republic (Lao PDR) introduced 13-valent PCV (PCV13) as a 3 + 0 schedule (doses at 6, 10 and 14 weeks of age) with limited catch-up vaccination. We conducted two cross-sectional carriage surveys (pre- and two years post-PCV) to assess the impact of PCV13 on nasopharyngeal pneumococcal carriage in 5-8 week old infants (n = 1000) and 12-23 month old children (n = 1010). Pneumococci were detected by quantitative real-time PCR, and molecular serotyping was performed using DNA microarray. Post PCV13, there was a 23% relative reduction in PCV13-type carriage in children aged 12-23 months (adjusted prevalence ratio [aPR] 0.77 [0.61-0.96]), and no significant change in non-PCV13 serotype carriage (aPR 1.11 [0.89-1.38]). In infants too young to be vaccinated, there was no significant change in carriage of PCV13 serotypes (aPR 0.74 [0.43-1.27]) or non-PCV13 serotypes (aPR 1.29 [0.85-1.96]), although trends were suggestive of indirect effects. Over 70% of pneumococcal-positive samples contained at least one antimicrobial resistance gene, which were more common in PCV13 serotypes (p < 0.001). In 12-23 month old children, pneumococcal density of both PCV13 serotypes and non-PCV13 serotypes was higher in PCV13-vaccinated compared with undervaccinated children (p = 0.004 and p < 0.001, respectively). This study provides evidence of PCV13 impact on carriage in a population without prior PCV7 utilisation, and provides important data from a lower-middle income setting in Asia. The reductions in PCV13 serotype carriage in vaccine-eligible children are likely to result in reductions in pneumococcal transmission and disease in Lao PDR
    corecore