142 research outputs found

    Tagging Scientific Publications using Wikipedia and Natural Language Processing Tools. Comparison on the ArXiv Dataset

    Full text link
    In this work, we compare two simple methods of tagging scientific publications with labels reflecting their content. As a first source of labels Wikipedia is employed, second label set is constructed from the noun phrases occurring in the analyzed corpus. We examine the statistical properties and the effectiveness of both approaches on the dataset consisting of abstracts from 0.7 million of scientific documents deposited in the ArXiv preprint collection. We believe that obtained tags can be later on applied as useful document features in various machine learning tasks (document similarity, clustering, topic modelling, etc.)

    Deriving query suggestions for site search

    Get PDF
    Modern search engines have been moving away from simplistic interfaces that aimed at satisfying a user's need with a single-shot query. Interactive features are now integral parts of web search engines. However, generating good query modification suggestions remains a challenging issue. Query log analysis is one of the major strands of work in this direction. Although much research has been performed on query logs collected on the web as a whole, query log analysis to enhance search on smaller and more focused collections has attracted less attention, despite its increasing practical importance. In this article, we report on a systematic study of different query modification methods applied to a substantial query log collected on a local website that already uses an interactive search engine. We conducted experiments in which we asked users to assess the relevance of potential query modification suggestions that have been constructed using a range of log analysis methods and different baseline approaches. The experimental results demonstrate the usefulness of log analysis to extract query modification suggestions. Furthermore, our experiments demonstrate that a more fine-grained approach than grouping search requests into sessions allows for extraction of better refinement terms from query log files. © 2013 ASIS&T

    Inclusion, Contrast and Polysemy in Dictionaries: The Relationship between Theory, Language Use and Lexicographic Practice

    Get PDF
    This paper explores the lexicographic representation of a type of polysemy that arises when the meaning of one lexical item can either include or contrast with the meaning of another, as in the case of dog/bitch, shoe/boot, finger/thumb and animal/bird. A survey of how such pairs are represented in monolingual English dictionaries showed that dictionaries mostly represent as explicitly polysemous those lexical items whose broader and narrower readings are more distinctive and clearly separable in definitional terms. They commonly only represented the broader readings for terms that are in fact frequently used in the narrower reading, as shown by data from the British National Corpus

    Using terminology extraction techniques for improving traceability from formal models to textual requirements

    Get PDF
    cerbah2000aInternational audienceThis article deals with traceability in sotfware engineering. More precisely, we concentrate on the role of terminological knowledge the mapping between (informal) textual requirements and (formal) object models. We show that terminological knowledge facilitates production of traceability links, provided that language processing technologies allow to elaborate semi-automatically the required terminological resources. The presented system is one step towards incremental formalization from textual knowledge

    ASCOT: a text mining-based web-service for efficient search and assisted creation of clinical trials

    Get PDF
    Clinical trials are mandatory protocols describing medical research on humans and among the most valuable sources of medical practice evidence. Searching for trials relevant to some query is laborious due to the immense number of existing protocols. Apart from search, writing new trials includes composing detailed eligibility criteria, which might be time-consuming, especially for new researchers. In this paper we present ASCOT, an efficient search application customised for clinical trials. ASCOT uses text mining and data mining methods to enrich clinical trials with metadata, that in turn serve as effective tools to narrow down search. In addition, ASCOT integrates a component for recommending eligibility criteria based on a set of selected protocols

    Automatic term identification for bibliometric mapping

    Get PDF
    A term map is a map that visualizes the structure of a scientific field by showing the relations between important terms in the field. The terms shown in a term map are usually selected manually with the help of domain experts. Manual term selection has the disadvantages of being subjective and labor-intensive. To overcome these disadvantages, we propose a methodology for automatic term identification and we use this methodology to select the terms to be included in a term map. To evaluate the proposed methodology, we use it to construct a term map of the field of operations research. The quality of the map is assessed by a number of operations research experts. It turns out that in general the proposed methodology performs quite well

    Mapping product and service innovation: A bibliometric analysis and a typology

    Get PDF
    Research conducted in the innovation field lags behind organizations’ general technological development and innovativeness. Literature that previously depicted innovation types in developed markets is markedly different from progressively publicized emerging market innovation types. While capital-abundant firms tend to engage in respective pioneering and incremental innovation loops, resource-constrained firms and firms in emerging countries may partially free-ride on existing products and services through innovations such as copycat and frugal. To date, there have been no attempts to holistically consolidate product and service innovation types into one overarching typology. Using novel methods of text mining and co-citation analysis, this study systematically maps three decades of product and service innovation scholarship to provide a typology of eight major product and service innovation types. This is further supported by case study analysis to demonstrate how these innovation types fit into the cost vs market novelty matrix. This study is unique in its methodological proposition to systematically review the innovation scholarship of more than 1,400 articles through comprehensive, quantified, and objective methods that offer transparent and reproducible results. The study provides some clarity regarding the classifications and characteristics of the innovation typology

    Benchmarking Ontologies: Bigger or Better?

    Get PDF
    A scientific ontology is a formal representation of knowledge within a domain, typically including central concepts, their properties, and relations. With the rise of computers and high-throughput data collection, ontologies have become essential to data mining and sharing across communities in the biomedical sciences. Powerful approaches exist for testing the internal consistency of an ontology, but not for assessing the fidelity of its domain representation. We introduce a family of metrics that describe the breadth and depth with which an ontology represents its knowledge domain. We then test these metrics using (1) four of the most common medical ontologies with respect to a corpus of medical documents and (2) seven of the most popular English thesauri with respect to three corpora that sample language from medicine, news, and novels. Here we show that our approach captures the quality of ontological representation and guides efforts to narrow the breach between ontology and collective discourse within a domain. Our results also demonstrate key features of medical ontologies, English thesauri, and discourse from different domains. Medical ontologies have a small intersection, as do English thesauri. Moreover, dialects characteristic of distinct domains vary strikingly as many of the same words are used quite differently in medicine, news, and novels. As ontologies are intended to mirror the state of knowledge, our methods to tighten the fit between ontology and domain will increase their relevance for new areas of biomedical science and improve the accuracy and power of inferences computed across them

    Measurement as relational, intensive and inclusive: Towards a ‘minor’ mathematics

    Get PDF
    Minor mathematics refers to the mathematical practices that are often erased by state-sanctioned curricular images of mathematics. We use the idea of a minor mathematics to explore alternative measurement practices. We argue that minor measurement practices have been buried by a ‘major’ settler mathematics, a process of erasure that distributes ‘sensibility’ and formulates conditions of mathematics dis/ability. We emphasize how measuring involves the making and mixing of analogies, and that this involves attending to intensive relationships rather than extensive properties. Our philosophical and historical approach moves from the archeological origins of human measurement activity, to pivotal developments in modern mathematics, to configurations of curriculum. We argue that the project of proliferating multiple mathematics is required in order to disturb narrow (and perhaps white, western, male) images of mathematics—and to open up opportunities for a more pluralist and inclusive school mathematics
    corecore