10,177 research outputs found

    BIKE: Bilingual Keyphrase Experiments

    Get PDF
    This paper presents a novel strategy for translating lists of keyphrases. Typical keyphrase lists appear in scientific articles, information retrieval systems and web page meta-data. Our system combines a statistical translation model trained on a bilingual corpus of scientific papers with sense-focused look-up in a large bilingual terminological resource. For the latter, we developed a novel technique that benefits from viewing the keyphrase list as contextual help for sense disambiguation. The optimal combination of modules was discovered by a genetic algorithm. Our work applies to the French / English language pair

    Software tools for conducting bibliometric analysis in science: An up-to-date review

    Get PDF
    Bibliometrics has become an essential tool for assessing and analyzing the output of scientists, cooperation between universities, the effect of state-owned science funding on national research and development performance and educational efficiency, among other applications. Therefore, professionals and scientists need a range of theoretical and practical tools to measure experimental data. This review aims to provide an up-to-date review of the various tools available for conducting bibliometric and scientometric analyses, including the sources of data acquisition, performance analysis and visualization tools. The included tools were divided into three categories: general bibliometric and performance analysis, science mapping analysis, and libraries; a description of all of them is provided. A comparative analysis of the database sources support, pre-processing capabilities, analysis and visualization options were also provided in order to facilitate its understanding. Although there are numerous bibliometric databases to obtain data for bibliometric and scientometric analysis, they have been developed for a different purpose. The number of exportable records is between 500 and 50,000 and the coverage of the different science fields is unequal in each database. Concerning the analyzed tools, Bibliometrix contains the more extensive set of techniques and suitable for practitioners through Biblioshiny. VOSviewer has a fantastic visualization and is capable of loading and exporting information from many sources. SciMAT is the tool with a powerful pre-processing and export capability. In views of the variability of features, the users need to decide the desired analysis output and chose the option that better fits into their aims

    Queensland University of Technology at TREC 2005

    Get PDF
    The Information Retrieval and Web Intelligence (IR-WI) research group is a research team at the Faculty of Information Technology, QUT, Brisbane, Australia. The IR-WI group participated in the Terabyte and Robust track at TREC 2005, both for the first time. For the Robust track we applied our existing information retrieval system that was originally designed for use with structured (XML) retrieval to the domain of document retrieval. For the Terabyte track we experimented with an open source IR system, Zettair and performed two types of experiments. First, we compared Zettair’s performance on both a high-powered supercomputer and a distributed system across seven midrange personal computers. Second, we compared Zettair’s performance when a standard TREC title is used, compared with a natural language query, and a query expanded with synonyms. We compare the systems both in terms of efficiency and retrieval performance. Our results indicate that the distributed system is faster than the supercomputer, while slightly decreasing retrieval performance, and that natural language queries also slightly decrease retrieval performance, while our query expansion technique significantly decreased performance

    Figuring the Plural

    Get PDF
    This report is an examination of ethnocultural, or ethnically/culturally specific, arts organizations in Canada and the United States.As our societies rapidly diversify and we seek to negotiate our increasingly complex national identities, these organizations possess enormous potential to assist in this process for they serve as cultural advocates, cultural interpreters, facilitators of cross-cultural understanding and communication keepers of ethnic tradition, and/or sites where prejudice is exposed and challenged

    An authoring tool for decision support systems in context questions of ecological knowledge

    Get PDF
    Decision support systems (DSS) support business or organizational decision-making activities, which require the access to information that is internally stored in databases or data warehouses, and externally in the Web accessed by Information Retrieval (IR) or Question Answering (QA) systems. Graphical interfaces to query these sources of information ease to constrain dynamically query formulation based on user selections, but they present a lack of flexibility in query formulation, since the expressivity power is reduced to the user interface design. Natural language interfaces (NLI) are expected as the optimal solution. However, especially for non-expert users, a real natural communication is the most difficult to realize effectively. In this paper, we propose an NLI that improves the interaction between the user and the DSS by means of referencing previous questions or their answers (i.e. anaphora such as the pronoun reference in “What traits are affected by them?”), or by eliding parts of the question (i.e. ellipsis such as “And to glume colour?” after the question “Tell me the QTLs related to awn colour in wheat”). Moreover, in order to overcome one of the main problems of NLIs about the difficulty to adapt an NLI to a new domain, our proposal is based on ontologies that are obtained semi-automatically from a framework that allows the integration of internal and external, structured and unstructured information. Therefore, our proposal can interface with databases, data warehouses, QA and IR systems. Because of the high NL ambiguity of the resolution process, our proposal is presented as an authoring tool that helps the user to query efficiently in natural language. Finally, our proposal is tested on a DSS case scenario about Biotechnology and Agriculture, whose knowledge base is the CEREALAB database as internal structured data, and the Web (e.g. PubMed) as external unstructured information.This paper has been partially supported by the MESOLAP (TIN2010-14860), GEODAS-BI (TIN2012-37493-C03-03), LEGOLANGUAGE (TIN2012-31224) and DIIM2.0 (PROMETEOII/2014/001) projects from the Spanish Ministry of Education and Competitivity. Alejandro Maté is funded by the Generalitat Valenciana under an ACIF grant (ACIF/2010/298)

    Stigmergic hyperlink's contributes to web search

    Get PDF
    Stigmergic hyperlinks are hyperlinks with a "heart beat": if used they stay healthy and online; if neglected, they fade, eventually getting replaced. Their life attribute is a relative usage measure that regular hyperlinks do not provide, hence PageRank-like measures have historically been well informed about the structure of webs of documents, but unaware of what users effectively do with the links. This paper elaborates on how to input the users’ perspective into Google’s original, structure centric, PageRank metric. The discussion then bridges to the Deep Web, some search challenges, and how stigmergic hyperlinks could help decentralize the search experience, facilitating user generated search solutions and supporting new related business models.info:eu-repo/semantics/publishedVersio

    Comparison of articulate brachiopod nuclear and mitochondrial gene trees leads to a clade-based redefinition of protostomes (Protostomozoa) and deuterostomes (Deuterostomozoa)

    Get PDF
    Nuclear and mtDNA sequences from selected short-looped terebratuloid (terebratulacean) articulate brachiopods yield congruent and genetically independent phylogenetic reconstructions by parsimony, neighbor-joining and maximum likelihood methods, suggesting that both sources of data are reliable guides to brachiopod species phylogeny. The present-day genealogical relationships and geographical distributions of the tested terebratuloid brachiopods are consistent with a tethyan dispersal and subsequent radiation. Concordance of nuclear and mitochondrial gene phylogenies reinforces previous indications that articulate brachiopods, inarticulate brachiopods, phoronids and ectoprocts cluster with other organisms generally regarded as protostomes. Since ontogeny and morphology in brachiopods, ectoprocts and phoronids depart in important respects from those features supposedly diagnostic of protostomes, this demonstrates that the operational definition of protostomy by the usual ontological characters must be misleading or unreliable. New, molecular, operational definitions are proposed to replace the traditional criteria for the recognition of protostomes and deuterostomes, and the clade-based terms 'Protostomozoa' and 'Deuterostomozoa' are proposed to replace the existing terms 'Protostomia' and 'Deuterostomia'

    Developing search strategies for clinical practice guidelines in SUMSearch and Google Scholar and assessing their retrieval performance

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Information overload, increasing time constraints, and inappropriate search strategies complicate the detection of clinical practice guidelines (CPGs). The aim of this study was to provide clinicians with recommendations for search strategies to efficiently identify relevant CPGs in SUMSearch and Google Scholar.</p> <p>Methods</p> <p>We compared the retrieval efficiency (retrieval performance) of search strategies to identify CPGs in SUMSearch and Google Scholar. For this purpose, a two-term GLAD (GuideLine And Disease) strategy was developed, combining a defined CPG term with a specific disease term (MeSH term). We used three different CPG terms and nine MeSH terms for nine selected diseases to identify the most efficient GLAD strategy for each search engine. The retrievals for the nine diseases were pooled. To compare GLAD strategies, we used a manual review of all retrievals as a reference standard. The CPGs detected had to fulfil predefined criteria, e.g., the inclusion of therapeutic recommendations. Retrieval performance was evaluated by calculating so-called diagnostic parameters (sensitivity, specificity, and "Number Needed to Read" [NNR]) for search strategies.</p> <p>Results</p> <p>The search yielded a total of 2830 retrievals; 987 (34.9%) in Google Scholar and 1843 (65.1%) in SUMSearch. Altogether, we found 119 unique and relevant guidelines for nine diseases (reference standard). Overall, the GLAD strategies showed a better retrieval performance in SUMSearch than in Google Scholar. The performance pattern between search engines was similar: search strategies including the term "guideline" yielded the highest sensitivity (SUMSearch: 81.5%; Google Scholar: 31.9%), and search strategies including the term "practice guideline" yielded the highest specificity (SUMSearch: 89.5%; Google Scholar: 95.7%), and the lowest NNR (SUMSearch: 7.0; Google Scholar: 9.3).</p> <p>Conclusion</p> <p>SUMSearch is a useful tool to swiftly gain an overview of available CPGs. Its retrieval performance is superior to that of Google Scholar, where a search is more time consuming, as substantially more retrievals have to be reviewed to detect one relevant CPG. In both search engines, the CPG term "guideline" should be used to obtain a comprehensive overview of CPGs, and the term "practice guideline" should be used if a less time consuming approach for the detection of CPGs is desired.</p

    The Porter stemming algorithm: then and now

    Get PDF
    Purpose: In 1980, Porter presented a simple algorithm for stemming English language words. This paper summarises the main features of the algorithm, and highlights its role not just in modern information retrieval research, but also in a range of related subject domains. Design: Review of literature and research involving use of the Porter algorithm. Findings: The algorithm has been widely adopted and extended so that it has become the standard approach to word conflation for information retrieval in a wide range of languages. Value: The 1980 paper in Program by Porter describing his algorithm has been highly cited. This paper provides a context for the original paper as well as an overview of its subsequent use
    corecore