1,380 research outputs found

    Information Retrieval Performance Enhancement Using The Average Standard Estimator And The Multi-criteria Decision Weighted Set

    Get PDF
    Information retrieval is much more challenging than traditional small document collection retrieval. The main difference is the importance of correlations between related concepts in complex data structures. These structures have been studied by several information retrieval systems. This research began by performing a comprehensive review and comparison of several techniques of matrix dimensionality estimation and their respective effects on enhancing retrieval performance using singular value decomposition and latent semantic analysis. Two novel techniques have been introduced in this research to enhance intrinsic dimensionality estimation, the Multi-criteria Decision Weighted model to estimate matrix intrinsic dimensionality for large document collections and the Average Standard Estimator (ASE) for estimating data intrinsic dimensionality based on the singular value decomposition (SVD). ASE estimates the level of significance for singular values resulting from the singular value decomposition. ASE assumes that those variables with deep relations have sufficient correlation and that only those relationships with high singular values are significant and should be maintained. Experimental results over all possible dimensions indicated that ASE improved matrix intrinsic dimensionality estimation by including the effect of both singular values magnitude of decrease and random noise distracters. Analysis based on selected performance measures indicates that for each document collection there is a region of lower dimensionalities associated with improved retrieval performance. However, there was clear disagreement between the various performance measures on the model associated with best performance. The introduction of the multi-weighted model and Analytical Hierarchy Processing (AHP) analysis helped in ranking dimensionality estimation techniques and facilitates satisfying overall model goals by leveraging contradicting constrains and satisfying information retrieval priorities. ASE provided the best estimate for MEDLINE intrinsic dimensionality among all other dimensionality estimation techniques, and further, ASE improved precision and relative relevance by 10.2% and 7.4% respectively. AHP analysis indicates that ASE and the weighted model ranked the best among other methods with 30.3% and 20.3% in satisfying overall model goals in MEDLINE and 22.6% and 25.1% for CRANFIELD. The weighted model improved MEDLINE relative relevance by 4.4%, while the scree plot, weighted model, and ASE provided better estimation of data intrinsic dimensionality for CRANFIELD collection than Kaiser-Guttman and Percentage of variance. ASE dimensionality estimation technique provided a better estimation of CISI intrinsic dimensionality than all other tested methods since all methods except ASE tend to underestimate CISI document collection intrinsic dimensionality. ASE improved CISI average relative relevance and average search length by 28.4% and 22.0% respectively. This research provided evidence supporting a system using a weighted multi-criteria performance evaluation technique resulting in better overall performance than a single criteria ranking model. Thus, the weighted multi-criteria model with dimensionality reduction provides a more efficient implementation for information retrieval than using a full rank model

    Information retrieval and text mining technologies for chemistry

    Get PDF
    Efficient access to chemical information contained in scientific literature, patents, technical reports, or the web is a pressing need shared by researchers and patent attorneys from different chemical disciplines. Retrieval of important chemical information in most cases starts with finding relevant documents for a particular chemical compound or family. Targeted retrieval of chemical documents is closely connected to the automatic recognition of chemical entities in the text, which commonly involves the extraction of the entire list of chemicals mentioned in a document, including any associated information. In this Review, we provide a comprehensive and in-depth description of fundamental concepts, technical implementations, and current technologies for meeting these information demands. A strong focus is placed on community challenges addressing systems performance, more particularly CHEMDNER and CHEMDNER patents tasks of BioCreative IV and V, respectively. Considering the growing interest in the construction of automatically annotated chemical knowledge bases that integrate chemical information and biological data, cheminformatics approaches for mapping the extracted chemical names into chemical structures and their subsequent annotation together with text mining applications for linking chemistry with biological information are also presented. Finally, future trends and current challenges are highlighted as a roadmap proposal for research in this emerging field.A.V. and M.K. acknowledge funding from the European Community’s Horizon 2020 Program (project reference: 654021 - OpenMinted). M.K. additionally acknowledges the Encomienda MINETAD-CNIO as part of the Plan for the Advancement of Language Technology. O.R. and J.O. thank the Foundation for Applied Medical Research (FIMA), University of Navarra (Pamplona, Spain). This work was partially funded by Consellería de Cultura, Educación e Ordenación Universitaria (Xunta de Galicia), and FEDER (European Union), and the Portuguese Foundation for Science and Technology (FCT) under the scope of the strategic funding of UID/BIO/04469/2013 unit and COMPETE 2020 (POCI-01-0145-FEDER-006684). We thank Iñigo Garciá -Yoldi for useful feedback and discussions during the preparation of the manuscript.info:eu-repo/semantics/publishedVersio

    Relating patenting and peer-review publications: an extended perspective on the vascular health and risk management literature

    Get PDF
    Hermann AM MuckeHM Pharma Consultancy, Vienna, AustriaPurpose: This investigation identifies patent applications published under the international Patent Convention Treaty between July 2010 and January 2011 in three significant fields of vascular risk management (arterial hypertension, atherosclerosis, and aneurysms) and investigates whether the inventors have also published peer reviewed papers directly describing their claimed invention.Results: Out of only 48 patent documents that specifically addressed at least one of the above-mentioned fields, 15 had immediate companion papers of which 13 were published earlier than the corresponding patent applications; the majority of these papers were published by noncorporate patentees. Although the majority of patent applications (30 documents) had at least one corporate assignee, 18 came from academic environments. As expected, medical devices dominated in the aneurysm segment while pharmacology dominated hypertension and atherosclerosis.Conclusion: Although information related to hypertension, atherosclerosis, or aneurysms that was claimed in international patent applications reached the public quicker through the corresponding peer review document if one was published, more than two-thirds of the patent applications had no such companion paper in a scientific journal. The patent literature, which is freely available online as full text, offers information to scientists and developers in the fields of vascular risk management that is not available from the peer reviewed literature.Keywords: hypertension, atherosclerosis, aneurysm, patents as topic, publishin

    Special Libraries, September 1976

    Get PDF
    Volume 67, Issue 9https://scholarworks.sjsu.edu/sla_sl_1976/1007/thumbnail.jp

    Towards exploratory faceted search systems

    Get PDF
    In this thesis, we cover what we believe would be the main ingredients of an exploratory search system (ESS). In a nutshell, these are textual queries, facets, visual results, social search and query-by-example. The goal of the thesis is to show how all of these elements could readily be integrated into a typical faceted search system that users are already accustomed to. In this respect, we propose that the future of exploratory search might be a traditional faceted search system, but with the added ingredients of information visualizations and query-by-example. To illustrate our ideas we have built two freely available web applications. The first one, Biomed Search, has been positively received by the community and offers some novel characteristics. First, in order to improve on both precision and recall, Biomed Search indexes not only the text caption but also the text that refers to the image. Second, the interface uses a common pattern of zooming in on a particular search result in order to display more information. User feedback on Biomed Search has hinted towards faceted search, visual search results and query-by-example. The second system, Cloud Mining, is an attempt at implementing the vision set forth in this thesis. The system is a framework used to instantiate ESSs. It offers the novel characteristics of facet views as well as multiple-item based searches combined with textual queries. Cloud Mining paves the way to a completely pluggable search framework, in which every component would be driven by a community of users. The system was tested on large publicly available datasets and all its software components are available under an open source license. The main contributions of this thesis come as lessons learned, suggestions or recommendations as to how to extend the current paradigm of faceted search into the one of exploratory search. The search results and facets should be extended with different views. Query by example should be integrated with Bayesian Sets as it reduces the handling of complex content based searches to choosing the right plugin. Finally, the system should be thought as a framework to instantiate ESSs, in which every one of its component is a community driven plugin. These customized tailored tools, when applied to a dataset of interest, could offer a collective intelligence approach to information overload
    corecore