25 research outputs found

    A context vector model for information retrieval

    Get PDF
    In the vector space model for information retrieval, term vectors are pair-wise orthogonal, that is, terms are assumed to be independent. It is well known that this assumption is too restrictive. In this article, we present our work on an indexing and retrieval method that, based on the vector space model, incorporates term dependencies and thus obtains semantically richer representations of documents. First, we generate term context vectors based on the co-occurrence of terms in the same documents. These vectors are used to calculate context vectors for documents. We present different techniques for estimating the dependencies among terms. We also define term weights that can be employed in the model. Experimental results on four text collections (MED, CRANFIELD, CISI, and CACM) show that the incorporation of term dependencies in the retrieval process performs statistically significantly better than the classical vector space model with IDF weights. We also show that the degree of semantic matching versus direct word matching that performs best varies on the four collections. We conclude that the model performs well for certain types of queries and, generally, for information tasks with high recall requirements. Therefore, we propose the use of the context vector model in combination with other, direct word-matching methods.Publicad

    A review on machine learning approaches and trends in drug discovery

    Get PDF
    Abstract: Drug discovery aims at finding new compounds with specific chemical properties for the treatment of diseases. In the last years, the approach used in this search presents an important component in computer science with the skyrocketing of machine learning techniques due to its democratization. With the objectives set by the Precision Medicine initiative and the new challenges generated, it is necessary to establish robust, standard and reproducible computational methodologies to achieve the objectives set. Currently, predictive models based on Machine Learning have gained great importance in the step prior to preclinical studies. This stage manages to drastically reduce costs and research times in the discovery of new drugs. This review article focuses on how these new methodologies are being used in recent years of research. Analyzing the state of the art in this field will give us an idea of where cheminformatics will be developed in the short term, the limitations it presents and the positive results it has achieved. This review will focus mainly on the methods used to model the molecular data, as well as the biological problems addressed and the Machine Learning algorithms used for drug discovery in recent years.Instituto de Salud Carlos III; PI17/01826Instituto de Salud Carlos III; PI17/01561Xunta de Galicia; Ref. ED431D 2017/16Xunta de Galicia; Ref. ED431D 2017/23Xunta de Galicia; Ref. ED431C 2018/4

    A method for automatically extracting infectious disease-related primers and probes from the literature

    Get PDF
    BACKGROUND: Primer and probe sequences are the main components of nucleic acid-based detection systems. Biologists use primers and probes for different tasks, some related to the diagnosis and prescription of infectious diseases. The biological literature is the main information source for empirically validated primer and probe sequences. Therefore, it is becoming increasingly important for researchers to navigate this important information. In this paper, we present a four-phase method for extracting and annotating primer/probe sequences from the literature. These phases are: (1) convert each document into a tree of paper sections, (2) detect the candidate sequences using a set of finite state machine-based recognizers, (3) refine problem sequences using a rule-based expert system, and (4) annotate the extracted sequences with their related organism/gene information. RESULTS: We tested our approach using a test set composed of 297 manuscripts. The extracted sequences and their organism/gene annotations were manually evaluated by a panel of molecular biologists. The results of the evaluation show that our approach is suitable for automatically extracting DNA sequences, achieving precision/recall rates of 97.98% and 95.77%, respectively. In addition, 76.66% of the detected sequences were correctly annotated with their organism name. The system also provided correct gene-related information for 46.18% of the sequences assigned a correct organism name. CONCLUSIONS: We believe that the proposed method can facilitate routine tasks for biomedical researchers using molecular methods to diagnose and prescribe different infectious diseases. In addition, the proposed method can be expanded to detect and extract other biological sequences from the literature. The extracted information can also be used to readily update available primer/probe databases or to create new databases from scratch.The present work has been funded, in part, by the European Commission through the ACGT integrated project (FP6-2005-IST-026996) and the ACTION-Grid support action (FP7-ICT-2007-2-224176), the Spanish Ministry of Science and Innovation through the OntoMineBase project (ref. TSI2006-13021-C02-01), the ImGraSec project (ref. TIN2007-61768), FIS/AES PS09/00069 and COMBIOMED-RETICS, and the Comunidad de Madrid, Spain.S

    Carbon Nanotubes’ Effect on Mitochondrial Oxygen Flux Dynamics: Polarography Experimental Study and Machine Learning Models using Star Graph Trace Invariants of Raman Spectra

    Get PDF
    [Abstract] This study presents the impact of carbon nanotubes (CNTs) on mitochondrial oxygen mass flux (Jm) under three experimental conditions. New experimental results and a new methodology are reported for the first time and they are based on CNT Raman spectra star graph transform (spectral moments) and perturbation theory. The experimental measures of Jm showed that no tested CNT family can inhibit the oxygen consumption profiles of mitochondria. The best model for the prediction of Jm for other CNTs was provided by random forest using eight features, obtaining test R-squared (R2) of 0.863 and test root-mean-square error (RMSE) of 0.0461. The results demonstrate the capability of encoding CNT information into spectral moments of the Raman star graphs (SG) transform with a potential applicability as predictive tools in nanotechnology and material risk assessmentsInstituto de Salud Carlos III; PI13/02020Instituto de Salud Carlos III; PI13/00280Galicia. ConsellerĂ­a de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; R2014/025Galicia. ConsellerĂ­a de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; GRC2014/049Galicia. ConsellerĂ­a de Cultura, EducaciĂłn e OrdenaciĂłn Universitaria; R2014/039Ministerio de EconomĂ­a y Competitividad; UNLC08-1E-002Ministerio de EconomĂ­a y Competitividad ; UNLC13-13-3503Ministerio de EconomĂ­a y Competitividad; CTQ2016-74881-PPaĂ­s Vasco.Gobierno; IT1045-16Brasil. Conselho Nacional de Desenvolvimento CientĂ­fico e TecnolĂłgico; 308539/2016-8Brasil. Conselho Nacional de Desenvolvimento CientĂ­fico e TecnolĂłgico; 454332/2014-

    A region-based interpolation method for mosaic images

    No full text
    Image interpolation is an important operation in some image analysis and processing applications. This paper describes a region-based interpolation method for mosaic images. It is an extension of a previously presented interpolation technique based on morphological operations for binary images. Even though the principles used in this method are similar to those used in the technique for binary images, there are some substantial differences, pointed out in this paper, due the nature of images treated. Some experimental results are provided

    Challenges for future intelligent systems in biomedicine

    Full text link
    ABSTRACT This special issue is an example of recent proposals to bring together and exchange AI initiatives that address both medical and biological issues and problems that need innovative solutions. Similar collaborative efforts are being launched by international institutions,such as the European Commission – e.g., via the BIOINFOMED project – and US organizations such as the American Medical Informatics Associations and the American College of Medical Informatics

    Bioinformatics: towards new directions for public health*

    Full text link
    SUMMARY Objectives: Epidemiologists are reformulating their classical approaches to diseases by considering variousissues associated to “omics” areas and technologies. Traditional differences between epidemiology and genetics include background, training, terminologies,study designs and others. Public health and epidemiology are increasingly looking forward to using methodologies and informatics tools, facilitated by the Bioinformatics community, for managing genomic information. Our aim is to describe which are the most important implications related with the increasing use of genomic information for public health practice, research and education. To review the contribution of bioinformatics to these issues, in terms of providing the methods and tools needed for processing genetic information from pathogens and patients. To analyze the research challenges in biomedical informatics related with the need of integration of clinical, environmental and genetic data and the new scenarios arisen in public health. Methods: Review of the literature, Internet resources and material and reports generated by internal and external research projects. Results: New developments are needed to advance in the study of the interactions between environmental agents and genetic factors involved in the development of diseases. The use of biomarkers, biobanks, and integrated genomic/clinical databases poses serious challenges for informaticians in order to extract useful information and knowledge for public health, biomedical research and healthcare. Conclusions: From an informatics perspective, integrated medical/biological ontologies and new semantic-based models for managing information provide new challenges for research in areas such as genetic epidemiology and the “omics” disciplines, among others. In this regard, there are various ethical, privacy, informed consent and social implications, that should be carefully addressed by researchers, practitioners and policy makers

    Learning retrieval expert combinations with genetic algorithms

    No full text
    The goal of information retrieval (IR) is to provide models and systems that help users to identify the relevant documents to their information needs. Extensive research has been carried out to develop retrieval methods that solve this goal. These IR techniques range from purely syntax-based, considering only frequencies of words, to more semantics-aware approaches. However, it seems clear that there is no single method that works equally well on all collections and for all queries. Prior work suggests that combining the evidence from multiple retrieval experts can achieve significant improvements in retrieval effectiveness. A common problem of expert combination approaches is the selection of both the experts to be combined and the combination function. In most studies the experts are selected from a rather small set of candidates using some heuristics. Thus, only a reduced number of possible combinations is considered and other possibly better solutions are left out. In this paper we propose the use of genetic algorithms to find a suboptimal combination of experts for a document collection at hand. Our approach automatically determines both the experts to be combined and the parameters of the combination function. Because we learn this combination for each specific document collection, this approach allows us to automatically adjust the IR system to specific user needs. To learn retrieval strategies that generalize well on new queries we propose a fitness function that is based on the statistical significance of the average precision obtained on a set of training queries. We test and evaluate the approach on four classical text collections. The results show that the learned combination strategies perform better than any of the individual methods and that genetic algorithms provide a viable method to learn expert combinations. The experiments also evaluate the use of a semantic indexing approach, the context vector model, in combination with classical word matching techniques.Publicad

    New results on the theory of morphological filters by reconstruction

    Full text link
    ABSTRACT This paper treats the problem of establishing bounds for the morphological filter by reconstruction class. Morphological filters by reconstruction, which are composed of openings and closings by reconstruction, are useful filters for image processing because they do not introduce discontinuities. The main contributions of this paper are: (a) To establish when the combination of openings by reconstruction (or, respectively, of closings by reconstruction) is an opening by reconstruction (respectively a closing by reconstruction). (b) To establish, for any filter by reconstruction, upper and lower bounds that are, respectively, a closing by reconstruction and an opening by reconstruction. In addition, the paper investigates certain aspects of filters by reconstruction that possess a robustness property called strong property. Some dual and equivalent forms are introduced for a family of multi-level filters recently introduced. A significant side-result is to determine some instances of connected openings composed by openings and closings by reconstruction that are not openings by reconstruction (similarly for closings)
    corecore