389,333 research outputs found

    A cascaded approach to normalising gene mentions in biomedical literature

    Get PDF
    Linking gene and protein names mentioned in the literature to unique identifiers in referent genomic databases is an essential step in accessing and integrating knowledge in the biomedical domain. However, it remains a challenging task due to lexical and terminological variation, and ambiguity of gene name mentions in documents. We present a generic and effective rule-based approach to link gene mentions in the literature to referent genomic databases, where pre-processing of both gene synonyms in the databases and gene mentions in text are first applied. The mapping method employs a cascaded approach, which combines exact, exact-like and token-based approximate matching by using flexible representations of a gene synonym dictionary and gene mentions generated during the pre-processing phase. We also consider multi-gene name mentions and permutation of components in gene names. A systematic evaluation of the suggested methods has identified steps that are beneficial for improving either precision or recall in gene name identification. The results of the experiments on the BioCreAtIvE2 data sets (identification of human gene names) demonstrated that our methods achieved highly encouraging results with F-measure of up to 81.20%

    Exact reconstruction of gene regulatory networks using compressive sensing.

    Get PDF
    BackgroundWe consider the problem of reconstructing a gene regulatory network structure from limited time series gene expression data, without any a priori knowledge of connectivity. We assume that the network is sparse, meaning the connectivity among genes is much less than full connectivity. We develop a method for network reconstruction based on compressive sensing, which takes advantage of the network's sparseness.ResultsFor the case in which all genes are accessible for measurement, and there is no measurement noise, we show that our method can be used to exactly reconstruct the network. For the more general problem, in which hidden genes exist and all measurements are contaminated by noise, we show that our method leads to reliable reconstruction. In both cases, coherence of the model is used to assess the ability to reconstruct the network and to design new experiments. We demonstrate that it is possible to use the coherence distribution to guide biological experiment design effectively. By collecting a more informative dataset, the proposed method helps reduce the cost of experiments. For each problem, a set of numerical examples is presented.ConclusionsThe method provides a guarantee on how well the inferred graph structure represents the underlying system, reveals deficiencies in the data and model, and suggests experimental directions to remedy the deficiencies

    Some advances in extensive bridge monitoring using low cost dynamic characterization

    Get PDF
    Dynamic measurements will become a standard for bridge monitoring in the near future. This fact will produce an important cost reduction for maintenance. US Administration has a long term intensive research program in order to diminish the estimated current maintenance cost of US$7 billion per year over 20 years. An optimal intervention maintenance program demands a historical dynamical record, as well as an updated mathematical model of the structure to be monitored. In case that a model of the structure is not actually available it is possible to produce it, however this possibility does not exist for missing measurement records from the past. Current acquisition systems to monitor structures can be made more efficient by introducing the following improvements, under development in the Spanish research Project “Low cost bridge health monitoring by ambient vibration tests using wireless sensors”: (a) a complete wireless system to acquire sensor data, (b) a wireless system that permits the localization and the hardware identification of the whole sensor system. The applied localization system has been object of a recent patent, and (c) automatization of the modal identification process, aimed to diminish human intervention. This system is assembled with cheap components and allows the simultaneous use of a large number of sensors at a low placement cost. The engineer’s intervention is limited to the selection of sensor positions, probably based on a preliminary FE analysis. In case of multiple setups, also the position of a number of fixed reference sensors has to be decided. The wireless localization system will obtain the exact coordinates of all these sensors positions. When the selection of optimal positions is difficult, for example because of the lack of a proper FE model, this can be compensated by using a higher number of measuring (also reference) points. The described low cost acquisition system allows the responsible bridge administration to obtain historical dynamic identification records at reasonable costs that will be used in future maintenance programs. Therefore, due to the importance of the baseline monitoring record of a new bridge, a monitoring test just after its construction should be highly recommended, if not compulsory

    Completing Low-Rank Matrices with Corrupted Samples from Few Coefficients in General Basis

    Full text link
    Subspace recovery from corrupted and missing data is crucial for various applications in signal processing and information theory. To complete missing values and detect column corruptions, existing robust Matrix Completion (MC) methods mostly concentrate on recovering a low-rank matrix from few corrupted coefficients w.r.t. standard basis, which, however, does not apply to more general basis, e.g., Fourier basis. In this paper, we prove that the range space of an m×nm\times n matrix with rank rr can be exactly recovered from few coefficients w.r.t. general basis, though rr and the number of corrupted samples are both as high as O(min⁥{m,n}/log⁥3(m+n))O(\min\{m,n\}/\log^3 (m+n)). Our model covers previous ones as special cases, and robust MC can recover the intrinsic matrix with a higher rank. Moreover, we suggest a universal choice of the regularization parameter, which is λ=1/log⁥n\lambda=1/\sqrt{\log n}. By our ℓ2,1\ell_{2,1} filtering algorithm, which has theoretical guarantees, we can further reduce the computational cost of our model. As an application, we also find that the solutions to extended robust Low-Rank Representation and to our extended robust MC are mutually expressible, so both our theory and algorithm can be applied to the subspace clustering problem with missing values under certain conditions. Experiments verify our theories.Comment: To appear in IEEE Transactions on Information Theor

    Collecting core data in physician-staffed pre-hospital helicopter emergency medical services using a consensus-based template: international multicentre feasibility study in Finland and Norway

    Get PDF
    Background Comparison of services and identification of factors important for favourable patient outcomes in emergency medical services (EMS) is challenging due to different organization and quality of data. The purpose of the present study was to evaluate the feasibility of physician-staffed EMS (p-EMS) to collect patient and system level data by using a consensus-based template. Methods The study was an international multicentre observational study. Data were collected according to a template for uniform reporting of data from p-EMS using two different data collection methods; a standard and a focused data collection method. For the standard data collection, data were extracted retrospectively for one year from all FinnHEMS bases and for the focused data collection, data were collected prospectively for six weeks from four selected Norwegian p-EMS bases. Completeness rates for the two data collection methods were then compared and factors affecting completeness rates and template feasibility were evaluated. Standard Chi-Square, Fisher’s Exact Test and Mann-Whitney U Test were used for group comparison of categorical and continuous data, respectively, and Kolomogorov-Smirnov test for comparison of distributional properties. Results All missions with patient encounters were included, leaving 4437 Finnish and 128 Norwegian missions eligible for analysis. Variable completeness rates indicated that physiological variables were least documented. Information on pain and respiratory rate were the most frequently missing variables with a standard data collection method and systolic blood pressure was the most missing variable with a focused data collection method. Completeness rates were similar or higher when patients were considered severely ill or injured but were lower for missions with short patient encounter. When a focused data collection method was used, completeness rates were higher compared to a standard data collection method. Conclusions We found that a focused data collection method increased data capture compared to a standard data collection method. The concept of using a template for documentation of p-EMS data is feasible in physician-staffed services in Finland and Norway. The greatest deficiencies in completeness rates were evident for physiological parameters. Short missions were associated with lower completeness rates whereas severe illness or injury did not result in reduced data capture.publishedVersio

    On the role of pre and post-processing in environmental data mining

    Get PDF
    The quality of discovered knowledge is highly depending on data quality. Unfortunately real data use to contain noise, uncertainty, errors, redundancies or even irrelevant information. The more complex is the reality to be analyzed, the higher the risk of getting low quality data. Knowledge Discovery from Databases (KDD) offers a global framework to prepare data in the right form to perform correct analyses. On the other hand, the quality of decisions taken upon KDD results, depend not only on the quality of the results themselves, but on the capacity of the system to communicate those results in an understandable form. Environmental systems are particularly complex and environmental users particularly require clarity in their results. In this paper some details about how this can be achieved are provided. The role of the pre and post processing in the whole process of Knowledge Discovery in environmental systems is discussed

    EXACT2: the semantics of biomedical protocols

    Get PDF
    © 2014 Soldatova et al.; licensee BioMed Central. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.This article has been made available through the Brunel Open Access Publishing Fund.Background: The reliability and reproducibility of experimental procedures is a cornerstone of scientific practice. There is a pressing technological need for the better representation of biomedical protocols to enable other agents (human or machine) to better reproduce results. A framework that ensures that all information required for the replication of experimental protocols is essential to achieve reproducibility. Methods: We have developed the ontology EXACT2 (EXperimental ACTions) that is designed to capture the full semantics of biomedical protocols required for their reproducibility. To construct EXACT2 we manually inspected hundreds of published and commercial biomedical protocols from several areas of biomedicine. After establishing a clear pattern for extracting the required information we utilized text-mining tools to translate the protocols into a machine amenable format. We have verified the utility of EXACT2 through the successful processing of previously ‘unseen’ (not used for the construction of EXACT2) protocols. Results: The paper reports on a fundamentally new version EXACT2 that supports the semantically-defined representation of biomedical protocols. The ability of EXACT2 to capture the semantics of biomedical procedures was verified through a text mining use case. In this EXACT2 is used as a reference model for text mining tools to identify terms pertinent to experimental actions, and their properties, in biomedical protocols expressed in natural language. An EXACT2-based framework for the translation of biomedical protocols to a machine amenable format is proposed. Conclusions: The EXACT2 ontology is sufficient to record, in a machine processable form, the essential information about biomedical protocols. EXACT2 defines explicit semantics of experimental actions, and can be used by various computer applications. It can serve as a reference model for for the translation of biomedical protocols in natural language into a semantically-defined format.This work has been partially funded by the Brunel University BRIEF award and a grant from Occams Resources
    • 

    corecore