53,377 research outputs found
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
Pragmatic Ontology Evolution: Reconciling User Requirements and Application Performance
Increasingly, organizations are adopting ontologies to describe their large catalogues of items. These ontologies need to evolve regularly in response to changes in the domain and the emergence of new requirements. An important step of this process is the selection of candidate concepts to include in the new version of the ontology. This operation needs to take into account a variety of factors and in particular reconcile user requirements and application performance. Current ontology evolution methods focus either on ranking concepts according to their relevance or on preserving compatibility with existing applications. However, they do not take in consideration the impact of the ontology evolution process on the performance of computational tasks – e.g., in this work we focus on instance tagging, similarity computation, generation of recommendations, and data clustering. In this paper, we propose the Pragmatic Ontology Evolution (POE) framework, a novel approach for selecting from a group of candidates a set of concepts able to produce a new version of a given ontology that i) is consistent with the a set of user requirements (e.g., max number of concepts in the ontology), ii) is parametrised with respect to a number of dimensions (e.g., topological considerations), and iii) effectively supports relevant computational tasks. Our approach also supports users in navigating the space of possible solutions by showing how certain choices, such as limiting the number of concepts or privileging trendy concepts rather than historical ones, would reflect on the application performance. An evaluation of POE on the real-world scenario of the evolving Springer Nature taxonomy for editorial classification yielded excellent results, demonstrating a significant improvement over alternative approaches
Multiple Retrieval Models and Regression Models for Prior Art Search
This paper presents the system called PATATRAS (PATent and Article Tracking,
Retrieval and AnalysiS) realized for the IP track of CLEF 2009. Our approach
presents three main characteristics: 1. The usage of multiple retrieval models
(KL, Okapi) and term index definitions (lemma, phrase, concept) for the three
languages considered in the present track (English, French, German) producing
ten different sets of ranked results. 2. The merging of the different results
based on multiple regression models using an additional validation set created
from the patent collection. 3. The exploitation of patent metadata and of the
citation structures for creating restricted initial working sets of patents and
for producing a final re-ranking regression model. As we exploit specific
metadata of the patent documents and the citation relations only at the
creation of initial working sets and during the final post ranking step, our
architecture remains generic and easy to extend
Automatic programming methodologies for electronic hardware fault monitoring
This paper presents three variants of Genetic Programming (GP) approaches for intelligent online performance monitoring of electronic circuits and systems. Reliability modeling of electronic circuits can be best performed by the Stressor - susceptibility interaction model. A circuit or a system is considered to be failed once the stressor has exceeded the susceptibility limits. For on-line prediction, validated stressor vectors may be obtained by direct measurements or sensors, which after pre-processing and standardization are fed into the GP models. Empirical results are compared with artificial neural networks trained using backpropagation algorithm and classification and regression trees. The performance of the proposed method is evaluated by comparing the experiment results with the actual failure model values. The developed model reveals that GP could play an important role for future fault monitoring systems.This research was supported by the International Joint Research Grant of the IITA (Institute of Information Technology Assessment) foreign professor invitation program of the MIC (Ministry of Information and Communication), Korea
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE
Systems approaches and algorithms for discovery of combinatorial therapies
Effective therapy of complex diseases requires control of highly non-linear
complex networks that remain incompletely characterized. In particular, drug
intervention can be seen as control of signaling in cellular networks.
Identification of control parameters presents an extreme challenge due to the
combinatorial explosion of control possibilities in combination therapy and to
the incomplete knowledge of the systems biology of cells. In this review paper
we describe the main current and proposed approaches to the design of
combinatorial therapies, including the empirical methods used now by clinicians
and alternative approaches suggested recently by several authors. New
approaches for designing combinations arising from systems biology are
described. We discuss in special detail the design of algorithms that identify
optimal control parameters in cellular networks based on a quantitative
characterization of control landscapes, maximizing utilization of incomplete
knowledge of the state and structure of intracellular networks. The use of new
technology for high-throughput measurements is key to these new approaches to
combination therapy and essential for the characterization of control
landscapes and implementation of the algorithms. Combinatorial optimization in
medical therapy is also compared with the combinatorial optimization of
engineering and materials science and similarities and differences are
delineated.Comment: 25 page
- …