129 research outputs found

    Fusion of Domain Knowledge for Dynamic Learning in Transcriptional Networks

    Get PDF
    A critical challenge of the postgenomic era is to understand how genes are differentially regulated even when they belong to a given network. Because the fundamental mechanism controlling gene expression operates at the level of transcription initiation, computational techniques have been devel oped that identify cis-regulatory features and map such features into differential expression patterns. The fact that such co-regulated genes may be differentially regulated suggests that subtle differences in the shared cis-acting regulatory elements are likely significant. Thus, we carry out an exhaustive description of cis-acting regulatory features including the orientation, location and number of binding sites for a regulatory protein, the presence of binding site submotifs, the class and number of RNA polymerase sites, as well as gene expression data, which is treated as one feature among many. These features, derived from dif ferent domain sources, are analyzed concurrently, and dynamic relations are re cognized to generate profiles, which are groups of promoters sharing common features. We apply this method to probe the regulatory networks governed by the PhoP/PhoQ two-component system in the enteric bacteria Escherichia coli and Salmonella enterica. Our analysis uncovered novel members of the PhoP regulon as and the resulting profiles group genes that share underlying biologi cal that characterize the system kinetics. The predictions were experimentally validated to establish that the PhoP protein uses multiple mechanisms to control gene transcription and is a central element in a highly connected network.Ministerio de Ciencia y TecnologĂ­a BIO2004-0270-

    Building environmentally-aware classifiers on streaming data

    Get PDF
    The three biggest challenges currently faced in machine learning, in our estimation, are the staggering quantity of data we wish to analyze, the incredibly small proportion of these data that are labeled, and the apparent lack of interest in creating algorithms that continually learn during inference. An unsupervised streaming approach addresses all three of these challenges, storing only a finite amount of information to model an unbounded dataset and adapting to new structures as they arise. Specifically, we are motivated by automated target recognition (ATR) in synthetic aperture sonar (SAS) imagery, the problem of finding explosive hazards on the sea oor. It has been shown that the performance of ATR can be improved by, instead of using a single classifier for the entire ATR task, creating several specialized classifers and fusing their predictions [44]. The prevailing opinion seems be that one should have different classifiers for varying complexity of sea oor [74], but we hypothesize that fusing classifiers based on sea bottom type will yield higher accuracy and better lend itself to making explainable classification decisions. The first step of building such a system is developing a robust framework for online texture classification, the topic of this research. xi In this work, we improve upon StreamSoNG [85], an existing algorithm for streaming data analysis (SDA) that models each structure in the data with a neural gas [69] and detects new structures by clustering an outlier list with the possibilistic 1-means [62] (P1M) algorithm. We call the modified algorithm StreamSoNGv2, denoting that it is the second version, or verse, if you will, of StreamSoNG. Notable improvements include detection of arbitrarily-shaped clusters by using DBSCAN [37] instead of P1M, using growing neural gas [43] to model each structure with an adaptive number of prototypes, and an automated approach to estimate the n parameters. Furthermore, we propose a novel algorithm called single-pass possibilistic clustering (SPC) for solving the same task. SPC maintains a fixed number of structures to model the data stream. These structures can be updated and merged based only on their "footprints", that is, summary statistics that contain all of the information from the stream needed by the algorithm without directly maintaining the entire stream. SPC is built on a damped window framework, allowing the user to balance the weight between old and new points in the stream with a decay factor parameter. We evaluate the two algorithms under consideration against four state of the art SDA algorithms from the literature on several synthetic datasets and two texture datasets: one real (KTH-TIPS2b [68]) and xii one simulated. The simulated dataset, a significant research effort in itself, is of our own construction in Unreal Engine and contains on the order of 6,000 images at 720 x 720 resolution from six different texture types. Our hope is that the methodology developed here will be effective texture classifiers for use not only in underwater scene understanding, but also in improving performance of ATR algorithms by providing a context in which the potential target is embedded.Includes bibliographical references

    Handling metadata in the scope of coreference detection in data collections

    Get PDF

    Distributed Random Set Theoretic Soft/Hard Data Fusion

    Get PDF
    Research on multisensor data fusion aims at providing the enabling technology to combine information from several sources in order to form a unifi ed picture. The literature work on fusion of conventional data provided by non-human (hard) sensors is vast and well-established. In comparison to conventional fusion systems where input data are generated by calibrated electronic sensor systems with well-defi ned characteristics, research on soft data fusion considers combining human-based data expressed preferably in unconstrained natural language form. Fusion of soft and hard data is even more challenging, yet necessary in some applications, and has received little attention in the past. Due to being a rather new area of research, soft/hard data fusion is still in a edging stage with even its challenging problems yet to be adequately de fined and explored. This dissertation develops a framework to enable fusion of both soft and hard data with the Random Set (RS) theory as the underlying mathematical foundation. Random set theory is an emerging theory within the data fusion community that, due to its powerful representational and computational capabilities, is gaining more and more attention among the data fusion researchers. Motivated by the unique characteristics of the random set theory and the main challenge of soft/hard data fusion systems, i.e. the need for a unifying framework capable of processing both unconventional soft data and conventional hard data, this dissertation argues in favor of a random set theoretic approach as the first step towards realizing a soft/hard data fusion framework. Several challenging problems related to soft/hard fusion systems are addressed in the proposed framework. First, an extension of the well-known Kalman lter within random set theory, called Kalman evidential filter (KEF), is adopted as a common data processing framework for both soft and hard data. Second, a novel ontology (syntax+semantics) is developed to allow for modeling soft (human-generated) data assuming target tracking as the application. Third, as soft/hard data fusion is mostly aimed at large networks of information processing, a new approach is proposed to enable distributed estimation of soft, as well as hard data, addressing the scalability requirement of such fusion systems. Fourth, a method for modeling trust in the human agents is developed, which enables the fusion system to protect itself from erroneous/misleading soft data through discounting such data on-the-fly. Fifth, leveraging the recent developments in the RS theoretic data fusion literature a novel soft data association algorithm is developed and deployed to extend the proposed target tracking framework into multi-target tracking case. Finally, the multi-target tracking framework is complemented by introducing a distributed classi fication approach applicable to target classes described with soft human-generated data. In addition, this dissertation presents a novel data-centric taxonomy of data fusion methodologies. In particular, several categories of fusion algorithms have been identifi ed and discussed based on the data-related challenging aspect(s) addressed. It is intended to provide the reader with a generic and comprehensive view of the contemporary data fusion literature, which could also serve as a reference for data fusion practitioners by providing them with conducive design guidelines, in terms of algorithm choice, regarding the specifi c data-related challenges expected in a given application

    Informational Paradigm, management of uncertainty and theoretical formalisms in the clustering framework: A review

    Get PDF
    Fifty years have gone by since the publication of the first paper on clustering based on fuzzy sets theory. In 1965, L.A. Zadeh had published “Fuzzy Sets” [335]. After only one year, the first effects of this seminal paper began to emerge, with the pioneering paper on clustering by Bellman, Kalaba, Zadeh [33], in which they proposed a prototypal of clustering algorithm based on the fuzzy sets theory
    • …
    corecore