1,193 research outputs found

    Which clustering algorithm is better for predicting protein complexes?

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Protein-Protein interactions (PPI) play a key role in determining the outcome of most cellular processes. The correct identification and characterization of protein interactions and the networks, which they comprise, is critical for understanding the molecular mechanisms within the cell. Large-scale techniques such as pull down assays and tandem affinity purification are used in order to detect protein interactions in an organism. Today, relatively new high-throughput methods like yeast two hybrid, mass spectrometry, microarrays, and phage display are also used to reveal protein interaction networks.</p> <p>Results</p> <p>In this paper we evaluated four different clustering algorithms using six different interaction datasets. We parameterized the MCL, Spectral, RNSC and Affinity Propagation algorithms and applied them to six PPI datasets produced experimentally by Yeast 2 Hybrid (Y2H) and Tandem Affinity Purification (TAP) methods. The predicted clusters, so called protein complexes, were then compared and benchmarked with already known complexes stored in published databases.</p> <p>Conclusions</p> <p>While results may differ upon parameterization, the MCL and RNSC algorithms seem to be more promising and more accurate at predicting PPI complexes. Moreover, they predict more complexes than other reviewed algorithms in absolute numbers. On the other hand the spectral clustering algorithm achieves the highest valid prediction rate in our experiments. However, it is nearly always outperformed by both RNSC and MCL in terms of the geometrical accuracy while it generates the fewest valid clusters than any other reviewed algorithm. This article demonstrates various metrics to evaluate the accuracy of such predictions as they are presented in the text below. Supplementary material can be found at: <url>http://www.bioacademy.gr/bioinformatics/projects/ppireview.htm</url></p

    인프라가 없는 환경에서의 재난 통신망을 위한 동기화 및 그룹 형성 기법

    Get PDF
    학위논문 (박사)-- 서울대학교 대학원 : 전기·컴퓨터공학부, 2016. 8. 이광복.A public safety network (PSN) has been developed as a special class of wireless communication network that aims to save lives and prevent property damage. PSNs have evolved separately from commercial wireless networks satisfying various requirements and regulatory issues associated with them. With growing needs for the transmission of multimedia data, existing voice-centric PSN technologies are facing hurdles in fulllfilling the demand for high capacity and different types of services. Mission-critical requirements for PSNs include the guaranteed dissemination of emergency information such as alarm texts, images, and videos of disasters even in the absence (or destruction) of cellular infrastructure. Many research projects have been launched to meet the mission-critical requirement of PSN, e.g., Aerial Base Station with Opportunistic Links for Unexpected & TEmporary events (ABSOLUTE), Alert for All (Alert4All), Mobile Alert InformAtion system using satellites (MAIA), and so on. The research projects include the emergency communications using satellite communications, aerial eNodeBs, and terrestrial radio access technologies. The approaches take advantages of inherent broadcasting and resilience with respect to Earth damages for disseminations of alert messages. In this dissertation, we limit our interests to terrestrial radio access technologies, e.g., LTE, TETRA, TETRAPOL, and DMR, because PSNs should be operational even in the low-class user equipments (UEs) that are lack of satellite communication functionalities. In Chapter 2 of this dissertation, we propose a distributed synchronization algorithm for infrastructure-less public safety networks. The proposed algorithm aims to minimize the number of out-of-sync user equipments by efficiently forming synchronization groups and selecting synchronization reference UEs in a distributed manner. For the purpose, we introduce a novel affinity propagation technique which enables an autonomous decision at each UE based on local message-passing among neighboring UEs. Our simulation results show that the proposed algorithm reduces the number of out-of-sync UEs by up to 40% compared to the conventional scan-and-select strategy. In Chapter 3 of this dissertation, we study an infrastructure-less public safety network where energy efficiency and reliability are critical requirements in the absence of cellular infrastructure, i.e., base stations and wired backbone lines. We formulate the IPSN group formation as a clustering problem. A subset of user equipments, called group owners (GOs), are chosen to serve as virtual base stations, and each non-GO UE, referred to as group member, is associated with a GO as its member. We propose a novel clustering algorithm in the framework of affinity propagation, which is a state- of-the-art message-passing technique with a graphical model approach developed in the machine learning field. Unlike conventional clustering approaches, the proposed clustering algorithm minimizes the total energy consumption while guaranteeing link reliability by adjusting the number of GOs. Simulation results verify that the IPSN optimized by the proposed clustering algorithm reduces the total energy consumption of the network by up to 31% compared to the conventional clustering approaches.Chapter 1 INTRODUCTION 1 1.1 Distributed Synchronization Algorithm for Infrastructure-less Public Safety Networks 2 1.2 Reliable Low-Energy Group Formation for Infrastructure-less Public Safety Networks 3 1.3 Outline of Dissertation 8 1.4 Notations 8 Chapter 2 Distributed Synchronization Algorithm for Infrastructure-less Public Safety Networks 10 2.1 System Model and Problem Formulation 10 2.2 Distributed Synchronization Algorithm based on Message-passing 17 2.2.1 Preliminaries: affinity propagation 17 2.2.2 Distributed Synchronization Algorithm 19 2.3 Distributed Synchronization Procedures 20 2.4 Simulation Results 25 Chapter 3 Reliable Low-Energy Group Formation for Infrastructure-less Public Safety Networks 42 3.1 System Model and Problem Formulation 42 3.1.1 Channel Model and Network Structure 42 3.1.2 ProblemFormulation 44 3.2 Constrained Clustering Algorithm for IPSN 47 3.2.1 Similarity Modeling 47 3.2.2 Proposed Clustering Algorithm 47 3.3 Determination of initial point 51 3.4 Simulation Results 56 Chapter 4 Conclusion and Future Work 72 4.1 Conclusion 72 4.2 Future Work 73 Bibliography 74 Abstract (In Korean) 82Docto

    Unsupervised discovery of human behavior and dialogue patterns in data from an online game

    Get PDF
    Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 121-126).A content authoring bottleneck in AI, coupled with improving technology, has lead to increasing efforts in using large datasets to power Al systems directly. This idea is being used to create Al agents in video games, using logs of human-played games as the dataset. This new approach to AI brings its own challenges, particularly the need to annotate the datasets used. This thesis explores annotating the behavior in human-played games automatically, namely: how can we generate a list of events, with examples, describing the behavior in thousands of games. First dialogue is clustered semantically to simplify the game logs. Next, sequential pattern mining is used to find action-dialogue sequences that correspond to higher-level events. Finally, these sequences are grouped according to their event. The system can not yet replace human annotation, but the results are promising and can already help to significantly reduce the amount of human effort needed.by Tynan S. Smith.M.Eng

    Morphological Study of Dendritic Spines

    Get PDF
    Reconstruction of the neuron structure and the characterisation of dendritic spines is nowadays a hot topic in neurobiology research. Dendrites are cellular structures whose main objective is to conduct the electrochemical stimulation received from other neural cells to the cell body of the neuron from which the dendrites project. A dendritic spine is a small membranous protrusion from a neuron's dendrite that typically receives input from a single synapse of an axon. Spines have different shapes that can change over time and are believed to be closely related to learning and memory. The main objective of this thesis is to develop an algorithm to analyse the stack of confocal microscopy images that have been previously manually processed, in order to obtain a precise three dimensional reconstruction of the dendritic spines and study their morphology. The outline of this thesis is as follows. In the first chapter we introduce the neuronal physiological basis of the dendritic spines and their classical classification. We also briefly explain how the confocal microscope works. In the methodology chapter we expose the manual procedure to extract the dendritic spines done by the researchers of the Cortical Circuits Laboratory, the spine reconstruction algorithm that we have implemented, the 3D shape descriptors and the clustering methods that we have used. In the third chapter we show the obtained results and we end the thesis by discussing them and proposing future research lines

    Clustering Approaches for Multi-source Entity Resolution

    Get PDF
    Entity Resolution (ER) or deduplication aims at identifying entities, such as specific customer or product descriptions, in one or several data sources that refer to the same real-world entity. ER is of key importance for improving data quality and has a crucial role in data integration and querying. The previous generation of ER approaches focus on integrating records from two relational databases or performing deduplication within a single database. Nevertheless, in the era of Big Data the number of available data sources is increasing rapidly. Therefore, large-scale data mining or querying systems need to integrate data obtained from numerous sources. For example, in online digital libraries or E-Shops, publications or products are incorporated from a large number of archives or suppliers across the world or within a specified region or country to provide a unified view for the user. This process requires data consolidation from numerous heterogeneous data sources, which are mostly evolving. By raising the number of sources, data heterogeneity and velocity as well as the variance in data quality is increased. Therefore, multi-source ER, i.e. finding matching entities in an arbitrary number of sources, is a challenging task. Previous efforts for matching and clustering entities between multiple sources (> 2) mostly treated all sources as a single source. This approach excludes utilizing metadata or provenance information for enhancing the integration quality and leads up to poor results due to ignorance of the discrepancy between quality of sources. The conventional ER pipeline consists of blocking, pair-wise matching of entities, and classification. In order to meet the new needs and requirements, holistic clustering approaches that are capable of scaling to many data sources are needed. The holistic clustering-based ER should further overcome the restriction of pairwise linking of entities by making the process capable of grouping entities from multiple sources into clusters. The clustering step aims at removing false links while adding missing true links across sources. Additionally, incremental clustering and repairing approaches need to be developed to cope with the ever-increasing number of sources and new incoming entities. To this end, we developed novel clustering and repairing schemes for multi-source entity resolution. The approaches are capable of grouping entities from multiple clean (duplicate-free) sources, as well as handling data from an arbitrary combination of clean and dirty sources. The multi-source clustering schemes exclusively developed for multi-source ER can obtain superior results compared to general purpose clustering algorithms. Additionally, we developed incremental clustering and repairing methods in order to handle the evolving sources. The proposed incremental approaches are capable of incorporating new sources as well as new entities from existing sources. The more sophisticated approach is able to repair previously determined clusters, and consequently yields improved quality and a reduced dependency on the insert order of the new entities. To ensure scalability, the parallel variation of all approaches are implemented on top of the Apache Flink framework which is a distributed processing engine. The proposed methods have been integrated in a new end-to-end ER tool named FAMER (FAst Multi-source Entity Resolution system). The FAMER framework is comprised of Linking and Clustering components encompassing both batch and incremental ER functionalities. The output of Linking part is recorded as a similarity graph where each vertex represents an entity and each edge maintains the similarity relationship between two entities. Such a similarity graph is the input of the Clustering component. The comprehensive comparative evaluations overall show that the proposed clustering and repairing approaches for both batch and incremental ER achieve high quality while maintaining the scalability

    Affinity-Based Reinforcement Learning : A New Paradigm for Agent Interpretability

    Get PDF
    The steady increase in complexity of reinforcement learning (RL) algorithms is accompanied by a corresponding increase in opacity that obfuscates insights into their devised strategies. Methods in explainable artificial intelligence seek to mitigate this opacity by either creating transparent algorithms or extracting explanations post hoc. A third category exists that allows the developer to affect what agents learn: constrained RL has been used in safety-critical applications and prohibits agents from visiting certain states; preference-based RL agents have been used in robotics applications and learn state-action preferences instead of traditional reward functions. We propose a new affinity-based RL paradigm in which agents learn strategies that are partially decoupled from reward functions. Unlike entropy regularisation, we regularise the objective function with a distinct action distribution that represents a desired behaviour; we encourage the agent to act according to a prior while learning to maximise rewards. The result is an inherently interpretable agent that solves problems with an intrinsic affinity for certain actions. We demonstrate the utility of our method in a financial application: we learn continuous time-variant compositions of prototypical policies, each interpretable by its action affinities, that are globally interpretable according to customers’ financial personalities. Our method combines advantages from both constrained RL and preferencebased RL: it retains the reward function but generalises the policy to match a defined behaviour, thus avoiding problems such as reward shaping and hacking. Unlike Boolean task composition, our method is a fuzzy superposition of different prototypical strategies to arrive at a more complex, yet interpretable, strategy.publishedVersio

    Review on recent advances in information mining from big consumer opinion data for product design

    Get PDF
    In this paper, based on more than ten years' studies on this dedicated research thrust, a comprehensive review concerning information mining from big consumer opinion data in order to assist product design is presented. First, the research background and the essential terminologies regarding online consumer opinion data are introduced. Next, studies concerning information extraction and information utilization of big consumer opinion data for product design are reviewed. Studies on information extraction of big consumer opinion data are explained from various perspectives, including data acquisition, opinion target recognition, feature identification and sentiment analysis, opinion summarization and sampling, etc. Reviews on information utilization of big consumer opinion data for product design are explored in terms of how to extract critical customer needs from big consumer opinion data, how to connect the voice of the customers with product design, how to make effective comparisons and reasonable ranking on similar products, how to identify ever-evolving customer concerns efficiently, and so on. Furthermore, significant and practical aspects of research trends are highlighted for future studies. This survey will facilitate researchers and practitioners to understand the latest development of relevant studies and applications centered on how big consumer opinion data can be processed, analyzed, and exploited in aiding product design

    Module Identification for Biological Networks

    Get PDF
    Advances in high-throughput techniques have enabled researchers to produce large-scale data on molecular interactions. Systematic analysis of these large-scale interactome datasets based on their graph representations has the potential to yield a better understanding of the functional organization of the corresponding biological systems. One way to chart out the underlying cellular functional organization is to identify functional modules in these biological networks. However, there are several challenges of module identification for biological networks. First, different from social and computer networks, molecules work together with different interaction patterns; groups of molecules working together may have different sizes. Second, the degrees of nodes in biological networks obey the power-law distribution, which indicates that there exist many nodes with very low degrees and few nodes with high degrees. Third, molecular interaction data contain a large number of false positives and false negatives. In this dissertation, we propose computational algorithms to overcome those challenges. To identify functional modules based on interaction patterns, we develop efficient algorithms based on the concept of block modeling. We propose a subgradient Frank-Wolfe algorithm with path generation method to identify functional modules and recognize the functional organization of biological networks. Additionally, inspired by random walk on networks, we propose a novel two-hop random walk strategy to detect fine-size functional modules based on interaction patterns. To overcome the degree heterogeneity problem, we propose an algorithm to identify functional modules with the topological structure that is well separated from the rest of the network as well as densely connected. In order to minimize the impact of the existence of noisy interactions in biological networks, we propose methods to detect conserved functional modules for multiple biological networks by integrating the topological and orthology information across different biological networks. For every algorithm we developed, we compare each of them with the state-of-the-art algorithms on several biological networks. The comparison results on the known gold standard biological function annotations show that our methods can enhance the accuracy of predicting protein complexes and protein functions

    An Entropy-Based Position Projection Algorithm for Motif Discovery

    Get PDF

    Diversity and Novelty: Measurement, Learning and Optimization

    Get PDF
    The primary objective of this dissertation is to investigate research methods to answer the question: ``How (and why) does one measure, learn and optimize novelty and diversity of a set of items?" The computational models we develop to answer this question also provide foundational mathematical techniques to throw light on the following three questions: 1. How does one reliably measure the creativity of ideas? 2. How does one form teams to evaluate design ideas? 3. How does one filter good ideas out of hundreds of submissions? Solutions to these questions are key to enable the effective processing of a large collection of design ideas generated in a design contest. In the first part of the dissertation, we discuss key qualities needed in design metrics and propose new diversity and novelty metrics for judging design products. We show that the proposed metrics have higher accuracy and sensitivity compared to existing alternatives in literature. To measure the novelty of a design item, we propose learning from human subjective responses to derive low dimensional triplet embeddings. To measure diversity, we propose an entropy-based diversity metric, which is more accurate and sensitive than benchmarks. In the second part of the dissertation, we introduce the bipartite b-matching problem and argue the need for incorporating diversity in the objective function for matching problems. We propose new submodular and supermodular objective functions to measure diversity and develop multiple matching algorithms for diverse team formation in offline and online cases. Finally, in the third part, we demonstrate filtering and ranking of ideas using diversity metrics based on Determinantal Point Processes as well as submodular functions. In real-world crowd experiments, we demonstrate that such ranking enables increased efficiency in filtering high-quality ideas compared to traditionally used methods
    corecore