2,308 research outputs found

    Large scale clustering of protein sequences with FORCE -A layout based heuristic for weighted cluster editing

    Get PDF
    Wittkop T, Baumbach J, Lobo FP, Rahmann S. Large scale clustering of protein sequences with FORCE - a layout based heuristic for weighted cluster editing. BMC Bioinformatics. 2007;8(1): 396.Background: Detecting groups of functionally related proteins from their amino acid sequence alone has been a long-standing challenge in computational genome research. Several clustering approaches, following different strategies, have been published to attack this problem. Today, new sequencing technologies provide huge amounts of sequence data that has to be efficiently clustered with constant or increased accuracy, at increased speed. Results: We advocate that the model of weighted cluster editing, also known as transitive graph projection is well-suited to protein clustering. We present the FORCE heuristic that is based on transitive graph projection and clusters arbitrary sets of objects, given pairwise similarity measures. In particular, we apply FORCE to the problem of protein clustering and show that it outperforms the most popular existing clustering tools ( Spectral clustering, TribeMCL, GeneRAGE, Hierarchical clustering, and Affinity Propagation). Furthermore, we show that FORCE is able to handle huge datasets by calculating clusters for all 192 187 prokaryotic protein sequences ( 66 organisms) obtained from the COG database. Finally, FORCE is integrated into the corynebacterial reference database CoryneRegNet. Conclusion: FORCE is an applicable alternative to existing clustering algorithms. Its theoretical foundation, weighted cluster editing, can outperform other clustering paradigms on protein homology clustering. FORCE is open source and implemented in Java. The software, including the source code, the clustering results for COG and CoryneRegNet, and all evaluation datasets are available at http://gi.cebitec.uni-bielefeld.de/comet/force/

    Un método para la fragmentación vertical de bases de datos y su variante como evaluador de particiones

    Get PDF
    El diseño de bases de datos distribuidas es un problema de optimización que implica la solución de problemáticas como la fragmentación de los datos y su ubicación. Típicamente, los criterios que determinan si la fragmentación y la asignación son óptimas se establecen de manera independiente. Primero se busca la “mejor” fragmentación y luego la “mejor” ubicación de los fragmentos obtenidos. La fragmentación vertical es más complicada que la partición horizontal, debido al incremento del número de posibles alternativas. En este trabajo se presenta un nuevo método para la fragmentación vertical, que se basa fundamentalmente en la Matriz de Atracción entre Atributos, suplantando la conocida Matriz de Afinidad entre Atributos. Se utiliza como heurística el enfoque de agrupamientos jerárquicos y una regla de decisión basada en la homogeneidad interna y la heterogeneidad externa de los grupos obtenidos. También se presenta una variante para que pueda ser usado como evaluador de particiones.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

    Обзор подходов к организации физического уровня в СУБД

    Get PDF
    In this paper we survey various DBMS physical design options. We will consider both vertical and horizontal partitioning, and briefly cover replication. This survey is not limited only to local systems, but also includes distributed ones. The latter adds a new interesting question — how to actually distribute data among several processing nodes. Aside from theoretical approaches we consider the practical ones, implemented in any contemporary DBMS. We cover these aspects not only from user, but also architect and programmer perspectives.В данной работе мы рассмотрели различные методы организации физического уровня СУБД: вертикальное и горизонтальное фрагментирование, а также вкратце нами затронут вопрос репликации. Указанные методы были рассмотрены не только для локальных, но и для распределенных СУБД. Последним было уделено повышенное внимание: были рассмотрены методы размещения данных на узлах распределенной системы. Кроме теоретических работ, приведены работы практического характера, в которых освещены вопросы применения вышеуказанных методов в современных коммерческих СУБД. Они были рассмотрены как с позиции пользователя, так и с позиций архитектора и программиста СУБ

    A physical model describing the interaction of nuclear transport receptors with FG nucleoporin domain assemblies

    Get PDF
    The permeability barrier of nuclear pore complexes (NPCs) controls bulk nucleocytoplasmic exchange. It consists of nucleoporin domains rich in phenylalanine-glycine motifs (FG domains). As a bottom-up nanoscale model for the permeability barrier, we have used planar films produced with three different end-grafted FG domains, and quantitatively analyzed the binding of two different nuclear transport receptors (NTRs), NTF2 and Importin b, together with the concomitant film thickness changes. NTR binding caused only moderate changes in film thickness; the binding isotherms showed negative cooperativity and could all be mapped onto a single master curve. This universal NTR binding behavior –a key element for the transport selectivity of the NPC –was quantitatively reproduced by a physical model that treats FG domains as regular, flexible polymers, and NTRs as spherical colloids with a homogeneous surface, ignoring the detailed arrangement of interaction sites along FG domains and on the NTR surface

    Multigranularity Representations for Human Inter-Actions: Pose, Motion and Intention

    Get PDF
    Tracking people and their body pose in videos is a central problem in computer vision. Standard tracking representations reason about temporal coherence of detected people and body parts. They have difficulty tracking targets under partial occlusions or rare body poses, where detectors often fail, since the number of training examples is often too small to deal with the exponential variability of such configurations. We propose tracking representations that track and segment people and their body pose in videos by exploiting information at multiple detection and segmentation granularities when available, whole body, parts or point trajectories. Detections and motion estimates provide contradictory information in case of false alarm detections or leaking motion affinities. We consolidate contradictory information via graph steering, an algorithm for simultaneous detection and co-clustering in a two-granularity graph of motion trajectories and detections, that corrects motion leakage between correctly detected objects, while being robust to false alarms or spatially inaccurate detections. We first present a motion segmentation framework that exploits long range motion of point trajectories and large spatial support of image regions. We show resulting video segments adapt to targets under partial occlusions and deformations. Second, we augment motion-based representations with object detection for dealing with motion leakage. We demonstrate how to combine dense optical flow trajectory affinities with repulsions from confident detections to reach a global consensus of detection and tracking in crowded scenes. Third, we study human motion and pose estimation. We segment hard to detect, fast moving body limbs from their surrounding clutter and match them against pose exemplars to detect body pose under fast motion. We employ on-the-fly human body kinematics to improve tracking of body joints under wide deformations. We use motion segmentability of body parts for re-ranking a set of body joint candidate trajectories and jointly infer multi-frame body pose and video segmentation. We show empirically that such multi-granularity tracking representation is worthwhile, obtaining significantly more accurate multi-object tracking and detailed body pose estimation in popular datasets

    Multidimensional study of urban squares through perimetral analysis: three Portuguese case studies

    Get PDF
    This paper addresses one of the most symbolically and socially meaningful elements of the public open space: the urban square (Portuguese: praça). Besides their urban centrality, these spaces’ potential for liveliness depends on multiple factors and their identity as a place may only be grasped by formal methods that embrace that latent complexity and address the multi?scale and multivariate correlations of factors that defy human cognitive capabilities. This paper will present a synchronic multidimensional analysis of three Portuguese historic squares: Praça da Oliveira, Praça de Santiago (Guimarães) and Praça do Giraldo (Évora), representative of the national historic heritage.info:eu-repo/semantics/acceptedVersio

    Spatial compositional turnover varies with trophic level and body size in marine assemblages of micro- and macroorganisms

    Get PDF
    Abstract Aim Spatial compositional turnover varies considerably among co-occurring assemblages of organisms, presumably shaped by common processes related to species traits. We investigated patterns of spatial turnover in a diverse set of marine assemblages using zeta diversity, which extends traditional pairwise measures of turnover to capture the roles of both rare and common species in shaping assemblage turnover. We tested the generality of hypothesized patterns related to ecological traits and provide insights into mechanisms of biodiversity change. Location Temperate pelagic and benthic marine assemblages of micro- and macroorganisms along south-eastern Australia (30–36° S latitude). Time period 2008–2021. Major taxa studied Bacteria, phytoplankton, zooplankton, fish, and macrobenthic groups. Methods Six marine datasets spanning bacteria to fishes were collated for measures of “species” occurrence, with a 1° latitude grain. For each assemblage, ecological traits of body size, habitat and trophic level were analysed for the form and rate of decline in zeta diversity and for the species retention rate. Results Species at higher trophic levels showed two to three times the rate of zeta diversity decline compared with lower trophic levels, indicating an increase in turnover from phytoplankton to carnivorous fishes. Body size showed the hypothesized unimodal relationship with rates of turnover for macroorganisms. Patterns of bacterial turnover contrasted with those found for macroorganisms, with the highest levels of turnover in pelagic habitats compared with benthic (kelp-associated) habitats. The shape of retention rate curves showed the importance of both rare and common species in driving turnover; a finding that would not have been observable using pairwise (beta diversity) measures of turnover. Main conclusions Our results support theoretical predictions for phytoplankton and macroorganisms, showing an increase in turnover rate with trophic level, but these predictions did not hold for bacteria. Such deviations from theory need to be investigated further to identify underlying processes that govern microbial assemblage dynamics

    Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling

    Full text link
    The U.S. Department of Energy recently announced the first five grants for the Genomes to Life (GTL) Program. The goal of this program is to "achieve the most far-reaching of all biological goals: a fundamental, comprehensive, and systematic understanding of life." While more information about the program can be found at the GTL website (www.doegenomestolife.org), this paper provides an overview of one of the five GTL projects funded, "Carbon Sequestration in Synechococcus Sp.: From Molecular Machines to Hierarchical Modeling." This project is a combined experimental and computational effort emphasizing developing, prototyping, and applying new computational tools and methods to ellucidate the biochemical mechanisms of the carbon sequestration of Synechococcus Sp., an abundant marine cyanobacteria known to play an important role in the global carbon cycle. Understanding, predicting, and perhaps manipulating carbon fixation in the oceans has long been a major focus of biological oceanography and has more recently been of interest to a broader audience of scientists and policy makers. It is clear that the oceanic sinks and sources of CO2 are important terms in the global environmental response to anthropogenic atmospheric inputs of CO2 and that oceanic microorganisms play a key role in this response. However, the relationship between this global phenomenon and the biochemical mechanisms of carbon fixation in these microorganisms is poorly understood. The project includes five subprojects: an experimental investigation, three computational biology efforts, and a fifth which deals with addressing computational infrastructure challenges of relevance to this project and the Genomes to Life program as a whole. Our experimental effort is designed to provide biology and data to drive the computational efforts and includes significant investment in developing new experimental methods for uncovering protein partners, characterizing protein complexes, identifying new binding domains. We will also develop and apply new data measurement and statistical methods for analyzing microarray experiments. Our computational efforts include coupling molecular simulation methods with knowledge discovery from diverse biological data sets for high-throughput discovery and characterization of protein-protein complexes and developing a set of novel capabilities for inference of regulatory pathways in microbial genomes across multiple sources of information through the integration of computational and experimental technologies. These capabilities will be applied to Synechococcus regulatory pathways to characterize their interaction map and identify component proteins in these pathways. We will also investigate methods for combining experimental and computational results with visualization and natural language tools to accelerate discovery of regulatory pathways. Furthermore, given that the ultimate goal of this effort is to develop a systems-level of understanding of how the Synechococcus genome affects carbon fixation at the global scale, we will develop and apply a set of tools for capturing the carbon fixation behavior of complex of Synechococcus at different levels of resolution. Finally, because the explosion of data being produced by high-throughput experiments requires data analysis and models which are more computationally complex, more heterogeneous, and require coupling to ever increasing amounts of experimentally obtained data in varying formats, we have also established a companion computational infrastructure to support this effort as well as the Genomes to Life program as a whole.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/63164/1/153623102321112746.pd
    corecore