35 research outputs found

    Ab initio detection of fuzzy amino acid tandem repeats in protein sequences

    Get PDF
    Background Tandem repetitions within protein amino acid sequences often correspond to regular secondary structures and form multi-repeat 3D assemblies of varied size and function. Developing internal repetitions is one of the evolutionary mechanisms that proteins employ to adapt their structure and function under evolutionary pressure. While there is keen interest in understanding such phenomena, detection of repeating structures based only on sequence analysis is considered an arduous task, since structure and function is often preserved even under considerable sequence divergence (fuzzy tandem repeats). Results In this paper we present PTRStalker, a new algorithm for ab-initio detection of fuzzy tandem repeats in protein amino acid sequences. In the reported results we show that by feeding PTRStalker with amino acid sequences from the UniProtKB/Swiss-Prot database we detect novel tandemly repeated structures not captured by other state-of-the-art tools. Experiments with membrane proteins indicate that PTRStalker can detect global symmetries in the primary structure which are then reflected in the tertiary structure. Conclusions PTRStalker is able to detect fuzzy tandem repeating structures in protein sequences, with performance beyond the current state-of-the art. Such a tool may be a valuable support to investigating protein structural properties when tertiary X-ray data is not available

    TRStalker: an efficient heuristic for finding fuzzy tandem repeats

    Get PDF
    Motivation: Genomes in higher eukaryotic organisms contain a substantial amount of repeated sequences. Tandem Repeats (TRs) constitute a large class of repetitive sequences that are originated via phenomena such as replication slippage and are characterized by close spatial contiguity. They play an important role in several molecular regulatory mechanisms, and also in several diseases (e.g. in the group of trinucleotide repeat disorders). While for TRs with a low or medium level of divergence the current methods are rather effective, the problem of detecting TRs with higher divergence (fuzzy TRs) is still open. The detection of fuzzy TRs is propaedeutic to enriching our view of their role in regulatory mechanisms and diseases. Fuzzy TRs are also important as tools to shed light on the evolutionary history of the genome, where higher divergence correlates with more remote duplication events

    Active and poised promoter states drive folding of the extended HoxB locus in mouse embryonic stem cells

    Get PDF
    Gene expression states influence the three-dimensional conformation of the genome through poorly understood mechanisms. Here, we investigate the conformation of the murine HoxB locus, a gene-dense genomic region containing closely spaced genes with distinct activation states in mouse embryonic stem (ES) cells. To predict possible folding scenarios, we performed computer simulations of polymer models informed with different chromatin occupancy features, which define promoter activation states or CTCF binding sites. Single cell imaging of the locus folding was performed to test model predictions. While CTCF occupancy alone fails to predict the in vivo folding at genomic length scale of 10 kb, we found that homotypic interactions between active and Polycomb-repressed promoters co-occurring in the same DNA fibre fully explain the HoxB folding patterns imaged in single cells. We identify state-dependent promoter interactions as major drivers of chromatin folding in gene-dense regions

    Automatic Structured Query Transformation Over Distributed Digital Libraries

    No full text
    Structured data and complex schemas are becoming the main way to represent the information many Digital Libraries provide, thus impacting the services they offer. When searching information among distributed Digital Libraries with heterogeneous schemas, the structured query with a given schema (the global or target schema) has to be transformed into a query over the schema of the digital library it will be submitted to (the source schema). Schema mappings define the rules for this query transformation. Schema matching is the problem of learning these mappings. In this paper we address the issue of automatically learning these mappings and transforming a structured query over the target schema into a new structured query over the source schema. We propose a simple and effective schema matching method based on the well known CORI selection algorithm and two ways of applying it. By evaluating the effectiveness of the obtained structured queries we show that the method works well in accessing distributed, heterogeneous digital libraries

    Web Metasearch: Rank vs. Score Based Rank Aggregation Methods

    No full text
    Given a set of rankings, the task of ranking fusion is the problem of combining these lists in such a way to optimize the performance of the combination. The ranking fusion problem is encountered in many situations and, e.g., metasearch is a prominent one. It deals with the problem of combining the result lists returned by multiple search engines in response to a given query, where each item in a result list is ordered with respect to a search engine and a relevance score. Several ranking fusion methods have been proposed in the literature. They can be classified based on whether: (i) they rely on the rank; (ii) they rely on the score; and (iii) they require training data or not. Our paper will make the following contributions: (i) we will report experimental results for the Markov chain rank based methods, for which no large experimental tests have yet been made; (ii) while it is believed that the rank based method, named Borda Count, is competitive with score based methods, we will show that this is not true for metasearch; and (iii) we will show that Markov chain based methods compete with score based methods. This is especially important in the context of metasearch as scores are usually not available from the search engines

    The Price of Privacy Control in Mobility Sharing

    No full text
    © 2020 The Society of Urban Technology. One of the main features in mobility sharing applications is the exposure of personal data provided to the system. Transportation and location data can reveal personal habits, preferences, and behaviors, and riders could be keen not to share the exact location of their origin and/or destination. But what is the price of privacy in terms of decreased efficiency of a mobility sharing system? In this paper we address the privacy issues under this point of view, and show how location privacy-preserving techniques could affect the performance of mobility-sharing applications, in terms of both system efficiency and quality of service. To this extent, we first apply different data-masking techniques to anonymize geographical information, and then compare the performance of shareability network-based trip-matching algorithms for ride-sharing, applied to real data and to privacy-preserving data. The goal of the paper is to evaluate the performance of mobility-sharing, privacy-preserving systems, and to shed light on the trade-off between data privacy and its costs. The results show that the total traveled distance increase due to the introduction of data privacy could be bounded if users are willing to spend (or “pay”) for more time in order to share a trip, meaning that data location privacy affects both efficiency and quality of service
    corecore