77 research outputs found

    SimSearch: A new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences

    Get PDF
    http://www.informatik.uni-trier.de/%7Eley/db/conf/iwpacbb/iwpacbb2008.htmlIn this paper, we propose SimSearch, an algorithm implementing a new variant of dynamic programming based on distance series for optimal and near-optimal similarity discovery in biological sequences. The initial phase of SimSearch is devoted to fulfil the binary similarity matrices by signalling the distances between occurrences of the same symbol. The scoring scheme is further applied, when analysed the maximal extension of the pattern. Employing bit parallelism to analyse the global similarity matrix’s upper triangle, the new methodology searches the sequence(s) for all the exact and approximate patterns in regular or reverse order. The algorithm accepts parameterization to work with greater seeds for near-optimal results. Performance tests show significant efficiency improvement over traditional optimal methods based on dynamic programming. Comparing the new algorithm’s efficiency against heuristic based methods, equalizing the required sensitivity, the proposed algorithm remains acceptable.This work has been partially supported by PRODEP

    DNA Hash Pooling and its Applications

    Full text link
    In this paper we describe a new technique for the comparison of populations of DNA strands. Comparison is vital to the study of ecological systems, at both the micro and macro scales. Existing methods make use of DNA sequencing and cloning, which can prove costly and time consuming, even with current sequencing techniques. Our overall objective is to address questions such as: (i) (Genome detection) Is a known genome sequence present, at least in part, in an environmental sample? (ii) (Sequence query) Is a specific fragment sequence present in a sample? (iii) (Similarity discovery) How similar in terms of sequence content are two unsequenced samples? We propose a method involving multiple filtering criteria that result in "pools" of DNA of high or very high purity. Because our method is similar in spirit to hashing in computer science, we call it DNA hash pooling. To illustrate this method, we describe protocols using pairs of restriction enzymes. The in silico empirical results we present reflect a sensitivity to experimental error. Our method will normally be performed as a filtering step prior to sequencing in order to reduce the amount of sequencing required (generally by a factor of 10 or more). Even as sequencing becomes cheaper, an order of magnitude remains important.Comment: 14 pages, 3 figures. To appear in the International Journal of Nanotechnology and Molecular Computation. Improved background, analysis and reference

    Scalable factorization model to discover implicit and explicit similarities across domains

    Full text link
    University of Technology Sydney. Faculty of Engineering and Information Technology.E-commerce businesses increasingly depend on recommendation systems to introduce personalized services and products to their target customers. Achieving accurate recommendations requires a sufficient understanding of user preferences and item characteristics. Given the current innovations on the Web, coupled datasets are abundantly available across domains. An analysis of these datasets can provide a broader knowledge to understand the underlying relationship between users and items. This thorough understanding results in more collaborative filtering power and leads to a higher recommendation accuracy. However, how to effectively use this knowledge for recommendation is still a challenging problem. In this research, we propose to exploit both explicit and implicit similarities extracted from latent factors across domains with matrix tri-factorization. On the coupled dimensions, common parts of the coupled factors across domains are shared among them. At the same time, their domain-specific parts are preserved. We show that such a configuration of both common and domain-specific parts benefits cross-domain recommendations significantly. Moreover, on the non-coupled dimensions, the middle factor of the tri-factorization is proposed to use to match the closely related clusters across datasets and align the matched ones to transfer cross-domain implicit similarities, further improving the recommendation. Furthermore, when dealing with data coupled from different sources, the scalability of the analytical method is another significant concern. We design a distributed factorization model that can scale up as the observed data across domains increases. Our data parallelism, based on Apache Spark, enables the model to have the smallest communication cost. Also, the model is equipped with an optimized solver that converges faster. We demonstrate that these key features stabilize our model’s performance when the data grows. Validated on real-world datasets, our developed model outperforms the existing algorithms regarding recommendation accuracy and scalability. These empirical results illustrate the potential of our research in exploiting both explicit and implicit similarities across domains for improving recommendation performance

    Melis: an incremental method for the lexical annotation of domain ontologies

    Get PDF
    In this paper, we present MELIS (Meaning Elicitation and Lexical Integration System), a method and a software tool for enabling an incremental process of automatic annotation of local schemas (e.g. relational database schemas, directory trees) with lexical information. The distinguishing and original feature of MELIS is the incremental process: the higher the number of schemas which are processed, the more background/domain knowledge is cumulated in the system (a portion of domain ontology is learned at every step), the better the performance of the systems on annotating new schemas.MELIS has been tested as component of MOMIS-Ontology Builder, a framework able to create a domain ontology representing a set of selected data sources, described with a standard W3C language wherein concepts and attributes are annotated according to the lexical reference database.We describe the MELIS component within the MOMIS-Ontology Builder framework and provide some experimental results of ME LIS as a standalone tool and as a component integrated in MOMIS

    Meta-learning

    Get PDF
    In: Encyclopedia of Systems Biology, W. Dubitzky, O. Wolkenhauer, K-H Cho, H. Yokota (Eds.), Springer 2011Meta-learning methods are aimed at automatic discovery of interesting models of data. They belong to a branch of Machine Learning that tries to replace human experts involved in the Data Mining process of creating various computational models learning from data

    Metaheuristic Based Clustering Algorithms for Biological Hypergraphs

    Get PDF
    Hypergraphs are widely used for modeling and representing relationships between entities, one such field where their application is prolific is in bioinformatics. In the present era of big data, sizes and complexity of these hypergraphs grow exponentially, it is impossible to process them manually or even visualize their interconnectivity superficially. A common approach to tackle their complexity is to cluster similar data nodes together in order to create a more comprehensible representation. This enables similarity discovery and hence, extract hidden knowledge within the hypergraphs. Several state-of-the-art algorithms have been proposed for partitioning and clustering of hypergraphs. Nevertheless, several issues remain unanswered, improvement to existing algorithms are possible, especially in scalability and clustering quality. This article presents a concise survey on hypergraph-clustering algorithms with the emphasis on knowledge-representation in systems biomedicine. It also suggests a novel approach to clustering quality by means of cluster-quality metrics which combines expert knowledge and measurable objective distances in existing biological ontology

    Spotify: A Strategic Analysis of its Strengths, Weaknesses, Opportunities, and Threats

    Get PDF
    Spotify is an online streaming platform that allows consumers to listen to music and podcasts for free with advertising or paying a monthly subscription for ad-free access. Since its founding in 2006, it exploded into the music scene, taking hold of a significant portion of market share in the media and entertainment industry. As always in the business world, however, there is room for growth. The purpose of this analysis is to evaluate how Spotify is performing as a company, both internally and compared to its competitors. After exploring Spotify’s business practices and the state of its industry through business articles, industry analyses, and financial reports, this analysis will determine what strengths and weaknesses Spotify has, as well as what opportunities and threats to its ability to increase profitability exist. Findings from this analysis show that Spotify is struggling to compete with rival companies in the Movies & Entertainment sub-industry, primarily due to the high cost of licensing fees to artists in the music industry. This report summarizes the information from the internal and external analyses and provides recommendations for Spotify’s next steps toward achieving growth and competitive advantage in its sub-industry. These recommendations include continuing the horizontal integration of non-music related audio products, expanding into untapped international markets, and financing original content to create a stream of revenue that is not significantly hampered by the continued cost of licensing from a third party
    • …
    corecore