136 research outputs found

    Mining the VVV: star formation and embedded clusters

    Full text link
    The aim of this study is to locate previously unknown stellar clusters from the VISTA variables in the V\'ia L\'actea Survey (VVV) catalogue data. The method, fitting a mixture model of Gaussian densities and background noise using the expectation maximization algorithm to a pre-filtered NIR survey stellar catalogue data, was developed by the authors for the UKIDSS Galactic Plane Survey (GPS). The search located 88 previously unknown mainly embedded stellar cluster candidates and 39 previously unknown sites of star formation in the 562 deg2 covered by VVV in the Galactic bulge and the southern disk

    Mihin algoritmeja tarvitaan?

    Get PDF
    Alkulukutestaus ja bioinformatiikka ovat ajankohtaisia algoritmitutkimuksen alueita. Kumpikin on kiinnostava sekä algoritmiteorian että sovellusten kannalta. Äskettäin esitetty nopea alkulukutesti ratkaisi klassisen lukuteoreettisen ongelman ja toi samalla uutta puhtia tiedonsuojausmenetelmien tutkimukseen. Bioinformatiikasta on puolestaan tullut uuden molekyylibiologian kehityksen seurauksena voimakkaasti laajeneva monitieteinen tutkimusala, joka tarjoaa uudentyyppisiä haasteita algoritmitutkimukselle. Suomalaiset ovat olleet mukana bioinformatiikan algoritmien kehittämisessä alusta alkaen

    Efficient algorithms for the discovery of gapped factors

    Get PDF
    Background: The discovery of surprisingly frequent patterns is of paramount interest in bioinformatics and computational biology. Among the patterns considered, those consisting of pairs of solid words that co-occur within a prescribed maximum distance-or gapped factors- emerge in a variety of contexts of DNA and protein sequence analysis. A few algorithms and tools have been developed in connection with specific formulations of the problem, however, none can handle comprehensively each of the multiple ways in which the distance between the two terms in a pair may be defined. Results: This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence. Whereas the number of such pairs in a sequence of n characters can be Θ(n 4), it is shown that an exhaustive discovery process can be carried out in O(n 2)orO(n 3), depending on the way distance is measured. This is made possible by a prudent combination of properties of pattern maximality and monotonicity of scores, which lead to reduce the number of word pairs to be weighed explicitly, while still producing also the scores attained by any of the pairs not explicitly considered. We applied our approach to the discovery of spaced dyads in DNA sequences. Conclusions: Experiments on biological datasets prove that the method is effective and much faster than exhaustive enumeration of candidate patterns. Software is available freely by academic users via the web interfac

    Longest common substrings with k mismatches

    Get PDF
    The longest common substring with k-mismatches problem is to find, given two strings S-1 and S-2, a longest substring A(1) of S-1 and A(2) of S-2 such that the Hamming distance between A(1) and A(2) isPeer reviewe

    MODER2: First-order Markov Modeling and Discovery of Monomeric and Dimeric Binding Motifs

    Get PDF
    Motivation: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average.Peer reviewe
    • …
    corecore