Search CORE

136 research outputs found

Mining the VVV: star formation and embedded clusters

Author: Haikala Lauri
Solin Otto
Ukkonen Esko
Publication venue: 'EDP Sciences'
Publication date: 30/12/2013
Field of study

The aim of this study is to locate previously unknown stellar clusters from the VISTA variables in the V\'ia L\'actea Survey (VVV) catalogue data. The method, fitting a mixture model of Gaussian densities and background noise using the expectation maximization algorithm to a pre-filtered NIR survey stellar catalogue data, was developed by the authors for the UKIDSS Galactic Plane Survey (GPS). The search located 88 previously unknown mainly embedded stellar cluster candidates and 39 previously unknown sites of star formation in the 562 deg2 covered by VVV in the Galactic bulge and the southern disk

arXiv.org e-Print Archive

CiteSeerX

EDP Sciences OAI-PMH repository (1.2.0)

Mihin algoritmeja tarvitaan?

Author: Ukkonen Esko
Publication venue: Tieteellisten seurain valtuuskunta
Publication date: 01/01/2003
Field of study

Alkulukutestaus ja bioinformatiikka ovat ajankohtaisia algoritmitutkimuksen alueita. Kumpikin on kiinnostava sekä algoritmiteorian että sovellusten kannalta. Äskettäin esitetty nopea alkulukutesti ratkaisi klassisen lukuteoreettisen ongelman ja toi samalla uutta puhtia tiedonsuojausmenetelmien tutkimukseen. Bioinformatiikasta on puolestaan tullut uuden molekyylibiologian kehityksen seurauksena voimakkaasti laajeneva monitieteinen tutkimusala, joka tarjoaa uudentyyppisiä haasteita algoritmitutkimukselle. Suomalaiset ovat olleet mukana bioinformatiikan algoritmien kehittämisessä alusta alkaen

Journal.fi

National Library of Finland DSpace Services

Geometric algorithms for transposition invariant content-based music retrieval

Author: Lemström Kjell
Mäkinen Veli
Ukkonen Esko
Publication venue
Publication date: 01/01/2003
Field of study

We are grateful to Mika Turkia for the implementations.Peer reviewe

CiteSeerX

Johns Hopkins University

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

JScholarship

Efficient algorithms for the discovery of gapped factors

Author: Apostolico Alberto
Pizzi Cinzia
Ukkonen Esko
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Background: The discovery of surprisingly frequent patterns is of paramount interest in bioinformatics and computational biology. Among the patterns considered, those consisting of pairs of solid words that co-occur within a prescribed maximum distance-or gapped factors- emerge in a variety of contexts of DNA and protein sequence analysis. A few algorithms and tools have been developed in connection with specific formulations of the problem, however, none can handle comprehensively each of the multiple ways in which the distance between the two terms in a pair may be defined. Results: This paper presents efficient algorithms and tools for the extraction of all pairs of words up to an arbitrarily large length that co-occur surprisingly often in close proximity within a sequence. Whereas the number of such pairs in a sequence of n characters can be Θ(n 4), it is shown that an exhaustive discovery process can be carried out in O(n 2)orO(n 3), depending on the way distance is measured. This is made possible by a prudent combination of properties of pattern maximality and monotonicity of scores, which lead to reduce the number of word pairs to be weighed explicitly, while still producing also the scores attained by any of the pairs not explicitly considered. We applied our approach to the discovery of spaced dyads in DNA sequences. Conclusions: Experiments on biological datasets prove that the method is effective and much faster than exhaustive enumeration of candidate patterns. Software is available freely by academic users via the web interfac

CiteSeerX

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Seed-driven Learning of Position Probability Matrices from Large Sequence Sets

Author: Taipale Jussi
Toivonen Jarkko
Ukkonen Esko
Publication venue: Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Publication date: 01/01/2017
Field of study

Peer reviewe

Dagstuhl Research Online Publication Server

Helsingin yliopiston digitaalinen arkisto

Longest common substrings with k mismatches

Author: Flouri Tomas
Giaquinta Emanuele
Kobert Kassian
Ukkonen Esko
Publication venue
Publication date: 01/01/2015
Field of study

The longest common substring with k-mismatches problem is to find, given two strings S-1 and S-2, a longest substring A(1) of S-1 and A(2) of S-2 such that the Hamming distance between A(1) and A(2) isPeer reviewe

arXiv.org e-Print Archive

Elsevier - Publisher Connector

Crossref

Aaltodoc Publication Archive

Helsingin yliopiston digitaalinen arkisto

MODER2: First-order Markov Modeling and Discovery of Monomeric and Dimeric Binding Motifs

Author: Das Pratyush
Taipale Jussi
Toivonen Jarkko
Ukkonen Esko
Publication venue
Publication date: 01/05/2020
Field of study

Motivation: Position-specific probability matrices (PPMs, also called position-specific weight matrices) have been the dominating model for transcription factor (TF)-binding motifs in DNA. There is, however, increasing recent evidence of better performance of higher order models such as Markov models of order one, also called adjacent dinucleotide matrices (ADMs). ADMs can model dependencies between adjacent nucleotides, unlike PPMs. A modeling technique and software tool that would estimate such models simultaneously both for monomers and their dimers have been missing. Results: We present an ADM-based mixture model for monomeric and dimeric TF-binding motifs and an expectation maximization algorithm MODER2 for learning such models from training data and seeds. The model is a mixture that includes monomers and dimers, built from the monomers, with a description of the dimeric structure (spacing, orientation). The technique is modular, meaning that the co-operative effect of dimerization is made explicit by evaluating the difference between expected and observed models. The model is validated using HT-SELEX and generated datasets, and by comparing to some earlier PPM and ADM techniques. The ADM models explain data slightly better than PPM models for 314 tested TFs (or their DNA-binding domains) from four families (bHLH, bZIP, ETS and Homeodomain), the ADM mixture models by MODER2 being the best on average.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Accurate self-correction of errors in long reads using de Bruijn graphs

Author: Rivals Eric
Salmela Leena
Ukkonen Esko
Walve Riku
Publication venue
Publication date: 01/01/2016
Field of study

Peer reviewe

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

PubMed Central

Helsingin yliopiston digitaalinen arkisto