Search CORE

84 research outputs found

Functional Sequence Annotation in an Error-prone Environment

Author: Koskinen Patrik
Publication venue: 'University of Helsinki Libraries'
Publication date: 22/08/2014
Field of study

As more and more sequences are submitted to public databases, so will grow more computationally challenging sequence retrieval systems. When for example the UniProtKB/TrEMBL doubles in size annually, the tools used today might not be sufficient tomorrow. Faster and computationally lighter methods are needed for sequence retrieval. This study presents a computationally more efficient tool. The Suffix Array Neighbourhood Search (SANS) tool is a hundred fold faster than the most commonly used tool BLAST. The sequence databases do not only grow in size but also in the number of different functional annotations they contain. Recent studies have shown that a large number of these annotations are assigned incorrectly. When the error level of functional annotations in the databases grows to a statistically significant figure, better methods and the use of error detection statistics are highly recommended. In the present study we introduce novel methods for weighted statistical testing of functional annotations. Also novel methods for the calculation of information content value are presented. The information content value enables the discrimination of informative from uninformative annotations. A growing number of functional annotation tools are introduced annually. Since no gold standard evaluation sets exist, it is impossible to determine the reliability of the different methods. The Critical Assessment of Functional Annotations (CAFA) challenge is the first attempt to evaluate functional annotation tools by using blind testing on a large scale. The first CAFA challenge included the evaluation of 54 state-of-the-art methods in two different Gene Ontology categories. The results show that there is a plenty of room for improvement in the prediction accuracy of the existing tools.Samaan aikaan, kun uusia sekvenssejä lisätään kiihtyvällä vauhdilla julkisiin biologisiin sekvenssitietopankkeihin, tietopankkien käyttäjät kohtaavat haasteita massivisten tietomäärien käsittelyssä. Esimerkiksi UniProtKB sekvenssitietokannan koko kaksinkertaistuu vuosittain, mikä johtaa väistämättä siihen tilanteeseen, että nykyisin käytössä olevat algoritmit tiedon etsimiseen vanhentuvat, koska eivät vastaa tehokkuudeltaan tulevaisuuden haasteita. Uusia, laskennallisesti tehokkaampia menetelmiä tarvitaan jatkuvasti. Tässä väitöskirjassa esitellään menetelmä joka on laskennallisesti tehokkaampi kuin nykyisin käytössä olevat menetelmät. Väitöskirjassa esitellyllä SANS algoritmilla päästään satakertaisiin parannuksiin suoritusajoissa verrattuna yleisimpään käytössä olevaan ohjelmaan BLAST. Biologiset sekvenssitietokannat eivät kasva ainoastaan niiden sekvenssimäärissä. Samalla kasvaa sekvensseihin liittyvä tiedon määrä. Viime aikoina kuitenkin on herännyt huolen aiheita tiedon oikeellisuuden puolesta. On arvioitu, että miltei puolet sekvenssitietokantojen tiedosta on virheellistä. Virheellisen tiedon käyttäminen esimerkiksi tutkimuksessa johtaa helposti vääriin johtopäätöksiin ja vääriin tuloksiin. Tässä väitöskirjassa esitellään menetelmä PANNZER, joka laskee tilastollisesti haetun tiedon luotettavuutta ja näin maksimoi tiedon oikeellisuuden. Oikeellisen tiedon saaminen julkisista biologisista sekvenssitietokannoista on kasvavissa määrin haasteellisempaa. Tähän ollaan herätty myös kansainvälisissä tutkijaryhmissä. Yksi tapa mitata olemassa olevien menetelmien suorituskykyä oikeellisen tiedon etsimisessä on järjestää kansainvälinen kilpailu tiedonhakumenetelmille. Ensimmäiseen kilpailuun nimeltä Critical Assessment of Functional Annotations (CAFA) osallistui 54 kilpailevaa menetelmää ympäri maailman. Tässä väitöskirjassa käsitellään myös kyseistä kilpailua sekä sen tuloksia

Helsingin yliopiston digitaalinen arkisto

BARCOSEL: a tool for selecting an optimal barcode set for high-throughput sequencing

Author: Auvinen Petri
Holm Liisa
Koskinen Patrik
Mei Peng
Paulin Lars
Somervuo Panu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

Abstract Background Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection. Results We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel . Conclusions Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto

BARCOSEL : a tool for selecting an optimal barcode set for high-throughput sequencing

Author: Auvinen Petri
Holm Liisa
Koskinen Patrik
Mei Peng
Paulin Lars
Somervuo Panu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/07/2018
Field of study

Background: Current high-throughput sequencing platforms provide capacity to sequence multiple samples in parallel. Different samples are labeled by attaching a short sample specific nucleotide sequence, barcode, to each DNA molecule prior pooling them into a mix containing a number of libraries to be sequenced simultaneously. After sequencing, the samples are binned by identifying the barcode sequence within each sequence read. In order to tolerate sequencing errors, barcodes should be sufficiently apart from each other in sequence space. An additional constraint due to both nucleotide usage and basecalling accuracy is that the proportion of different nucleotides should be in balance in each barcode position. The number of samples to be mixed in each sequencing run may vary and this introduces a problem how to select the best subset of available barcodes at sequencing core facility for each sequencing run. There are plenty of tools available for de novo barcode design, but they are not suitable for subset selection. Results: We have developed a tool which can be used for three different tasks: 1) selecting an optimal barcode set from a larger set of candidates, 2) checking the compatibility of user-defined set of barcodes, e.g. whether two or more libraries with existing barcodes can be combined in a single sequencing pool, and 3) augmenting an existing set of barcodes. In our approach the selection process is formulated as a minimization problem. We define the cost function and a set of constraints and use integer programming to solve the resulting combinatorial problem. Based on the desired number of barcodes to be selected and the set of candidate sequences given by user, the necessary constraints are automatically generated and the optimal solution can be found. The method is implemented in C programming language and web interface is available at http://ekhidna2.biocenter.helsinki.fi/barcosel. Conclusions: Increasing capacity of sequencing platforms raises the challenge of mixing barcodes. Our method allows the user to select a given number of barcodes among the larger existing barcode set so that both sequencing errors are tolerated and the nucleotide balance is optimized. The tool is easy to access via web browser.Peer reviewe

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto

Multi-type quantum well semiconductor membrane external-cavity surface-emitting lasers (MECSELs) for widely tunable continuous wave operation

Author: Guina Mircea
Kahle Hermann
Koskinen Jesse
Phung Hoy-My
Rajala Patrik
Ranta Sanna
Tatar-Mathes Philipp
Publication venue
Publication date: 11/09/2023
Field of study

Membrane external-cavity surface-emitting lasers (MECSELs) are at the forefront of pushing the performance limits of vertically emitting semiconductor lasers. Their simple idea of using just a very thin (hundreds of nanometers to few microns) gain membrane opens up new possibilities through uniform double side optical pumping and superior heat extraction from the active area. Moreover, these advantages of MECSELs enable more complex band gap engineering possibilities for the active region by the introduction of multiple types of quantum wells (QWs) to a single laser gain structure. In this paper, we present a new design strategy for laser gain structures with several types of QWs. The aim is to achieve broadband gain with relatively high power operation and potentially a flat spectral tuning range. The emphasis in our design is on ensuring sufficient gain over a wide wavelength range, having uniform pump absorption, and restricted carrier mobility between the different quantum wells during laser operation. A full-width half-maximum tuning range of > 70 nm (> 21.7 THz) with more than 125 mW of power through the entire tuning range at room temperature is demonstrated

arXiv.org e-Print Archive

Genomic features separating ten strains of Neorhizobium galegae with different symbiotic phenotypes

Author: Koskinen Patrik
Lindström Kristina
Mousavi Seyed A
Paulin Lars
Österman Janina
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/05/2015
Field of study

Abstract Background The symbiotic phenotype of Neorhizobium galegae, with strains specifically fixing nitrogen with either Galega orientalis or G. officinalis, has made it a target in research on determinants of host specificity in nitrogen fixation. The genomic differences between representative strains of the two symbiovars are, however, relatively small. This introduced a need for a dataset representing a larger bacterial population in order to make better conclusions on characteristics typical for a subset of the species. In this study, we produced draft genomes of eight strains of N. galegae having different symbiotic phenotypes, both with regard to host specificity and nitrogen fixation efficiency. These genomes were analysed together with the previously published complete genomes of N. galegae strains HAMBI 540T and HAMBI 1141. Results The results showed that the presence of an additional rpoN sigma factor gene in the symbiosis gene region is a characteristic specific to symbiovar orientalis, required for nitrogen fixation. Also the nifQ gene was shown to be crucial for functional symbiosis in both symbiovars. Genome-wide analyses identified additional genes characteristic of strains of the same symbiovar and of strains having similar plant growth promoting properties on Galega orientalis. Many of these genes are involved in transcriptional regulation or in metabolic functions. Conclusions The results of this study confirm that the only symbiosis-related gene that is present in one symbiovar of N. galegae but not in the other is an rpoN gene. The specific function of this gene remains to be determined, however. New genes that were identified as specific for strains of one symbiovar may be involved in determining host specificity, while others are defined as potential determinant genes for differences in efficiency of nitrogen fixation

Crossref

Springer - Publisher Connector

PubMed Central

Helsingin yliopiston digitaalinen arkisto

gapFinisher: A reliable gap filling pipeline for SSPACE-LongRead scaffolder output

Author: Auvinen Petri
Jernvall Jukka
Kammonen Juhana I.
Koskinen Patrik
Laine Pia
Paulin Lars
Pereira Pedro A. B.
Smolander Olli-Pekka
Publication venue
Publication date: 01/01/2019
Field of study

Unknown sequences, or gaps, are present in many published genomes across public databases. Gap filling is an important finishing step in de novo genome assembly, especially in large genomes. The gap filling problem is nontrivial and while there are many computational tools partially solving the problem, several have shortcomings as to the reliability and correctness of the output, i.e. the gap filled draft genome. SSPACE-LongRead is a scaffolding tool that utilizes long reads from multiple third-generation sequencing platforms in finding links between contigs and combining them. The long reads potentially contain sequence information to fill the gaps created in the scaffolding, but SSPACE-LongRead currently lacks this functionality. We present an automated pipeline called gapFinisher to process SSPACE-LongRead output to fill gaps after the scaffolding. gapFinisher is based on the controlled use of a previously published gap filling tool FGAP and works on all standard Linux/UNIX command lines. We compare the performance of gapFinisher against two other published gap filling tools PBJelly and GMcloser. We conclude that gapFinisher can fill gaps in draft genomes quickly and reliably. In addition, the serial design of gapFinisher makes it scale well from prokaryote genomes to larger genomes with no increase in the computational footprint.Peer reviewe

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto

Genome sequence of the model plant pathogen Pectobacterium carotovorum SCC1

Author: Auvinen Petri
Harjunpaa Heidi
Holm Liisa
Koskinen Patrik
Laine Pia
Niemi Outi
Nykyri Johanna
Palva E. Tapio
Pasanen Miia
Paulin Lars
Pennanen Ville
Pirhonen Minna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 20/12/2017
Field of study

Bacteria of the genus Pectobacterium are economically important plant pathogens that cause soft rot disease on a wide variety of plant species. Here, we report the genome sequence of Pectobacterium carotovorum strain SCC1, a Finnish soft rot model strain isolated from a diseased potato tuber in the early 1980's. The genome of strain SCC1 consists of one circular chromosome of 4,974,798 bp and one circular plasmid of 5524 bp. In total 4451 genes were predicted, of which 4349 are protein coding and 102 are RNA genes.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Genomic features separating ten strains of Neorhizobium galegae with different symbiotic phenotypes

Author: Koskinen Jani Patrik
Lindström Anna Kristina
Mousavi Seyed Abdollah
Paulin Lars Göran
Österman Janina Maria
Publication venue
Publication date: 02/05/2015
Field of study

Background The symbiotic phenotype of Neorhizobium galegae, with strains specifically fixing nitrogen with either Galega orientalis or G. officinalis, has made it a target in research on determinants of host specificity in nitrogen fixation. The genomic differences between representative strains of the two symbiovars are, however, relatively small. This introduced a need for a dataset representing a larger bacterial population in order to make better conclusions on characteristics typical for a subset of the species. In this study, we produced draft genomes of eight strains of N. galegae having different symbiotic phenotypes, both with regard to host specificity and nitrogen fixation efficiency. These genomes were analysed together with the previously published complete genomes of N. galegae strains HAMBI 540T and HAMBI 1141. Results The results showed that the presence of an additional rpoN sigma factor gene in the symbiosis gene region is a characteristic specific to symbiovar orientalis, required for nitrogen fixation. Also the nifQ gene was shown to be crucial for functional symbiosis in both symbiovars. Genome-wide analyses identified additional genes characteristic of strains of the same symbiovar and of strains having similar plant growth promoting properties on Galega orientalis. Many of these genes are involved in transcriptional regulation or in metabolic functions. Conclusions The results of this study confirm that the only symbiosis-related gene that is present in one symbiovar of N. galegae but not in the other is an rpoN gene. The specific function of this gene remains to be determined, however. New genes that were identified as specific for strains of one symbiovar may be involved in determining host specificity, while others are defined as potential determinant genes for differences in efficiency of nitrogen fixation.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Design and characterization of MECSELs for widely tunable (>25 THz) continuous wave operation

Author: Guina Mircea
Kahle Hermann
Koskinen Jesse
Phung Hoy My
Rajala Patrik
Ranta Sanna
Tatar-Mathes Philipp
Publication venue: 'Instytut Dermatologii Radoslaw Spiewak'
Publication date: 04/03/2022
Field of study

Membrane external-cavity surface-emitting lasers (MECSELs) are vertically emitting semiconductor lasers that combine all the benefits of VECSELs (vertical-external-cavity surface-emitting lasers) with the new degree of freedom in creating gain structures without monolithically integrated distributed Bragg reflectors (DBRs). The absence of the DBR and the substrate, and the use of a very thin gain membrane (typically some hundreds of nanometers), which can be sandwiched between two transparent heat spreaders, represents the best solution for heat removal. The membrane configuration also allows the option of double side pumping, which in turn makes it possible to utilize an extensive amount of quantum well (QW) groups as well as multiple kinds of QWs in a periodic laser gain structure. Here we report on design strategy and results of different kinds of approaches on broadband, relatively high power MECSEL gain structures. Especially efficient pump absorption, sufficient gain on several different wavelengths and carrier mobility during laser operation, are discussed. We also present the characteristics of the laser systems created. Results show ∼83 nm (∼25 THz) tuning range with more than 100 mW of power at all wavelengths at room temperature operation. Strategies for further development are discussed as well.Peer reviewe

Trepo - Institutional Repository of Tampere University

Genome Sequence of Dickeya solani, a New soft Rot Pathogen of Potato, Suggests its Emergence May Be Related to a Novel Combination of Non-Ribosomal Peptide/Polyketide Synthetase Clusters

Author: Auvinen Petri
Garlant Linda
Holm Liisa
Koskinen Patrik
Laine Pia
Paulin Lars
Pirhonen Minna
Rouhiainen Leo A O
Publication venue
Publication date: 01/01/2013
Field of study

Peer reviewe

Crossref

Directory of Open Access Journals

Helsingin yliopiston digitaalinen arkisto