1,955 research outputs found

    A Simple Data-Adaptive Probabilistic Variant Calling Model

    Get PDF
    Background: Several sources of noise obfuscate the identification of single nucleotide variation (SNV) in next generation sequencing data. For instance, errors may be introduced during library construction and sequencing steps. In addition, the reference genome and the algorithms used for the alignment of the reads are further critical factors determining the efficacy of variant calling methods. It is crucial to account for these factors in individual sequencing experiments. Results: We introduce a simple data-adaptive model for variant calling. This model automatically adjusts to specific factors such as alignment errors. To achieve this, several characteristics are sampled from sites with low mismatch rates, and these are used to estimate empirical log-likelihoods. These likelihoods are then combined to a score that typically gives rise to a mixture distribution. From these we determine a decision threshold to separate potentially variant sites from the noisy background. Conclusions: In simulations we show that our simple proposed model is competitive with frequently used much more complex SNV calling algorithms in terms of sensitivity and specificity. It performs specifically well in cases with low allele frequencies. The application to next-generation sequencing data reveals stark differences of the score distributions indicating a strong influence of data specific sources of noise. The proposed model is specifically designed to adjust to these differences.Comment: 19 pages, 6 figure

    ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files

    Get PDF
    Abstract Background Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. Results A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. Conclusion The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.</p

    Direct measurement of molecular stiffness and damping in confined water layers

    Get PDF
    We present {\em direct} and {\em linear} measurements of the normal stiffness and damping of a confined, few molecule thick water layer. The measurements were obtained by use of a small amplitude (0.36 A˚\textrm{\AA}), off-resonance Atomic Force Microscopy (AFM) technique. We measured stiffness and damping oscillations revealing up to 7 layers separated by 2.56 ±\pm 0.20 A˚\textrm{\AA}. Relaxation times could also be calculated and were found to indicate a significant slow-down of the dynamics of the system as the confining separation was reduced. We found that the dynamics of the system is determined not only by the interfacial pressure, but more significantly by solvation effects which depend on the exact separation of tip and surface. Thus ` solidification\rq seems to not be merely a result of pressure and confinement, but depends strongly on how commensurate the confining cavity is with the molecule size. We were able to model the results by starting from the simple assumption that the relaxation time depends linearly on the film stiffness.Comment: 7 pages, 6 figures, will be submitted to PR

    Assessing the cost of global biodiversity and conservation knowledge

    Get PDF
    Knowledge products comprise assessments of authoritative information supported by stan-dards, governance, quality control, data, tools, and capacity building mechanisms. Considerable resources are dedicated to developing and maintaining knowledge productsfor biodiversity conservation, and they are widely used to inform policy and advise decisionmakers and practitioners. However, the financial cost of delivering this information is largelyundocumented. We evaluated the costs and funding sources for developing and maintain-ing four global biodiversity and conservation knowledge products: The IUCN Red List ofThreatened Species, the IUCN Red List of Ecosystems, Protected Planet, and the WorldDatabase of Key Biodiversity Areas. These are secondary data sets, built on primary datacollected by extensive networks of expert contributors worldwide. We estimate that US160million(range:US160million (range: US116–204 million), plus 293 person-years of volunteer time (range: 278–308 person-years) valued at US14million(rangeUS 14 million (range US12–16 million), were invested inthese four knowledge products between 1979 and 2013. More than half of this financingwas provided through philanthropy, and nearly three-quarters was spent on personnelcosts. The estimated annual cost of maintaining data and platforms for three of these knowl-edge products (excluding the IUCN Red List of Ecosystems for which annual costs were notpossible to estimate for 2013) is US6.5millionintotal(range:US6.5 million in total (range: US6.2–6.7 million). We esti-mated that an additional US114millionwillbeneededtoreachpredefinedbaselinesofdatacoverageforallthefourknowledgeproducts,andthatonceachieved,annualmaintenancecostswillbeapproximatelyUS114 million will be needed to reach pre-defined baselines ofdata coverage for all the four knowledge products, and that once achieved, annual mainte-nance costs will be approximately US12 million. These costs are much lower than those tomaintain many other, similarly important, global knowledge products. Ensuring that biodi-versity and conservation knowledge products are sufficiently up to date, comprehensiveand accurate is fundamental to inform decision-making for biodiversity conservation andsustainable development. Thus, the development and implementation of plans for sustain-able long-term financing for them is critical

    The RNA workbench: Best practices for RNA and high-throughput sequencing bioinformatics in Galaxy

    Get PDF
    RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis

    Genome Informatics for High-Throughput Sequencing Data Analysis: Methods and Applications

    Get PDF
    This thesis introduces three different algorithmical and statistical strategies for the analysis of high-throughput sequencing data. First, we introduce a heuristic method based on enhanced suffix arrays to map short sequences to larger reference genomes. The algorithm builds on the idea of an error-tolerant traversal of the suffix array for the reference genome in conjunction with the concept of matching statistics introduced by Chang and a bitvector based alignment algorithm proposed by Myers. The algorithm supports paired-end and mate-pair alignments and the implementation offers methods for primer detection, primer and poly-A trimming. In our own benchmarks as well as independent bench- marks this tool outcompetes other currently available tools with respect to sensitivity and specificity in simulated and real data sets for a large number of sequencing protocols. Second, we introduce a novel dynamic programming algorithm for the spliced alignment problem. The advantage of this algorithm is its capability to not only detect co-linear splice events, i.e. local splice events on the same genomic strand, but also circular and other non-collinear splice events. This succinct and simple algorithm handles all these cases at the same time with a high accuracy. While it is at par with other state- of-the-art methods for collinear splice events, it outcompetes other tools for many non-collinear splice events. The application of this method to publically available sequencing data led to the identification of a novel isoform of the tumor suppressor gene p53. Since this gene is one of the best studied genes in the human genome, this finding is quite remarkable and suggests that the application of our algorithm could help to identify a plethora of novel isoforms and genes. Third, we present a data adaptive method to call single nucleotide variations (SNVs) from aligned high-throughput sequencing reads. We demonstrate that our method based on empirical log-likelihoods automatically adjusts to the quality of a sequencing experiment and thus renders a \"decision\" on when to call an SNV. In our simulations this method is at par with current state-of-the-art tools. Finally, we present biological results that have been obtained using the special features of the presented alignment algorithm.Diese Arbeit stellt drei verschiedene algorithmische und statistische Strategien für die Analyse von Hochdurchsatz-Sequenzierungsdaten vor. Zuerst führen wir eine auf enhanced Suffixarrays basierende heuristische Methode ein, die kurze Sequenzen mit grossen Genomen aligniert. Die Methode basiert auf der Idee einer fehlertoleranten Traversierung eines Suffixarrays für Referenzgenome in Verbindung mit dem Konzept der Matching-Statistik von Chang und einem auf Bitvektoren basierenden Alignmentalgorithmus von Myers. Die vorgestellte Methode unterstützt Paired-End und Mate-Pair Alignments, bietet Methoden zur Erkennung von Primersequenzen und zum trimmen von Poly-A-Signalen an. Auch in unabhängigen Benchmarks zeichnet sich das Verfahren durch hohe Sensitivität und Spezifität in simulierten und realen Datensätzen aus. Für eine große Anzahl von Sequenzierungsprotokollen erzielt es bessere Ergebnisse als andere bekannte Short-Read Alignmentprogramme. Zweitens stellen wir einen auf dynamischer Programmierung basierenden Algorithmus für das spliced alignment problem vor. Der Vorteil dieses Algorithmus ist seine Fähigkeit, nicht nur kollineare Spleiß- Ereignisse, d.h. Spleiß-Ereignisse auf dem gleichen genomischen Strang, sondern auch zirkuläre und andere nicht-kollineare Spleiß-Ereignisse zu identifizieren. Das Verfahren zeichnet sich durch eine hohe Genauigkeit aus: während es bei der Erkennung kollinearer Spleiß-Varianten vergleichbare Ergebnisse mit anderen Methoden erzielt, schlägt es die Wettbewerber mit Blick auf Sensitivität und Spezifität bei der Vorhersage nicht-kollinearer Spleißvarianten. Die Anwendung dieses Algorithmus führte zur Identifikation neuer Isoformen. In unserer Publikation berichten wir über eine neue Isoform des Tumorsuppressorgens p53. Da dieses Gen eines der am besten untersuchten Gene des menschlichen Genoms ist, könnte die Anwendung unseres Algorithmus helfen, eine Vielzahl weiterer Isoformen bei weniger prominenten Genen zu identifizieren. Drittens stellen wir ein datenadaptives Modell zur Identifikation von Single Nucleotide Variations (SNVs) vor. In unserer Arbeit zeigen wir, dass sich unser auf empirischen log-likelihoods basierendes Modell automatisch an die Qualität der Sequenzierungsexperimente anpasst und eine \"Entscheidung\" darüber trifft, welche potentiellen Variationen als SNVs zu klassifizieren sind. In unseren Simulationen ist diese Methode auf Augenhöhe mit aktuell eingesetzten Verfahren. Schließlich stellen wir eine Auswahl biologischer Ergebnisse vor, die mit den Besonderheiten der präsentierten Alignmentverfahren in Zusammenhang stehen

    The RAPID-CTCA trial (Rapid Assessment of Potential Ischaemic Heart Disease with CTCA) - a multicentre parallel-group randomised trial to compare early computerised tomography coronary angiography versus standard care in patients presenting with suspected or confirmed acute coronary syndrome: study protocol for a randomised controlled trial.

    Get PDF
    BACKGROUND: Emergency department attendances with chest pain requiring assessment for acute coronary syndrome (ACS) are a major global health issue. Standard assessment includes history, examination, electrocardiogram (ECG) and serial troponin testing. Computerised tomography coronary angiography (CTCA) enables additional anatomical assessment of patients for coronary artery disease (CAD) but has only been studied in very low-risk patients. This trial aims to investigate the effect of early CTCA upon interventions, event rates and health care costs in patients with suspected/confirmed ACS who are at intermediate risk. METHODS/DESIGN: Participants will be recruited in about 35 tertiary and district general hospitals in the UK. Patients ≥18 years old with symptoms with suspected/confirmed ACS with at least one of the following will be included: (1) ECG abnormalities, e.g. ST-segment depression >0.5 mm; (2) history of ischaemic heart disease; (3) troponin elevation above the 99(th) centile of the normal reference range or increase in high-sensitivity troponin meeting European Society of Cardiology criteria for 'rule-in' of myocardial infarction (MI). The early use of ≥64-slice CTCA as part of routine assessment will be compared to standard care. The primary endpoint will be 1-year all-cause death or recurrent type 1 or type 4b MI at 1 year, measured as the time to such event. A number of secondary clinical, process and safety endpoints will be collected and analysed. Cost effectiveness will be estimated in terms of the lifetime incremental cost per quality-adjusted life year gained. We plan to recruit 2424 (2500 with ~3% drop-out) evaluable patients (1212 per arm) to have 90% power to detect a 20% versus 15% difference in 1-year death or recurrent type 1 MI or type 4b MI, two-sided p < 0.05. Analysis will be on an intention-to-treat basis. The relationship between intervention and the primary outcome will be analysed using Cox proportional hazard regression adjusted for study site (used to stratify the randomisation), age, baseline Global Registry of Acute Coronary Events score, previous CAD and baseline troponin level. The results will be expressed as a hazard ratio with the corresponding 95% confidence intervals and p value. DISCUSSION: The Rapid Assessment of Potential Ischaemic Heart Disease with CTCA (RAPID-CTCA) trial will recruit 2500 participants across about 35 hospital sites. It will be the first study to investigate the role of CTCA in the early assessment of patients with suspected or confirmed ACS who are at intermediate risk and including patients who have raised troponin measurements during initial assessment. TRIAL REGISTRATION: ISRCTN19102565 . Registered on 3 October 2014. ClinicalTrials.gov: NCT02284191

    Systematic computational hunting for small RNAs derived from ncRNAs during dengue virus infection in endothelial HMEC-1 cells

    Get PDF
    In recent years, a population of small RNA fragments derived from non-coding RNAs (sfd-RNAs) has gained significant interest due to its functional and structural resemblance to miRNAs, adding another level of complexity to our comprehension of small-RNA-mediated gene regulation. Despite this, scientists need more tools to test the differential expression of sfd-RNAs since the current methods to detect miRNAs may not be directly applied to them. The primary reasons are the lack of accurate small RNA and ncRNA annotation, the multi-mapping read (MMR) placement, and the multicopy nature of ncRNAs in the human genome. To solve these issues, a methodology that allows the detection of differentially expressed sfd-RNAs, including canonical miRNAs, by using an integrated copy-number-corrected ncRNA annotation was implemented. This approach was coupled with sixteen different computational strategies composed of combinations of four aligners and four normalization methods to provide a rank-order of prediction for each differentially expressed sfd-RNA. By systematically addressing the three main problems, we could detect differentially expressed miRNAs and sfd-RNAs in dengue virus-infected human dermal microvascular endothelial cells. Although more biological evaluations are required, two molecular targets of the hsa-mir-103a and hsa-mir-494 (CDK5 and PI3/AKT) appear relevant for dengue virus (DENV) infections. Here, we performed a comprehensive annotation and differential expression analysis, which can be applied in other studies addressing the role of small fragment RNA populations derived from ncRNAs in virus infection

    Penilaian Kinerja Keuangan Koperasi di Kabupaten Pelalawan

    Full text link
    This paper describe development and financial performance of cooperative in District Pelalawan among 2007 - 2008. Studies on primary and secondary cooperative in 12 sub-districts. Method in this stady use performance measuring of productivity, efficiency, growth, liquidity, and solvability of cooperative. Productivity of cooperative in Pelalawan was highly but efficiency still low. Profit and income were highly, even liquidity of cooperative very high, and solvability was good
    corecore