1,955 research outputs found
A Simple Data-Adaptive Probabilistic Variant Calling Model
Background: Several sources of noise obfuscate the identification of single
nucleotide variation (SNV) in next generation sequencing data. For instance,
errors may be introduced during library construction and sequencing steps. In
addition, the reference genome and the algorithms used for the alignment of the
reads are further critical factors determining the efficacy of variant calling
methods. It is crucial to account for these factors in individual sequencing
experiments.
Results: We introduce a simple data-adaptive model for variant calling. This
model automatically adjusts to specific factors such as alignment errors. To
achieve this, several characteristics are sampled from sites with low mismatch
rates, and these are used to estimate empirical log-likelihoods. These
likelihoods are then combined to a score that typically gives rise to a mixture
distribution. From these we determine a decision threshold to separate
potentially variant sites from the noisy background.
Conclusions: In simulations we show that our simple proposed model is
competitive with frequently used much more complex SNV calling algorithms in
terms of sensitivity and specificity. It performs specifically well in cases
with low allele frequencies. The application to next-generation sequencing data
reveals stark differences of the score distributions indicating a strong
influence of data specific sources of noise. The proposed model is specifically
designed to adjust to these differences.Comment: 19 pages, 6 figure
ORFer – retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files
Abstract Background Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. Results A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. Conclusion The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.</p
Direct measurement of molecular stiffness and damping in confined water layers
We present {\em direct} and {\em linear} measurements of the normal stiffness
and damping of a confined, few molecule thick water layer. The measurements
were obtained by use of a small amplitude (0.36 ), off-resonance
Atomic Force Microscopy (AFM) technique. We measured stiffness and damping
oscillations revealing up to 7 layers separated by 2.56 0.20
. Relaxation times could also be calculated and were found to
indicate a significant slow-down of the dynamics of the system as the confining
separation was reduced. We found that the dynamics of the system is determined
not only by the interfacial pressure, but more significantly by solvation
effects which depend on the exact separation of tip and surface. Thus `
solidification\rq seems to not be merely a result of pressure and confinement,
but depends strongly on how commensurate the confining cavity is with the
molecule size. We were able to model the results by starting from the simple
assumption that the relaxation time depends linearly on the film stiffness.Comment: 7 pages, 6 figures, will be submitted to PR
Assessing the cost of global biodiversity and conservation knowledge
Knowledge products comprise assessments of authoritative information supported by stan-dards, governance, quality control, data, tools, and capacity building mechanisms. Considerable resources are dedicated to developing and maintaining knowledge productsfor biodiversity conservation, and they are widely used to inform policy and advise decisionmakers and practitioners. However, the financial cost of delivering this information is largelyundocumented. We evaluated the costs and funding sources for developing and maintain-ing four global biodiversity and conservation knowledge products: The IUCN Red List ofThreatened Species, the IUCN Red List of Ecosystems, Protected Planet, and the WorldDatabase of Key Biodiversity Areas. These are secondary data sets, built on primary datacollected by extensive networks of expert contributors worldwide. We estimate that US116–204 million), plus 293 person-years of volunteer time (range: 278–308 person-years) valued at US12–16 million), were invested inthese four knowledge products between 1979 and 2013. More than half of this financingwas provided through philanthropy, and nearly three-quarters was spent on personnelcosts. The estimated annual cost of maintaining data and platforms for three of these knowl-edge products (excluding the IUCN Red List of Ecosystems for which annual costs were notpossible to estimate for 2013) is US6.2–6.7 million). We esti-mated that an additional US12 million. These costs are much lower than those tomaintain many other, similarly important, global knowledge products. Ensuring that biodi-versity and conservation knowledge products are sufficiently up to date, comprehensiveand accurate is fundamental to inform decision-making for biodiversity conservation andsustainable development. Thus, the development and implementation of plans for sustain-able long-term financing for them is critical
The RNA workbench: Best practices for RNA and high-throughput sequencing bioinformatics in Galaxy
RNA-based regulation has become a major research topic in molecular biology. The analysis of epigenetic and expression data is therefore incomplete if RNA-based regulation is not taken into account. Thus, it is increasingly important but not yet standard to combine RNA-centric data and analysis tools with other types of experimental data such as RNA-seq or ChIP-seq. Here, we present the RNA workbench, a comprehensive set of analysis tools and consolidated workflows that enable the researcher to combine these two worlds. Based on the Galaxy framework the workbench guarantees simple access, easy extension, flexible adaption to personal and security needs, and sophisticated analyses that are independent of command-line knowledge. Currently, it includes more than 50 bioinformatics tools that are dedicated to different research areas of RNA biology including RNA structure analysis, RNA alignment, RNA annotation, RNA-protein interaction, ribosome profiling, RNA-seq analysis and RNA target prediction. The workbench is developed and maintained by experts in RNA bioinformatics and the Galaxy framework. Together with the growing community evolving around this workbench, we are committed to keep the workbench up-to-date for future standards and needs, providing researchers with a reliable and robust framework for RNA data analysis
Genome Informatics for High-Throughput Sequencing Data Analysis: Methods and Applications
This thesis introduces three different algorithmical and statistical strategies for the analysis of high-throughput sequencing data. First, we introduce a heuristic method based on enhanced suffix arrays to map short sequences to larger reference genomes. The algorithm builds on the idea of an error-tolerant traversal of the suffix array for the reference genome in conjunction with the concept of matching statistics introduced by Chang and a bitvector based alignment algorithm proposed by Myers. The algorithm supports paired-end and mate-pair alignments and the implementation offers methods for primer detection, primer and poly-A trimming. In our own benchmarks as well as independent bench- marks this tool outcompetes other currently available tools with respect to sensitivity and specificity in simulated and real data sets for a large number of sequencing protocols. Second, we introduce a novel dynamic programming algorithm for the spliced alignment problem. The advantage of this algorithm is its capability to not only detect co-linear splice events, i.e. local splice events on the same genomic strand, but also circular and other non-collinear splice events. This succinct and simple algorithm handles all these cases at the same time with a high accuracy. While it is at par with other state- of-the-art methods for collinear splice events, it outcompetes other tools for many non-collinear splice events. The application of this method to publically available sequencing data led to the identification of a novel isoform of the tumor suppressor gene p53. Since this gene is one of the best studied genes in the human genome, this finding is quite remarkable and suggests that the application of our algorithm could help to identify a plethora of novel isoforms and genes. Third, we present a data adaptive method to call single nucleotide variations (SNVs) from aligned high-throughput sequencing reads. We demonstrate that our method based on empirical log-likelihoods automatically adjusts to the quality of a sequencing experiment and thus renders a \"decision\" on when to call an SNV. In our simulations this method is at par with current state-of-the-art tools. Finally, we present biological results that have been obtained using the special features of the presented alignment algorithm.Diese Arbeit stellt drei verschiedene algorithmische und statistische Strategien für die Analyse von Hochdurchsatz-Sequenzierungsdaten vor. Zuerst führen wir eine auf enhanced Suffixarrays basierende heuristische Methode ein, die kurze Sequenzen mit grossen Genomen aligniert. Die Methode basiert auf der Idee einer fehlertoleranten Traversierung eines Suffixarrays für Referenzgenome in Verbindung mit dem Konzept der Matching-Statistik von Chang und einem auf Bitvektoren basierenden Alignmentalgorithmus von Myers. Die vorgestellte Methode unterstützt Paired-End und Mate-Pair Alignments, bietet Methoden zur Erkennung von Primersequenzen und zum trimmen von Poly-A-Signalen an. Auch in unabhängigen Benchmarks zeichnet sich das Verfahren durch hohe Sensitivität und Spezifität in simulierten und realen Datensätzen aus. Für eine große Anzahl von Sequenzierungsprotokollen erzielt es bessere Ergebnisse als andere bekannte Short-Read Alignmentprogramme. Zweitens stellen wir einen auf dynamischer Programmierung basierenden Algorithmus für das spliced alignment problem vor. Der Vorteil dieses Algorithmus ist seine Fähigkeit, nicht nur kollineare Spleiß- Ereignisse, d.h. Spleiß-Ereignisse auf dem gleichen genomischen Strang, sondern auch zirkuläre und andere nicht-kollineare Spleiß-Ereignisse zu identifizieren. Das Verfahren zeichnet sich durch eine hohe Genauigkeit aus: während es bei der Erkennung kollinearer Spleiß-Varianten vergleichbare Ergebnisse mit anderen Methoden erzielt, schlägt es die Wettbewerber mit Blick auf Sensitivität und Spezifität bei der Vorhersage nicht-kollinearer Spleißvarianten. Die Anwendung dieses Algorithmus führte zur Identifikation neuer Isoformen. In unserer Publikation berichten wir über eine neue Isoform des Tumorsuppressorgens p53. Da dieses Gen eines der am besten untersuchten Gene des menschlichen Genoms ist, könnte die Anwendung unseres Algorithmus helfen, eine Vielzahl weiterer Isoformen bei weniger prominenten Genen zu identifizieren. Drittens stellen wir ein datenadaptives Modell zur Identifikation von Single Nucleotide Variations (SNVs) vor. In unserer Arbeit zeigen wir, dass sich unser auf empirischen log-likelihoods basierendes Modell automatisch an die Qualität der Sequenzierungsexperimente anpasst und eine \"Entscheidung\" darüber trifft, welche potentiellen Variationen als SNVs zu klassifizieren sind. In unseren Simulationen ist diese Methode auf Augenhöhe mit aktuell eingesetzten Verfahren. Schließlich stellen wir eine Auswahl biologischer Ergebnisse vor, die mit den Besonderheiten der präsentierten Alignmentverfahren in Zusammenhang stehen
The RAPID-CTCA trial (Rapid Assessment of Potential Ischaemic Heart Disease with CTCA) - a multicentre parallel-group randomised trial to compare early computerised tomography coronary angiography versus standard care in patients presenting with suspected or confirmed acute coronary syndrome: study protocol for a randomised controlled trial.
BACKGROUND: Emergency department attendances with chest pain requiring assessment for acute coronary syndrome (ACS) are a major global health issue. Standard assessment includes history, examination, electrocardiogram (ECG) and serial troponin testing. Computerised tomography coronary angiography (CTCA) enables additional anatomical assessment of patients for coronary artery disease (CAD) but has only been studied in very low-risk patients. This trial aims to investigate the effect of early CTCA upon interventions, event rates and health care costs in patients with suspected/confirmed ACS who are at intermediate risk. METHODS/DESIGN: Participants will be recruited in about 35 tertiary and district general hospitals in the UK. Patients ≥18 years old with symptoms with suspected/confirmed ACS with at least one of the following will be included: (1) ECG abnormalities, e.g. ST-segment depression >0.5 mm; (2) history of ischaemic heart disease; (3) troponin elevation above the 99(th) centile of the normal reference range or increase in high-sensitivity troponin meeting European Society of Cardiology criteria for 'rule-in' of myocardial infarction (MI). The early use of ≥64-slice CTCA as part of routine assessment will be compared to standard care. The primary endpoint will be 1-year all-cause death or recurrent type 1 or type 4b MI at 1 year, measured as the time to such event. A number of secondary clinical, process and safety endpoints will be collected and analysed. Cost effectiveness will be estimated in terms of the lifetime incremental cost per quality-adjusted life year gained. We plan to recruit 2424 (2500 with ~3% drop-out) evaluable patients (1212 per arm) to have 90% power to detect a 20% versus 15% difference in 1-year death or recurrent type 1 MI or type 4b MI, two-sided p < 0.05. Analysis will be on an intention-to-treat basis. The relationship between intervention and the primary outcome will be analysed using Cox proportional hazard regression adjusted for study site (used to stratify the randomisation), age, baseline Global Registry of Acute Coronary Events score, previous CAD and baseline troponin level. The results will be expressed as a hazard ratio with the corresponding 95% confidence intervals and p value. DISCUSSION: The Rapid Assessment of Potential Ischaemic Heart Disease with CTCA (RAPID-CTCA) trial will recruit 2500 participants across about 35 hospital sites. It will be the first study to investigate the role of CTCA in the early assessment of patients with suspected or confirmed ACS who are at intermediate risk and including patients who have raised troponin measurements during initial assessment. TRIAL REGISTRATION: ISRCTN19102565 . Registered on 3 October 2014. ClinicalTrials.gov: NCT02284191
Systematic computational hunting for small RNAs derived from ncRNAs during dengue virus infection in endothelial HMEC-1 cells
In recent years, a population of small RNA fragments derived from non-coding RNAs (sfd-RNAs) has gained significant interest due to its functional and structural resemblance to miRNAs, adding another level of complexity to our comprehension of small-RNA-mediated gene regulation. Despite this, scientists need more tools to test the differential expression of sfd-RNAs since the current methods to detect miRNAs may not be directly applied to them. The primary reasons are the lack of accurate small RNA and ncRNA annotation, the multi-mapping read (MMR) placement, and the multicopy nature of ncRNAs in the human genome. To solve these issues, a methodology that allows the detection of differentially expressed sfd-RNAs, including canonical miRNAs, by using an integrated copy-number-corrected ncRNA annotation was implemented. This approach was coupled with sixteen different computational strategies composed of combinations of four aligners and four normalization methods to provide a rank-order of prediction for each differentially expressed sfd-RNA. By systematically addressing the three main problems, we could detect differentially expressed miRNAs and sfd-RNAs in dengue virus-infected human dermal microvascular endothelial cells. Although more biological evaluations are required, two molecular targets of the hsa-mir-103a and hsa-mir-494 (CDK5 and PI3/AKT) appear relevant for dengue virus (DENV) infections. Here, we performed a comprehensive annotation and differential expression analysis, which can be applied in other studies addressing the role of small fragment RNA populations derived from ncRNAs in virus infection
Recommended from our members
Analysis of the African coelacanth genome sheds light on tetrapod evolution
It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution
Penilaian Kinerja Keuangan Koperasi di Kabupaten Pelalawan
This paper describe development and financial performance of cooperative in District Pelalawan among 2007 - 2008. Studies on primary and secondary cooperative in 12 sub-districts. Method in this stady use performance measuring of productivity, efficiency, growth, liquidity, and solvability of cooperative. Productivity of cooperative in Pelalawan was highly but efficiency still low. Profit and income were highly, even liquidity of cooperative very high, and solvability was good
- …
