7,803 research outputs found
PILER-CR: Fast and accurate identification of CRISPR repeats
BACKGROUND: Sequencing of prokaryotic genomes has recently revealed the presence of CRISPR elements: short, highly conserved repeats separated by unique sequences of similar length. The distinctive sequence signature of CRISPR repeats can be found using general-purpose repeat- or pattern-finding software tools. However, the output of such tools is not always ideal for studying these repeats, and significant effort is sometimes needed to build additional tools and perform manual analysis of the output. RESULTS: We present PILER-CR, a program specifically designed for the identification and analysis of CRISPR repeats. The program executes rapidly, completing a 5 Mb genome in around 5 seconds on a current desktop computer. We validate the algorithm by manual curation and by comparison with published surveys of these repeats, finding that PILER-CR has both high sensitivity and high specificity. We also present a catalogue of putative CRISPR repeats identified in a comprehensive analysis of 346 prokaryotic genomes. CONCLUSION: PILER-CR is a useful tool for rapid identification and classification of CRISPR repeats. The software is donated to the public domain. Source code and a Linux binary are freely available at
Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution
The standard approach to analyzing 16S tag sequence data, which relies on
clustering reads by sequence similarity into Operational Taxonomic Units
(OTUs), underexploits the accuracy of modern sequencing technology. We present
a clustering-free approach to multi-sample Illumina datasets that can identify
independent bacterial subpopulations regardless of the similarity of their 16S
tag sequences. Using published data from a longitudinal time-series study of
human tongue microbiota, we are able to resolve within standard 97% similarity
OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S
tags differing by as little as 1 nucleotide (99.2% similarity). A comparative
analysis of oral communities of two cohabiting individuals reveals that most
such subpopulations are shared between the two communities at 100% sequence
identity, and that dynamical similarity between subpopulations in one host is
strongly predictive of dynamical similarity between the same subpopulations in
the other host. Our method can also be applied to samples collected in
cross-sectional studies and can be used with the 454 sequencing platform. We
discuss how the sub-OTU resolution of our approach can provide new insight into
factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures +
supplement. Significantly revised for clarity, references added, results not
change
Expedited batch processing and analysis of transposon insertions
<p>Abstract</p> <p>Background</p> <p>With advances in sequencing technology, greater and greater amounts of eukaryotic genome data are becoming available. Often, large portions of these genomes consist of transposable elements, frequently accounting for 50% or more in vertebrates. Each transposable element family may have thousands or tens of thousands of individual copies within a given genome, and therefore it can take an exorbitant amount of time and effort to process data in a meaningful fashion.</p> <p>Findings</p> <p>In order to combat this problem, we developed a set of bioinformatics techniques and programs to streamline the analysis. This includes a unique Perl script which automates the process of taking BLAST, Repeatmasker and similar data to extract and manipulate the hit sequences from the genome. This script, called Process_hits uses an object-oriented methodology to compile all hit locations from a given file for processing, organize this data into useable categories, and output it in multiple formats.</p> <p>Conclusions</p> <p>The program proved capable of handling large amounts of transposon data in an efficient fashion. It is equipped with a number of useful sub-functions, each of which is contained within its own sub-module to allow for greater expandability and as a foundation for future program design.</p
Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently
Motif Minang Kaluak Paku Kacang Balimbiang pada Busana Kasual
Minangkabau sebagai salah satu suku bangsa yang mengisi kekhasan
budaya Indonesia memiliki warisan budaya yang terpencar dalam berbagai aspek
kehidupannya. Salah satu warisan budaya adalah seni ukir. Seni ukir yang
dikembangkan dengan mengambil ide dari alam memiliki makna-makna filosofi
bagi kehidupan masyarakat Minangkabau. Semua jenis ukiran yang dipahatkan di
Rumah Gadang menunjukkan unsur penting pembentuk budaya Minangkabau
bercerminkan kepada apa yang ada di alam. Salah satu ukiran pada rumah gadang
yaitu kaluak paku. Kaluak paku adalah nama salah satu motif ukiran dalam adat
Minangkabau. Berasal dari motif gulungan (kelukan/kaluak) pada ujung tanaman
pakis (paku) yang masih muda. Ukiran kaluak paku rumah gadang melambangkan
tanggung jawab seorang lelaki dalam adat Minangkabau kepada generasi penerus,
sebagai ayah dari anak-anaknya dan sebagai mamak dari kemenakan (keponakan).
Ukiran rumah gadang kaluak paku minangkabau inilah yang menjadi sumber ide
penciptaan busana pada tugas akhir ini.
Pada Penciptaan karya ini menggunakan beberapa metode, yaitu metode
pendekatan estetis dan ergonomis, metode pengumpulan data dengan studi
pustaka, dan motode penciptaan dengan teori Gustami Sp 3 tahap 6 Langkah.
Dalam proses pembuatan karya dibutuhkan beberapa data, cara pengumpulan data
acuan berdasarkan pengumpulan data pustaka yaitu berupa buku, jurnal pada
media sosial, serta aplikasi pada smartphone seperti pinterest. Data yang
dikumpulkan yang paling utama adalah gambar bentuk visual dari ukiran tanaman
kaluak paku minangkabau dan busana kasual.
Penciptaan karya yang dihasilkan yaitu berupa 8 busana kasual. Siluet pada
kesuluruhan hasil karya yaitu memiliki siluet A yang mengembang pada bagian
bawah. Pada penciptaan karya ini menggunakan bahan utama primisima.
Perpaduan warna yang diterapkan menggunakan warna khas minangkabau yang
diambil dari warna bendera adatnya “marawa” yaitu merah, hitam, dan kuning.
Karya- karya yang dihasilkan dengan penggunaan warna tersebut sangat sesuai
dengan tema yang mengangkat ukiran rumah gadang kaluak paku minangkabau.
Kata Kunci : Minang, Kaluak Paku Kacang Balimbiang, Kasua
Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment
Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique
in bioinformatics used to infer related residues among biological sequences.
Thus alignment accuracy is crucial to a vast range of analyses, often in ways
difficult to assess in those analyses. To compare the performance of different
aligners and help detect systematic errors in alignments, a number of
benchmarking strategies have been pursued. Here we present an overview of the
main strategies--based on simulation, consistency, protein structure, and
phylogeny--and discuss their different advantages and associated risks. We
outline a set of desirable characteristics for effective benchmarking, and
evaluate each strategy in light of them. We conclude that there is currently no
universally applicable means of benchmarking MSA, and that developers and users
of alignment tools should base their choice of benchmark depending on the
context of application--with a keen awareness of the assumptions underlying
each benchmarking strategy.Comment: Revie
Risk of Cerebrovascular Events in 178 962 Five-Year Survivors of Cancer Diagnosed at 15 to 39 Years of Age: The TYACSS (Teenage and Young Adult Cancer Survivor Study)
Background: Survivors of teenage and young adult (TYA) cancer are at risk of cerebrovascular events, but the magnitude of and extent to which this risk varies by cancer type, decade of diagnosis, age at diagnosis and attained age remains uncertain. This is the largest ever cohort study to evaluate the risks of hospitalisation for a cerebrovascular event among long-term survivors of TYA cancer. Methods:The population-based Teenage and Young Adult Cancer Survivor Study (N=178,962) was linked to Hospital Episode Statistics data for England to investigate the risks of hospitalisation for a cerebrovascular event among 5-year survivors of cancer diagnosed when aged 15-39 years. Observed numbers of first hospitalisations for cerebrovascular events were compared to that expected from the general population using standardised hospitalisation ratios (SHR) and absolute excess risks (AER) per 10,000 person-years. Cumulative incidence was calculated with death considered a competing risk. Results: Overall, 2,782 cancer survivors were hospitalised for a cerebrovascular event—40% higher than expected (SHR=1.4, 95% confidence interval [CI]=1.3-1.4). Survivors of central nervous system (CNS) tumours (SHR=4.6, CI=4.3-5.0), head & neck tumours (SHR=2.6, CI=2.2-3.1) and leukaemia (SHR=2.5, CI=1.9-3.1) were at greatest risk. Males had a significantly higher AER than females (AER=7 versus 3), especially among head & neck tumour survivors (AER=30 versus 11). By age 60, 9%, 6% and 5% of CNS tumour, head & neck tumour, and leukaemia survivors, respectively, had been hospitalised for a cerebrovascular event. Beyond age 60, every year 0.4% of CNS tumour survivors were hospitalised for a cerebral infarction (versus 0.1% expected. Whereas at any age, every year 0.2% of head & neck tumour survivors were hospitalised for a cerebral infarction 7 (versus 0.06% expected). Conclusions: Survivors of a CNS tumour, head & neck tumour, and leukaemia are particularly at risk of hospitalisation for a cerebrovascular event. The excess risk of cerebral infarction among CNS tumour survivors increases with attained age. For head & neck tumour survivors this excess risk remains high across all ages. These groups of survivors, and in particular males, should be considered for surveillance of cerebrovascular risk factors and potential pharmacological interventions for cerebral infarction prevention
Optimizing substitution matrix choice and gap parameters for sequence alignment
<p>Abstract</p> <p>Background</p> <p>While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments.</p> <p>Results</p> <p>POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB.</p> <p>Conclusion</p> <p>The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p
- …