6,656 research outputs found
PILER-CR: Fast and accurate identification of CRISPR repeats
BACKGROUND: Sequencing of prokaryotic genomes has recently revealed the presence of CRISPR elements: short, highly conserved repeats separated by unique sequences of similar length. The distinctive sequence signature of CRISPR repeats can be found using general-purpose repeat- or pattern-finding software tools. However, the output of such tools is not always ideal for studying these repeats, and significant effort is sometimes needed to build additional tools and perform manual analysis of the output. RESULTS: We present PILER-CR, a program specifically designed for the identification and analysis of CRISPR repeats. The program executes rapidly, completing a 5 Mb genome in around 5 seconds on a current desktop computer. We validate the algorithm by manual curation and by comparison with published surveys of these repeats, finding that PILER-CR has both high sensitivity and high specificity. We also present a catalogue of putative CRISPR repeats identified in a comprehensive analysis of 346 prokaryotic genomes. CONCLUSION: PILER-CR is a useful tool for rapid identification and classification of CRISPR repeats. The software is donated to the public domain. Source code and a Linux binary are freely available at
Interpreting 16S metagenomic data without clustering to achieve sub-OTU resolution
The standard approach to analyzing 16S tag sequence data, which relies on
clustering reads by sequence similarity into Operational Taxonomic Units
(OTUs), underexploits the accuracy of modern sequencing technology. We present
a clustering-free approach to multi-sample Illumina datasets that can identify
independent bacterial subpopulations regardless of the similarity of their 16S
tag sequences. Using published data from a longitudinal time-series study of
human tongue microbiota, we are able to resolve within standard 97% similarity
OTUs up to 20 distinct subpopulations, all ecologically distinct but with 16S
tags differing by as little as 1 nucleotide (99.2% similarity). A comparative
analysis of oral communities of two cohabiting individuals reveals that most
such subpopulations are shared between the two communities at 100% sequence
identity, and that dynamical similarity between subpopulations in one host is
strongly predictive of dynamical similarity between the same subpopulations in
the other host. Our method can also be applied to samples collected in
cross-sectional studies and can be used with the 454 sequencing platform. We
discuss how the sub-OTU resolution of our approach can provide new insight into
factors shaping community assembly.Comment: Updated to match the published version. 12 pages, 5 figures +
supplement. Significantly revised for clarity, references added, results not
change
Expedited batch processing and analysis of transposon insertions
<p>Abstract</p> <p>Background</p> <p>With advances in sequencing technology, greater and greater amounts of eukaryotic genome data are becoming available. Often, large portions of these genomes consist of transposable elements, frequently accounting for 50% or more in vertebrates. Each transposable element family may have thousands or tens of thousands of individual copies within a given genome, and therefore it can take an exorbitant amount of time and effort to process data in a meaningful fashion.</p> <p>Findings</p> <p>In order to combat this problem, we developed a set of bioinformatics techniques and programs to streamline the analysis. This includes a unique Perl script which automates the process of taking BLAST, Repeatmasker and similar data to extract and manipulate the hit sequences from the genome. This script, called Process_hits uses an object-oriented methodology to compile all hit locations from a given file for processing, organize this data into useable categories, and output it in multiple formats.</p> <p>Conclusions</p> <p>The program proved capable of handling large amounts of transposon data in an efficient fashion. It is equipped with a number of useful sub-functions, each of which is contained within its own sub-module to allow for greater expandability and as a foundation for future program design.</p
Improving the Alignment Quality of Consistency Based Aligners with an Evaluation Function Using Synonymous Protein Words
Most sequence alignment tools can successfully align protein sequences with higher levels of sequence identity. The accuracy of corresponding structure alignment, however, decreases rapidly when considering distantly related sequences (<20% identity). In this range of identity, alignments optimized so as to maximize sequence similarity are often inaccurate from a structural point of view. Over the last two decades, most multiple protein aligners have been optimized for their capacity to reproduce structure-based alignments while using sequence information. Methods currently available differ essentially in the similarity measurement between aligned residues using substitution matrices, Fourier transform, sophisticated profile-profile functions, or consistency-based approaches, more recently
Motif Minang Kaluak Paku Kacang Balimbiang pada Busana Kasual
Minangkabau sebagai salah satu suku bangsa yang mengisi kekhasan
budaya Indonesia memiliki warisan budaya yang terpencar dalam berbagai aspek
kehidupannya. Salah satu warisan budaya adalah seni ukir. Seni ukir yang
dikembangkan dengan mengambil ide dari alam memiliki makna-makna filosofi
bagi kehidupan masyarakat Minangkabau. Semua jenis ukiran yang dipahatkan di
Rumah Gadang menunjukkan unsur penting pembentuk budaya Minangkabau
bercerminkan kepada apa yang ada di alam. Salah satu ukiran pada rumah gadang
yaitu kaluak paku. Kaluak paku adalah nama salah satu motif ukiran dalam adat
Minangkabau. Berasal dari motif gulungan (kelukan/kaluak) pada ujung tanaman
pakis (paku) yang masih muda. Ukiran kaluak paku rumah gadang melambangkan
tanggung jawab seorang lelaki dalam adat Minangkabau kepada generasi penerus,
sebagai ayah dari anak-anaknya dan sebagai mamak dari kemenakan (keponakan).
Ukiran rumah gadang kaluak paku minangkabau inilah yang menjadi sumber ide
penciptaan busana pada tugas akhir ini.
Pada Penciptaan karya ini menggunakan beberapa metode, yaitu metode
pendekatan estetis dan ergonomis, metode pengumpulan data dengan studi
pustaka, dan motode penciptaan dengan teori Gustami Sp 3 tahap 6 Langkah.
Dalam proses pembuatan karya dibutuhkan beberapa data, cara pengumpulan data
acuan berdasarkan pengumpulan data pustaka yaitu berupa buku, jurnal pada
media sosial, serta aplikasi pada smartphone seperti pinterest. Data yang
dikumpulkan yang paling utama adalah gambar bentuk visual dari ukiran tanaman
kaluak paku minangkabau dan busana kasual.
Penciptaan karya yang dihasilkan yaitu berupa 8 busana kasual. Siluet pada
kesuluruhan hasil karya yaitu memiliki siluet A yang mengembang pada bagian
bawah. Pada penciptaan karya ini menggunakan bahan utama primisima.
Perpaduan warna yang diterapkan menggunakan warna khas minangkabau yang
diambil dari warna bendera adatnya “marawa” yaitu merah, hitam, dan kuning.
Karya- karya yang dihasilkan dengan penggunaan warna tersebut sangat sesuai
dengan tema yang mengangkat ukiran rumah gadang kaluak paku minangkabau.
Kata Kunci : Minang, Kaluak Paku Kacang Balimbiang, Kasua
Who Watches the Watchmen? An Appraisal of Benchmarks for Multiple Sequence Alignment
Multiple sequence alignment (MSA) is a fundamental and ubiquitous technique
in bioinformatics used to infer related residues among biological sequences.
Thus alignment accuracy is crucial to a vast range of analyses, often in ways
difficult to assess in those analyses. To compare the performance of different
aligners and help detect systematic errors in alignments, a number of
benchmarking strategies have been pursued. Here we present an overview of the
main strategies--based on simulation, consistency, protein structure, and
phylogeny--and discuss their different advantages and associated risks. We
outline a set of desirable characteristics for effective benchmarking, and
evaluate each strategy in light of them. We conclude that there is currently no
universally applicable means of benchmarking MSA, and that developers and users
of alignment tools should base their choice of benchmark depending on the
context of application--with a keen awareness of the assumptions underlying
each benchmarking strategy.Comment: Revie
Optimizing substitution matrix choice and gap parameters for sequence alignment
<p>Abstract</p> <p>Background</p> <p>While substitution matrices can readily be computed from reference alignments, it is challenging to compute optimal or approximately optimal gap penalties. It is also not well understood which substitution matrices are the most effective when alignment accuracy is the goal rather than homolog recognition. Here a new parameter optimization procedure, POP, is described and applied to the problems of optimizing gap penalties and selecting substitution matrices for pair-wise global protein alignments.</p> <p>Results</p> <p>POP is compared to a recent method due to Kim and Kececioglu and found to achieve from 0.2% to 1.3% higher accuracies on pair-wise benchmarks extracted from BALIBASE. The VTML matrix series is shown to be the most accurate on several global pair-wise alignment benchmarks, with VTML200 giving best or close to the best performance in all tests. BLOSUM matrices are found to be slightly inferior, even with the marginal improvements in the bug-fixed RBLOSUM series. The PAM series is significantly worse, giving accuracies typically 2% less than VTML. Integer rounding is found to cause slight degradations in accuracy. No evidence is found that selecting a matrix based on sequence divergence improves accuracy, suggesting that the use of this heuristic in CLUSTALW may be ineffective. Using VTML200 is found to improve the accuracy of CLUSTALW by 8% on BALIBASE and 5% on PREFAB.</p> <p>Conclusion</p> <p>The hypothesis that more accurate alignments of distantly related sequences may be achieved using low-identity matrices is shown to be false for commonly used matrix types. Source code and test data is freely available from the author's web site at <url>http://www.drive5.com/pop</url>.</p
Grammar-based distance in progressive multiple sequence alignment
Background: We propose a multiple sequence alignment (MSA) algorithm and compare the alignment-quality and execution-time of the proposed algorithm with that of existing algorithms. The proposed progressive alignment algorithm uses a grammar-based distance metric to determine the order in which biological sequences are to be pairwise aligned. The progressive alignment occurs via pairwise aligning new sequences with an ensemble of the sequences previously aligned. Results: The performance of the proposed algorithm is validated via comparison to popular progressive multiple alignment approaches, ClustalW and T-Coffee, and to the more recently developed algorithms MAFFT, MUSCLE, Kalign, and PSAlign using the BAliBASE 3.0 database of amino acid alignment files and a set of longer sequences generated by Rose software. The proposed algorithm has successfully built multiple alignments comparable to other programs with significant improvements in running time. The results are especially striking for large datasets. Conclusion: We introduce a computationally efficient progressive alignment algorithm using a grammar based sequence distance particularly useful in aligning large datasets
- …