3,370 research outputs found
BATCH-GE : batch analysis of next-generation sequencing data for genome editing assessment
Targeted mutagenesis by the CRISPR/Cas9 system is currently revolutionizing genetics. The ease of this technique has enabled genome engineering in-vitro and in a range of model organisms and has pushed experimental dimensions to unprecedented proportions. Due to its tremendous progress in terms of speed, read length, throughput and cost, Next-Generation Sequencing (NGS) has been increasingly used for the analysis of CRISPR/Cas9 genome editing experiments. However, the current tools for genome editing assessment lack flexibility and fall short in the analysis of large amounts of NGS data. Therefore, we designed BATCH-GE, an easy-to-use bioinformatics tool for batch analysis of NGS-generated genome editing data, available from https://github.com/WouterSteyaert/BATCH-GE.git. BATCH-GE detects and reports indel mutations and other precise genome editing events and calculates the corresponding mutagenesis efficiencies for a large number of samples in parallel. Furthermore, this new tool provides flexibility by allowing the user to adapt a number of input variables. The performance of BATCH-GE was evaluated in two genome editing experiments, aiming to generate knock-out and knock-in zebrafish mutants. This tool will not only contribute to the evaluation of CRISPR/Cas9-based experiments, but will be of use in any genome editing experiment and has the ability to analyze data from every organism with a sequenced genome
Unleashing the Power of Hashtags in Tweet Analytics with Distributed Framework on Apache Storm
Twitter is a popular social network platform where users can interact and
post texts of up to 280 characters called tweets. Hashtags, hyperlinked words
in tweets, have increasingly become crucial for tweet retrieval and search.
Using hashtags for tweet topic classification is a challenging problem because
of context dependent among words, slangs, abbreviation and emoticons in a short
tweet along with evolving use of hashtags. Since Twitter generates millions of
tweets daily, tweet analytics is a fundamental problem of Big data stream that
often requires a real-time Distributed processing. This paper proposes a
distributed online approach to tweet topic classification with hashtags. Being
implemented on Apache Storm, a distributed real time framework, our approach
incrementally identifies and updates a set of strong predictors in the Na\"ive
Bayes model for classifying each incoming tweet instance. Preliminary
experiments show promising results with up to 97% accuracy and 37% increase in
throughput on eight processors.Comment: IEEE International Conference on Big Data 201
FindFoci: a focus detection algorithm with automated parameter training that closely matches human assignments, reduces human inconsistencies and increases speed of analysis
Accurate and reproducible quantification of the accumulation of proteins into foci in cells is essential for data interpretation and for biological inferences. To improve reproducibility, much emphasis has been placed on the preparation of samples, but less attention has been given to reporting and standardizing the quantification of foci. The current standard to quantitate foci in open-source software is to manually determine a range of parameters based on the outcome of one or a few representative images and then apply the parameter combination to the analysis of a larger dataset. Here, we demonstrate the power and utility of using machine learning to train a new algorithm (FindFoci) to determine optimal parameters. FindFoci closely matches human assignments and allows rapid automated exploration of parameter space. Thus, individuals can train the algorithm to mirror their own assignments and then automate focus counting using the same parameters across a large number of images. Using the training algorithm to match human assignments of foci, we demonstrate that applying an optimal parameter combination from a single image is not broadly applicable to analysis of other images scored by the same experimenter or by other experimenters. Our analysis thus reveals wide variation in human assignment of foci and their quantification. To overcome this, we developed training on multiple images, which reduces the inconsistency of using a single or a few images to set parameters for focus detection. FindFoci is provided as an open-source plugin for ImageJ
Comparison of panel codes for aerodynamic analysis of airfoils
CieÄľom tejto práce bolo vytvorenie prehÄľadu v súčasnosti pouĹľĂvanĂ˝ch implementáciĂ panelovĂ˝ch metĂłd pre aerodynamickĂ© vĂ˝poÄŤty charakteristĂk 2D profilov. ZákladnĂ˝ popis princĂpu panelovej metĂłdy, porovnanie jednotlivĂ˝ch implementáciĂ a zhodnotenie ich moĹľnostĂ (presnosĹĄ, aplikovateÄľnosĹĄ) na typickĂ© Ăşlohy. V práci boli pouĹľitĂ© tri rĂ´zne panelovĂ© programy: Xfoil, JavaFoil a XFLR5. Práca bola obohatená o meranie v aerodynamickom tuneli.The purpose of this study is to create an overview of currently the most used panel codes for computation of aerodynamic characteristics of 2D airfoils. Description of the basic principles of panel code, comparison of various implementation and evaluation (accuracy, applicability) for typical tasks. In this thesis there were used three different panel codes: Xfoil, JavaFoil and XFLR5. Thesis was enriched by measurement in wind tunnel.
Incremental Principal Component Analysis Exact implementation and continuity corrections
This paper describes some applications of an incremental implementation of
the principal component analysis (PCA). The algorithm updates the
transformation coefficients matrix on-line for each new sample, without the
need to keep all the samples in memory. The algorithm is formally equivalent to
the usual batch version, in the sense that given a sample set the
transformation coefficients at the end of the process are the same. The
implications of applying the PCA in real time are discussed with the help of
data analysis examples. In particular we focus on the problem of the continuity
of the PCs during an on-line analysis.Comment: accepted at http://www.icinco.org
Job Interactivity Using a Steering Service in an Interactive Grid Analysis Environment
Grid computing has been dominated by the execution of batch jobs. Interactive data analysis is a new domain in the area of grid job execution. The Grid-Enabled Analysis Environment (GAE) attempts to address this in HEP grids by the use of a Steering Service. This service will provide physicists with the continuous feedback of their jobs and will provide them with the ability to control and steer the execution of their submitted jobs. It will enable them to move their jobs to different grid nodes when desired. The Steering Service will also act autonomously to make steering decisions on behalf of the user, attempting to optimize the execution of the job. This service will also ensure the optimal consumption of the Grid user's resource quota. The Steering Service will provide a web service interface defined by standard WSDL. In this paper we have discussed how the Steering Service will facilitate interactive remote analysis of data generated in Interactive Grid Analysis Environment
- …