1,031 research outputs found

    BusyBee Web : towards comprehensive and differential composition-based metagenomic binning

    Get PDF
    Despite recent methodology and reference database improvements for taxonomic profiling tools, metagenomic assembly and genomic binning remain important pillars of metagenomic analysis workflows. In case reference information is lacking, genomic binning is considered to be a state-of-the-art method in mixed culture metagenomic data analysis. In this light, our previously published tool BusyBee Web implements a composition-based binning method efficient enough to function as a rapid online utility. Handling assembled contigs and long nanopore generated reads alike, the webserver provides a wide range of supplementary annotations and visualizations. Half a decade after the initial publication, we revisited existing functionality, added comprehensive visualizations, and increased the number of data analysis customization options for further experimentation. The webserver now allows for visualizationsupported differential analysis of samples, which is computationally expensive and typically only performed in coverage-based binning methods. Further, users may now optionally check their uploaded samples for plasmid sequences using PLSDB as a reference database. Lastly, a new application programming interface with a supporting python package was implemented, to allow power users fully automated access to the resource and integration into existing workflows

    Aviator: a web service for monitoring the availability of web services

    Get PDF
    With Aviator, we present a web service and repository that facilitates surveillance of online tools. Aviator consists of a user-friendly website and two modules, a literature-mining based general and a manually curated module. The general module currently checks 9417 websites twice a day with respect to their availability and stores many features (frontend and backend response time, required RAM and size of the web page, security certificates, analytic tools and trackers embedded in the webpage and others) in a data warehouse. Aviator is also equipped with an analysis functionality, for example authors can check and evaluate the availability of their own tools or those of their peers. Likewise, users can check the availability of a certain tool they intend to use in research or teaching to avoid including unstable tools. The curated section of Aviator offers additional services. We provide API snippets for common programming languages (Perl, PHP, Python, JavaScript) as well as an OpenAPI documentation for embedding in the backend of own web services for an automatic test of their function. We query the respective APIs twice a day and send automated notifications in case of an unexpected result. Naturally, the same analysis functionality as for the literature-based module is available for the curated section. Aviator can freely be used at https://www.ccb.uni-saarland.de/aviator

    PLSDB: advancing a comprehensive database of bacterial plasmids

    Get PDF
    Plasmids are known to contain genes encoding for virulence factors and antibiotic resistance mechanisms. Their relevance in metagenomic data processing is steadily growing. However, with the increasing popularity and scale of metagenomics experiments, the number of reported plasmids is rapidly growing as well, amassing a considerable number of false positives due to undetected misassembles. Here, our previously published database PLSDB provides a reliable resource for researchers to quickly compare their sequences against selected and annotated previous findings. Within two years, the size of this resource has more than doubled from the initial 13,789 to now 34,513 entries over the course of eight regular data updates. For this update, we aggregated community feedback for major changes to the database featuring new analysis functionality as well as performance, quality, and accessibility improvements. New filtering steps, annotations, and preprocessing of existing records improve the quality of the provided data. Additionally, new features implemented in the web-server ease user interaction and allow for a deeper understanding of custom uploaded sequences, by visualizing similarity information. Lastly, an application programming interface was implemented along with a python library, to allow remote database queries in automated workflows. The latest release of PLSDB is freely accessible under https://www.ccb.uni-saarland.de/plsdb

    Analyzing Adverse Events from Publicly Available Web Sources

    Get PDF
    Data mining for drug-reaction associations is a major topic in the pharmaceutical industry. Historically the focus has been on using privately owned and maintained datasets consisting of information that has been transformed via the FDA Adverse Event Reporting System (FAERS) and privatized reporting systems that house the data from clinical trials. Our focus will be on building a pipeline that demonstrates an open source solution for building a drug’s safety profile from data collection through signal detection. In contrast this pipeline primarily uses the openFDA and social media data available through Reddit with all analysis being done in the R statistical programming language. The aim was to collect the information available in these public sources and apply popular data mining methodologies used to identify and predict the occurrence of adverse events. The results show the ability of the openFDA and social media sites to create real-time drug safety occurrence profiles by applying the same statistical methods applied in clinical trials. Social media will be shown to provide the best results when applied to prescribed daily use medications compared to common over-the-counter drugs or last line of defense medications. The information and results reported in this paper are not intended or implied to be a substitute for professional medical advice, diagnosis, or treatment. Do not delay seeking medical treatment or advice because of something you have read in this paper

    DeepCell Kiosk: scaling deep learning–enabled cellular image analysis with Kubernetes

    Get PDF
    Deep learning is transforming the analysis of biological images, but applying these models to large datasets remains challenging. Here we describe the DeepCell Kiosk, cloud-native software that dynamically scales deep learning workflows to accommodate large imaging datasets. To demonstrate the scalability and affordability of this software, we identified cell nuclei in 10⁶ 1-megapixel images in ~5.5 h for ~US250,withacostbelowUS250, with a cost below US100 achievable depending on cluster configuration. The DeepCell Kiosk can be downloaded at https://github.com/vanvalenlab/kiosk-console; a persistent deployment is available at https://deepcell.org/

    Seafloor characterization using airborne hyperspectral co-registration procedures independent from attitude and positioning sensors

    Get PDF
    The advance of remote-sensing technology and data-storage capabilities has progressed in the last decade to commercial multi-sensor data collection. There is a constant need to characterize, quantify and monitor the coastal areas for habitat research and coastal management. In this paper, we present work on seafloor characterization that uses hyperspectral imagery (HSI). The HSI data allows the operator to extend seafloor characterization from multibeam backscatter towards land and thus creates a seamless ocean-to-land characterization of the littoral zone

    A heuristic-based approach to code-smell detection

    Get PDF
    Encapsulation and data hiding are central tenets of the object oriented paradigm. Deciding what data and behaviour to form into a class and where to draw the line between its public and private details can make the difference between a class that is an understandable, flexible and reusable abstraction and one which is not. This decision is a difficult one and may easily result in poor encapsulation which can then have serious implications for a number of system qualities. It is often hard to identify such encapsulation problems within large software systems until they cause a maintenance problem (which is usually too late) and attempting to perform such analysis manually can also be tedious and error prone. Two of the common encapsulation problems that can arise as a consequence of this decomposition process are data classes and god classes. Typically, these two problems occur together – data classes are lacking in functionality that has typically been sucked into an over-complicated and domineering god class. This paper describes the architecture of a tool which automatically detects data and god classes that has been developed as a plug-in for the Eclipse IDE. The technique has been evaluated in a controlled study on two large open source systems which compare the tool results to similar work by Marinescu, who employs a metrics-based approach to detecting such features. The study provides some valuable insights into the strengths and weaknesses of the two approache

    Programming Robots for Activities of Everyday Life

    Get PDF
    Text-based programming remains a challenge to novice programmers in\ua0all programming domains including robotics. The use of robots is gainingconsiderable traction in several domains since robots are capable of assisting\ua0humans in repetitive and hazardous tasks. In the near future, robots willbe used in tasks of everyday life in homes, hotels, airports, museums, etc.\ua0However, robotic missions have been either predefined or programmed usinglow-level APIs, making mission specification task-specific and error-prone.\ua0To harness the full potential of robots, it must be possible to define missionsfor specific applications domains as needed. The specification of missions of\ua0robotic applications should be performed via easy-to-use, accessible ways, and\ua0at the same time, be accurate, and unambiguous. Simplicity and flexibility in\ua0programming such robots are important, since end-users come from diverse\ua0domains, not necessarily with suffcient programming knowledge.The main objective of this licentiate thesis is to empirically understand the\ua0state-of-the-art in languages and tools used for specifying robot missions byend-users. The findings will form the basis for interventions in developing\ua0future languages for end-user robot programming.During the empirical study, DSLs for robot mission specification were\ua0analyzed through published literature, their websites, user manuals, samplemissions and using the languages to specify missions for supported robots.After extracting data from 30 environments, 133 features were identified.\ua0A feature matrix mapping the features to the environments was developedwith a feature model for robotic mission specification DSLs.Our results show that most end-user facing environments exist in the\ua0education domain for teaching novice programmers and STEM subjects. Mostof the visual languages are developed using Blockly and Scratch libraries.\ua0The end-user domain abstraction needs more work since most of the visualenvironments abstract robotic and programming language concepts but not\ua0end-user concepts. In future works, it is important to focus on the development\ua0of reusable libraries for end-user concepts; and further, explore how end-user\ua0facing environments can be adapted for novice programmers to learn\ua0general programming skills and robot programming in low resource settings\ua0in developing countries, like Uganda

    DeepCell Kiosk: scaling deep learning–enabled cellular image analysis with Kubernetes

    Get PDF
    Deep learning is transforming the analysis of biological images, but applying these models to large datasets remains challenging. Here we describe the DeepCell Kiosk, cloud-native software that dynamically scales deep learning workflows to accommodate large imaging datasets. To demonstrate the scalability and affordability of this software, we identified cell nuclei in 10⁶ 1-megapixel images in ~5.5 h for ~US250,withacostbelowUS250, with a cost below US100 achievable depending on cluster configuration. The DeepCell Kiosk can be downloaded at https://github.com/vanvalenlab/kiosk-console; a persistent deployment is available at https://deepcell.org/
    • 

    corecore