4,527 research outputs found

    Discovering transcriptional modules by Bayesian data integration

    Get PDF
    Motivation: We present a method for directly inferring transcriptional modules (TMs) by integrating gene expression and transcription factor binding (ChIP-chip) data. Our model extends a hierarchical Dirichlet process mixture model to allow data fusion on a gene-by-gene basis. This encodes the intuition that co-expression and co-regulation are not necessarily equivalent and hence we do not expect all genes to group similarly in both datasets. In particular, it allows us to identify the subset of genes that share the same structure of transcriptional modules in both datasets. Results: We find that by working on a gene-by-gene basis, our model is able to extract clusters with greater functional coherence than existing methods. By combining gene expression and transcription factor binding (ChIP-chip) data in this way, we are better able to determine the groups of genes that are most likely to represent underlying TMs

    SWITCH Innovation Lab: “Image to Concept”: Final Project Report

    Get PDF
    By means of its “Research Data Connectome”, SWITCH, the Swiss national infrastructure service provider for higher education and research, seeks to connect open research using linked open data technologies. The goal is to make the data accessible, interoperable and valuable for research, education and innovation. In order to kick-start the development of new services, SWITCH carries out so-called “InnoLab” projects which have an experimental character and are geared towards generating quick learnings. The present InnoLab project brought together researchers and software developers from SWITCH, Wikimedia Sverige, the University of Applied Sciences of the Grisons as well as the Bern University of Applied Sciences. The goal was to develop a microservice that supports the semi-automatic tagging of images in order to interlink them with concepts on Wikidata. It thus facilitates the search and discovery of relevant images by researchers and other interested parties. The microservice builds upon an existing crowdsourcing tool, the ISA Tool, that has been deployed on Wikimedia Commons in 2019 where it is used to apply “depicts” statements describing the content of images stored in the free media repository. The semi-automatic tagging functionality added to the ISA Tool in the course of the present project relies on two distinct algorithms: One of them is used to extract entities from the image itself. For this purpose, the Google Cloud Vision service available on Wikimedia Commons is used. The other one extracts entities from the image metadata, thus leveraging earlier efforts made to describe the content of the images. At the time of writing, the enhanced version of the ISA Tool is available in the test environment and can be used to add “depicts” statements to images on Wikimedia Commons. Plans to deploy it to production have been postponed due to several remaining bugs. The key learnings gained in the course of the project can be summarized as follows: – There are several issues that need to be tackled to allow for wider use and promotion of the ISA Tool: performance issues, reliability issues, improvement of multilingual support. – Once these issues have been resolved, measures should be taken to increase the visibility and take-up of the tool among potential contributors. As an accompanying measure, it would be advisable to assess and monitor the relevance of the ISA Tool in comparison to other tools and methods employed to add Structured Data on Commons. Moreover, activities to further promote the tool among the volunteer community should be accompanied by a dialogue with various stakeholders on what constitutes “good” tagging of images. – The algorithms used for semi-automatic tagging should be further improved and/or complemented; a variety of avenues to be pursued to this effect have been suggested. – Research use cases in the context of the SWITCH Research Data Connectome should be facilitatedby developing alternatives to the current requirement of uploading all media files to Wikimedia Commons. Some initial use cases have been identified in the areas of digital humanities, medicallibraries etc. – Requirements arising from research use cases making use of “depicts” statements beyond theircurrent use for search and discovery should be further investigated. – If the ISA Tool is to be used on a large scale in the context of the SWITCH Research Data Connectome, the conclusion of contractual agreements with service providers may be indicated. Roles and responsibilities with regard to deployment, operations and maintenance need to be clarified

    Improved measurement of electron antineutrino disappearance at Daya Bay

    Get PDF
    We report an improved measurement of the neutrino mixing angle theta(13) from the Daya Bay Reactor Neutrino Experiment. We exclude a zero value for sin(2)2 theta(13) with a significance of 7.7 standard deviations. Electron antineutrinos from six reactors of 2.9 GW(th) were detected in six antineutrino detectors deployed in two near (flux-weighted baselines of 470 m and 576 m) and one far (1648 m) underground experimental halls. Using 139 days of data, 28909 (205308) electron antineutrino candidates were detected at the far hall (near halls). The ratio of the observed to the expected number of antineutrinos assuming no oscillations at the far hall is 0.944 +/- 0.007(stat.)+/- 0.003(syst.). An analysis of the relative rates in six detectors finds sin(2)2 theta(13)= 0.089 +/- 0.010(stat.)+/- 0.005(syst.) in a three-neutrino framework

    Efficient Text Classification with Linear Regression Using a Combination of Predictors for Flu Outbreak Detection

    Get PDF
    Early prediction of disease outbreaks and seasonal epidemics such as Influenza may reduce their impact on daily lives. Today, the web can be used for surveillance of diseases.Search engines and Social Networking Sites can be used to track trends of different diseases more quickly than government agencies such as Center of Disease Control and Prevention(CDC). Today, Social Networking Sites (SNS) are widely used by diverse demographic populations. Thus, SNS data can be used effectively to track disease outbreaks and provide necessary warnings. Although the generated data of microblogging sites is valuable for real time analysis and outbreak predictions, the volume is huge. Therefore, one of the main challenges in analyzing this huge volume of data is to find the best approach for accurate analysis in an efficient time. Regardless of the analysis time, many studies show only the accuracy of applying different machine learning approaches. Current SNS-based flu detection and prediction frameworks apply conventional machine learning approaches that require lengthy training and testing, which is not the optimal solution for new outbreaks with new signs and symptoms. The aim of this study is to propose an efficient and accurate framework that uses SNS data to track disease outbreaks and provide early warnings, even for newest outbreaks accurately. The presented framework of outbreak prediction consists of three main modules: text classification, mapping, and linear regression for weekly flu rate predictions. The text classification module utilizes the features of sentiment analysis and predefined keyword occurrences. Various classifiers, including FastText and six conventional machine learning algorithms, are evaluated to identify the most efficient and accurate one for the proposed framework. The text classifiers have been trained and tested using a pre-labeled dataset of flu-related and unrelated Twitter postings. The selected text classifier is then used to classify over 8,400,000 tweet documents. The flu-related documents are then mapped ona weekly basis using a mapping module. Lastly, the mapped results are passed together with historical Center for Disease Control and Prevention (CDC) data to a linear regression module for weekly flu rate predictions. The evaluation of flu tweet classification shows that FastText together with the extracted features, has achieved accurate results with anF-measure value of 89.9% in addition to its efficiency. Therefore, FastText has been chosen to be the classification module to work together with the other modules in the proposed framework, including the linear regression module, for flu trend predictions. The prediction results are compared with the available recent data from CDC as the ground truth and show a strong correlation of 96.2%

    Improved Measurement of Electron Antineutrino Disappearance at Daya Bay

    Get PDF
    postprin
    • 

    corecore