158,041 research outputs found
Recommended from our members
A Variance Component Based Multi-marker Association Test Using Family and Unrelated Data
Background: Incorporating family data in genetic association studies has become increasingly appreciated, especially for its potential value in testing rare variants. We introduce here a variance-component based association test that can test multiple common or rare variants jointly using both family and unrelated samples. Results: The proposed approach implemented in our R package aggregates or collapses the information across a region based on genetic similarity instead of genotype scores, which avoids the power loss when the effects are in different directions or have different association strengths. The method is also able to effectively leverage the LD information in a region and it can produce a test statistic with an adaptively estimated number of degrees of freedom. Our method can readily allow for the adjustment of non-genetic contributions to the familial similarity, as well as multiple covariates. Conclusions: We demonstrate through simulations that the proposed method achieves good performance in terms of Type I error control and statistical power. The method is implemented in the R package “fassoc”, which provides a useful tool for data analysis and exploration
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
Learning to detect video events from zero or very few video examples
In this work we deal with the problem of high-level event detection in video.
Specifically, we study the challenging problems of i) learning to detect video
events from solely a textual description of the event, without using any
positive video examples, and ii) additionally exploiting very few positive
training samples together with a small number of ``related'' videos. For
learning only from an event's textual description, we first identify a general
learning framework and then study the impact of different design choices for
various stages of this framework. For additionally learning from example
videos, when true positive training samples are scarce, we employ an extension
of the Support Vector Machine that allows us to exploit ``related'' event
videos by automatically introducing different weights for subsets of the videos
in the overall training set. Experimental evaluations performed on the
large-scale TRECVID MED 2014 video dataset provide insight on the effectiveness
of the proposed methods.Comment: Image and Vision Computing Journal, Elsevier, 2015, accepted for
publicatio
Positioning for conceptual development using latent semantic analysis
With increasing opportunities to learn online, the problem of positioning learners in an educational network of content offers new possibilities for the utilisation of geometry-based natural language processing techniques.
In this article, the adoption of latent semantic analysis (LSA) for guiding learners in their conceptual development is investigated. We propose five new algorithmic derivations of LSA and test their validity for positioning in an experiment in order to draw back conclusions on the suitability of machine learning from previously accredited evidence. Special attention is thereby directed towards the role of distractors and the calculation of thresholds when using similarities as a proxy for assessing conceptual closeness.
Results indicate that learning improves positioning. Distractors are of low value and seem to be replaceable by generic noise to improve threshold calculation. Furthermore, new ways to flexibly calculate thresholds could be identified
Recommended from our members
Functional interpretation of single cell similarity maps.
We present Vision, a tool for annotating the sources of variation in single cell RNA-seq data in an automated and scalable manner. Vision operates directly on the manifold of cell-cell similarity and employs a flexible annotation approach that can operate either with or without preconceived stratification of the cells into groups or along a continuum. We demonstrate the utility of Vision in several case studies and show that it can derive important sources of cellular variation and link them to experimental meta-data even with relatively homogeneous sets of cells. Vision produces an interactive, low latency and feature rich web-based report that can be easily shared among researchers, thus facilitating data dissemination and collaboration
- …