47,452 research outputs found
Phytoplankton Hotspot Prediction With an Unsupervised Spatial Community Model
Many interesting natural phenomena are sparsely distributed and discrete.
Locating the hotspots of such sparsely distributed phenomena is often difficult
because their density gradient is likely to be very noisy. We present a novel
approach to this search problem, where we model the co-occurrence relations
between a robot's observations with a Bayesian nonparametric topic model. This
approach makes it possible to produce a robust estimate of the spatial
distribution of the target, even in the absence of direct target observations.
We apply the proposed approach to the problem of finding the spatial locations
of the hotspots of a specific phytoplankton taxon in the ocean. We use
classified image data from Imaging FlowCytobot (IFCB), which automatically
measures individual microscopic cells and colonies of cells. Given these
individual taxon-specific observations, we learn a phytoplankton community
model that characterizes the co-occurrence relations between taxa. We present
experiments with simulated robot missions drawn from real observation data
collected during a research cruise traversing the US Atlantic coast. Our
results show that the proposed approach outperforms nearest neighbor and
k-means based methods for predicting the spatial distribution of hotspots from
in-situ observations.Comment: To appear in ICRA 2017, Singapor
Transcription Factor-DNA Binding Via Machine Learning Ensembles
We present ensemble methods in a machine learning (ML) framework combining
predictions from five known motif/binding site exploration algorithms. For a
given TF the ensemble starts with position weight matrices (PWM's) for the
motif, collected from the component algorithms. Using dimension reduction, we
identify significant PWM-based subspaces for analysis. Within each subspace a
machine classifier is built for identifying the TF's gene (promoter) targets
(Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool.
Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string)
feature PWM-based subspaces that stand out in identifying gene targets. We
approach Problem 3 (binding sites) with a novel machine learning approach that
uses promoter string features and ML importance scores in a classification
algorithm locating binding sites across the genome. For target gene
identification this method improves performance (measured by the F1 score) by
about 10 percentage points over the (a) motif scanning method and (b) the
coexpression-based association method. Top motif outperformed 5 component
algorithms as well as two other common algorithms (BEST and DEME). For
identifying individual binding sites on a benchmark cross species database
(Tompa et al., 2005) we match the best performer without much human
intervention. It also improved the performance on mammalian TFs.
The ensemble can integrate orthogonal information from different weak
learners (potentially using entirely different types of features) into a
machine learner that can perform consistently better for more TFs. The TF gene
target identification component (problem 1 above) is useful in constructing a
transcriptional regulatory network from known TF-target associations. The
ensemble is easily extendable to include more tools as well as future PWM-based
information.Comment: 33 page
Four dimensions characterize comprehensive trait judgments of faces
People readily attribute many traits to faces: some look beautiful, some competent, some aggressive. These snap judgments have important consequences in real life, ranging from success in political elections to decisions in courtroom sentencing. Modern psychological theories argue that the hundreds of different words people use to describe others from their faces are well captured by only two or three dimensions, such as valence and dominance, a highly influential framework that has been the basis for numerous studies in social and developmental psychology, social neuroscience, and in engineering applications. However, all prior work has used only a small number of words (12 to 18) to derive underlying dimensions, limiting conclusions to date. Here we employed deep neural networks to select a comprehensive set of 100 words that are representative of the trait words people use to describe faces, and to select a set of 100 faces. In two large-scale, preregistered studies we asked participants to rate the 100 faces on the 100 words (obtaining 2,850,000 ratings from 1,710 participants), and discovered a novel set of four psychological dimensions that best explain trait judgments of faces: warmth, competence, femininity, and youth. We reproduced these four dimensions across different regions around the world, in both aggregated and individual-level data. These results provide a new and most comprehensive characterization of face judgments, and reconcile prior work on face perception with work in social cognition and personality psychology
Recommended from our members
Determining citizensâ opinions about stories in the news media: analysing Google, Facebook and Twitter
We describe a method whereby a governmental policy maker can discover citizensâ reaction to news stories. This is particularly relevant in the political world, where governmentsâ policy statements are reported by the news media and discussed by citizens. The work here addresses two main questions: whereabouts are citizens discussing a news story, and what are they saying? Our strategy to answer the first question is to find news articles pertaining to the policy statements, then perform internet searches for references to the news articlesâ headlines and URLs. We have created a software tool that schedules repeating Google searches for the news articles and collects the results in a database, enabling the user to aggregate and analyse them to produce ranked tables of sites that reference the news articles. Using data mining techniques we can analyse data so that resultant ranking reflects an overall aggregate score, taking into account multiple datasets, and this shows the most relevant places on the internet where the story is discussed. To answer the second question, we introduce the WeGov toolbox as a tool for analysing citizensâ comments and behaviour pertaining to news stories. We first use the tool for identifying social network discussions, using different strategies for Facebook and Twitter. We apply different analysis components to analyse the data to distil the essence of the social network usersâ comments, to determine influential users and identify important comments
SLIM : Scalable Linkage of Mobility Data
We present a scalable solution to link entities across mobility datasets using their spatio-temporal information. This is a fundamental problem in many applications such as linking user identities for security, understanding privacy limitations of location based services, or producing a unified dataset from multiple sources for urban planning. Such integrated datasets are also essential for service providers to optimise their services and improve business intelligence. In this paper, we first propose a mobility based representation and similarity computation for entities. An efficient matching process is then developed to identify the final linked pairs, with an automated mechanism to decide when to stop the linkage. We scale the process with a locality-sensitive hashing (LSH) based approach that significantly reduces candidate pairs for matching. To realize the effectiveness and efficiency of our techniques in practice, we introduce an algorithm called SLIM. In the experimental evaluation, SLIM outperforms the two existing state-of-the-art approaches in terms of precision and recall. Moreover, the LSH-based approach brings two to four orders of magnitude speedup
Point triangulation through polyhedron collapse using the lâ norm
Multi-camera triangulation of feature points based on a minimisation of the overall l(2) reprojection error can get stuck in suboptimal local minima or require slow global optimisation. For this reason, researchers have proposed optimising the l(infinity) norm of the l(2) single view reprojection errors, which avoids the problem of local minima entirely. In this paper we present a novel method for l(infinity) triangulation that minimizes the l(infinity) norm of the l(infinity) reprojection errors: this apparently small difference leads to a much faster but equally accurate solution which is related to the MLE under the assumption of uniform noise. The proposed method adopts a new optimisation strategy based on solving simple quadratic equations. This stands in contrast with the fastest existing methods, which solve a sequence of more complex auxiliary Linear Programming or Second Order Cone Problems. The proposed algorithm performs well: for triangulation, it achieves the same accuracy as existing techniques while executing faster and being straightforward to implement
- âŠ