42,778 research outputs found
Occlusion Coherence: Detecting and Localizing Occluded Faces
The presence of occluders significantly impacts object recognition accuracy.
However, occlusion is typically treated as an unstructured source of noise and
explicit models for occluders have lagged behind those for object appearance
and shape. In this paper we describe a hierarchical deformable part model for
face detection and landmark localization that explicitly models part occlusion.
The proposed model structure makes it possible to augment positive training
data with large numbers of synthetically occluded instances. This allows us to
easily incorporate the statistics of occlusion patterns in a discriminatively
trained model. We test the model on several benchmarks for landmark
localization and detection including challenging new data sets featuring
significant occlusion. We find that the addition of an explicit occlusion model
yields a detection system that outperforms existing approaches for occluded
instances while maintaining competitive accuracy in detection and landmark
localization for unoccluded instances
EFICAz²: enzyme function inference by a combined approach enhanced by machine learning
©2009 Arakaki et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The electronic version of this article is the complete one and can be found online at: http://www.biomedcentral.com/1471-2105/10/107doi:10.1186/1471-2105-10-107Background: We previously developed EFICAz, an enzyme function inference approach that combines predictions from non-completely overlapping component methods. Two of the four components in the original EFICAz are based on the detection of functionally discriminating residues (FDRs). FDRs distinguish between member of an enzyme family that are homofunctional (classified under the EC number of interest) or heterofunctional (annotated with another EC number or lacking enzymatic activity). Each of the two FDR-based components is associated to one of two specific kinds of enzyme families. EFICAz exhibits high precision performance, except when the maximal test to training sequence identity (MTTSI) is lower than 30%. To improve EFICAz's performance in this regime, we: i) increased the number of predictive components and ii) took advantage of consensual information from the different components to make the final EC number assignment. Results: We have developed two new EFICAz components, analogs to the two FDR-based components, where the discrimination between homo and heterofunctional members is based on the evaluation, via Support Vector Machine models, of all the aligned positions between the query sequence and the multiple sequence alignments associated to the enzyme families. Benchmark results indicate that: i) the new SVM-based components outperform their FDR-based counterparts, and ii) both SVM-based and FDR-based components generate unique predictions. We developed classification tree models to optimally combine the results from the six EFICAz components into a final EC number prediction. The new implementation of our approach, EFICAz², exhibits a highly improved prediction precision at MTTSI < 30% compared to the original EFICAz, with only a slight decrease in prediction recall. A comparative analysis of enzyme function annotation of the human proteome by EFICAz² and KEGG shows that: i) when both sources make EC number assignments for the same protein sequence, the assignments tend to be consistent and ii) EFICAz² generates considerably more unique assignments than KEGG. Conclusion: Performance benchmarks and the comparison with KEGG demonstrate that EFICAz² is a powerful and precise tool for enzyme function annotation, with multiple applications in genome analysis and metabolic pathway reconstruction. The EFICAz² web service is available at: http://cssb.biology.gatech.edu/skolnick/webservice/EFICAz2/index.htm
Measuring neutrino masses with a future galaxy survey
We perform a detailed forecast on how well a Euclid-like photometric galaxy
and cosmic shear survey will be able to constrain the absolute neutrino mass
scale. Adopting conservative assumptions about the survey specifications and
assuming complete ignorance of the galaxy bias, we estimate that the minimum
mass sum of sum m_nu ~ 0.06 eV in the normal hierarchy can be detected at 1.5
sigma to 2.5 sigma significance, depending on the model complexity, using a
combination of galaxy and cosmic shear power spectrum measurements in
conjunction with CMB temperature and polarisation observations from Planck.
With better knowledge of the galaxy bias, the significance of the detection
could potentially reach 5.4 sigma. Interestingly, neither Planck+shear nor
Planck+galaxy alone can achieve this level of sensitivity; it is the combined
effect of galaxy and cosmic shear power spectrum measurements that breaks the
persistent degeneracies between the neutrino mass, the physical matter density,
and the Hubble parameter. Notwithstanding this remarkable sensitivity to sum
m_nu, Euclid-like shear and galaxy data will not be sensitive to the exact mass
spectrum of the neutrino sector; no significant bias (< 1 sigma) in the
parameter estimation is induced by fitting inaccurate models of the neutrino
mass splittings to the mock data, nor does the goodness-of-fit of these models
suffer any significant degradation relative to the true one (Delta chi_eff ^2<
1).Comment: v1: 29 pages, 10 figures. v2: 33 pages, 12 figures; added sections on
shape evolution and constraints in more complex models, accepted for
publication in JCA
Wireless Data Acquisition for Edge Learning: Data-Importance Aware Retransmission
By deploying machine-learning algorithms at the network edge, edge learning
can leverage the enormous real-time data generated by billions of mobile
devices to train AI models, which enable intelligent mobile applications. In
this emerging research area, one key direction is to efficiently utilize radio
resources for wireless data acquisition to minimize the latency of executing a
learning task at an edge server. Along this direction, we consider the specific
problem of retransmission decision in each communication round to ensure both
reliability and quantity of those training data for accelerating model
convergence. To solve the problem, a new retransmission protocol called
data-importance aware automatic-repeat-request (importance ARQ) is proposed.
Unlike the classic ARQ focusing merely on reliability, importance ARQ
selectively retransmits a data sample based on its uncertainty which helps
learning and can be measured using the model under training. Underpinning the
proposed protocol is a derived elegant communication-learning relation between
two corresponding metrics, i.e., signal-to-noise ratio (SNR) and data
uncertainty. This relation facilitates the design of a simple threshold based
policy for importance ARQ. The policy is first derived based on the classic
classifier model of support vector machine (SVM), where the uncertainty of a
data sample is measured by its distance to the decision boundary. The policy is
then extended to the more complex model of convolutional neural networks (CNN)
where data uncertainty is measured by entropy. Extensive experiments have been
conducted for both the SVM and CNN using real datasets with balanced and
imbalanced distributions. Experimental results demonstrate that importance ARQ
effectively copes with channel fading and noise in wireless data acquisition to
achieve faster model convergence than the conventional channel-aware ARQ.Comment: This is an updated version: 1) extension to general classifiers; 2)
consideration of imbalanced classification in the experiments. Submitted to
IEEE Journal for possible publicatio
Thermal effects compensation and associated uncertainty for large magnet assembly precision alignment
Big science and ambitious industrial projects continually push technical requirements forward beyond the grasp of conventional engineering techniques. An example of these are the extremely tight micrometric assembly and alignment tolerances required in the field of celestial telescopes, particle accelerators, and the aerospace industry. Achieving such extreme requirements for large assemblies is limited, largely by the capability of the metrology used, namely, its uncertainty in relation to the alignment tolerance required. The current work described here was done as part of Maria Curie European research project held at CERN, Geneva. This related to future accelerators requiring the spatial alignment of several thousand, metre-plus large assemblies to a common datum within a targeted combined standard uncertainty (uctg(y)) of 12 μm. The current work has found several gaps in knowledge limiting such a capability. Among these was the lack of uncertainty statements for the thermal error compensation applied to correct for the assembly's dimensional instability, post metrology and during assembly and alignment. A novel methodology was developed by which a mixture of probabilistic modelling and high precision traceable reference measurements were used to quantify the uncertainty of the various thermal expansion models used namely: Empirical, Finite Element Method (FEM) models and FEM metamodels. Results have shown that the suggested methodology can accurately predict the uncertainty of the thermal deformation predictions made and thus compensations. The analysis of the results further showed how using this method a ‘digital twin’ of the engineering structure can be calibrated with known uncertainty of the thermal deformation behaviour predictions in the micrometric range. Namely, the Empirical, FEM and FEM metamodels combined standard uncertainties ( uc(y) ) of prediction were validated to be of maximum: 8.7 μm, 11.28 μm and 12.24 μm for the studied magnet assemblies
Autonomous integrated GPS/INS navigation experiment for OMV. Phase 1: Feasibility study
The phase 1 research focused on the experiment definition. A tightly integrated Global Positioning System/Inertial Navigation System (GPS/INS) navigation filter design was analyzed and was shown, via detailed computer simulation, to provide precise position, velocity, and attitude (alignment) data to support navigation and attitude control requirements of future NASA missions. The application of the integrated filter was also shown to provide the opportunity to calibrate inertial instrument errors which is particularly useful in reducing INS error growth during times of GPS outages. While the Orbital Maneuvering Vehicle (OMV) provides a good target platform for demonstration and for possible flight implementation to provide improved capability, a successful proof-of-concept ground demonstration can be obtained using any simulated mission scenario data, such as Space Transfer Vehicle, Shuttle-C, Space Station
Inference of Markovian Properties of Molecular Sequences from NGS Data and Applications to Comparative Genomics
Next Generation Sequencing (NGS) technologies generate large amounts of short
read data for many different organisms. The fact that NGS reads are generally
short makes it challenging to assemble the reads and reconstruct the original
genome sequence. For clustering genomes using such NGS data, word-count based
alignment-free sequence comparison is a promising approach, but for this
approach, the underlying expected word counts are essential.
A plausible model for this underlying distribution of word counts is given
through modelling the DNA sequence as a Markov chain (MC). For single long
sequences, efficient statistics are available to estimate the order of MCs and
the transition probability matrix for the sequences. As NGS data do not provide
a single long sequence, inference methods on Markovian properties of sequences
based on single long sequences cannot be directly used for NGS short read data.
Here we derive a normal approximation for such word counts. We also show that
the traditional Chi-square statistic has an approximate gamma distribution,
using the Lander-Waterman model for physical mapping. We propose several
methods to estimate the order of the MC based on NGS reads and evaluate them
using simulations. We illustrate the applications of our results by clustering
genomic sequences of several vertebrate and tree species based on NGS reads
using alignment-free sequence dissimilarity measures. We find that the
estimated order of the MC has a considerable effect on the clustering results,
and that the clustering results that use a MC of the estimated order give a
plausible clustering of the species.Comment: accepted by RECOMB-SEQ 201
- …