842 research outputs found
Object Referring in Visual Scene with Spoken Language
Object referring has important applications, especially for human-machine
interaction. While having received great attention, the task is mainly attacked
with written language (text) as input rather than spoken language (speech),
which is more natural. This paper investigates Object Referring with Spoken
Language (ORSpoken) by presenting two datasets and one novel approach. Objects
are annotated with their locations in images, text descriptions and speech
descriptions. This makes the datasets ideal for multi-modality learning. The
approach is developed by carefully taking down ORSpoken problem into three
sub-problems and introducing task-specific vision-language interactions at the
corresponding levels. Experiments show that our method outperforms competing
methods consistently and significantly. The approach is also evaluated in the
presence of audio noise, showing the efficacy of the proposed vision-language
interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201
Object Referring in Videos with Language and Human Gaze
We investigate the problem of object referring (OR) i.e. to localize a target
object in a visual scene coming with a language description. Humans perceive
the world more as continued video snippets than as static images, and describe
objects not only by their appearance, but also by their spatio-temporal context
and motion features. Humans also gaze at the object when they issue a referring
expression. Existing works for OR mostly focus on static images only, which
fall short in providing many such cues. This paper addresses OR in videos with
language and human gaze. To that end, we present a new video dataset for OR,
with 30, 000 objects over 5, 000 stereo video sequences annotated for their
descriptions and gaze. We further propose a novel network model for OR in
videos, by integrating appearance, motion, gaze, and spatio-temporal context
into one network. Experimental results show that our method effectively
utilizes motion cues, human gaze, and spatio-temporal context. Our method
outperforms previousOR methods. For dataset and code, please refer
https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure
COMPUTIONAL MODELING OF PRECISION MOLDING OF ASPHERIC GLASS OPTICS
In this dissertation, research in two parallel directions is presented; the first involves the prediction of the final size and shape of a glass lens during a precision glass lens molding process and the second introduces a method to compute and quantify the importance of higher order terms in fracture mechanics for different modes of fracture. The process of precision lens molding has received attention in recent years due to its potential to mass produce aspherical lenses. Aspherical lenses have significantly better optical properties and conventional lens making techniques are limited to manufacturing of spherical lenses only. The conventional technique involves an iterative procedure of grinding, lapping and polishing to obtain a desired surface profile. However in precision molding, the glass raw material or preform is placed between dies and heated until it becomes soft and molten. Then the dies are pressed against each other to deform the molten glass to take the shape of the dies. After this stage glass is cooled to room temperature by the use of nitrogen gas. Thus, in a single process the lens is made unlike the traditional approach. Although the molding process appears to a better alternative, there are shortcomings that need to be addressed before using the process for mass production. From the point of view of the current study, the shortcomings include both surface profiles and center thickness of the final lens. In the expensive process of mold preparation, the mold surfaces are first machined to be exact negatives of the required surface profile of the lens. One of the main issues is the deviation of the surface profile of the final molded lens from that of the molds due to the complex, time and temperature dependent stress state experienced by the lens during the approximately 15 minute process of heating, pressing and then cooling. In current practice the deviation of manufactured lenses is as high as 20 microns, approximately 20 times the allowable deviation according to the optical design specifications. The empirical approach to solving this problem is to compensate the molds by trial and error based on practical experience which is very time-consuming and costly. Usually it takes 3-4 months and a considerable amount of money to compensate the molds to meet current specifications. This has motivated the development of computational solutions to arrive at a compensated mold shape which requires the prediction of the lens deviation within micron level accuracy taking into account process parameters and the complex material behavior of glass. In this research, ABAQUS, a commercial FEM solver, is used to simulate the process and predict the final size/shape of the lens. The computational study of final size and shape includes a sensitivity analysis of the various material and process parameters. The material parameters include viscoelasticity, structural relaxation and the thermo-rheological behavior of the glass; friction and gap dependent heat transfer at the interface; and the thermo-mechanical properties of the molds. This comprehensive study will not only eliminate some of the parameters which have the least effect on the final size/shape, but also identify the key material properties and substantiate the need to obtain them more accurately through experimentation. At this time it should be mentioned that the material properties of the molding glasses considered are not available. Friction coefficient at the mold/glass interface is one of the important input parameters in the model. A ring compression test was used in the current research to find the friction coefficient. In this test, a \u27washer\u27 or a ring shaped specimen is compressed between two flat dies at the molding temperature and the change in internal diameter is correlated to a friction coefficient. The main strength of this test is the sensitive nature of the inner diameter change during pressing for different friction conditions at the interface. In addition to friction coefficient, approximate viscoelastic material properties and the TRS behavior were also found out using this test from the experimental force and displacement data. After validating the model to well within one micron, it was determined that the deviation of the lens profile with respect to the molds is primarily caused by structural relaxation of glass, thermal expansion behavior of the molds, friction at the glass/mold interface and time-temperature dependence of the viscoelastic material behavior of glass. Several practical examples/numerical studies that clearly show the cause for the deviation are presented. It is also shown that the deviation in the molded lens is affected by its location with respect to the molds. Finally the process of mold compensation is demonstrated using the computational tool. In the other parallel direction, a method to determine higher order coefficients in fracture mechanics from the solution of a singular integral equation is presented. In the asymptotic series the stress intensity factor, k0 is the first coefficient, and the T-stress, T0 is the second coefficient. For the example of an edge crack in a half space, converged values of the first twelve mode I coefficients (kn and Tn, n=0,...,5) have been determined, and for an edge crack in a finite width strip, the first six coefficients are presented. Coefficients for an internal crack in a half space are also presented. Results for an edge crack in a finite width strip are used to quantify the size of the k-dominant zone, the kT-dominant zone and the zones associated with three and four terms, taking into account the entire region around the crack tip. Finally, this method was also applied to fracture problems with Mode-II loading
LeaF: A Learning-based Fault Diagnostic System for Multi-Robot Teams
The failure-prone complex operating environment of a standard multi-robot application dictates some amount of fault-tolerance to be incorporated into every system. In fact, the quality of the incorporated fault-tolerance has a direct impact on the overall performance of the system. Despite the extensive work being done in the field of multi-robot systems, there does not exist a general methodology for fault diagnosis and recovery. The objective of this research, in part, is to provide an adaptive approach that enables the robot team to autonomously detect and compensate for the wide variety of faults that could be experienced. The key feature of the developed approach is its ability to learn useful information from encountered faults, unique or otherwise, towards a more robust system. As part of this research, we analyzed an existing multi-agent architecture, CMM – Causal Model Method – as a fault diagnostic solution for a sample multi-robot application. Based on the analysis, we claim that a causal model approach is effective for anticipating and recovering from many types of robot team errors. However, the analysis also showed that the CMM method in its current form is incomplete as a turn-key solution. Due to the significant number of possible failure modes in a complex multi-robot application, and the difficulty in anticipating all possible failures in advance, one cannot guarantee the generation of a complete a priori causal model that identifies and specifies all faults that may occur in the system. Therefore, based on these preliminary studies, we designed an alternate approach, called LeaF: Learning based Fault diagnostic architecture for multi-robot teams. LeaF is an adaptive method that uses its experience to update and extend its causal model to enable the team, over time, to better recover from faults when they occur. LeaF combines the initial fault model with a case-based learning algorithm, LID – Lazy Induction of Descriptions — to allow robot team members to diagnose faults and to automatically update their causal models. The modified LID algorithm uses structural similarity between fault characteristics as a means of classifying previously un-encountered faults. Furthermore, the use of learning allows the system to identify and categorize unexpected faults, enable team members to learn from problems encountered by others, and make intelligent decisions regarding the environment. To evaluate LeaF, we implemented it in two challenging and dynamic physical multi-robot applications.
The other significant contribution of the research is the development of metrics to measure the fault-tolerance, within the context of system performance, for a multi-robot system. In addition to developing these metrics, we also outline potential methods to better interpret the obtained measures towards truly understanding the capabilities of the implemented system. The developed metrics are designed to be application independent and can be used to evaluate and/or compare different fault-tolerance architectures like CMM and LeaF. To the best of our knowledge, this approach is the only one that attempts to capture the effect of intelligence, reasoning, or learning on the effective fault-tolerance of the system, rather than relying purely on traditional redundancy based measures. Finally, we show the utility of the designed metrics by applying them to the obtained physical robot experiments, measuring the effective fault-tolerance and system performance, and subsequently analyzing the calculated measures to help better understand the capabilities of LeaF
TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications
Datacenters running on-line, data-intensive applications (OLDIs) consume
significant amounts of energy. However, reducing their energy is challenging
due to their tight response time requirements. A key aspect of OLDIs is that
each user query goes to all or many of the nodes in the cluster, so that the
overall time budget is dictated by the tail of the replies' latency
distribution; replies see latency variations both in the network and compute.
Previous work proposes to achieve load-proportional energy by slowing down the
computation at lower datacenter loads based directly on response times (i.e.,
at lower loads, the proposal exploits the average slack in the time budget
provisioned for the peak load). In contrast, we propose TimeTrader to reduce
energy by exploiting the latency slack in the sub- critical replies which
arrive before the deadline (e.g., 80% of replies are 3-4x faster than the
tail). This slack is present at all loads and subsumes the previous work's
load-related slack. While the previous work shifts the leaves' response time
distribution to consume the slack at lower loads, TimeTrader reshapes the
distribution at all loads by slowing down individual sub-critical nodes without
increasing missed deadlines. TimeTrader exploits slack in both the network and
compute budgets. Further, TimeTrader leverages Earliest Deadline First
scheduling to largely decouple critical requests from the queuing delays of
sub- critical requests which can then be slowed down without hurting critical
requests. A combination of real-system measurements and at-scale simulations
shows that without adding to missed deadlines, TimeTrader saves 15-19% and
41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512
nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page
Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory
The role of robots in society keeps expanding, bringing with it the necessity
of interacting and communicating with humans. In order to keep such interaction
intuitive, we provide automatic wayfinding based on verbal navigational
instructions. Our first contribution is the creation of a large-scale dataset
with verbal navigation instructions. To this end, we have developed an
interactive visual navigation environment based on Google Street View; we
further design an annotation method to highlight mined anchor landmarks and
local directions between them in order to help annotators formulate typical,
human references to those. The annotation task was crowdsourced on the AMT
platform, to construct a new Talk2Nav dataset with routes. Our second
contribution is a new learning method. Inspired by spatial cognition research
on the mental conceptualization of navigational instructions, we introduce a
soft dual attention mechanism defined over the segmented language instructions
to jointly extract two partial instructions -- one for matching the next
upcoming visual landmark and the other for matching the local directions to the
next landmark. On the similar lines, we also introduce spatial memory scheme to
encode the local directional transitions. Our work takes advantage of the
advance in two lines of research: mental formalization of verbal navigational
instructions and training neural network agents for automatic way finding.
Extensive experiments show that our method significantly outperforms previous
navigation methods. For demo video, dataset and code, please refer to our
project page: https://www.trace.ethz.ch/publications/2019/talk2nav/index.htmlComment: 20 pages, 10 Figures, Demo Video:
https://people.ee.ethz.ch/~arunv/resources/talk2nav.mp
Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization
Modern ML applications increasingly rely on complex deep learning models and
large datasets. There has been an exponential growth in the amount of
computation needed to train the largest models. Therefore, to scale computation
and data, these models are inevitably trained in a distributed manner in
clusters of nodes, and their updates are aggregated before being applied to the
model. However, a distributed setup is prone to Byzantine failures of
individual nodes, components, and software. With data augmentation added to
these settings, there is a critical need for robust and efficient aggregation
systems. We define the quality of workers as reconstruction ratios ,
and formulate aggregation as a Maximum Likelihood Estimation procedure using
Beta densities. We show that the Regularized form of log-likelihood wrt
subspace can be approximately solved using iterative least squares solver, and
provide convergence guarantees using recent Convex Optimization landscape
results. Our empirical findings demonstrate that our approach significantly
enhances the robustness of state-of-the-art Byzantine resilient aggregators. We
evaluate our method in a distributed setup with a parameter server, and show
simultaneous improvements in communication efficiency and accuracy across
various tasks. The code is publicly available at
https://github.com/hamidralmasi/FlagAggregato
Identification of Clinical Mold Isolates by Sequence Analysis of the Internal Transcribed Spacer Region, Ribosomal Large-Subunit D1/D2, and β-Tubulin
Background: The identification of molds in clinical laboratories is largely on the basis of phenotypic criteria, the classification of which can be subjective. Recently, molecular methods have been introduced for identification of pathogenic molds in clinical settings. Here, we employed comparative sequence analysis to identify molds. Methods: A total of 47 clinical mold isolates were used in this study, including Aspergillus and Trichophyton. All isolates were identified by phenotypic properties, such as growth rate, colony morphology, and reproductive structures. PCR and direct sequencing, targeting the internal transcribed spacer (ITS) region, the D1/D2 region of the 28S subunit, and the ß-tubulin gene, were performed using primers described previously. Comparative sequence analysis by using the GenBank database was performed with the basic local alignment search tool (BLAST) algorithm. Results: For Aspergillus, 56 % and 67 % of the isolates were identified to the species level by using ITS and ß-tubulin analysis, respectively. Only D1/D2 analysis was useful for Trichophyton identification, with 100 % of isolates being identified to the species level. Performances of ITS and D1/D2 analyses were comparable for species-level identification of molds othe
- …