842 research outputs found

    Object Referring in Visual Scene with Spoken Language

    Full text link
    Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is more natural. This paper investigates Object Referring with Spoken Language (ORSpoken) by presenting two datasets and one novel approach. Objects are annotated with their locations in images, text descriptions and speech descriptions. This makes the datasets ideal for multi-modality learning. The approach is developed by carefully taking down ORSpoken problem into three sub-problems and introducing task-specific vision-language interactions at the corresponding levels. Experiments show that our method outperforms competing methods consistently and significantly. The approach is also evaluated in the presence of audio noise, showing the efficacy of the proposed vision-language interaction methods in counteracting background noise.Comment: 10 pages, Submitted to WACV 201

    Object Referring in Videos with Language and Human Gaze

    Full text link
    We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

    COMPUTIONAL MODELING OF PRECISION MOLDING OF ASPHERIC GLASS OPTICS

    Get PDF
    In this dissertation, research in two parallel directions is presented; the first involves the prediction of the final size and shape of a glass lens during a precision glass lens molding process and the second introduces a method to compute and quantify the importance of higher order terms in fracture mechanics for different modes of fracture. The process of precision lens molding has received attention in recent years due to its potential to mass produce aspherical lenses. Aspherical lenses have significantly better optical properties and conventional lens making techniques are limited to manufacturing of spherical lenses only. The conventional technique involves an iterative procedure of grinding, lapping and polishing to obtain a desired surface profile. However in precision molding, the glass raw material or preform is placed between dies and heated until it becomes soft and molten. Then the dies are pressed against each other to deform the molten glass to take the shape of the dies. After this stage glass is cooled to room temperature by the use of nitrogen gas. Thus, in a single process the lens is made unlike the traditional approach. Although the molding process appears to a better alternative, there are shortcomings that need to be addressed before using the process for mass production. From the point of view of the current study, the shortcomings include both surface profiles and center thickness of the final lens. In the expensive process of mold preparation, the mold surfaces are first machined to be exact negatives of the required surface profile of the lens. One of the main issues is the deviation of the surface profile of the final molded lens from that of the molds due to the complex, time and temperature dependent stress state experienced by the lens during the approximately 15 minute process of heating, pressing and then cooling. In current practice the deviation of manufactured lenses is as high as 20 microns, approximately 20 times the allowable deviation according to the optical design specifications. The empirical approach to solving this problem is to compensate the molds by trial and error based on practical experience which is very time-consuming and costly. Usually it takes 3-4 months and a considerable amount of money to compensate the molds to meet current specifications. This has motivated the development of computational solutions to arrive at a compensated mold shape which requires the prediction of the lens deviation within micron level accuracy taking into account process parameters and the complex material behavior of glass. In this research, ABAQUS, a commercial FEM solver, is used to simulate the process and predict the final size/shape of the lens. The computational study of final size and shape includes a sensitivity analysis of the various material and process parameters. The material parameters include viscoelasticity, structural relaxation and the thermo-rheological behavior of the glass; friction and gap dependent heat transfer at the interface; and the thermo-mechanical properties of the molds. This comprehensive study will not only eliminate some of the parameters which have the least effect on the final size/shape, but also identify the key material properties and substantiate the need to obtain them more accurately through experimentation. At this time it should be mentioned that the material properties of the molding glasses considered are not available. Friction coefficient at the mold/glass interface is one of the important input parameters in the model. A ring compression test was used in the current research to find the friction coefficient. In this test, a \u27washer\u27 or a ring shaped specimen is compressed between two flat dies at the molding temperature and the change in internal diameter is correlated to a friction coefficient. The main strength of this test is the sensitive nature of the inner diameter change during pressing for different friction conditions at the interface. In addition to friction coefficient, approximate viscoelastic material properties and the TRS behavior were also found out using this test from the experimental force and displacement data. After validating the model to well within one micron, it was determined that the deviation of the lens profile with respect to the molds is primarily caused by structural relaxation of glass, thermal expansion behavior of the molds, friction at the glass/mold interface and time-temperature dependence of the viscoelastic material behavior of glass. Several practical examples/numerical studies that clearly show the cause for the deviation are presented. It is also shown that the deviation in the molded lens is affected by its location with respect to the molds. Finally the process of mold compensation is demonstrated using the computational tool. In the other parallel direction, a method to determine higher order coefficients in fracture mechanics from the solution of a singular integral equation is presented. In the asymptotic series the stress intensity factor, k0 is the first coefficient, and the T-stress, T0 is the second coefficient. For the example of an edge crack in a half space, converged values of the first twelve mode I coefficients (kn and Tn, n=0,...,5) have been determined, and for an edge crack in a finite width strip, the first six coefficients are presented. Coefficients for an internal crack in a half space are also presented. Results for an edge crack in a finite width strip are used to quantify the size of the k-dominant zone, the kT-dominant zone and the zones associated with three and four terms, taking into account the entire region around the crack tip. Finally, this method was also applied to fracture problems with Mode-II loading

    LeaF: A Learning-based Fault Diagnostic System for Multi-Robot Teams

    Get PDF
    The failure-prone complex operating environment of a standard multi-robot application dictates some amount of fault-tolerance to be incorporated into every system. In fact, the quality of the incorporated fault-tolerance has a direct impact on the overall performance of the system. Despite the extensive work being done in the field of multi-robot systems, there does not exist a general methodology for fault diagnosis and recovery. The objective of this research, in part, is to provide an adaptive approach that enables the robot team to autonomously detect and compensate for the wide variety of faults that could be experienced. The key feature of the developed approach is its ability to learn useful information from encountered faults, unique or otherwise, towards a more robust system. As part of this research, we analyzed an existing multi-agent architecture, CMM – Causal Model Method – as a fault diagnostic solution for a sample multi-robot application. Based on the analysis, we claim that a causal model approach is effective for anticipating and recovering from many types of robot team errors. However, the analysis also showed that the CMM method in its current form is incomplete as a turn-key solution. Due to the significant number of possible failure modes in a complex multi-robot application, and the difficulty in anticipating all possible failures in advance, one cannot guarantee the generation of a complete a priori causal model that identifies and specifies all faults that may occur in the system. Therefore, based on these preliminary studies, we designed an alternate approach, called LeaF: Learning based Fault diagnostic architecture for multi-robot teams. LeaF is an adaptive method that uses its experience to update and extend its causal model to enable the team, over time, to better recover from faults when they occur. LeaF combines the initial fault model with a case-based learning algorithm, LID – Lazy Induction of Descriptions — to allow robot team members to diagnose faults and to automatically update their causal models. The modified LID algorithm uses structural similarity between fault characteristics as a means of classifying previously un-encountered faults. Furthermore, the use of learning allows the system to identify and categorize unexpected faults, enable team members to learn from problems encountered by others, and make intelligent decisions regarding the environment. To evaluate LeaF, we implemented it in two challenging and dynamic physical multi-robot applications. The other significant contribution of the research is the development of metrics to measure the fault-tolerance, within the context of system performance, for a multi-robot system. In addition to developing these metrics, we also outline potential methods to better interpret the obtained measures towards truly understanding the capabilities of the implemented system. The developed metrics are designed to be application independent and can be used to evaluate and/or compare different fault-tolerance architectures like CMM and LeaF. To the best of our knowledge, this approach is the only one that attempts to capture the effect of intelligence, reasoning, or learning on the effective fault-tolerance of the system, rather than relying purely on traditional redundancy based measures. Finally, we show the utility of the designed metrics by applying them to the obtained physical robot experiments, measuring the effective fault-tolerance and system performance, and subsequently analyzing the calculated measures to help better understand the capabilities of LeaF

    TimeTrader: Exploiting Latency Tail to Save Datacenter Energy for On-line Data-Intensive Applications

    Get PDF
    Datacenters running on-line, data-intensive applications (OLDIs) consume significant amounts of energy. However, reducing their energy is challenging due to their tight response time requirements. A key aspect of OLDIs is that each user query goes to all or many of the nodes in the cluster, so that the overall time budget is dictated by the tail of the replies' latency distribution; replies see latency variations both in the network and compute. Previous work proposes to achieve load-proportional energy by slowing down the computation at lower datacenter loads based directly on response times (i.e., at lower loads, the proposal exploits the average slack in the time budget provisioned for the peak load). In contrast, we propose TimeTrader to reduce energy by exploiting the latency slack in the sub- critical replies which arrive before the deadline (e.g., 80% of replies are 3-4x faster than the tail). This slack is present at all loads and subsumes the previous work's load-related slack. While the previous work shifts the leaves' response time distribution to consume the slack at lower loads, TimeTrader reshapes the distribution at all loads by slowing down individual sub-critical nodes without increasing missed deadlines. TimeTrader exploits slack in both the network and compute budgets. Further, TimeTrader leverages Earliest Deadline First scheduling to largely decouple critical requests from the queuing delays of sub- critical requests which can then be slowed down without hurting critical requests. A combination of real-system measurements and at-scale simulations shows that without adding to missed deadlines, TimeTrader saves 15-19% and 41-49% energy at 90% and 30% loading, respectively, in a datacenter with 512 nodes, whereas previous work saves 0% and 31-37%.Comment: 13 page

    Talk2Nav: Long-Range Vision-and-Language Navigation with Dual Attention and Spatial Memory

    Full text link
    The role of robots in society keeps expanding, bringing with it the necessity of interacting and communicating with humans. In order to keep such interaction intuitive, we provide automatic wayfinding based on verbal navigational instructions. Our first contribution is the creation of a large-scale dataset with verbal navigation instructions. To this end, we have developed an interactive visual navigation environment based on Google Street View; we further design an annotation method to highlight mined anchor landmarks and local directions between them in order to help annotators formulate typical, human references to those. The annotation task was crowdsourced on the AMT platform, to construct a new Talk2Nav dataset with 10,71410,714 routes. Our second contribution is a new learning method. Inspired by spatial cognition research on the mental conceptualization of navigational instructions, we introduce a soft dual attention mechanism defined over the segmented language instructions to jointly extract two partial instructions -- one for matching the next upcoming visual landmark and the other for matching the local directions to the next landmark. On the similar lines, we also introduce spatial memory scheme to encode the local directional transitions. Our work takes advantage of the advance in two lines of research: mental formalization of verbal navigational instructions and training neural network agents for automatic way finding. Extensive experiments show that our method significantly outperforms previous navigation methods. For demo video, dataset and code, please refer to our project page: https://www.trace.ethz.ch/publications/2019/talk2nav/index.htmlComment: 20 pages, 10 Figures, Demo Video: https://people.ee.ethz.ch/~arunv/resources/talk2nav.mp

    Flag Aggregator: Scalable Distributed Training under Failures and Augmented Losses using Convex Optimization

    Full text link
    Modern ML applications increasingly rely on complex deep learning models and large datasets. There has been an exponential growth in the amount of computation needed to train the largest models. Therefore, to scale computation and data, these models are inevitably trained in a distributed manner in clusters of nodes, and their updates are aggregated before being applied to the model. However, a distributed setup is prone to Byzantine failures of individual nodes, components, and software. With data augmentation added to these settings, there is a critical need for robust and efficient aggregation systems. We define the quality of workers as reconstruction ratios ∈(0,1]\in (0,1], and formulate aggregation as a Maximum Likelihood Estimation procedure using Beta densities. We show that the Regularized form of log-likelihood wrt subspace can be approximately solved using iterative least squares solver, and provide convergence guarantees using recent Convex Optimization landscape results. Our empirical findings demonstrate that our approach significantly enhances the robustness of state-of-the-art Byzantine resilient aggregators. We evaluate our method in a distributed setup with a parameter server, and show simultaneous improvements in communication efficiency and accuracy across various tasks. The code is publicly available at https://github.com/hamidralmasi/FlagAggregato

    Identification of Clinical Mold Isolates by Sequence Analysis of the Internal Transcribed Spacer Region, Ribosomal Large-Subunit D1/D2, and β-Tubulin

    Get PDF
    Background: The identification of molds in clinical laboratories is largely on the basis of phenotypic criteria, the classification of which can be subjective. Recently, molecular methods have been introduced for identification of pathogenic molds in clinical settings. Here, we employed comparative sequence analysis to identify molds. Methods: A total of 47 clinical mold isolates were used in this study, including Aspergillus and Trichophyton. All isolates were identified by phenotypic properties, such as growth rate, colony morphology, and reproductive structures. PCR and direct sequencing, targeting the internal transcribed spacer (ITS) region, the D1/D2 region of the 28S subunit, and the ß-tubulin gene, were performed using primers described previously. Comparative sequence analysis by using the GenBank database was performed with the basic local alignment search tool (BLAST) algorithm. Results: For Aspergillus, 56 % and 67 % of the isolates were identified to the species level by using ITS and ß-tubulin analysis, respectively. Only D1/D2 analysis was useful for Trichophyton identification, with 100 % of isolates being identified to the species level. Performances of ITS and D1/D2 analyses were comparable for species-level identification of molds othe
    • …
    corecore