949 research outputs found

    Vehicle make and model recognition for intelligent transportation monitoring and surveillance.

    Get PDF
    Vehicle Make and Model Recognition (VMMR) has evolved into a significant subject of study due to its importance in numerous Intelligent Transportation Systems (ITS), such as autonomous navigation, traffic analysis, traffic surveillance and security systems. A highly accurate and real-time VMMR system significantly reduces the overhead cost of resources otherwise required. The VMMR problem is a multi-class classification task with a peculiar set of issues and challenges like multiplicity, inter- and intra-make ambiguity among various vehicles makes and models, which need to be solved in an efficient and reliable manner to achieve a highly robust VMMR system. In this dissertation, facing the growing importance of make and model recognition of vehicles, we present a VMMR system that provides very high accuracy rates and is robust to several challenges. We demonstrate that the VMMR problem can be addressed by locating discriminative parts where the most significant appearance variations occur in each category, and learning expressive appearance descriptors. Given these insights, we consider two data driven frameworks: a Multiple-Instance Learning-based (MIL) system using hand-crafted features and an extended application of deep neural networks using MIL. Our approach requires only image level class labels, and the discriminative parts of each target class are selected in a fully unsupervised manner without any use of part annotations or segmentation masks, which may be costly to obtain. This advantage makes our system more intelligent, scalable, and applicable to other fine-grained recognition tasks. We constructed a dataset with 291,752 images representing 9,170 different vehicles to validate and evaluate our approach. Experimental results demonstrate that the localization of parts and distinguishing their discriminative powers for categorization improve the performance of fine-grained categorization. Extensive experiments conducted using our approaches yield superior results for images that were occluded, under low illumination, partial camera views, or even non-frontal views, available in our real-world VMMR dataset. The approaches presented herewith provide a highly accurate VMMR system for rea-ltime applications in realistic environments.\\ We also validate our system with a significant application of VMMR to ITS that involves automated vehicular surveillance. We show that our application can provide law inforcement agencies with efficient tools to search for a specific vehicle type, make, or model, and to track the path of a given vehicle using the position of multiple cameras

    Describing Images by Semantic Modeling using Attributes and Tags

    Get PDF
    This dissertation addresses the problem of describing images using visual attributes and textual tags, a fundamental task that narrows down the semantic gap between the visual reasoning of humans and machines. Automatic image annotation assigns relevant textual tags to the images. In this dissertation, we propose a query-specific formulation based on Weighted Multi-view Non-negative Matrix Factorization to perform automatic image annotation. Our proposed technique seamlessly adapt to the changes in training data, naturally solves the problem of feature fusion and handles the challenge of the rare tags. Unlike tags, attributes are category-agnostic, hence their combination models an exponential number of semantic labels. Motivated by the fact that most attributes describe local properties, we propose exploiting localization cues, through semantic parsing of human face and body to improve person-related attribute prediction. We also demonstrate that image-level attribute labels can be effectively used as weak supervision for the task of semantic segmentation. Next, we analyze the Selfie images by utilizing tags and attributes. We collect the first large-scale Selfie dataset and annotate it with different attributes covering characteristics such as gender, age, race, facial gestures, and hairstyle. We then study the popularity and sentiments of the selfies given an estimated appearance of various semantic concepts. In brief, we automatically infer what makes a good selfie. Despite its extensive usage, the deep learning literature falls short in understanding the characteristics and behavior of the Batch Normalization. We conclude this dissertation by providing a fresh view, in light of information geometry and Fisher kernels to why the batch normalization works. We propose Mixture Normalization that disentangles modes of variation in the underlying distribution of the layer outputs and confirm that it effectively accelerates training of different batch-normalized architectures including Inception-V3, Densely Connected Networks, and Deep Convolutional Generative Adversarial Networks while achieving better generalization error

    Aerospace medicine and biology: A continuing bibliography with indexes (supplement 341)

    Get PDF
    This bibliography lists 133 reports, articles and other documents introduced into the NASA Scientific and Technical Information System during September 1990. Subject coverage includes: aerospace medicine and psychology, life support systems and controlled environments, safety equipment, exobiology and extraterrestrial life, and flight crew behavior and performance

    Highly Interactive Web-Based Courseware

    Get PDF
    Zukünftige Lehr-/Lernprogramme sollen als vernetzte Systeme die Lernenden befähigen, Lerninhalte zu erforschen und zu konstruieren, sowie Verständnisschwierigkeiten und Gedanken in der Lehr-/Lerngemeinschaft zu kommunizieren. Lehrmaterial soll dabei in digitale Lernobjekte übergeführt, kollaborativ von Programmierern, Pädagogen und Designern entwickelt und in einer Datenbank archiviert werden, um von Lehrern und Lernenden eingesetzt, angepasst und weiterentwickelt zu werden. Den ersten Schritt in diese Richtung machte die Lerntechnologie, indem sie Wiederverwendbarkeit und Kompabilität für hypermediale Kurse spezifizierte. Ein größeres Maß an Interaktivität wird bisher allerdings noch nicht in Betracht gezogen. Jedes interaktive Lernobjekt wird als autonome Hypermedia-Einheit angesehen, aufwändig in der Erstellung, und weder mehrstufig verschränk- noch anpassbar, oder gar adäquat spezifizierbar. Dynamische Eigenschaften, Aussehen und Verhalten sind fest vorgegeben. Die vorgestellte Arbeit konzipiert und realisiert Lerntechnologie für hypermediale Kurse unter besonderer Berücksichtigung hochgradig interaktiver Lernobjekte. Innovativ ist dabei zunächst die mehrstufige, komponenten-basierte Technologie, die verschiedenste strukturelle Abstufungen von kompletten Lernobjekten und Werkzeugsätzen bis hin zu Basiskomponenten und Skripten, einzelnen Programmanweisungen, erlaubt. Zweitens erweitert die vorgeschlagene Methodik Kollaboration und individuelle Anpassung seitens der Teilnehmer eines hypermedialen Kurses auf die Software-Ebene. Komponenten werden zu verknüpfbaren Hypermedia-Objekten, die in der Kursdatenbank verwaltet und von allen Kursteilnehmern bewertet, mit Anmerkungen versehen und modifiziert werden. Neben einer detaillierten Beschreibung der Lerntechnologie und Entwurfsmuster für interaktive Lernobjekte sowie verwandte hypermediale Kurse wird der Begriff der Interaktivität verdeutlicht, indem eine kombinierte technologische und symbolische Definition von Interaktionsgraden vorgestellt und daraus ein visuelles Skriptschema abgeleitet wird, welches Funktionalität übertragbar macht. Weiterhin wird die Evolution von Hypermedia und Lehr-/Lernprogrammen besprochen, um wesentliche Techniken für interaktive, hypermediale Kurse auszuwählen. Die vorgeschlagene Architektur unterstützt mehrsprachige, alternative Inhalte, bietet konsistente Referenzen und ist leicht zu pflegen, und besitzt selbst für interaktive Inhalte Online-Assistenten. Der Einsatz hochgradiger Interaktivität in Lehr-/Lernprogrammen wird mit hypermedialen Kursen im Bereich der Computergraphik illustriert.The grand vision of educational software is that of a networked system enabling the learner to explore, discover, and construct subject matters and communicate problems and ideas with other community members. Educational material is transformed into reusable learning objects, created collaboratively by developers, educators, and designers, preserved in a digital library, and utilized, adapted, and evolved by educators and learners. Recent advances in learning technology specified reusability and interoperability in Web-based courseware. However, great interactivity is not yet considered. Each interactive learning object represents an autonomous hypermedia entity, laborious to create, impossible to interlink and to adapt in a graduated manner, and hard to specify. Dynamic attributes, the look and feel, and functionality are predefined. This work designs and realizes learning technology for Web-based courseware with special regard to highly interactive learning objects. The innovative aspect initially lies in the multi-level, component-based technology providing a graduated structuring. Components range from complex learning objects to toolkits to primitive components and scripts. Secondly, the proposed methodologies extend community support in Web-based courseware – collaboration and personalization – to the software layer. Components become linkable hypermedia objects and part of the courseware repository, rated, annotated, and modified by all community members. In addition to a detailed description of technology and design patterns for interactive learning objects and matching Web-based courseware, the thesis clarifies the denotation of interactivity in educational software formulating combined levels of technological and symbolical interactivity, and deduces a visual scripting metaphor for transporting functionality. Further, it reviews the evolution of hypermedia and educational software to extract substantial techniques for interactive Web-based courseware. The proposed framework supports multilingual, alternative content, provides link consistency and easy maintenance, and includes state-driven online wizards also for interactive content. The impact of great interactivity in educational software is illustrated with courseware in the Computer Graphics domain

    Extending DoD modeling and simulation with Web 2.0, Ajax and X3D

    Get PDF
    DoD has much to gain from Web 2.0 and the Ajax paradigm in open source. The Java language has come a long way in providing real world case studies and scalable solutions for the enterprise that are currently in production on sites such as eBay.com (http://www.ebay.com) and MLB.com (http://www.mlb.com). The most popular Ajax application in production is Google Maps (http://maps.google.com), which serves as a good example of the power of the technology. Open Source technology has matured greatly in the past three years and is now mature enough for deployment within DoD systems. In the past, management within the DoD has been reluctant to consider Enterprise Level Open Source Technologies as a solution, fearing that they might receive little to no support. In fact, the Open Source Business Model is entirely based on first developing a broad user base then providing support as a service for their clients. DoD Modeling and Simulation can create dynamic and compelling content that is ready for the challenges of the 21st century and completely integrated with the Global Information Grid (GIG) concept. This paper presents a short history of Model View Controller (MVC) architectures and goes over various pros and cons of each framework (Struts, Spring, Java Server Faces), which is critical for the deployment of a modern Java web application. Ajax and various frameworks are then discussed (Dojo, Google Web Toolkit, ZK, and Echo2). The paper then touches on Ajax3D technologies and the use of Rez to generate 3D models of entire cities and goes on to discuss possible extended functionality of the Rez concept to create a terrain system like Google Earth in X3D-Earth.http://archive.org/details/extendingdodmode109453282US Navy (USN) author.Approved for public release; distribution is unlimited

    Exploiting Spatio-Temporal Coherence for Video Object Detection in Robotics

    Get PDF
    This paper proposes a method to enhance video object detection for indoor environments in robotics. Concretely, it exploits knowledge about the camera motion between frames to propagate previously detected objects to successive frames. The proposal is rooted in the concepts of planar homography to propose regions of interest where to find objects, and recursive Bayesian filtering to integrate observations over time. The proposal is evaluated on six virtual, indoor environments, accounting for the detection of nine object classes over a total of ∼ 7k frames. Results show that our proposal improves the recall and the F1-score by a factor of 1.41 and 1.27, respectively, as well as it achieves a significant reduction of the object categorization entropy (58.8%) when compared to a two-stage video object detection method used as baseline, at the cost of small time overheads (120 ms) and precision loss (0.92).</p

    Object Tracking

    Get PDF
    Object tracking consists in estimation of trajectory of moving objects in the sequence of images. Automation of the computer object tracking is a difficult task. Dynamics of multiple parameters changes representing features and motion of the objects, and temporary partial or full occlusion of the tracked objects have to be considered. This monograph presents the development of object tracking algorithms, methods and systems. Both, state of the art of object tracking methods and also the new trends in research are described in this book. Fourteen chapters are split into two sections. Section 1 presents new theoretical ideas whereas Section 2 presents real-life applications. Despite the variety of topics contained in this monograph it constitutes a consisted knowledge in the field of computer object tracking. The intention of editor was to follow up the very quick progress in the developing of methods as well as extension of the application

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div
    corecore