474 research outputs found

    Convolutional Neural Networks for Image Recognition in Mixed Reality Using Voice Command Labeling

    Get PDF
    In the context of the Industrial Internet of Things (IIoT), image and object recognition has become an important factor. Camera systems provide information to realize sophisticated monitoring applications, quality control solutions, or reliable prediction approaches. During the last years, the evolution of smart glasses has enabled new technical solutions as they can be seen as mobile and ubiquitous cameras. As an important aspect in this context, the recognition of objects from images must be reliably solved to realize the previously mentioned solutions. Therefore, algorithms need to be trained with labeled input to recognize differences in input images. We simplify this labeling process using voice commands in Mixed Reality. The generated input from the mixed- reality labeling is put into a convolutional neural network. The latter is trained to classify the images with different objects. In this work, we describe the development of this mixed-reality prototype with its backend architecture. Furthermore, we test the classification robustness with im- age distortion filters. We validated our approach with format parts from a blister machine provided by a pharmaceutical packaging company in Germany. Our results indicate that the proposed architecture is at least suitable for small classification problems and not sensitive to distortions

    Object Detection with HoloLens 2 using Mixed Reality and Unity a proof-of-concept

    Get PDF

    A Mixed Reality application for Object detection with audiovisual feedback through MS HoloLenses

    Get PDF
    Καθώς η επιστήμη των υπολογιστών αναπτύσσεται και προοδεύει, εμφανίζονται νέες τεχνολογίες. Οι πρόσφατες εξελίξεις στην επαυξημένη πραγματικότητα και την τεχνητή νοημοσύνη έχουν κάνει αυτές τις τεχνολογίες να πρωτοπορήσουν στην καινοτομία και την αλλαγή σε κάθε τομέα και κλάδο. Οι ταχύρρυθμες εξελίξεις στην μηχανική όραση και την επαυξημένη πραγματικότητα διευκόλυναν την ανάλυση και την κατανόηση του περιβάλλοντος χώρου. Η μεικτή και επαυξημένη πραγματικότητα μπορεί να επεκτείνει σε μεγάλο βαθμό τις δυνατότητες και τις εμπειρίες ενός χρήστη, φέρνοντας ψηφιακά δεδομένα απευθείας στον φυσικό κόσμο όπου και όταν είναι απαραίτητα. Τα τρέχοντα έξυπνα γυαλιά, όπως η συσκευή Microsoft HoloLens, υπερέχουν στην τοποθέτηση εντός του φυσικού περιβάλλοντος, ωστόσο η αναγνώριση αντικειμένων εξακολουθεί να είναι σχετικά πρωτόγονη. Με μια πρόσθετη σημασιολογική κατανόηση του φυσικού πλαισίου του χρήστη, οι έξυπνοι ψηφιακοί πράκτορες μπορούν να βοηθήσουν τους χρήστες στην ολοκλήρωση εργασιών. Στην παρούσα εργασία, παρουσιάζεται ένα σύστημα μεικτής πραγματικότητας που, χρησιμοποιώντας τους αισθητήρες που είναι τοποθετημένοι στα HoloLens και μια υπηρεσία cloud, αποκτά και επεξεργάζεται δεδομένα σε πραγματικό χρόνο για την ανίχνευση διαφορετικών ειδών αντικειμένων και τοποθετεί γεωγραφικά συνεκτικά ολογράμματα που περικλείουν τα εντοπισμένα αντικείμενα και δίνουν πληροφορία για την κλάση στην οποία ανήκουν. Για την αντιμετώπιση των εγγενών περιορισμών υλικού των HoloLens, εκτελούμε μέρος του συνολικού υπολογισμού σε περιβάλλον cloud. Συγκεκριμένα, οι αλγόριθμοι ανίχνευσης αντικειμένων, που βασίζονται σε Deep Neural Networks (DNNs), εκτελούνται σε ένα σύστημα που υποστηρίζει RESTful κλήσεις δεδομένων και φιλοξενείται σε ένα NVIDIA Jetson TX2, μια γρήγορη και αποδοτική ενσωματωμένη υπολογιστική συσκευή AI. Εφαρμόζουμε το YOLOv3 (You Only Look Once) ως αλγόριθμο Βαθιάς Μηχανικής μάθησης, χρησιμοποιώντας ένα μοντέλο εκπαιδευμένο στο σύνολο δεδομένων MS COCO. Αυτός ο αλγόριθμος παρέχει ταχύτητα ανίχνευσης και ακριβή αποτελέσματα με ελάχιστα σφάλματα. Ταυτόχρονα, αντισταθμίζουμε τις καθυστερήσεις μετάδοσης και υπολογισμού εκτελώντας έλεγχο ομοιότητας μεταξύ των καρέ λήψης κάμερας των HoloLens, πριν εφαρμόσουμε σε ένα καρέ τους αλγόριθμους ανίχνευσης αντικειμένων, για να αποφύγουμε την εκτέλεση της εργασίας ανίχνευσης αντικειμένων όταν το περιβάλλον του χρήστη είναι αρκετά παρόμοιο και να περιορίσουμε τους πολύπλοκους υπολογισμούς. Αυτή η εφαρμογή στοχεύει επίσης στη χρήση σύγχρονης τεχνολογίας για να βοηθήσει άτομα με προβλήματα όρασης ή τύφλωση. Ο χρήστης μπορεί με φωνητική εντολή να ξεκινήσει μια σάρωση του περιβάλλοντος χώρου. Εκτός από την οπτική ανατροφοδότηση, η εφαρμογή μπορεί να διαβάσει το όνομα κάθε αντικειμένου που ανιχνεύτηκε μαζί με τη σχετική θέση του στον χώρο και την απόστασή του από τον χρήστη με βάση το χωρικό μοντέλο των HoloLens. Έτσι, υποστηρίζει τον προσανατολισμό του χρήστη χωρίς να απαιτείται εκτενή εκπαίδευση για την χρήση της.As computer science develops and progresses, new technologies emerge. Recent advances in augmented reality and artificial intelligence have caused these technologies to pioneer innovation and alteration in any field and industry. The fast-paced developments in computer vision and augmented reality facilitated analyzing and understanding the surrounding environments. Mixed and Augmented Reality can greatly extend a user capabilities and experiences by bringing digital data directly into the physical world where and when it is most needed. Current smart glasses such as Microsoft HoloLens device excel at positioning within the physical environment, however object recognition is still relatively primitive. With an additional semantic understanding of the wearer’s physical context, intelligent digital agents can assist workers in warehouses, factories, greenhouses, etc. or guide consumers through completion of physical tasks. We present a mixed reality system that, using the sensors mounted on the Microsoft HoloLens headset and a cloud service, acquires and processes in real-time data to detect and track different kinds of objects and finally superimposes geographically coherent holographic tooltips and bounding boxes on the detected objects. Such a goal has been achieved dealing with the intrinsic headset hardware limitations, by performing part of the overall computation in an edge/cloud environment. In particular, the heavier object detection algorithms, based on Deep Neural Networks (DNNs), are executed in a cloud RESTful system hosted by a server running on an NVIDIA Jetson TX2, a fast and power-efficient embedded AI computing device. We apply YOLOv3 (You Only Look Once) as a deep learning algorithm at server side to process the data from the user side, using a model trained on the public dataset MS COCO. This algorithm improves the speed of detection and provides accurate results with minimal background errors. At the same time, we compensate for cloud transmission and computation latencies by running camera frames similarity check between the current and previous HoloLens camera capture frames, before applying the object detection algorithms on a camera frame, to avoid running the object detection task when the user surrounding environment is significantly similar and limit as much as possible complex computations. This application also aims to use modern technology to help people with visual impairment or blindness. The user can issue a voice command to initiate an environment scan. Apart from visual feedback provided for the detected objects, the application can read out the name of each detected object along with its relative position in the user's view. A distance announcement of the detected objects is also derived using the HoloLens’s spatial model. The wearable solution offers the opportunity to efficiently locate objects to support orientation without extensive training of the user

    Mixing Modalities of 3D Sketching and Speech for Interactive Model Retrieval in Virtual Reality

    Get PDF
    Sketch and speech are intuitive interaction methods that convey complementary information and have been independently used for 3D model retrieval in virtual environments. While sketch has been shown to be an effective retrieval method, not all collections are easily navigable using this modality alone. We design a new challenging database for sketch comprised of 3D chairs where each of the components (arms, legs, seat, back) are independently colored. To overcome this, we implement a multimodal interface for querying 3D model databases within a virtual environment. We base the sketch on the state-of-the-art for 3D Sketch Retrieval, and use a Wizard-of-Oz style experiment to process the voice input. In this way, we avoid the complexities of natural language processing which frequently requires fine-tuning to be robust. We conduct two user studies and show that hybrid search strategies emerge from the combination of interactions, fostering the advantages provided by both modalities

    ESTABLISHING THE FOUNDATION TO ROBOTIZE COMPLEX WELDING PROCESSES THROUGH LEARNING FROM HUMAN WELDERS BASED ON DEEP LEARNING TECHNIQUES

    Get PDF
    As the demand for customized, efficient, and high-quality production increases, traditional manufacturing processes are transforming into smart manufacturing with the aid of advancements in information technology, such as cyber-physical systems (CPS), the Internet of Things (IoT), big data, and artificial intelligence (AI). The key requirement for integration with these advanced information technologies is to digitize manufacturing processes to enable analysis, control, and interaction with other digitized components. The integration of deep learning algorithm and massive industrial data will be critical components in realizing this process, leading to enhanced manufacturing in the Future of Work at the Human-Technology Frontier (FW-HTF). This work takes welding manufacturing as the case study to accelerate its transition to intelligent welding by robotize a complex welding process. By integrate process sensing, data visualization, deep learning-based modeling and optimization, a complex welding system is established, with the systematic solution to generalize domain-specific knowledge from experienced human welder. Such system can automatically perform complex welding processes that can only be handled by human in the past. To enhance the system\u27s tracking capabilities, we trained an image segmentation network to offer precise position information. We incorporated a recurrent neural network structure to analyze dynamic variations during welding. Addressing the challenge of human heterogeneity in data collection, we conducted experiments illustrating that even inaccurate datasets can effectively train deep learning models with zero mean error. Fine-tuning the model with a small portion of accurate data further elevates its performance

    Artificial Intelligence Technology

    Get PDF
    This open access book aims to give our readers a basic outline of today’s research and technology developments on artificial intelligence (AI), help them to have a general understanding of this trend, and familiarize them with the current research hotspots, as well as part of the fundamental and common theories and methodologies that are widely accepted in AI research and application. This book is written in comprehensible and plain language, featuring clearly explained theories and concepts and extensive analysis and examples. Some of the traditional findings are skipped in narration on the premise of a relatively comprehensive introduction to the evolution of artificial intelligence technology. The book provides a detailed elaboration of the basic concepts of AI, machine learning, as well as other relevant topics, including deep learning, deep learning framework, Huawei MindSpore AI development framework, Huawei Atlas computing platform, Huawei AI open platform for smart terminals, and Huawei CLOUD Enterprise Intelligence application platform. As the world’s leading provider of ICT (information and communication technology) infrastructure and smart terminals, Huawei’s products range from digital data communication, cyber security, wireless technology, data storage, cloud computing, and smart computing to artificial intelligence

    Machine Learning for Microcontroller-Class Hardware -- A Review

    Full text link
    The advancements in machine learning opened a new opportunity to bring intelligence to the low-end Internet-of-Things nodes such as microcontrollers. Conventional machine learning deployment has high memory and compute footprint hindering their direct deployment on ultra resource-constrained microcontrollers. This paper highlights the unique requirements of enabling onboard machine learning for microcontroller class devices. Researchers use a specialized model development workflow for resource-limited applications to ensure the compute and latency budget is within the device limits while still maintaining the desired performance. We characterize a closed-loop widely applicable workflow of machine learning model development for microcontroller class devices and show that several classes of applications adopt a specific instance of it. We present both qualitative and numerical insights into different stages of model development by showcasing several use cases. Finally, we identify the open research challenges and unsolved questions demanding careful considerations moving forward.Comment: Accepted for publication at IEEE Sensors Journa

    Dimensionality Reduction and Subspace Clustering in Mixed Reality for Condition Monitoring of High-Dimensional Production Data

    Get PDF
    Visual analytics are becoming more and more important in the light of big data and related scenarios. Along this trend, the field of immersive analytics has been variously furthered as it is able to provide sophisticated visual data analytics on one hand, while preserving user-friendliness on the other. Furthermore, recent hardware developments like smart glasses, as well as achievements in virtual-reality applications, have fanned immersive analytic solutions. Notably, such solutions can be very effective when they are applied to high-dimensional data sets. Taking this advantage into account, the work at hand applies immersive analytics to a high-dimensional production data set in order to improve the digital support of daily work tasks. More specifically, a mixed-reality implementation is presented that shall support manufactures as well as data scientists to comprehensively analyze machine data. As a particular goal, the prototype shall simplify the analysis of manufacturing data through the usage of dimensionality reduction effects. Therefore, five aspects are mainly reported in this paper. First, it is shown how dimensionality reduction effects can be represented by clusters. Second, it is presented how the resulting information loss of the reduction is addressed. Third, the graphical interface of the developed prototype is illustrated as it provides a (1) correlation coefficient graph, a (2) plot for the information loss, and a (3) 3D particle system. In addition, an implemented voice recognition feature of the prototype is shown, which was considered as being promising to select or deselect data variables users are interested in when analyzing the data. Fourth, based on a machine learning library, it is shown how the prototype reduces computational resources by the use of smart glasses. The main idea is based on a recommendation approach as well as the use of subspace clustering. Fifth, results from a practical setting are presented, in which the prototype was shown to domain experts. The latter reported that such a tool is actually helpful to analyze machine data on a daily basis. Moreover, it was reported that such system can be used to educate machine operators more properly. As a general outcome of this work, the presented approach may constitute a helpful solution for the industry as well as other domains like medicine
    corecore