2,112 research outputs found
Video Summarization Using Unsupervised Deep Learning
In this thesis, we address the task of video summarization using unsupervised deep-learning architectures. Video summarization aims to generate a short summary by selecting the most informative and important frames (key-frames) or fragments (key-fragments) of the full-length video, and presenting them in temporally-ordered fashion. Our objective is to overcome observed weaknesses of existing video summarization approaches that utilize RNNs for modeling the temporal dependence of frames, related to: i) the small influence of the estimated frame-level importance scores in the created video summary, ii) the insufficiency of RNNs to model long-range frames' dependence, and iii) the small amount of parallelizable operations during the training of RNNs. To address the first weakness, we propose a new unsupervised network architecture, called AC-SUM-GAN, which formulates the selection of important video fragments as a sequence generation task and learns this task by embedding an Actor-Critic model in a Generative Adversarial Network. The feedback of a trainable Discriminator is used as a reward by the Actor-Critic model in order to explore a space of actions and learn a value function (Critic) and a policy (Actor) for video fragment selection. To tackle the remaining weaknesses, we investigate the use of attention mechanisms for video summarization and propose a new supervised network architecture, called PGL-SUM, that combines global and local multi-head attention mechanisms which take into account the temporal position of the video frames, in order to discover different modelings of the frames' dependencies at different levels of granularity. Based on the acquired experience, we then propose a new unsupervised network architecture, called CA-SUM, which estimates the frames' importance using a novel concentrated attention mechanism that focuses on non-overlapping blocks in the main diagonal of the attention matrix and takes into account the attentive uniqueness and diversity of the associated frames of the video. All the proposed architectures have been extensively evaluated on the most commonly-used benchmark datasets, demonstrating their competitiveness against other approaches and documenting the contribution of our proposals on advancing the current state-of-the-art on video summarization. Finally, we make a first attempt on producing explanations for the video summarization results. Inspired by relevant works in the Natural Language Processing domain, we propose an attention-based method for explainable video summarization and we evaluate the performance of various explanation signals using our CA-SUM architecture and two benchmark datasets for video summarization. The experimental results indicate the advanced performance of explanation signals formed using the inherent attention weights, and demonstrate the ability of the proposed method to explain the video summarization results using clues about the focus of the attention mechanism
Sensing Collectives: Aesthetic and Political Practices Intertwined
Are aesthetics and politics really two different things? The book takes a new look at how they intertwine, by turning from theory to practice. Case studies trace how sensory experiences are created and how collective interests are shaped. They investigate how aesthetics and politics are entangled, both in building and disrupting collective orders, in governance and innovation. This ranges from populist rallies and artistic activism over alternative lifestyles and consumer culture to corporate PR and governmental policies. Authors are academics and artists. The result is a new mapping of the intermingling and co-constitution of aesthetics and politics in engagements with collective orders
Modern meat: the next generation of meat from cells
Modern Meat is the first textbook on cultivated meat, with contributions from over 100 experts within the cultivated meat community.
The Sections of Modern Meat comprise 5 broad categories of cultivated meat: Context, Impact, Science, Society, and World.
The 19 chapters of Modern Meat, spread across these 5 sections, provide detailed entries on cultivated meat. They extensively tour a range of topics including the impact of cultivated meat on humans and animals, the bioprocess of cultivated meat production, how cultivated meat may become a food option in Space and on Mars, and how cultivated meat may impact the economy, culture, and tradition of Asia
Sports competition tactical analysis model of cross-modal transfer learning intelligent robot based on Swin Transformer and CLIP
IntroductionThis paper presents an innovative Intelligent Robot Sports Competition Tactical Analysis Model that leverages multimodal perception to tackle the pressing challenge of analyzing opponent tactics in sports competitions. The current landscape of sports competition analysis necessitates a comprehensive understanding of opponent strategies. However, traditional methods are often constrained to a single data source or modality, limiting their ability to capture the intricate details of opponent tactics.MethodsOur system integrates the Swin Transformer and CLIP models, harnessing cross-modal transfer learning to enable a holistic observation and analysis of opponent tactics. The Swin Transformer is employed to acquire knowledge about opponent action postures and behavioral patterns in basketball or football games, while the CLIP model enhances the system's comprehension of opponent tactical information by establishing semantic associations between images and text. To address potential imbalances and biases between these models, we introduce a cross-modal transfer learning technique that mitigates modal bias issues, thereby enhancing the model's generalization performance on multimodal data.ResultsThrough cross-modal transfer learning, tactical information learned from images by the Swin Transformer is effectively transferred to the CLIP model, providing coaches and athletes with comprehensive tactical insights. Our method is rigorously tested and validated using Sport UV, Sports-1M, HMDB51, and NPU RGB+D datasets. Experimental results demonstrate the system's impressive performance in terms of prediction accuracy, stability, training time, inference time, number of parameters, and computational complexity. Notably, the system outperforms other models, with a remarkable 8.47% lower prediction error (MAE) on the Kinetics dataset, accompanied by a 72.86-second reduction in training time.DiscussionThe presented system proves to be highly suitable for real-time sports competition assistance and analysis, offering a novel and effective approach for an Intelligent Robot Sports Competition Tactical Analysis Model that maximizes the potential of multimodal perception technology. By harnessing the synergies between the Swin Transformer and CLIP models, we address the limitations of traditional methods and significantly advance the field of sports competition analysis. This innovative model opens up new avenues for comprehensive tactical analysis in sports, benefiting coaches, athletes, and sports enthusiasts alike
Caribbean cultural heritage and the nation:Aruba, Bonaire and Curaçao in a regional context
Centuries of intense migrations have deeply impacted expressions of cultural heritage on the ABC islands: Aruba, Bonaire, and Curaçao. This volume queries how cultural heritage on these Dutch Caribbean islands relates to the work of nation building and nation-branding. How does the imagining of a shared political “we” relates to images deliberately produced to market these islands to a world of capital? The contributing authors in this volume address this leading question in their essays that describe and analyze the expressions of the ABC islands. In doing so they compare and contrast nation building and branding on the ABC islands to those taking place in the wider Caribbean. The expressions of cultural heritage discussed range from the importance of sports, music, literature and visual arts to those related to the political economy of tourism, the work of museums, the activism surrounding the question of reparations, and the politics and policies affecting the Caribbean Diasporas in the North Atlantic. This volume adds to the understanding of the dynamics of nation, culture and economy in the Caribbean
Teaching Unknown Objects by Leveraging Human Gaze and Augmented Reality in Human-Robot Interaction
Roboter finden aufgrund ihrer außergewöhnlichen Arbeitsleistung, Präzision, Effizienz und Skalierbarkeit immer mehr Verwendung in den verschiedensten Anwendungsbereichen. Diese Entwicklung wurde zusätzlich begünstigt durch Fortschritte in der Künstlichen Intelligenz (KI), insbesondere im Maschinellem Lernen (ML). Mit Hilfe moderner neuronaler Netze sind Roboter in der Lage, Objekte in ihrer Umgebung zu erkennen und mit ihnen zu interagieren. Ein erhebliches Manko besteht jedoch darin, dass das Training dieser Objekterkennungsmodelle, in aller Regel mit einer zugrundeliegenden Abhängig von umfangreichen Datensätzen und der Verfügbarkeit großer Datenmengen einhergeht. Dies ist insbesondere dann problematisch, wenn der konkrete Einsatzort des Roboters und die Umgebung, einschließlich der darin befindlichen Objekte, nicht im Voraus bekannt sind. Die breite und ständig wachsende Palette von Objekten macht es dabei praktisch unmöglich, das gesamte Spektrum an existierenden Objekten allein mit bereits zuvor erstellten Datensätzen vollständig abzudecken. Das Ziel dieser Dissertation war es, einem Roboter unbekannte Objekte mit Hilfe von Human-Robot Interaction (HRI) beizubringen, um ihn von seiner Abhängigkeit von Daten sowie den Einschränkungen durch vordefinierte Szenarien zu befreien. Die Synergie von Eye Tracking und Augmented Reality (AR) ermöglichte es dem als Lehrer fungierenden Menschen, mit dem Roboter zu kommunizieren und ihn mittels des menschlichen Blickes auf Objekte hinzuweisen. Dieser holistische Ansatz ermöglichte die Konzeption eines multimodalen HRI-Systems, durch das der Roboter Objekte identifizieren und dreidimensional segmentieren konnte, obwohl sie ihm zu diesem Zeitpunkt noch unbekannt waren, um sie anschließend aus unterschiedlichen Blickwinkeln eigenständig zu inspizieren. Anhand der Klasseninformationen, die ihm der Mensch mitteilte, war der Roboter daraufhin in der Lage, die entsprechenden Objekte zu erlernen und später wiederzuerkennen. Mit dem Wissen, das dem Roboter durch diesen auf HRI basierenden Lehrvorgang beigebracht worden war, war dessen Fähigkeit Objekte zu erkennen vergleichbar mit den Fähigkeiten modernster Objektdetektoren, die auf umfangreichen Datensätzen trainiert worden waren. Dabei war der Roboter jedoch nicht auf vordefinierte Klassen beschränkt, was seine Vielseitigkeit und Anpassungsfähigkeit unter Beweis stellte. Die im Rahmen dieser Dissertation durchgeführte Forschung leistete bedeutende Beiträge an der Schnittstelle von Machine Learning (ML), AR, Eye Tracking und Robotik. Diese Erkenntnisse tragen nicht nur zum besseren Verständnis der genannten Felder bei, sondern ebnen auch den Weg für weitere interdisziplinäre Forschung. Die in dieser Dissertation enthalten wissenschaftlichen Artikel wurden auf hochrangigen Konferenzen in den Bereichen Robotik, Eye Tracking und HRI veröffentlicht.Robots are becoming increasingly popular in a wide range of environments due to their exceptional work capacity, precision, efficiency, and scalability. This development has been further encouraged by advances in Artificial Intelligence (AI), particularly Machine Learning (ML). By employing sophisticated neural networks, robots are given the ability to detect and interact with objects in their vicinity. However, a significant drawback arises from the underlying dependency on extensive datasets and the availability of substantial amounts of training data for these object detection models. This issue becomes particularly problematic when the specific deployment location of the robot and the surroundings, including the objects within it, are not known in advance. The vast and ever-expanding array of objects makes it virtually impossible to comprehensively cover the entire spectrum of existing objects using preexisting datasets alone. The goal of this dissertation was to teach a robot unknown objects in the context of Human-Robot Interaction (HRI) in order to liberate it from its data dependency, unleashing it from predefined scenarios. In this context, the combination of eye tracking and Augmented Reality (AR) created a powerful synergy that empowered the human teacher to seamlessly communicate with the robot and effortlessly point out objects by means of human gaze. This holistic approach led to the development of a multimodal HRI system that enabled the robot to identify and visually segment the Objects of Interest (OOIs) in three-dimensional space, even though they were initially unknown to it, and then examine them autonomously from different angles. Through the class information provided by the human, the robot was able to learn the objects and redetect them at a later stage. Due to the knowledge gained from this HRI based teaching process, the robot’s object detection capabilities exhibited comparable performance to state-of-the-art object detectors trained on extensive datasets, without being restricted to predefined classes, showcasing its versatility and adaptability. The research conducted within the scope of this dissertation made significant contributions at the intersection of ML, AR, eye tracking, and robotics. These findings not only enhance the understanding of these fields, but also pave the way for further interdisciplinary research. The scientific articles included in this dissertation have been published at high-impact conferences in the fields of robotics, eye tracking, and HRI
- …