781 research outputs found
Computational creativity: an interdisciplinary approach to sequential learning and creative generations
Creativity seems mysterious; when we experience a creative spark, it is difficult to explain how we got that idea, and we often recall notions like ``inspiration" and ``intuition" when we try to explain the phenomenon. The fact that we are clueless about how a creative idea manifests itself does not necessarily imply that a scientific explanation cannot exist. We are unaware of how we perform certain tasks, such as biking or language understanding, but we have more and more computational techniques that can replicate and hopefully explain such activities.
We should understand that every creative act is a fruit of experience, society, and culture. Nothing comes from nothing. Novel ideas are never utterly new; they stem from representations that are already in mind. Creativity involves establishing new relations between pieces of information we had already: then, the greater the knowledge, the greater the possibility of finding uncommon connections, and the more the potential to be creative.
In this vein, a beneficial approach to a better understanding of creativity must include computational or mechanistic accounts of such inner procedures and the formation of the knowledge that enables such connections. That is the aim of Computational Creativity: to develop computational systems for emulating and studying creativity.
Hence, this dissertation focuses on these two related research areas: discussing computational mechanisms to generate creative artifacts and describing some implicit cognitive processes that can form the basis for creative thoughts
Recommended from our members
Machine learning based small bowel video capsule endoscopy analysis: Challenges and opportunities
YesVideo capsule endoscopy (VCE) is a revolutionary technology for the early diagnosis of gastric disorders. However, owing to the high redundancy and subtle manifestation of anomalies among thousands of frames, the manual construal of VCE videos requires considerable patience, focus, and time. The automatic analysis of these videos using computational methods is a challenge as the capsule is untamed in motion and captures frames inaptly. Several machine learning (ML) methods, including recent deep convolutional neural networks approaches, have been adopted after evaluating their potential of improving the VCE analysis. However, the clinical impact of these methods is yet to be investigated. This survey aimed to highlight the gaps between existing ML-based research methodologies and clinically significant rules recently established by gastroenterologists based on VCE. A framework for interpreting raw frames into contextually relevant frame-level findings and subsequently merging these findings with meta-data to obtain a disease-level diagnosis was formulated. Frame-level findings can be more intelligible for discriminative learning when organized in a taxonomical hierarchy. The proposed taxonomical hierarchy, which is formulated based on pathological and visual similarities, may yield better classification metrics by setting inference classes at a higher level than training classes. Mapping from the frame level to the disease level was structured in the form of a graph based on clinical relevance inspired by the recent international consensus developed by domain experts. Furthermore, existing methods for VCE summarization, classification, segmentation, detection, and localization were critically evaluated and compared based on aspects deemed significant by clinicians. Numerous studies pertain to single anomaly detection instead of a pragmatic approach in a clinical setting. The challenges and opportunities associated with VCE analysis were delineated. A focus on maximizing the discriminative power of features corresponding to various subtle lesions and anomalies may help cope with the diverse and mimicking nature of different VCE frames. Large multicenter datasets must be created to cope with data sparsity, bias, and class imbalance. Explainability, reliability, traceability, and transparency are important for an ML-based diagnostics system in a VCE. Existing ethical and legal bindings narrow the scope of possibilities where ML can potentially be leveraged in healthcare. Despite these limitations, ML based video capsule endoscopy will revolutionize clinical practice, aiding clinicians in rapid and accurate diagnosis
End-to-end Autonomous Driving: Challenges and Frontiers
The autonomous driving community has witnessed a rapid growth in approaches
that embrace an end-to-end algorithm framework, utilizing raw sensor input to
generate vehicle motion plans, instead of concentrating on individual tasks
such as detection and motion prediction. End-to-end systems, in comparison to
modular pipelines, benefit from joint feature optimization for perception and
planning. This field has flourished due to the availability of large-scale
datasets, closed-loop evaluation, and the increasing need for autonomous
driving algorithms to perform effectively in challenging scenarios. In this
survey, we provide a comprehensive analysis of more than 250 papers, covering
the motivation, roadmap, methodology, challenges, and future trends in
end-to-end autonomous driving. We delve into several critical challenges,
including multi-modality, interpretability, causal confusion, robustness, and
world models, amongst others. Additionally, we discuss current advancements in
foundation models and visual pre-training, as well as how to incorporate these
techniques within the end-to-end driving framework. To facilitate future
research, we maintain an active repository that contains up-to-date links to
relevant literature and open-source projects at
https://github.com/OpenDriveLab/End-to-end-Autonomous-Driving
Real-time motion capture and game engine technologies in contemporary dance
This Master of Arts thesis is made for the New Media study programme in the School of Arts, Design and Architecture at Aalto University under the supervision of Matti Niinimäki and with advising from Nuno Antonio Do Nascimento Correia and Teemu Määttänen.
This study is focusing on the topic of real-time motion capture and game engine technologies in contemporary dance with the goal to discover how these technologies can augment contemporary dance, both in the visual and audio domains, in a way in which sound, visuals, and choreography influence one another.
The methods being used to achieve this goal include devising mixed reality audiovisual dance performance, as a part of practice-based research methodology, related work review as well as an interview with a field expert.
Although the topic of motion capture in contemporary dance is fairly well-researched there is a clear shortage of studies on the ways game engines could be utilized in this segment of art and even less studies are conducted on modern hybrid club music and its influence on contemporary dance. Current research fills these gaps.
This study includes a brief overview of Dance and Technology art movement, elucidates motion capture and game engine technologies as well as attempts to define modern hybrid club music. It covers a broad selection of case studies from contemporary dance segment related to each category as well as the writer’s own perspective and experience with motion capture and modern hybrid club music. Furthermore, this research includes an interview with pioneering virtual performer, Sam Rolfes, who is actively using real-time motion capture, game engines, and other real-time tools in his artistic practice and finally, it explains in great detail the whole design process behind the mixed real-ity audio-visual dance performance piece "ROCK/STAR Vol.1", an artistic component of this research.
Using various game engine technologies together with the real-time motion captured data can help to establish a greater connection between different artistic domains of the performance, as well as provide a much stronger feeling of a world and a story for the performer who is wearing a suit. The ability to execute things in real-time, that this tech is offering makes it possible for performers to respond to one another, as well as the audience and the current moment in time, thus embracing and crystallizing the originality and specificity of the moment.Media files notes:
Fragment of "ROCK/STAR Vol.1" artistic component of this research
Description:
Video recording of premiere of mixed reality audiovisual dance performance "ROCK/STAR Vol.1" that took place on 20th of May 2023 in Odeion Screening Auditorium in Otaniemi.(Fragment)
Media rights: CC-BY-NC-ND 4.
Data simulation in deep learning-based human recognition
Human recognition is an important part of perception systems, such as those used in autonomous vehicles or robots. These systems often use deep neural networks for this purpose, which rely on large amounts of data that ideally cover various situations, movements, visual appearances, and interactions. However, obtaining such data is typically complex and expensive. In addition to raw data, labels are required to create training data for supervised learning. Thus, manual annotation of bounding boxes, keypoints, orientations, or actions performed is frequently necessary. This work addresses whether the laborious acquisition and creation of data can be simplified through targeted simulation. If data are generated in a simulation, information such as positions, dimensions, orientations, surfaces, and occlusions are already known, and appropriate labels can be generated automatically. A key question is whether deep neural networks, trained with simulated data, can be applied to real data. This work explores the use of simulated training data using examples from the field of pedestrian detection for autonomous vehicles. On the one hand, it is shown how existing systems can be improved by targeted retraining with simulation data, for example to better recognize corner cases. On the other hand, the work focuses on the generation of data that hardly or not occur at all in real standard datasets. It will be demonstrated how training data can be generated by targeted acquisition and combination of motion data and 3D models, which contain finely graded action labels to recognize even complex pedestrian situations. Through the diverse annotation data that simulations provide, it becomes possible to train deep neural networks for a wide variety of tasks with one dataset. In this work, such simulated data is used to train a novel deep multitask network that brings together diverse, previously mostly independently considered but related, tasks such as 2D and 3D human pose recognition and body and orientation estimation
Designing a Simulation showcasing the Pharmacological Effects of Beta-2-Agonists in Asthma Treatment; Virtual Reality as a supplement to traditional teaching methods
As educational technology evolves, there is a growing interest in applying VR in teaching complex scientific concepts that benefit from a visual and immersive learning environment. Motivated by the promising results of VR in medical education across multiple disciplines, we aimed to investigate the applicability and effectiveness of this technology in pharmacology education. This discipline, which involves understanding how drugs work within the human body, is often considered complex and challenging for students. However, it is a critical component of medical education and is essential in treating and preventing various diseases. The study was driven by two research inquiries. The primary inquiry aimed to explore the potential design possibilities of a virtual reality (VR) simulation for visualizing the pharmacological effects of beta-2-agonists in asthma treatment. The secondary question focused on evaluating the perspectives of students and educators regarding the efficacy of the VR application in learning pharmacology concepts compared to conventional teaching approaches. The application underwent two rounds of evaluation sessions with both students and teachers. Participants responded positively to the immersive learning experience, particularly appreciating the detailed visualizations and interactivity offered by the VR application. Their feedback highlighted the potential of VR to create a more intuitive understanding of complex pharmacological processes. Despite the evaluation phase featuring a limited number of participants, the received feedback suggested a promising potential for VR as an additional tool. The study, therefore, serves as a proof of concept, showcasing the possibilities of VR in enhancing pharmacology education and paving the way for future research and development in this area.Masteroppgave i Programvareutvikling samarbeid med HVLPROG399MAMN-PRO
Interdisciplinarity in the Age of the Triple Helix: a Film Practitioner's Perspective
This integrative chapter contextualises my research including articles I have published as well as one of the creative artefacts developed from it, the feature film The Knife That Killed Me. I review my work considering the ways in which technology, industry methods and academic practice have evolved as well as how attitudes to interdisciplinarity have changed, linking these to Etzkowitz and Leydesdorff’s ‘Triple Helix’ model (1995). I explore my own experiences and observations of opportunities and challenges that have been posed by the intersection of different stakeholder needs and expectations, both from industry and academic perspectives, and argue that my work provides novel examples of the applicability of the ‘Triple Helix’ to the creative industries. The chapter concludes with a reflection on the evolution and direction of my work, the relevance of the ‘Triple Helix’ to creative practice, and ways in which this relationship could be investigated further
From visuomotor control to latent space planning for robot manipulation
Deep visuomotor control is emerging as an active research area for robot manipulation. Recent advances in learning sensory and motor systems in an end-to-end manner have achieved remarkable performance across a range of complex tasks. Nevertheless, a few limitations restrict visuomotor control from being more widely adopted as the de facto choice when facing a manipulation task on a real robotic platform. First, imitation learning-based visuomotor control approaches tend to suffer from the inability to recover from an out-of-distribution state caused by compounding errors. Second, the lack of versatility in task definition limits skill generalisability. Finally, the training data acquisition process and domain transfer are often impractical. In this thesis, individual solutions are proposed to address each of these issues.
In the first part, we find policy uncertainty to be an effective indicator of potential failure cases, in which the robot is stuck in out-of-distribution states. On this basis, we introduce a novel uncertainty-based approach to detect potential failure cases and a recovery strategy based on action-conditioned uncertainty predictions. Then, we propose to employ visual dynamics approximation to our model architecture to capture the motion of the robot arm instead of the static scene background, making it possible to learn versatile skill primitives. In the second part, taking inspiration from the recent progress in latent space planning, we propose a gradient-based optimisation method operating within the latent space of a deep generative model for motion planning. Our approach bypasses the traditional computational challenges encountered by established planning algorithms, and has the capability to specify novel constraints easily and handle multiple constraints simultaneously. Moreover, the training data comes from simple random motor-babbling of kinematically feasible robot states. Our real-world experiments further illustrate that our latent space planning approach can handle both open and closed-loop planning in challenging environments such as heavily cluttered or dynamic scenes. This leads to the first, to our knowledge, closed-loop motion planning algorithm that can incorporate novel custom constraints, and lays the foundation for more complex manipulation tasks
- …