9,343 research outputs found
Multimodal music information processing and retrieval: survey and future challenges
Towards improving the performance in various music information processing
tasks, recent studies exploit different modalities able to capture diverse
aspects of music. Such modalities include audio recordings, symbolic music
scores, mid-level representations, motion, and gestural data, video recordings,
editorial or cultural tags, lyrics and album cover arts. This paper critically
reviews the various approaches adopted in Music Information Processing and
Retrieval and highlights how multimodal algorithms can help Music Computing
applications. First, we categorize the related literature based on the
application they address. Subsequently, we analyze existing information fusion
approaches, and we conclude with the set of challenges that Music Information
Retrieval and Sound and Music Computing research communities should focus in
the next years
Virtual to Real Reinforcement Learning for Autonomous Driving
Reinforcement learning is considered as a promising direction for driving
policy learning. However, training autonomous driving vehicle with
reinforcement learning in real environment involves non-affordable
trial-and-error. It is more desirable to first train in a virtual environment
and then transfer to the real environment. In this paper, we propose a novel
realistic translation network to make model trained in virtual environment be
workable in real world. The proposed network can convert non-realistic virtual
image input into a realistic one with similar scene structure. Given realistic
frames as input, driving policy trained by reinforcement learning can nicely
adapt to real world driving. Experiments show that our proposed virtual to real
(VR) reinforcement learning (RL) works pretty well. To our knowledge, this is
the first successful case of driving policy trained by reinforcement learning
that can adapt to real world driving data
Automatic Affine and Elastic Registration Strategies for Multi-dimensional Medical Images
Medical images have been used increasingly for diagnosis, treatment planning, monitoring disease processes, and other medical applications. A large variety of medical imaging modalities exists including CT, X-ray, MRI, Ultrasound, etc. Frequently a group of images need to be compared to one another and/or combined for research or cumulative purposes. In many medical studies, multiple images are acquired from subjects at different times or with different imaging modalities. Misalignment inevitably occurs, causing anatomical and/or functional feature shifts within the images. Computerized image registration (alignment) approaches can offer automatic and accurate image alignments without extensive user involvement and provide tools for visualizing combined images. This dissertation focuses on providing automatic image registration strategies. After a through review of existing image registration techniques, we identified two registration strategies that enhance the current field: (1) an automated rigid body and affine registration using voxel similarity measurements based on a sequential hybrid genetic algorithm, and (2) an automated deformable registration approach based upon a linear elastic finite element formulation. Both methods streamlined the registration process. They are completely automatic and require no user intervention. The proposed registration strategies were evaluated with numerous 2D and 3D MR images with a variety of tissue structures, orientations and dimensions. Multiple registration pathways were provided with guidelines for their applications. The sequential genetic algorithm mimics the pathway of an expert manually doing registration. Experiments demonstrated that the sequential genetic algorithm registration provides high alignment accuracy and is reliable for brain tissues. It avoids local minima/maxima traps of conventional optimization techniques, and does not require any preprocessing such as threshold, smoothing, segmentation, or definition of base points or edges. The elastic model was shown to be highly effective to accurately align areas of interest that are automatically extracted from the images, such as brains. Using a finite element method to get the displacement of each element node by applying a boundary mapping, this method provides an accurate image registration with excellent boundary alignment of each pair of slices and consequently align the entire volume automatically. This dissertation presented numerous volume alignments. Surface geometries were created directly from the aligned segmented images using the Multiple Material Marching Cubes algorithm. Using the proposed registration strategies, multiple subjects were aligned to a standard MRI reference, which is aligned to a segmented reference atlas. Consequently, multiple subjects are aligned to the segmented atlas and a full fMRI analysis is possible
Reference face graph for face recognition
Face recognition has been studied extensively; however, real-world face recognition still remains a challenging task. The demand for unconstrained practical face recognition is rising with the explosion of online multimedia such as social networks, and video surveillance footage where face analysis is of significant importance. In this paper, we approach face recognition in the context of graph theory. We recognize an unknown face using an external reference face graph (RFG). An RFG is generated and recognition of a given face is achieved by comparing it to the faces in the constructed RFG. Centrality measures are utilized to identify distinctive faces in the reference face graph. The proposed RFG-based face recognition algorithm is robust to the changes in pose and it is also alignment free. The RFG recognition is used in conjunction with DCT locality sensitive hashing for efficient retrieval to ensure scalability. Experiments are conducted on several publicly available databases and the results show that the proposed approach outperforms the state-of-the-art methods without any preprocessing necessities such as face alignment. Due to the richness in the reference set construction, the proposed method can also handle illumination and expression variation
- …