2,268 research outputs found
Elicitation of expert knowledge to inform object-based audio rendering to different systems
Object-based audio presents the opportunity to optimise audio reproduction for different listening scenarios. Vector base amplitude panning (VBAP) is typically used to render object-based scenes. Optimizing this process based on knowledge of the perception and practices of experts could result in significant improvements to the end user's listening experience. An experiment was conducted to investigate how content creators perceive changes in the perceptual attributes of the same content rendered to systems with different numbers of channels, and to determine what they would do differently to standard VBAP and matrix based downmixes to minimize these changes. Text mining and clustering of the content creators' responses revealed 6 general mix processes: the spatial spread of individual objects, EQ and processing, reverberation, position, bass, and level. Logistic regression models show the relationships between the mix processes, perceived changes in perceptual attributes, and the rendering method/speaker layout. The relative frequency of use for the different mix processes was found to differ between categories of audio object suggesting that any downmix rules should be object category specific. These results give insight into how object-based audio can be used to improve listener experience and provide the first template for doing this across different reproduction systems
Video browsing interfaces and applications: a review
We present a comprehensive review of the state of the art in video browsing and retrieval systems, with special emphasis on interfaces and applications. There has been a significant increase in activity (e.g., storage, retrieval, and sharing) employing video data in the past decade, both for personal and professional use. The ever-growing amount of video content available for human consumption and the inherent characteristics of video dataāwhich, if presented in its raw format, is rather unwieldy and costlyāhave become driving forces for the development of more effective solutions to present video contents and allow rich user interaction. As a result, there are many contemporary research efforts toward developing better video browsing solutions, which we summarize. We review more than 40 different video browsing and retrieval interfaces and classify them into three groups: applications that use video-player-like interaction, video retrieval applications, and browsing solutions based on video surrogates. For each category, we present a summary of existing work, highlight the technical aspects of each solution, and compare them against each other
Recommended from our members
State-of-the-art on research and applications of machine learning in the building life cycle
Fueled by big data, powerful and affordable computing resources, and advanced algorithms, machine learning has been explored and applied to buildings research for the past decades and has demonstrated its potential to enhance building performance. This study systematically surveyed how machine learning has been applied at different stages of building life cycle. By conducting a literature search on the Web of Knowledge platform, we found 9579 papers in this field and selected 153 papers for an in-depth review. The number of published papers is increasing year by year, with a focus on building design, operation, and control. However, no study was found using machine learning in building commissioning. There are successful pilot studies on fault detection and diagnosis of HVAC equipment and systems, load prediction, energy baseline estimate, load shape clustering, occupancy prediction, and learning occupant behaviors and energy use patterns. None of the existing studies were adopted broadly by the building industry, due to common challenges including (1) lack of large scale labeled data to train and validate the model, (2) lack of model transferability, which limits a model trained with one data-rich building to be used in another building with limited data, (3) lack of strong justification of costs and benefits of deploying machine learning, and (4) the performance might not be reliable and robust for the stated goals, as the method might work for some buildings but could not be generalized to others. Findings from the study can inform future machine learning research to improve occupant comfort, energy efficiency, demand flexibility, and resilience of buildings, as well as to inspire young researchers in the field to explore multidisciplinary approaches that integrate building science, computing science, data science, and social science
AutoEncoding Tree for City Generation and Applications
City modeling and generation have attracted an increased interest in various
applications, including gaming, urban planning, and autonomous driving. Unlike
previous works focused on the generation of single objects or indoor scenes,
the huge volumes of spatial data in cities pose a challenge to the generative
models. Furthermore, few publicly available 3D real-world city datasets also
hinder the development of methods for city generation. In this paper, we first
collect over 3,000,000 geo-referenced objects for the city of New York, Zurich,
Tokyo, Berlin, Boston and several other large cities. Based on this dataset, we
propose AETree, a tree-structured auto-encoder neural network, for city
generation. Specifically, we first propose a novel Spatial-Geometric Distance
(SGD) metric to measure the similarity between building layouts and then
construct a binary tree over the raw geometric data of building based on the
SGD metric. Next, we present a tree-structured network whose encoder learns to
extract and merge spatial information from bottom-up iteratively. The resulting
global representation is reversely decoded for reconstruction or generation. To
address the issue of long-dependency as the level of the tree increases, a Long
Short-Term Memory (LSTM) Cell is employed as a basic network element of the
proposed AETree. Moreover, we introduce a novel metric, Overlapping Area Ratio
(OAR), to quantitatively evaluate the generation results. Experiments on the
collected dataset demonstrate the effectiveness of the proposed model on 2D and
3D city generation. Furthermore, the latent features learned by AETree can
serve downstream urban planning applications
S-Graphs+: Real-time Localization and Mapping leveraging Hierarchical Representations
In this paper, we present an evolved version of the Situational Graphs, which
jointly models in a single optimizable factor graph, a SLAM graph, as a set of
robot keyframes, containing its associated measurements and robot poses, and a
3D scene graph, as a high-level representation of the environment that encodes
its different geometric elements with semantic attributes and the relational
information between those elements. Our proposed S-Graphs+ is a novel
four-layered factor graph that includes: (1) a keyframes layer with robot pose
estimates, (2) a walls layer representing wall surfaces, (3) a rooms layer
encompassing sets of wall planes, and (4) a floors layer gathering the rooms
within a given floor level. The above graph is optimized in real-time to obtain
a robust and accurate estimate of the robot's pose and its map, simultaneously
constructing and leveraging the high-level information of the environment. To
extract such high-level information, we present novel room and floor
segmentation algorithms utilizing the mapped wall planes and free-space
clusters. We tested S-Graphs+ on multiple datasets including, simulations of
distinct indoor environments, on real datasets captured over several
construction sites and office environments, and on a real public dataset of
indoor office environments. S-Graphs+ outperforms relevant baselines in the
majority of the datasets while extending the robot situational awareness by a
four-layered scene model. Moreover, we make the algorithm available as a docker
file.Comment: 8 Pages, 7 Figures, 3 Table
User-centered visual analysis using a hybrid reasoning architecture for intensive care units
One problem pertaining to Intensive Care Unit information systems is that, in some cases, a very dense display of data can result. To ensure the overview and readability of the increasing volumes of data, some special features are required (e.g., data prioritization, clustering, and selection mechanisms) with the application of analytical methods (e.g., temporal data abstraction, principal component analysis, and detection of events). This paper addresses the problem of improving the integration of the visual and analytical methods applied to medical monitoring systems. We present a knowledge- and machine learning-based approach to support the knowledge discovery process with appropriate analytical and visual methods. Its potential benefit to the development of user interfaces for intelligent monitors that can assist with the detection and explanation of new, potentially threatening medical events. The proposed hybrid reasoning architecture provides an interactive graphical user interface to adjust the parameters of the analytical methods based on the users' task at hand. The action sequences performed on the graphical user interface by the user are consolidated in a dynamic knowledge base with specific hybrid reasoning that integrates symbolic and connectionist approaches. These sequences of expert knowledge acquisition can be very efficient for making easier knowledge emergence during a similar experience and positively impact the monitoring of critical situations. The provided graphical user interface incorporating a user-centered visual analysis is exploited to facilitate the natural and effective representation of clinical information for patient care
- ā¦