8,518 research outputs found
An Adaptive Penalty Approach to Multi-Pitch Estimation
This work treats multi-pitch estimation, and in particular the common misclassification issue wherein the pitch at half of the true fundamental frequency, here referred to as a sub-octave, is chosen instead of the true pitch. Extending on current methods which use an extension of the Group LASSO for pitch estimation, this work introduces an adaptive total variation penalty, which both enforce group- and block sparsity, and deal with errors due to sub-octaves. The method is shown to outperform current state-of-the-art sparse methods, where the model orders are unknown, while also requiring fewer tuning parameters than these. The method is also shown to outperform several conventional pitch estimation methods, even when these are virtued with oracle model orders
Aerial navigation in obstructed environments with embedded nonlinear model predictive control
We propose a methodology for autonomous aerial navigation and obstacle
avoidance of micro aerial vehicles (MAV) using nonlinear model predictive
control (NMPC) and we demonstrate its effectiveness with laboratory
experiments. The proposed methodology can accommodate obstacles of arbitrary,
potentially non-convex, geometry. The NMPC problem is solved using PANOC: a
fast numerical optimization method which is completely matrix-free, is not
sensitive to ill conditioning, involves only simple algebraic operations and is
suitable for embedded NMPC. A C89 implementation of PANOC solves the NMPC
problem at a rate of 20Hz on board a lab-scale MAV. The MAV performs smooth
maneuvers moving around an obstacle. For increased autonomy, we propose a
simple method to compensate for the reduction of thrust over time, which comes
from the depletion of the MAV's battery, by estimating the thrust constant
Deep-sea image processing
High-resolution seafloor mapping often requires optical methods of sensing, to confirm interpretations made from sonar data. Optical digital imagery of seafloor sites can now provide very high resolution and also provides additional cues, such as color information for sediments, biota and divers rock types. During the cruise AT11-7 of the Woods Hole Oceanographic Institution (WHOI) vessel R/V Atlantis (February 2004, East Pacific Rise) visual imagery was acquired from three sources: (1) a digital still down-looking camera mounted on the submersible Alvin, (2) observer-operated 1-and 3-chip video cameras with tilt and pan capabilities mounted on the front of Alvin, and (3) a digital still camera on the WHOI TowCam (Fornari, 2003). Imagery from the first source collected on a previous cruise (AT7-13) to the Galapagos Rift at 86°W was successfully processed and mosaicked post-cruise, resulting in a single image covering area of about 2000 sq.m, with the resolution of 3 mm per pixel (Rzhanov et al., 2003). This paper addresses the issues of the optimal acquisition of visual imagery in deep-seaconditions, and requirements for on-board processing. Shipboard processing of digital imagery allows for reviewing collected imagery immediately after the dive, evaluating its importance and optimizing acquisition parameters, and augmenting acquisition of data over specific sites on subsequent dives.Images from the deepsea power and light (DSPL) digital camera offer the best resolution (3.3 Mega pixels) and are taken at an interval of 10 seconds (determined by the strobe\u27s recharge rate). This makes images suitable for mosaicking only when Alvin moves slowly (≪1/4 kt), which is not always possible for time-critical missions. Video cameras provided a source of imagery more suitable for mosaicking, despite its inferiority in resolution. We discuss required pre-processing and imageenhancement techniques and their influence on the interpretation of mosaic content. An algorithm for determination of camera tilt parameters from acquired imagery is proposed and robustness conditions are discussed
General highlight detection in sport videos
Attention is a psychological measurement of human reflection against stimulus. We propose a general framework of highlight detection by comparing attention intensity during the watching of sports videos. Three steps are involved: adaptive selection on salient features, unified attention estimation and highlight identification. Adaptive selection computes feature correlation to decide an optimal set of salient features. Unified estimation combines these features by the technique of multi-resolution autoregressive (MAR) and thus creates a temporal curve of attention intensity. We rank the intensity of attention to discriminate boundaries of highlights. Such a framework alleviates semantic uncertainty around sport highlights and leads to an efficient and effective highlight detection. The advantages are as follows: (1) the capability of using data at coarse temporal resolutions; (2) the robustness against noise caused by modality asynchronism, perception uncertainty and feature mismatch; (3) the employment of Markovian constrains on content presentation, and (4) multi-resolution estimation on attention intensity, which enables the precise allocation of event boundaries
Sparse Modeling of Grouped Line Spectra
This licentiate thesis focuses on clustered parametric models for estimation of line spectra, when the spectral content of a signal source is assumed to exhibit some form of grouping. Different from previous parametric approaches, which generally require explicit knowledge of the model orders, this thesis exploits sparse modeling, where the orders are implicitly chosen. For line spectra, the non-linear parametric model is approximated by a linear system, containing an overcomplete basis of candidate frequencies, called a dictionary, and a large set of linear response variables that selects and weights the components in the dictionary. Frequency estimates are obtained by solving a convex optimization program, where the sum of squared residuals is minimized. To discourage overfitting and to infer certain structure in the solution, different convex penalty functions are introduced into the optimization. The cost trade-off between fit and penalty is set by some user parameters, as to approximate the true number of spectral lines in the signal, which implies that the response variable will be sparse, i.e., have few non-zero elements. Thus, instead of explicit model orders, the orders are implicitly set by this trade-off. For grouped variables, the dictionary is customized, and appropriate convex penalties selected, so that the solution becomes group sparse, i.e., has few groups with non-zero variables. In an array of sensors, the specific time-delays and attenuations will depend on the source and sensor positions. By modeling this, one may estimate the location of a source. In this thesis, a novel joint location and grouped frequency estimator is proposed, which exploits sparse modeling for both spectral and spatial estimates, showing robustness against sources with overlapping frequency content. For audio signals, this thesis uses two different features for clustering. Pitch is a perceptual property of sound that may be described by the harmonic model, i.e., by a group of spectral lines at integer multiples of a fundamental frequency, which we estimate by exploiting a novel adaptive total variation penalty. The other feature, chroma, is a concept in musical theory, collecting pitches at powers of 2 from each other into groups. Using a chroma dictionary, together with appropriate group sparse penalties, we propose an automatic transcription of the chroma content of a signal
Growing Regression Forests by Classification: Applications to Object Pose Estimation
In this work, we propose a novel node splitting method for regression trees
and incorporate it into the regression forest framework. Unlike traditional
binary splitting, where the splitting rule is selected from a predefined set of
binary splitting rules via trial-and-error, the proposed node splitting method
first finds clusters of the training data which at least locally minimize the
empirical loss without considering the input space. Then splitting rules which
preserve the found clusters as much as possible are determined by casting the
problem into a classification problem. Consequently, our new node splitting
method enjoys more freedom in choosing the splitting rules, resulting in more
efficient tree structures. In addition to the Euclidean target space, we
present a variant which can naturally deal with a circular target space by the
proper use of circular statistics. We apply the regression forest employing our
node splitting to head pose estimation (Euclidean target space) and car
direction estimation (circular target space) and demonstrate that the proposed
method significantly outperforms state-of-the-art methods (38.5% and 22.5%
error reduction respectively).Comment: Paper accepted by ECCV 201
A Parametric Method for Multi-Pitch Estimation
This thesis proposes a novel method for multi-pitch estimation. The method operates by posing pitch estimation as a sparse recovery problem which is solved using convex optimization techniques. In that respect, it is an extension of an earlier presented estimation method based on the group-LASSO. However, by introducing an adaptive total variation penalty, the proposed method requires fewer user supplied parameters, thereby simplifying the estimation procedure. The method is shown to have comparable to superior performance in low noise environments when compared to three standard multi-pitch estimation methods as well as the predecessor method. Also presented is a scheme for automatic selection of the regularization parameters, thereby making the method more user friendly. Used together with this scheme, the proposed method is shown to yield accurate, although not statistically efficent, pitch Estimates when evaluated on synthetic speech data
Wavenet based low rate speech coding
Traditional parametric coding of speech facilitates low rate but provides
poor reconstruction quality because of the inadequacy of the model used. We
describe how a WaveNet generative speech model can be used to generate high
quality speech from the bit stream of a standard parametric coder operating at
2.4 kb/s. We compare this parametric coder with a waveform coder based on the
same generative model and show that approximating the signal waveform incurs a
large rate penalty. Our experiments confirm the high performance of the WaveNet
based coder and show that the speech produced by the system is able to
additionally perform implicit bandwidth extension and does not significantly
impair recognition of the original speaker for the human listener, even when
that speaker has not been used during the training of the generative model.Comment: 5 pages, 2 figure
- …