617 research outputs found

    Beyond 2D-grids: a dependence maximization view on image browsing

    Get PDF
    Ideally, one would like to perform image search using an intuitive and friendly approach. Many existing image search engines, however, present users with sets of images arranged in some default order on the screen, typically the relevance to a query, only. While this certainly has its advantages, arguably, a more flexible and intuitive way would be to sort images into arbitrary structures such as grids, hierarchies, or spheres so that images that are visually or semantically alike are placed together. This paper focuses on designing such a navigation system for image browsers. This is a challenging task because arbitrary layout structure makes it difficult -- if not impossible -- to compute cross-similarities between images and structure coordinates, the main ingredient of traditional layouting approaches. For this reason, we resort to a recently developed machine learning technique: kernelized sorting. It is a general technique for matching pairs of objects from different domains without requiring cross-domain similarity measures and hence elegantly allows sorting images into arbitrary structures. Moreover, we extend it so that some images can be preselected for instance forming the tip of the hierarchy allowing to subsequently navigate through the search results in the lower levels in an intuitive way

    On Classification with Bags, Groups and Sets

    Full text link
    Many classification problems can be difficult to formulate directly in terms of the traditional supervised setting, where both training and test samples are individual feature vectors. There are cases in which samples are better described by sets of feature vectors, that labels are only available for sets rather than individual samples, or, if individual labels are available, that these are not independent. To better deal with such problems, several extensions of supervised learning have been proposed, where either training and/or test objects are sets of feature vectors. However, having been proposed rather independently of each other, their mutual similarities and differences have hitherto not been mapped out. In this work, we provide an overview of such learning scenarios, propose a taxonomy to illustrate the relationships between them, and discuss directions for further research in these areas

    Experimenting Liver Fibrosis Diagnostic by Two Photon Excitation Microscopy and Bag-of-Features Image Classification

    Get PDF
    The accurate staging of liver fibrosis is of paramount importance to determine the state of disease progression, therapy responses, and to optimize disease treatment strategies. Non-linear optical microscopy techniques such as two-photon excitation fluorescence (TPEF) and second harmonic generation (SHG) can image the endogenous signals of tissue structures and can be used for fibrosis assessment on non-stained tissue samples. While image analysis of collagen in SHG images was consistently addressed until now, cellular and tissue information included in TPEF images, such as inflammatory and hepatic cell damage, equally important as collagen deposition imaged by SHG, remain poorly exploited to date. We address this situation by experimenting liver fibrosis quantification and scoring using a combined approach based on TPEF liver surface imaging on a Thioacetamide-induced rat model and a gradient based Bag-of-Features (BoF) image classification strategy. We report the assessed performance results and discuss the influence of specific BoF parameters to the performance of the fibrosis scoring framework.Romania. Executive Agency for Higher Education, Research, Development and Innovation Funding (research grant PN-II-PT-PCCA-2011-3.2-1162)Rectors' Conference of the Swiss Universities (SCIEX NMS-CH research fellowship nr. 12.135)Singapore. Agency for Science, Technology and Research (R-185-000-182-592)Singapore. Biomedical Research CouncilInstitute of Bioengineering and Nanotechnology (Singapore)Singapore-MIT Alliance (Computational and Systems Biology Flagship Project funding (C-382-641-001-091))Singapore-MIT Alliance for Research and Technology (SMART BioSyM and Mechanobiology Institute of Singapore (R-714-001-003-271)

    Advanced algorithms for audio and image processing

    Get PDF
    The objective of the thesis is the development of a set of innovative algorithms around the topic of beamforming in the field of acoustic imaging, audio and image processing, aimed at significantly improving the performance of devices that exploit these computational approaches. Therefore the context is the improvement of devices (ultrasound machines and video/audio devices) already on the market or the development of new ones which, through the proposed studies, can be introduced on new the markets with the launch of innovative high-tech start-ups. This is the motivation and the leitmotiv behind the doctoral work carried out. In fact, in the first part of the work an innovative image reconstruction algorithm in the field of ultrasound biomedical imaging is presented, which is connected to the development of such equipment that exploits the computing opportunities currently offered nowadays at low cost by GPUs (Moore\u2019s law). The proposed target is to obtain a new pipeline of the reconstruction of the image abandoning the architecture of such hardware based In the first part of the thesis I faced the topic of the reconstruction of ultrasound images for applications hypothesized on a software based device through image reconstruction algorithms processed in the frequency domain. An innovative beamforming algorithm based on seismic migration is presented, in which a transformation of the RF data is carried out and the reconstruction algorithm can evaluate a masking of the k-space of the data, speeding up the reconstruction process and reducing the computational burden. The analysis and development of the algorithms responsible for carrying out the thesis has been approached from a feasibility point in an off-line context and on the Matlab platform, processing both synthetic simulated generated data and real RF data: the subsequent development of these algorithms within of the future ultrasound biomedical equipment will exploit an high-performance computing framework capable of processing customized kernel pipelines (henceforth called \u2019filters\u2019) on CPU/GPU. The type of filters implemented involved the topic of Plane Wave Imaging (PWI), an alternative method of acquiring the ultrasound image compared to the state of the art of the traditional standard B-mode which currently exploit sequential sequence of insonification of the sample under examination through focused beams transmitted by the probe channels. The PWI mode is interesting and opens up new scenarios compared to the usual signal acquisition and processing techniques, with the aim of making signal processing in general and image reconstruction in particular faster and more flexible, and increasing importantly the frame rate opens up and improves clinical applications. The innovative idea is to introduce in an offline seismic reconstruction algorithm for ultrasound imaging a further filter, named masking matrix. The masking matrices can be computed offline knowing the system parameters, since they do not depend from acquired data. Moreover, they can be pre-multiplied to propagation matrices, without affecting the overall computational load. Subsequently in the thesis, the topic of beamforming in audio processing on super-direct linear arrays of microphones is addressed. The aim is to make an in depth analysis of two main families of data-independent approaches and algorithms present in the literature by comparing their performances and the trade-off between directivity and frequency invariance, which is not yet known at to the state-of-the-art. The goal is to validate the best algorithm that allows, from the perspective of an implementation, to experimentally verify performance, correlating it with the characteristics and error statistics. Frequency-invariant beam patterns are often required by systems using an array of sensors to process broadband signals. In some experimental conditions, the array spatial aperture is shorter than the involved wavelengths. In these conditions, superdirective beamforming is essential for an efficient system. I present a comparison between two methods that deal with a data-independent beamformer based on a filter-and-sum structure. Both methods (the first one numerical, the second one analytic) formulate a mathematical convex minimization problem, in which the variables to be optimized are the filters coefficients or frequency responses. In the described simulations, I have chosen a geometry and a set-up of parameters that allows us to make a fair comparison between the performances of the two different design methods analyzed. In particular, I addressed a small linear array for audio capture with different purposes (hearing aids, audio surveillance system, video-conference system, multimedia device, etc.). The research activity carried out has been used for the launch of a high-tech device through an innovative start-up in the field of glasses/audio devices (https://acoesis.com/en/). It has been proven that the proposed algorithm gives the possibility of obtaining higher performances than the state of the art of similar algorithms, additionally providing the possibility of connecting directivity or better generalized directivity to the statistics of phase errors and gain of sensors, extremely important in superdirective arrays in the case of real and industrial implementation. Therefore, the method proposed by the comparison is innovative because it quantitatively links the physical construction characteristics of the array to measurable and experimentally verifiable quantities, making the real implementation process controllable. The third topic faced is the reconstruction of the Room Impluse Response (RIR) using audio processing blind methods. Given an unknown audio source, the estimation of time differences-of-arrivals (TDOAs) can be efficiently and robustly solved using blind channel identification and exploiting the cross-correlation identity (CCI). Prior blind works have improved the estimate of TDOAs by means of different algorithmic solutions and optimization strategies, while always sticking to the case N = 2 microphones. But what if we can obtain a direct improvement in performance by just increasing N? In the fourth Chapter I tried to investigate this direction, showing that, despite the arguable simplicity, this is capable of (sharply) improving upon state-of-the-art blind channel identification methods based on CCI, without modifying the computational pipeline. Inspired by our results, we seek to warm up the community and the practitioners by paving the way (with two concrete, yet preliminary, examples) towards joint approaches in which advances in the optimization are combined with an increased number of microphones, in order to achieve further improvements. Sound source localisation applications can be tackled by inferring the time-difference-of-arrivals (TDOAs) between a sound-emitting source and a set of microphones. Among the referred applications, one can surely list room-aware sound reproduction, room geometry\u2019s estimation, speech enhancement. Despite a broad spectrum of prior works estimate TDOAs from a known audio source, even when the signal emitted from the acoustic source is unknown, TDOAs can be inferred by comparing the signals received at two (or more) spatially separated microphones, using the notion of cross-corrlation identity (CCI). This is the key theoretical tool, not only, to make the ordering of microphones irrelevant during the acquisition stage, but also to solve the problem as blind channel identification, robustly and reliably inferring TDOAs from an unknown audio source. However, when dealing with natural environments, such \u201cmutual agreement\u201d between microphones can be tampered by a variety of audio ambiguities such as ambient noise. Furthermore, each observed signal may contain multiple distorted or delayed replicas of the emitting source due to reflections or generic boundary effects related to the (closed) environment. Thus, robustly estimating TDOAs is surely a challenging problem and CCI-based approaches cast it as single-input/multi-output blind channel identification. Such methods promote robustness in the estimate from the methodological standpoint: using either energy-based regularization, sparsity or positivity constraints, while also pre-conditioning the solution space. Last but not least, the Acoustic Imaging is an imaging modality that exploits the propagation of acoustic waves in a medium to recover the spatial distribution and intensity of sound sources in a given region. Well known and widespread acoustic imaging applications are, for example, sonar and ultrasound. There are active and passive imaging devices: in the context of this thesis I consider a passive imaging system called Dual Cam that does not emit any sound but acquires it from the environment. In an acoustic image each pixel corresponds to the sound intensity of the source, the whose position is described by a particular pair of angles and, in the case in which the beamformer can, as in our case, work in near-field, from a distance on which the system is focused. In the last part of this work I propose the use of a new modality characterized by a richer information content, namely acoustic images, for the sake of audio-visual scene understanding. Each pixel in such images is characterized by a spectral signature, associated to a specific direction in space and obtained by processing the audio signals coming from an array of microphones. By coupling such array with a video camera, we obtain spatio-temporal alignment of acoustic images and video frames. This constitutes a powerful source of self-supervision, which can be exploited in the learning pipeline we are proposing, without resorting to expensive data annotations. However, since 2D planar arrays are cumbersome and not as widespread as ordinary microphones, we propose that the richer information content of acoustic images can be distilled, through a self-supervised learning scheme, into more powerful audio and visual feature representations. The learnt feature representations can then be employed for downstream tasks such as classification and cross-modal retrieval, without the need of a microphone array. To prove that, we introduce a novel multimodal dataset consisting in RGB videos, raw audio signals and acoustic images, aligned in space and synchronized in time. Experimental results demonstrate the validity of our hypothesis and the effectiveness of the proposed pipeline, also when tested for tasks and datasets different from those used for training. Chapter 6 closes the thesis, presenting a development activity of a new Dual Cam POC to build-up from it a spin-off, assuming to apply for an innovation project for hi-tech start- ups (such as a SME instrument H2020) for a 50Keuro grant, following the idea of the technology transfer. A deep analysis of the reference market, technologies and commercial competitors, business model and the FTO of intellectual property is then conducted. Finally, following the latest technological trends (https://www.flir.eu/products/si124/) a new version of the device (planar audio array) with reduced dimensions and improved technical characteristics is simulated, simpler and easier to use than the current one, opening up new interesting possibilities of development not only technical and scientific but also in terms of business fallout

    Privacy-Preserving Outsourced Media Search

    Get PDF
    International audienceThis work proposes a privacy-protection framework for an important application called outsourced media search. This scenario involves a data owner, a client, and an untrusted server, where the owner outsources a search service to the server. Due to lack of trust, the privacy of the client and the owner should be protected. The framework relies on multimedia hashing and symmetric encryption. It requires involved parties to participate in a privacy-enhancing protocol. Additional processing steps are carried out by the owner and the client: (i) before outsourcing low-level media features to the server, the owner has to one-way hash them, and partially encrypt each hash-value; (ii) the client completes the similarity search by re-ranking the most similar candidates received from the server. One-way hashing and encryption add ambiguity to data and make it difficult for the server to infer contents from database items and queries, so the privacy of both the owner and the client is enforced. The proposed framework realizes trade-offs among strength of privacy enforcement, quality of search, and complexity, because the information loss can be tuned during hashing and encryption. Extensive experiments demonstrate the effectiveness and the flexibility of the framework

    TopSig: Topology Preserving Document Signatures

    Get PDF
    Performance comparisons between File Signatures and Inverted Files for text retrieval have previously shown several significant shortcomings of file signatures relative to inverted files. The inverted file approach underpins most state-of-the-art search engine algorithms, such as Language and Probabilistic models. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures. Many advances in semantic hashing and dimensionality reduction have been made in recent times, but these were not so far linked to general purpose, signature file based, search engines. This paper introduces a different signature file approach that builds upon and extends these recent advances. We are able to demonstrate significant improvements in the performance of signature file based indexing and retrieval, performance that is comparable to that of state of the art inverted file based systems, including Language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and from the theoretical perspective it positions the file signatures model in the class of Vector Space retrieval models.Comment: 12 pages, 8 figures, CIKM 201
    corecore