Search CORE

2,135 research outputs found

Graphing of E-Science Data with varying user requirements

Author: Aly Robin
Wombacher Andreas
Publication venue: Centre for Telematics and Information Technology, University of Twente
Publication date: 01/01/2011
Field of study

Based on our experience in the Swiss Experiment, exploring experimental, scientific data is often done in a visual way. Starting from a global overview the users are zooming in on interesting events. In case of huge data volumes special data structures have to be introduced to provide fast and easy access to the data. Since it is hard to predict on how users will work with the data a generic approach requires self-adaptation of the required special data structures. In this paper we describe the underlying NP-hard problem and present several approaches to address the problem with varying properties. The approaches are illustrated with a small example and are evaluated with a synthetic data set and user queries

University of Twente Research Information

Plant image retrieval using color, shape and texture features

Author: Kebapcı Hanife
Kebapci Hanife
Unal Gozde
Yanikoglu Berrin
Yanıkoğlu Berrin
Ünal Gözde
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/09/2011
Field of study

We present a content-based image retrieval system for plant image retrieval, intended especially for the house plant identification problem. A plant image consists of a collection of overlapping leaves and possibly flowers, which makes the problem challenging.We studied the suitability of various well-known color, shape and texture features for this problem, as well as introducing some new texture matching techniques and shape features. Feature extraction is applied after segmenting the plant region from the background using the max-flow min-cut technique. Results on a database of 380 plant images belonging to 78 different types of plants show promise of the proposed new techniques and the overall system: in 55% of the queries, the correct plant image is retrieved among the top-15 results. Furthermore, the accuracy goes up to 73% when a 132-image subset of well-segmented plant images are considered

Sabanci University Research Database

Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

Author: Freris Nikolaos
Kyrillidis Anastasios
Vlachos Michail
Publication venue
Publication date: 22/05/2014
Field of study

Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the \emph{tightest} lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations \textit{directly} in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an \emph{exact} solution to the problem. The suggested solution provides the tightest estimation of the

L_2

-norm or the correlation. We show that typical data-analysis operations, such as k-NN search or k-Means clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCA-based techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCA-based compression. The contributions of this work are generic as our methodology is applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.Comment: 25 pages, 20 figures, accepted in VLD

arXiv.org e-Print Archive

Crossref

Serveur académique lausannois

Differentially Private Publication of Sparse Data

Author: Cormode Graham
Procopiuc Magda
Srivastava Divesh
Tran Thanh T. L.
Publication venue
Publication date: 04/03/2011
Field of study

The problem of privately releasing data is to provide a version of a dataset without revealing sensitive information about the individuals who contribute to the data. The model of differential privacy allows such private release while providing strong guarantees on the output. A basic mechanism achieves differential privacy by adding noise to the frequency counts in the contingency tables (or, a subset of the count data cube) derived from the dataset. However, when the dataset is sparse in its underlying space, as is the case for most multi-attribute relations, then the effect of adding noise is to vastly increase the size of the published data: it implicitly creates a huge number of dummy data points to mask the true data, making it almost impossible to work with. We present techniques to overcome this roadblock and allow efficient private release of sparse data, while maintaining the guarantees of differential privacy. Our approach is to release a compact summary of the noisy data. Generating the noisy data and then summarizing it would still be very costly, so we show how to shortcut this step, and instead directly generate the summary from the input data, without materializing the vast intermediate noisy data. We instantiate this outline for a variety of sampling and filtering methods, and show how to use the resulting summary for approximate, private, query answering. Our experimental study shows that this is an effective, practical solution, with comparable and occasionally improved utility over the costly materialization approach

arXiv.org e-Print Archive

CiteSeerX

Wavelet based similarity measurement algorithm for seafloor morphology

Author: Darilmaz İlkay
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2006
Field of study

Thesis (S.M. in Naval Architecture and Marine Engineering and S.M. in Mechanical Engineering)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 2006.Includes bibliographical references (leaves 71-73).The recent expansion of systematic seafloor exploration programs such as geophysical research, seafloor mapping, search and survey, resource assessment and other scientific, commercial and military applications has created a need for rapid and robust methods of processing seafloor imagery. Given the existence of a large library of seafloor images, a fast automated image classifier algorithm is needed to determine changes in seabed morphology over time. The focus of this work is the development of a robust Similarity Measurement (SM) algorithm to address the above problem. Our work uses a side-scan sonar image library for experimentation and testing. Variations of an underwater vehicle's height above the sea floor and of its pitch and roll angles cause distortion in the data obtained, such that transformations to align the data should include rotation, translation, anisotropic scaling and skew. In order to deal with these problems, we propose to use the Wavelet transform for similarity detection. Wavelets have been widely used during the last three decades in image processing. Since the Wavelet transform allows a multi-resolution decomposition, it is easier to identify the similarities between two images by examining the energy distribution at each decomposition level.(cont.) The energy distribution in the frequency domain at the output of the high pass and low pass filter banks identifies the texture discrimination. Our approach uses a statistical framework, involving fitting the Wavelet coefficients into a generalized Gaussian density distribution. The next step involves use of the Kullback-Leibner entropy metric to measure the distance between Wavelet coefficient distributions. To select the top N most likely matching images, the database images are ranked based on the minimum Kullback-Leibner distance. The statistical approach is effective in eliminating rotation, mis-registration and skew problems by working in the Wavelet domain. It's recommended that further work focuses on choosing the best Wavelet packet to increase the robustness of the algorithm developed in this thesis.by Ilkay Darilmaz.S.M.in Naval Architecture and Marine Engineering and S.M.in Mechanical Engineerin

DSpace@MIT

Application of Wavelets and Principal Component Analysis in Image Query and Mammography

Author: Neeman Sol, Ph.D.
Publication venue: ScholarsArchive@JWU
Publication date: 01/01/2000
Field of study

Breast cancer is currently one of the major causes of death for women in the U.S. Mammography is currently the most effective method for detection of breast cancer and early detection has proven to be an efficient tool to reduce the number of deaths. Mammography is the most demanding of all clinical imaging applications as it requires high contrast, high signal to noise ratio and resolution with minimal x-radiation. According to studies [36], 10% to 30% of women having breast cancer and undergoing mammography, have negative mammograms, i.e. are misdiagnosed. Furthermore, only 20%-40% of the women who undergo biopsy, have cancer. Biopsies are expensive, invasive and traumatic to the patient. The high rate of false positives is partly because of the difficulties in the diagnosis process and partly due to the fear of missing a cancer. These facts motivate research aimed to enhance the mammogram images (e.g. by enhancement of features such as clustered calcification regions which were found to be associated with breast cancer) , to provide CAD (Computer Aided Diagnostics) tools that can alert the radiologist to potentially malignant regions in the mammograms and to develope tools for automated classification of mammograms into benign and malignant classes. In this paper we apply wavelet and Principal Component analysis, including the approximate Karhunen Loeve aransform to mammographic images, to derive feature vectors used for classification of mammographic images from an early stage of malignancy. Another area where wavelet analysis was found useful, is the area of image query. Image query of large data bases must provide a fast and efficient search of the query image. Lately, a group of researchers developed an algorithm based on wavelet analysis that was found to provide fast and efficient search in large data bases. Their method overcomes some of the difficulties associated with previous approaches, but the search algorithm is sensitive to displacement and rotation of the query image due to the fact that wavelet analysis is not invariant under displacement and rotation. In this study we propose the integration of the Hotelling transform to improve on this sensitivity and provide some experimental results in the context of the standard alphabetic characters

ScholarsArchive at Johnson & Wales University

HELIN Digital Commons

DigitalCommons@URI

Recommended from our members

Fast retrieval of weather analogues in a multi-petabyte meteorological archive

Author: Raoult Baudouin
Publication venue
Publication date: 31/01/2020
Field of study

The European Centre for Medium-Range Weather Forecasts (ECMWF) manages the largest archive of meteorological data in the world. At the time of writing, it holds around 300 petabytes and grows at a rate of 1 petabyte per week. This archive is now mature, and contains valuable datasets such as several reanalyses, providing a consistent view of the weather over several decades. Weather analogue is the term used by meteorologists to refer to similar weather situations. Looking for analogues in an archive using a brute force approach requires data to be retrieved from tape and then compared to a user-provided weather pattern, using a chosen similarity measure. Such an operation would be very long and costly. In this work, a wavelet-based fingerprinting scheme is proposed to index all weather patterns from the archive, over a selected geographical domain. The system answers search queries by computing the fingerprint of the query pattern and looking for close matched in the index. Searches are fast enough that they are perceived as being instantaneous. A web-based application is provided, allowing users to express their queries interactively in a friendly and straightforward manner by sketching weather patterns directly in their web browser. Matching results are then presented as a series of weather maps, labelled with the date and time at which they occur. The system has been deployed as part of the Copernicus Climate Data Store and allows the retrieval of weather analogues from ERA5, a 40-years hourly reanalysis dataset. Some preliminary results of this work have been presented at the International Conference on Computational Science 2018 (Raoult et al. (2018))

Central Archive at the University of Reading