19 research outputs found

    Burrows‐Wheeler post‐transformation with effective clustering and interpolative coding

    Get PDF
    Lossless compression methods based on the Burrows‐Wheeler transform (BWT) are regarded as an excellent compromise between speed and compression efficiency: they provide compression rates close to the PPM algorithms, with the speed of dictionary‐based methods. Instead of the laborious statistics‐gathering process used in PPM, the BWT reversibly sorts the input symbols, using as the sort key as many following characters as necessary to make the sort unique. Characters occurring in similar contexts are sorted close together, resulting in a clustered symbol sequence. Run‐length encoding and Move‐to‐Front (MTF) recoding, combined with a statistical Huffman or arithmetic coder, is then typically used to exploit the clustering. A drawback of the MTF recoding is that knowledge of the character that produced the MTF number is lost. In this paper, we present a new, competitive Burrows‐Wheeler posttransform stage that takes advantage of interpolative coding—a fast binary encoding method for integer sequences, being able to exploit clusters without requiring explicit statistics. We introduce a fast and simple way to retain knowledge of the run characters during the MTF recoding and use this to improve the clustering of MTF numbers and run‐lengths by applying reversible, stable sorting, with the run characters as sort keys, achieving significant improvement in the compression rate, as shown here by experiments on common text corpora.</p

    Automatic detection of cereal rows by means of pattern recognition techniques

    Get PDF
    Automatic locating of weeds from fields is an active research topic in precision agriculture. A reliable and practical plant identification technique would enable the reduction of herbicide amounts and lowering of production costs, along with reducing the damage to the ecosystem. When the seeds have been sown row-wise, most weeds may be located between the sowing rows. The present work describes a clustering-based method for recognition of plantlet rows from a set of aerial photographs, taken by a drone flying at approximately ten meters. The algorithm includes three phases: segmentation of green objects in the view, feature extraction, and clustering of plants into individual rows. Segmentation separates the plants from the background. The main feature to be extracted is the center of gravity of each plant segment. A tentative clustering is obtained piecewise by applying the 2D Fourier transform to image blocks to get information about the direction and the distance between the rows. The precise sowing line position is finally derived by principal component analysis. The method was able to find the rows from a set of photographs of size 1452 x 969 pixels approximately in 0.11 s, with the accuracy of 94 per cent

    Clustering of Shared Subobjects in Databases

    Full text link
    The topic of this article is multi-criterion, structure-based clustering in objectoriented databases. We study an object class, which is the target (subobject) of several multi-valued reference types from other object classes. The aim is to serve all access paths fairly, so that the number of page accesses is proportional to the number of referenced occurrences of the subobject class. An efficient heuristic algorithm is developed for inserting new subobjects in an existing page set. Significant benefits can be obtained in read-intensive applications, if the confluent references are semantically correlated. Keywords: Clustering, Page allocation, Object-oriented databases 1 Introduction Clustering of objects (records) is necessary for effective database processing, as long as they are stored on conventional disks with slow random access. Three general kinds of clustering can be distinguished: 1. Content-based clustering: Objects sharing the same value for a certain attribute are placed..

    Probabilistic Iterative Expansion of Candidates in Mining Frequent Itemsets

    Full text link
    A simple new algorithm is suggested for frequent itemset mining, using item probabilities as the basis for generating candidates. The method first finds all the frequent items, and then generates an estimate of the frequent sets, assuming item independence. The candidates are stored in a trie where each path from the root to a node represents one candidate itemset. The method expands the trie iteratively, until all frequent itemsets are found. Expansion is based on scanning through the data set in each iteration cycle, and extending the subtries based on observed node frequencies. Trie probing can be restricted to only those nodes which possibly need extension. The number of candidates is usually quite moderate; for dense datasets 2-4 times the number of final frequent itemsets, for non-dense sets somewhat more. In practical experiments the method has been observed to make clearly fewer passes than the well-known Apriori method. As for speed, our non-optimised implementation is in some cases faster, in some others slower than the comparison methods. 1
    corecore