4,059 research outputs found

    k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)

    Get PDF
    Perhaps the most straightforward classifier in the arsenal or machine learning techniques is the Nearest Neighbour Classifier -- classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data. This paper is the second edition of a paper previously published as a technical report. Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN

    Optimization of Tree Modes for Parallel Hash Functions: A Case Study

    Full text link
    This paper focuses on parallel hash functions based on tree modes of operation for an inner Variable-Input-Length function. This inner function can be either a single-block-length (SBL) and prefix-free MD hash function, or a sponge-based hash function. We discuss the various forms of optimality that can be obtained when designing parallel hash functions based on trees where all leaves have the same depth. The first result is a scheme which optimizes the tree topology in order to decrease the running time. Then, without affecting the optimal running time we show that we can slightly change the corresponding tree topology so as to minimize the number of required processors as well. Consequently, the resulting scheme decreases in the first place the running time and in the second place the number of required processors.Comment: Preprint version. Added citations, IEEE Transactions on Computers, 201

    Sharing large data collections between mobile peers

    Get PDF
    New directions in the provision of end-user computing experiences mean that we need to determine the best way to share data between small mobile computing devices. Partitioning large structures so that they can be shared efficiently provides a basis for data-intensive applications on such platforms. In conjunction with such an approach, dictionary-based compression techniques provide additional benefits and help to prolong battery life

    Designing a resource-efficient data structure for mobile data systems

    Get PDF
    Designing data structures for use in mobile devices requires attention on optimising data volumes with associated benefits for data transmission, storage space and battery use. For semi-structured data, tree summarisation techniques can be used to reduce the volume of structured elements while dictionary compression can efficiently deal with value-based predicates. This project seeks to investigate and evaluate an integration of the two approaches. The key strength of this technique is that both structural and value predicates could be resolved within one graph while further allowing for compression of the resulting data structure. As the current trend is towards the requirement for working with larger semi-structured data sets this work would allow for the utilisation of much larger data sets whilst reducing requirements on bandwidth and minimising the memory necessary both for the storage and querying of the data

    Efficient classification using parallel and scalable compressed model and Its application on intrusion detection

    Full text link
    In order to achieve high efficiency of classification in intrusion detection, a compressed model is proposed in this paper which combines horizontal compression with vertical compression. OneR is utilized as horizontal com-pression for attribute reduction, and affinity propagation is employed as vertical compression to select small representative exemplars from large training data. As to be able to computationally compress the larger volume of training data with scalability, MapReduce based parallelization approach is then implemented and evaluated for each step of the model compression process abovementioned, on which common but efficient classification methods can be directly used. Experimental application study on two publicly available datasets of intrusion detection, KDD99 and CMDC2012, demonstrates that the classification using the compressed model proposed can effectively speed up the detection procedure at up to 184 times, most importantly at the cost of a minimal accuracy difference with less than 1% on average

    Efficient data representation for XML in peer-based systems

    Get PDF
    Purpose - New directions in the provision of end-user computing experiences mean that the best way to share data between small mobile computing devices needs to be determined. Partitioning large structures so that they can be shared efficiently provides a basis for data-intensive applications on such platforms. The partitioned structure can be compressed using dictionary-based approaches and then directly queried without firstly decompressing the whole structure. Design/methodology/approach - The paper describes an architecture for partitioning XML into structural and dictionary elements and the subsequent manipulation of the dictionary elements to make the best use of available space. Findings - The results indicate that considerable savings are available by removing duplicate dictionaries. The paper also identifies the most effective strategy for defining dictionary scope. Research limitations/implications - This evaluation is based on a range of benchmark XML structures and the approach to minimising dictionary size shows benefit in the majority of these. Where structures are small and regular, the benefits of efficient dictionary representation are lost. The authors' future research now focuses on heuristics for further partitioning of structural elements. Practical implications - Mobile applications that need access to large data collections will benefit from the findings of this research. Traditional client/server architectures are not suited to dealing with high volume demands from a multitude of small mobile devices. Peer data sharing provides a more scalable solution and the experiments that the paper describes demonstrate the most effective way of sharing data in this context. Social implications - Many services are available via smartphone devices but users are wary of exploiting the full potential because of the need to conserve battery power. The approach mitigates this challenge and consequently expands the potential for users to benefit from mobile information systems. This will have impact in areas such as advertising, entertainment and education but will depend on the acceptability of file sharing being extended from the desktop to the mobile environment. Originality/value - The original work characterises the most effective way of sharing large data sets between small mobile devices. This will save battery power on devices such as smartphones, thus providing benefits to users of such devices

    A novel semi-fragile forensic watermarking scheme for remote sensing images

    Get PDF
    Peer-reviewedA semi-fragile watermarking scheme for multiple band images is presented. We propose to embed a mark into remote sensing images applying a tree structured vector quantization approach to the pixel signatures, instead of processing each band separately. The signature of themmultispectral or hyperspectral image is used to embed the mark in it order to detect any significant modification of the original image. The image is segmented into threedimensional blocks and a tree structured vector quantizer is built for each block. These trees are manipulated using an iterative algorithm until the resulting block satisfies a required criterion which establishes the embedded mark. The method is shown to be able to preserve the mark under lossy compression (above a given threshold) but, at the same time, it detects possibly forged blocks and their position in the whole image.Se presenta un esquema de marcas de agua semi-frĂĄgiles para mĂșltiples imĂĄgenes de banda. Proponemos incorporar una marca en imĂĄgenes de detecciĂłn remota, aplicando un enfoque de cuantizaciĂłn del vector de ĂĄrbol estructurado con las definiciones de pĂ­xel, en lugar de procesar cada banda por separado. La firma de la imagen hiperespectral se utiliza para insertar la marca en el mismo orden para detectar cualquier modificaciĂłn significativa de la imagen original. La imagen es segmentada en bloques tridimensionales y un cuantificador de vector de estructura de ĂĄrbol se construye para cada bloque. Estos ĂĄrboles son manipulados utilizando un algoritmo iteractivo hasta que el bloque resultante satisface un criterio necesario que establece la marca incrustada. El mĂ©todo se muestra para poder preservar la marca bajo compresiĂłn con pĂ©rdida (por encima de un umbral establecido) pero, al mismo tiempo, detecta posiblemente bloques forjados y su posiciĂłn en la imagen entera.Es presenta un esquema de marques d'aigua semi-frĂ gils per a mĂșltiples imatges de banda. Proposem incorporar una marca en imatges de detecciĂł remota, aplicant un enfocament de quantitzaciĂł del vector d'arbre estructurat amb les definicions de pĂ­xel, en lloc de processar cada banda per separat. La signatura de la imatge hiperespectral s'utilitza per inserir la marca en el mateix ordre per detectar qualsevol modificaciĂł significativa de la imatge original. La imatge Ă©s segmentada en blocs tridimensionals i un quantificador de vector d'estructura d'arbre es construeix per a cada bloc. Aquests arbres sĂłn manipulats utilitzant un algoritme iteractiu fins que el bloc resultant satisfĂ  un criteri necessari que estableix la marca incrustada. El mĂštode es mostra per poder preservar la marca sota compressiĂł amb pĂšrdua (per sobre d'un llindar establert) perĂČ, al mateix temps, detecta possiblement blocs forjats i la seva posiciĂł en la imatge sencera
    • 

    corecore