Search CORE

3,053 research outputs found

A binary neural k-nearest neighbour technique

Author: Austin J.
Hodge V.J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 02/02/2005
Field of study

K-Nearest Neighbour (k-NN) is a widely used technique for classifying and clustering data. K-NN is effective but is often criticised for its polynomial run-time growth as k-NN calculates the distance to every other record in the data set for each record in turn. This paper evaluates a novel k-NN classifier with linear growth and faster run-time built from binary neural networks. The binary neural approach uses robust encoding to map standard ordinal, categorical and real-valued data sets onto a binary neural network. The binary neural network uses high speed pattern matching to recall the k-best matches. We compare various configurations of the binary approach to a conventional approach for memory overheads, training speed, retrieval speed and retrieval accuracy. We demonstrate the superior performance with respect to speed and memory requirements of the binary approach compared to the standard approach and we pinpoint the optimal configurations

White Rose Research Online

A systematic review of data quality issues in knowledge discovery tasks

Author: Corrales David Camilo
Corrales Juan Carlos
Ledezma Agapito Ismael
Publication venue: 'Universidad de Medellin'
Publication date: 07/11/2015
Field of study

Hay un gran crecimiento en el volumen de datos porque las organizaciones capturan permanentemente la cantidad colectiva de datos para lograr un mejor proceso de toma de decisiones. El desafío mas fundamental es la exploración de los grandes volúmenes de datos y la extracción de conocimiento útil para futuras acciones por medio de tareas para el descubrimiento del conocimiento; sin embargo, muchos datos presentan mala calidad. Presentamos una revisión sistemática de los asuntos de calidad de datos en las áreas del descubrimiento de conocimiento y un estudio de caso aplicado a la enfermedad agrícola conocida como la roya del café.Large volume of data is growing because the organizations are continuously capturing the collective amount of data for better decision-making process. The most fundamental challenge is to explore the large volumes of data and extract useful knowledge for future actions through knowledge discovery tasks, nevertheless many data has poor quality. We presented a systematic review of the data quality issues in knowledge discovery tasks and a case study applied to agricultural disease named coffee rust

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

Universidad de Medellín: Revistas Científicas

Repositorio Institucional Universidad de Medellín

DIALNET

Weighted Visibility Graph with Complex Network Features in the Detection of Epilepsy

Author: Cao J
Siuly Siuly
Supriya Supriya
Wang Hua
Zhang Yanchun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

Crossref

Victoria University Eprints Repository

SAFE: Self-Attentive Function Embeddings for Binary Similarity

Author: Baldoni Roberto
Di Luna Giuseppe Antonio
Massarelli Luca
Petroni Fabio
Querzoni Leonardo
Publication venue
Publication date: 01/01/2019
Field of study

The binary similarity problem consists in determining if two functions are similar by only considering their compiled form. Advanced techniques for binary similarity recently gained momentum as they can be applied in several fields, such as copyright disputes, malware analysis, vulnerability detection, etc., and thus have an immediate practical impact. Current solutions compare functions by first transforming their binary code in multi-dimensional vector representations (embeddings), and then comparing vectors through simple and efficient geometric operations. However, embeddings are usually derived from binary code using manual feature extraction, that may fail in considering important function characteristics, or may consider features that are not important for the binary similarity problem. In this paper we propose SAFE, a novel architecture for the embedding of functions based on a self-attentive neural network. SAFE works directly on disassembled binary functions, does not require manual feature extraction, is computationally more efficient than existing solutions (i.e., it does not incur in the computational overhead of building or manipulating control flow graphs), and is more general as it works on stripped binaries and on multiple architectures. We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions. Furthermore, we show how clusters of our embedding vectors are closely related to the semantic of the implemented algorithms, paving the way for further interesting applications (e.g. semantic-based binary function search).Comment: Published in International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment (DIMVA) 201

arXiv.org e-Print Archive

Crossref

Archivio della ricerca- Università di Roma La Sapienza

Fast and Efficient Classification, Tracking, and Simulation in Wireless Sensor Networks

Author: Jiang Hao
Publication venue: Clemson University Libraries
Publication date: 01/08/2012
Field of study

Wireless sensor networks are composed of large numbers of resource-lean sensors that collect low-level inputs from the physical world. The applications present challenges for programmers. On the one hand, lightweight algorithms are required given the limited capacity of the constituent devices. On the other, the algorithms must be scalable to accommodate large networks. In this thesis, we focus on the design and implementation of fast and lean (yet scalable) algorithms for classification, simulation, and target tracking in the context of wireless sensor networks. We briefly consider each of these challenges in turn. The first challenge is to achieve high precision classification of high-level events in-network using limited computational and energy resources. We present in-network implementations of a Bayesian classifier and a condensed kd-tree classifier for identifying events of interest on resource-lean embedded sensors. The first approach uses preprocessed sensor readings to derive a multi-dimensional Bayesian classifier used to classify sensor data in real-time. The second introduces an innovative condensed kd-tree to represent preprocessed sensor data and uses a fast nearest-neighbor search to determine the likelihood of class membership for incoming samples. Both classifiers consume limited resources and provide high precision classification. To evaluate each approach, two case studies are considered, in the contexts of human movement and vehicle navigation, respectively. The classification accuracy is above 85% for both classifiers across the two case studies. The second challenge is to achieve high performance parallel simulation of sensor network hardware. This is achieved by reducing the synchronization overhead among distributed simulation processes. Traditional parallel simulation strategies introduce significant synchronization overhead, reducing the simulation speed. We present an optimistic simulation algorithm with support for backtracking and re-execution. The algorithm reduces the number of synchronization cycles to the number of transmissions in the network under test. Concretely, we implement SnapSim, an extension to the popular Avrora simulator, based on this algorithm. The experimental results show that our prototype system improves the performance of Avrora by 2 to 10 times for typical network-centric sensor network applications, and up to three orders of magnitude for applications that use the radio infrequently. The third challenge is to efficiently track a moving target in a network. The difficulty again lies in the conflict between the limited resource capacity of typical sensors and the significant processing requirements of typical tracking algorithms. We introduce an in-network object tracking framework for tracking mobile objects using resource-lean sensors. The framework is based on a distributed, dynamically scoped tracking algorithm which adaptively scopes the event detection region based on object speed. A leader node records the samples across an event region (without the aid of time synchronization) and estimates the object\u27s location in situ. To minimize the number of radio transmissions, the location snapshotting rate is also adjusted based on the object speed. In this dissertation, focusing on the above challenges, we present the design, implementation, and evaluation of classification, simulation, and tracking contributions

Clemson University: TigerPrints