1,095 research outputs found
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
Weyl Spreading Sequence Optimizing CDMA
This paper shows an optimal spreading sequence in the Weyl sequence class,
which is similar to the set of the Oppermann sequences for asynchronous CDMA
systems. Sequences in Weyl sequence class have the desired property that the
order of cross-correlation is low. Therefore, sequences in the Weyl sequence
class are expected to minimize the inter-symbol interference. We evaluate the
upper bound of cross-correlation and odd cross-correlation of spreading
sequences in the Weyl sequence class and construct the optimization problem:
minimize the upper bound of the absolute values of cross-correlation and odd
cross-correlation. Since our optimization problem is convex, we can derive the
optimal spreading sequences as the global solution of the problem. We show
their signal to interference plus noise ratio (SINR) in a special case. From
this result, we propose how the initial elements are assigned, that is, how
spreading sequences are assigned to each users. In an asynchronous CDMA system,
we also numerically compare our spreading sequences with other ones, the Gold
codes, the Oppermann sequences, the optimal Chebyshev spreading sequences and
the SP sequences in Bit Error Rate. Our spreading sequence, which yields the
global solution, has the highest performance among the other spreading
sequences tested
Efficient Constellation-Based Map-Merging for Semantic SLAM
Data association in SLAM is fundamentally challenging, and handling ambiguity
well is crucial to achieve robust operation in real-world environments. When
ambiguous measurements arise, conservatism often mandates that the measurement
is discarded or a new landmark is initialized rather than risking an incorrect
association. To address the inevitable `duplicate' landmarks that arise, we
present an efficient map-merging framework to detect duplicate constellations
of landmarks, providing a high-confidence loop-closure mechanism well-suited
for object-level SLAM. This approach uses an incrementally-computable
approximation of landmark uncertainty that only depends on local information in
the SLAM graph, avoiding expensive recovery of the full system covariance
matrix. This enables a search based on geometric consistency (GC) (rather than
full joint compatibility (JC)) that inexpensively reduces the search space to a
handful of `best' hypotheses. Furthermore, we reformulate the commonly-used
interpretation tree to allow for more efficient integration of clique-based
pairwise compatibility, accelerating the branch-and-bound max-cardinality
search. Our method is demonstrated to match the performance of full JC methods
at significantly-reduced computational cost, facilitating robust object-based
loop-closure over large SLAM problems.Comment: Accepted to IEEE International Conference on Robotics and Automation
(ICRA) 201
Random Access for Machine-Type Communication based on Bloom Filtering
We present a random access method inspired on Bloom filters that is suited
for Machine-Type Communications (MTC). Each accessing device sends a
\emph{signature} during the contention process. A signature is constructed
using the Bloom filtering method and contains information on the device
identity and the connection establishment cause. We instantiate the proposed
method over the current LTE-A access protocol. However, the method is
applicable to a more general class of random access protocols that use
preambles or other reservation sequences, as expected to be the case in 5G
systems. We show that our method utilizes the system resources more efficiently
and achieves significantly lower connection establishment latency in case of
synchronous arrivals, compared to the variant of the LTE-A access protocol that
is optimized for MTC traffic. A dividend of the proposed method is that it
allows the base station (BS) to acquire the device identity and the connection
establishment cause already in the initial phase of the connection
establishment, thereby enabling their differentiated treatment by the BS.Comment: Accepted for presentation on IEEE Globecom 201
Using TEI for an Endangered Language Lexical Resource: The Nxaʔamxcín Database-Dictionary Project
This paper describes the evolution of a lexical resource project for Nxaʔamxcín, an endangered Salish language, from the project’s inception in the 1990s, based on legacy materials recorded in the 1960s and 1970s, to its current form as an online database that is transformable into various print and web-based formats for varying uses. We illustrate how we are using TEI P5 for data-encoding and archiving and show that TEI is a mature, reliable, flexible standard which is a valuable tool for lexical and morphological markup and for the production of lexical resources. Lexical resource creation, as is the case with language documentation and description more generally, benefits from portability and thus from conformance to standards (Bird and Simons 2003, Thieberger 2011). This paper therefore also discusses standards-harmonization, focusing on our attempt to achieve interoperability in format and terminology between our database and standards proposed for LMF, RELISH and GOLD. We show that, while it is possible to achieve interoperability, ultimately it is difficult to do so convincingly, thus raising questions about what conformance to standards means in practice.National Foreign Language Resource Cente
k-Nearest Neighbour Classifiers - A Tutorial
Perhaps the most straightforward classifier in the arsenal or Machine Learning techniques is the Nearest Neighbour Classifier – classification is achieved by identifying the nearest neighbours to a query example and using those neighbours to determine the class of the query. This approach to classification is of particular importance because issues of poor run-time performance is not such a problem these days with the computational power that is available. This paper presents an overview of techniques for Nearest Neighbour classification focusing on; mechanisms for assessing similarity (distance), computational issues in identifying nearest neighbours and mechanisms for reducing the dimension of the data.This paper is the second edition of a paper previously published as a technical report . Sections on similarity measures for time-series, retrieval speed-up and intrinsic dimensionality have been added. An Appendix is included providing access to Python code for the key methods
- …