8,196 research outputs found
A comparison of standard spell checking algorithms and a novel binary neural approach
In this paper, we propose a simple, flexible, and efficient hybrid spell checking methodology based upon phonetic matching, supervised learning, and associative matching in the AURA neural system. We integrate Hamming Distance and n-gram algorithms that have high recall for typing errors and a phonetic spell-checking algorithm in a single novel architecture. Our approach is suitable for any spell checking application though aimed toward isolated word error correction, particularly spell checking user queries in a search engine. We use a novel scoring scheme to integrate the retrieved words from each spelling approach and calculate an overall score for each matched word. From the overall scores, we can rank the possible matches. In this paper, we evaluate our approach against several benchmark spellchecking algorithms for recall accuracy. Our proposed hybrid methodology has the highest recall rate of the techniques evaluated. The method has a high recall rate and low-computational cost
Artificial Intelligence Approach to the Determination of Physical Properties of Eclipsing Binaries. I. The EBAI Project
Achieving maximum scientific results from the overwhelming volume of
astronomical data to be acquired over the next few decades will demand novel,
fully automatic methods of data analysis. Artificial intelligence approaches
hold great promise in contributing to this goal. Here we apply neural network
learning technology to the specific domain of eclipsing binary (EB) stars, of
which only some hundreds have been rigorously analyzed, but whose numbers will
reach millions in a decade. Well-analyzed EBs are a prime source of
astrophysical information whose growth rate is at present limited by the need
for human interaction with each EB data-set, principally in determining a
starting solution for subsequent rigorous analysis. We describe the artificial
neural network (ANN) approach which is able to surmount this human bottleneck
and permit EB-based astrophysical information to keep pace with future data
rates. The ANN, following training on a sample of 33,235 model light curves,
outputs a set of approximate model parameters (T2/T1, (R1+R2)/a, e sin(omega),
e cos(omega), and sin i) for each input light curve data-set. The whole sample
is processed in just a few seconds on a single 2GHz CPU. The obtained
parameters can then be readily passed to sophisticated modeling engines. We
also describe a novel method polyfit for pre-processing observational light
curves before inputting their data to the ANN and present the results and
analysis of testing the approach on synthetic data and on real data including
fifty binaries from the Catalog and Atlas of Eclipsing Binaries (CALEB)
database and 2580 light curves from OGLE survey data. [abridged]Comment: 52 pages, accepted to Ap
Stacked Cross-modal Feature Consolidation Attention Networks for Image Captioning
Recently, the attention-enriched encoder-decoder framework has aroused great
interest in image captioning due to its overwhelming progress. Many visual
attention models directly leverage meaningful regions to generate image
descriptions. However, seeking a direct transition from visual space to text is
not enough to generate fine-grained captions. This paper exploits a
feature-compounding approach to bring together high-level semantic concepts and
visual information regarding the contextual environment fully end-to-end. Thus,
we propose a stacked cross-modal feature consolidation (SCFC) attention network
for image captioning in which we simultaneously consolidate cross-modal
features through a novel compounding function in a multi-step reasoning
fashion. Besides, we jointly employ spatial information and context-aware
attributes (CAA) as the principal components in our proposed compounding
function, where our CAA provides a concise context-sensitive semantic
representation. To make better use of consolidated features potential, we
further propose an SCFC-LSTM as the caption generator, which can leverage
discriminative semantic information through the caption generation process. The
experimental results indicate that our proposed SCFC can outperform various
state-of-the-art image captioning benchmarks in terms of popular metrics on the
MSCOCO and Flickr30K datasets
How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis
Supervised learning from high-dimensional data, e.g., multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a label-predictive low-dimensional set of features that can be post-processed by typical supervised algorithms to generate the ďŹnal label or class estimation. GSFA is trained with a so-called training graph, in which the vertices are the samples and the edges represent similarities of the corresponding labels. A new weighted SFA optimization problem is introduced, generalizing the notion of slowness from sequences of samples to such training graphs. We show that GSFA computes an optimal solution to this problem in the considered function space, and propose several types of training graphs. For classiďŹcation, the most straightforward graph yields features equivalent to those of (nonlinear) Fisher discriminant analysis. Emphasis is on regression, where four different graphs were evaluated experimentally with a subproblem of face detection on photographs. The method proposed is promising particularly when linear models are insufficient, as well as when feature selection is difficult
Quantum machine learning: a classical perspective
Recently, increased computational power and data availability, as well as
algorithmic advances, have led machine learning techniques to impressive
results in regression, classification, data-generation and reinforcement
learning tasks. Despite these successes, the proximity to the physical limits
of chip fabrication alongside the increasing size of datasets are motivating a
growing number of researchers to explore the possibility of harnessing the
power of quantum computation to speed-up classical machine learning algorithms.
Here we review the literature in quantum machine learning and discuss
perspectives for a mixed readership of classical machine learning and quantum
computation experts. Particular emphasis will be placed on clarifying the
limitations of quantum algorithms, how they compare with their best classical
counterparts and why quantum resources are expected to provide advantages for
learning problems. Learning in the presence of noise and certain
computationally hard problems in machine learning are identified as promising
directions for the field. Practical questions, like how to upload classical
data into quantum form, will also be addressed.Comment: v3 33 pages; typos corrected and references adde
Data Mining to Uncover Heterogeneous Water Use Behaviors From Smart Meter Data
Knowledge on the determinants and patterns of water demand for different consumers supports the design of customized demand management strategies. Smart meters coupled with big data analytics tools create a unique opportunity to support such strategies. Yet, at present, the information content of smart meter data is not fully mined and usually needs to be complemented with water fixture inventory and survey data to achieve detailed customer segmentation based on end use water usage. In this paper, we developed a dataâdriven approach that extracts information on heterogeneous water end use routines, main end use components, and temporal characteristics, only via data mining existing smart meter readings at the scale of individual households. We tested our approach on data from 327 households in Australia, each monitored with smart meters logging water use readings every 5 s. As part of the approach, we first disaggregated the householdâlevel water use time series into different end uses via Autoflow. We then adapted a customer segmentation based on eigenbehavior analysis to discriminate among heterogeneous water end use routines and identify clusters of consumers presenting similar routines. Results revealed three main water end use profile clusters, each characterized by a primary end use: shower, clothes washing, and irrigation. Timeâofâuse and intensityâofâuse differences exist within each class, as well as different characteristics of regularity and periodicity over time. Our customer segmentation analysis approach provides utilities with a concise snapshot of recurrent water use routines from smart meter data and can be used to support customized demand management strategies.TU Berlin, Open-Access-Mittel - 201
Data mining as a tool for environmental scientists
Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous
- âŚ