6,467 research outputs found
Query-driven document partitioning and collection selection
Abstract — We present a novel strategy to partition a document collection onto several servers and to perform effective collection selection. The method is based on the analysis of query logs. We proposed a novel document representation called query-vectors model. Each document is represented as a list recording the queries for which the document itself is a match, along with their ranks. To both partition the collection and build the collection selection function, we co-cluster queries and documents. The document clusters are then assigned to the underlying IR servers, while the query clusters represent queries that return similar results, and are used for collection selection. We show that this document partition strategy greatly boosts the performance of standard collection selection algorithms, including CORI, w.r.t. a round-robin assignment. Secondly, we show that performing collection selection by matching the query to the existing query clusters and successively choosing only one server, we reach an average precision-at-5 up to 1.74 and we constantly improve CORI precision of a factor between 11 % and 15%. As a side result we show a way to select rarely asked-for documents. Separating these documents from the rest of the collection allows the indexer to produce a more compact index containing only relevant documents that are likely to be requested in the future. In our tests, around 52 % of the documents (3,128,366) are not returned among the first 100 top-ranked results of any query. I
A Survey of Adaptive Resonance Theory Neural Network Models for Engineering Applications
This survey samples from the ever-growing family of adaptive resonance theory
(ART) neural network models used to perform the three primary machine learning
modalities, namely, unsupervised, supervised and reinforcement learning. It
comprises a representative list from classic to modern ART models, thereby
painting a general picture of the architectures developed by researchers over
the past 30 years. The learning dynamics of these ART models are briefly
described, and their distinctive characteristics such as code representation,
long-term memory and corresponding geometric interpretation are discussed.
Useful engineering properties of ART (speed, configurability, explainability,
parallelization and hardware implementation) are examined along with current
challenges. Finally, a compilation of online software libraries is provided. It
is expected that this overview will be helpful to new and seasoned ART
researchers
Constraint-Based Ontology Induction From Online Customer Reviews
We present an unsupervised, domain-independent technique for inducing a product-specific ontology of product features based upon online customer reviews. We frame ontology induction as a logical assignment problem and solve it with a bounds consistency constrained logic program. Using shallow natural language processing techniques, reviews are parsed into phrase sequences where each phrase refers to a single concept. Traditional document clustering techniques are adapted to collect phrases into initial concepts. We generate a token graph for each initial concept cluster and find a maximal clique to define the corresponding logical set of concept sub-elements. The logic program assigns tokens to clique sub-elements. We apply the technique to several thousand digital camera customer reviews and evaluate the results by comparing them to the ontologies represented by several prominent online buying guides. Because our results are drawn directly from customer comments, differences between our automatically induced product features and those in extant guides may reflect opportunities for better managing customer-producer relationships rather than errors in the process
Machine Learning in Wireless Sensor Networks: Algorithms, Strategies, and Applications
Wireless sensor networks monitor dynamic environments that change rapidly
over time. This dynamic behavior is either caused by external factors or
initiated by the system designers themselves. To adapt to such conditions,
sensor networks often adopt machine learning techniques to eliminate the need
for unnecessary redesign. Machine learning also inspires many practical
solutions that maximize resource utilization and prolong the lifespan of the
network. In this paper, we present an extensive literature review over the
period 2002-2013 of machine learning methods that were used to address common
issues in wireless sensor networks (WSNs). The advantages and disadvantages of
each proposed algorithm are evaluated against the corresponding problem. We
also provide a comparative guide to aid WSN designers in developing suitable
machine learning solutions for their specific application challenges.Comment: Accepted for publication in IEEE Communications Surveys and Tutorial
- …