Search CORE

782 research outputs found

mARC: Memory by Association and Reinforcement of Contexts

Author: Descourt Patrice
Rimoux Norbert
Publication venue
Publication date: 10/12/2013
Field of study

This paper introduces the memory by Association and Reinforcement of Contexts (mARC). mARC is a novel data modeling technology rooted in the second quantization formulation of quantum mechanics. It is an all-purpose incremental and unsupervised data storage and retrieval system which can be applied to all types of signal or data, structured or unstructured, textual or not. mARC can be applied to a wide range of information clas-sification and retrieval problems like e-Discovery or contextual navigation. It can also for-mulated in the artificial life framework a.k.a Conway "Game Of Life" Theory. In contrast to Conway approach, the objects evolve in a massively multidimensional space. In order to start evaluating the potential of mARC we have built a mARC-based Internet search en-gine demonstrator with contextual functionality. We compare the behavior of the mARC demonstrator with Google search both in terms of performance and relevance. In the study we find that the mARC search engine demonstrator outperforms Google search by an order of magnitude in response time while providing more relevant results for some classes of queries

arXiv.org e-Print Archive

CiteSeerX

Evaluating a Cluster of Low-Power ARM64 Single-Board Computers with MapReduce

Author: McDermott Daniel
Publication venue: EWU Digital Commons
Publication date: 01/01/2018
Field of study

With the meteoric rise of enormous data collection in science, industry, and the cloud, methods for processing massive datasets have become more crucial than ever. MapReduce is a restricted programing model for expressing parallel computations as simple serial functions, and an execution framework for distributing those computations over large datasets residing on clusters of commodity hardware. MapReduce abstracts away the challenging low-level synchronization and scalability details which parallel and distributed computing often necessitate, reducing the concept burden on programmers and scientists who require data processing at-scale. Typically, MapReduce clusters are implemented using inexpensive commodity hardware, emphasizing quantity over quality due to the fault-tolerant nature of the MapReduce execution framework. The nascent explosion of inexpensive single-board computers designed around multi-core 64bit ARM processors, such as the RasberryPi 3, Pine64, and Odroid C2, has opened new avenues for inexpensive and low-power cluster computing. In this thesis, we implement a novel cluster around low-power ARM64 single-board computers and the Disco Python MapReduce execution framework. We use MapReduce to empirically evaluate our cluster by solving the Word Count and Inverted Link Index problems for the Wikipedia article dataset. We benchmark our MapReduce solutions against local solutions of the same algorithms for a conventional low-power x86 platform. We show our cluster out-performs the conventional platform for larger benchmarks, thus demonstrating low-power single-board computers as a viable avenue for data-intensive cluster computing

Eastern Washington University: EWU Digital Commons

Mining Missing Hyperlinks from Human Navigation Traces: A Case Study of Wikipedia

Author: Clemesha A.
Milgram S.
Milne D.
Popescul A.
Singer P.
Taskar B.
West R.
West R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 13/03/2015
Field of study

Hyperlinks are an essential feature of the World Wide Web. They are especially important for online encyclopedias such as Wikipedia: an article can often only be understood in the context of related articles, and hyperlinks make it easy to explore this context. But important links are often missing, and several methods have been proposed to alleviate this problem by learning a linking model based on the structure of the existing links. Here we propose a novel approach to identifying missing links in Wikipedia. We build on the fact that the ultimate purpose of Wikipedia links is to aid navigation. Rather than merely suggesting new links that are in tune with the structure of existing links, our method finds missing links that would immediately enhance Wikipedia's navigability. We leverage data sets of navigation paths collected through a Wikipedia-based human-computation game in which users must find a short path from a start to a target article by only clicking links encountered along the way. We harness human navigational traces to identify a set of candidates for missing links and then rank these candidates. Experiments show that our procedure identifies missing links of high quality

arXiv.org e-Print Archive

CiteSeerX

Crossref

Recommended from our members

Can we do better than co-citations? Bringing Citation Proximity Analysis from idea to practice in research articles recommendation

Author: Khadka Anita
Knoth Petr
Publication venue
Publication date: 01/01/2017
Field of study

In this paper, we build on the idea of Citation Proximity Analysis (CPA), originally introduced in [1], by developing a step by step scalable approach for building CPA-based recommender systems. As part of this approach, we introduce three new proximity functions, extending the basic assumption of co-citation analysis (stating that the more often two articles are co-cited in a document, the more likely they are related) to take the distance between the co-cited documents into account. Ask- ing the question of whether CPA can outperform co-citation analysis in recommender systems, we have built a CPA based recommender system from a corpus of 368,385 full-texts articles and conducted a user survey to perform an initial evaluation. Two of our three proximity functions used within CPA outperform co-citations on our evaluation dataset

Open Research Online (The Open University)