148,329 research outputs found
Relating Web pages to enable information-gathering tasks
We argue that relationships between Web pages are functions of the user's
intent. We identify a class of Web tasks - information-gathering - that can be
facilitated by a search engine that provides links to pages which are related
to the page the user is currently viewing. We define three kinds of intentional
relationships that correspond to whether the user is a) seeking sources of
information, b) reading pages which provide information, or c) surfing through
pages as part of an extended information-gathering process. We show that these
three relationships can be productively mined using a combination of textual
and link information and provide three scoring mechanisms that correspond to
them: {\em SeekRel}, {\em FactRel} and {\em SurfRel}. These scoring mechanisms
incorporate both textual and link information. We build a set of capacitated
subnetworks - each corresponding to a particular keyword - that mirror the
interconnection structure of the World Wide Web. The scores are computed by
computing flows on these subnetworks. The capacities of the links are derived
from the {\em hub} and {\em authority} values of the nodes they connect,
following the work of Kleinberg (1998) on assigning authority to pages in
hyperlinked environments. We evaluated our scoring mechanism by running
experiments on four data sets taken from the Web. We present user evaluations
of the relevance of the top results returned by our scoring mechanisms and
compare those to the top results returned by Google's Similar Pages feature,
and the {\em Companion} algorithm proposed by Dean and Henzinger (1999).Comment: In Proceedings of ACM Hypertext 200
Role of Ranking Algorithms for Information Retrieval
As the use of web is increasing more day by day, the web users get easily
lost in the web's rich hyper structure. The main aim of the owner of the
website is to give the relevant information according their needs to the users.
We explained the Web mining is used to categorize users and pages by analyzing
user's behavior, the content of pages and then describe Web Structure mining.
This paper includes different Page Ranking algorithms and compares those
algorithms used for Information Retrieval. Different Page Rank based algorithms
like Page Rank (PR), WPR (Weighted Page Rank), HITS (Hyperlink Induced Topic
Selection), Distance Rank and EigenRumor algorithms are discussed and compared.
Simulation Interface has been designed for PageRank algorithm and Weighted
PageRank algorithm but PageRank is the only ranking algorithm on which Google
search engine works.Comment: Keywords: Page Rank, Web Mining, Web Structured Mining, Web Content
Minin
Mining large-scale human mobility data for long-term crime prediction
Traditional crime prediction models based on census data are limited, as they
fail to capture the complexity and dynamics of human activity. With the rise of
ubiquitous computing, there is the opportunity to improve such models with data
that make for better proxies of human presence in cities. In this paper, we
leverage large human mobility data to craft an extensive set of features for
crime prediction, as informed by theories in criminology and urban studies. We
employ averaging and boosting ensemble techniques from machine learning, to
investigate their power in predicting yearly counts for different types of
crimes occurring in New York City at census tract level. Our study shows that
spatial and spatio-temporal features derived from Foursquare venues and
checkins, subway rides, and taxi rides, improve the baseline models relying on
census and POI data. The proposed models achieve absolute R^2 metrics of up to
65% (on a geographical out-of-sample test set) and up to 89% (on a temporal
out-of-sample test set). This proves that, next to the residential population
of an area, the ambient population there is strongly predictive of the area's
crime levels. We deep-dive into the main crime categories, and find that the
predictive gain of the human dynamics features varies across crime types: such
features bring the biggest boost in case of grand larcenies, whereas assaults
are already well predicted by the census features. Furthermore, we identify and
discuss top predictive features for the main crime categories. These results
offer valuable insights for those responsible for urban policy or law
enforcement
Sequential quasi-Monte Carlo: Introduction for Non-Experts, Dimension Reduction, Application to Partly Observed Diffusion Processes
SMC (Sequential Monte Carlo) is a class of Monte Carlo algorithms for
filtering and related sequential problems. Gerber and Chopin (2015) introduced
SQMC (Sequential quasi-Monte Carlo), a QMC version of SMC. This paper has two
objectives: (a) to introduce Sequential Monte Carlo to the QMC community, whose
members are usually less familiar with state-space models and particle
filtering; (b) to extend SQMC to the filtering of continuous-time state-space
models, where the latent process is a diffusion. A recurring point in the paper
will be the notion of dimension reduction, that is how to implement SQMC in
such a way that it provides good performance despite the high dimension of the
problem.Comment: To be published in the proceedings of MCMQMC 201
GraphBLAST: A High-Performance Linear Algebra-based Graph Framework on the GPU
High-performance implementations of graph algorithms are challenging to
implement on new parallel hardware such as GPUs because of three challenges:
(1) the difficulty of coming up with graph building blocks, (2) load imbalance
on parallel hardware, and (3) graph problems having low arithmetic intensity.
To address some of these challenges, GraphBLAS is an innovative, on-going
effort by the graph analytics community to propose building blocks based on
sparse linear algebra, which will allow graph algorithms to be expressed in a
performant, succinct, composable and portable manner. In this paper, we examine
the performance challenges of a linear-algebra-based approach to building graph
frameworks and describe new design principles for overcoming these bottlenecks.
Among the new design principles is exploiting input sparsity, which allows
users to write graph algorithms without specifying push and pull direction.
Exploiting output sparsity allows users to tell the backend which values of the
output in a single vectorized computation they do not want computed.
Load-balancing is an important feature for balancing work amongst parallel
workers. We describe the important load-balancing features for handling graphs
with different characteristics. The design principles described in this paper
have been implemented in "GraphBLAST", the first high-performance linear
algebra-based graph framework on NVIDIA GPUs that is open-source. The results
show that on a single GPU, GraphBLAST has on average at least an order of
magnitude speedup over previous GraphBLAS implementations SuiteSparse and GBTL,
comparable performance to the fastest GPU hardwired primitives and
shared-memory graph frameworks Ligra and Gunrock, and better performance than
any other GPU graph framework, while offering a simpler and more concise
programming model.Comment: 50 pages, 14 figures, 14 table
- …