106,978 research outputs found
Semantic-driven matchmaking of web services using case-based reasoning
With the rapid proliferation of Web services as the medium of choice to securely publish application services beyond the firewall, the importance of accurate, yet flexible matchmaking of similar services gains importance both for the human user and for dynamic composition engines. In this paper, we present a novel approach that utilizes the case based reasoning methodology for modelling dynamic Web service discovery and matchmaking. Our framework considers Web services execution experiences in the decision making process and is highly adaptable to the service requester constraints. The framework also utilises OWL semantic descriptions extensively for implementing both the components of the CBR engine and the matchmaking profile of the Web services
Bad news: analysis of the quality of information on influenza prevention returned by Google in English and Italian
Information available to the public influences the approach of the population toward vaccination against influenza compared with other preventative approaches. In this study, we have analyzed the first 200 websites returned by searching Google on two topics (prevention of influenza and influenza vaccine), in English and Italian. For all the four searches above, websites were classified according to their typology (government, commercial, professional, portals, etc.) and for their trustworthiness as defined by the Journal of the American Medical Association (JAMA) score, which assesses whether they provide some basic elements of information quality (IQ): authorship, currency, disclosure, and references. The type of information described was also assessed to add another dimension of IQ. Websites on influenza prevention were classified according to the type of preventative approach mentioned (vaccine, lifestyle, hygiene, complementary medicine, etc.), whether the approaches were in agreement with evidence-based medicine (EBM) or not. Websites on influenza vaccination were classified as pro- or anti-vaccine, or neutral. The great majority of websites described EBM approaches to influenza prevention and had a pro-vaccine orientation. Government websites mainly pointed at EBM preventative approaches and had a pro-vaccine orientation, while there was a higher proportion of commercial websites among those which promote non-EBM approaches. Although the JAMA score was lower in commercial websites, it did not correlate with the preventative approaches suggested or the orientation toward vaccines. For each of the four search engine result pages (SERP), only one website displayed the health-of-the-net (HON) seal. In the SERP on vaccines, journalistic websites were the most abundant category and ranked higher than average in both languages. Analysis using natural language processing showed that journalistic websites were mostly reporting news about two specific topics (different in the two languages). While the ranking by Google favors EBM approaches and, in English, does not promote commercial websites, in both languages it gives a great advantage to news. Thus, the type of news published during the influenza season probably has a key importance in orienting the public opinion due to its high visibility. This raises important questions on the relationships between health IQ, trustworthiness, and newsworthiness
FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search
We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for
\textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search
system for ultra-high dimensional datasets on a single machine, that does not
require similarity computations and is tailored for high-performance computing
platforms. By leveraging a LSH style randomized indexing procedure and
combining it with several principled techniques, such as reservoir sampling,
recent advances in one-pass minwise hashing, and count based estimations, we
reduce the computational and parallelization costs of similarity search, while
retaining sound theoretical guarantees.
We evaluate FLASH on several real, high-dimensional datasets from different
domains, including text, malicious URL, click-through prediction, social
networks, etc. Our experiments shed new light on the difficulties associated
with datasets having several million dimensions. Current state-of-the-art
implementations either fail on the presented scale or are orders of magnitude
slower than FLASH. FLASH is capable of computing an approximate k-NN graph,
from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than
10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam
dataset, using brute-force (), will require at least 20 teraflops. We
provide CPU and GPU implementations of FLASH for replicability of our results
Please, talk about it! When hotel popularity boosts preferences
Many consumers post on-line reviews, affecting the average evaluation of products and services. Yet, little is known about the importance of the number of reviews for consumer decision making. We conducted an on-line experiment (n= 168) to assess the joint impact of the average evaluation, a measure of quality, and the number of reviews, a measure of popularity, on hotel preference. The results show that consumers' preference increases with the number of reviews, independently of the average evaluation being high or low. This is not what one would expect from an informational point of view, and review websites fail to take this pattern into account. This novel result is mediated by demographics: young people, and in particular young males, are less affected by popularity, relying more on quality. We suggest the adoption of appropriate ranking mechanisms to fit consumer preferences. © 2014 Elsevier Ltd
Living Knowledge
Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following
Improving Entity Retrieval on Structured Data
The increasing amount of data on the Web, in particular of Linked Data, has
led to a diverse landscape of datasets, which make entity retrieval a
challenging task. Explicit cross-dataset links, for instance to indicate
co-references or related entities can significantly improve entity retrieval.
However, only a small fraction of entities are interlinked through explicit
statements. In this paper, we propose a two-fold entity retrieval approach. In
a first, offline preprocessing step, we cluster entities based on the
\emph{x--means} and \emph{spectral} clustering algorithms. In the second step,
we propose an optimized retrieval model which takes advantage of our
precomputed clusters. For a given set of entities retrieved by the BM25F
retrieval approach and a given user query, we further expand the result set
with relevant entities by considering features of the queries, entities and the
precomputed clusters. Finally, we re-rank the expanded result set with respect
to the relevance to the query. We perform a thorough experimental evaluation on
the Billions Triple Challenge (BTC12) dataset. The proposed approach shows
significant improvements compared to the baseline and state of the art
approaches
- …