106,978 research outputs found

    Semantic-driven matchmaking of web services using case-based reasoning

    Get PDF
    With the rapid proliferation of Web services as the medium of choice to securely publish application services beyond the firewall, the importance of accurate, yet flexible matchmaking of similar services gains importance both for the human user and for dynamic composition engines. In this paper, we present a novel approach that utilizes the case based reasoning methodology for modelling dynamic Web service discovery and matchmaking. Our framework considers Web services execution experiences in the decision making process and is highly adaptable to the service requester constraints. The framework also utilises OWL semantic descriptions extensively for implementing both the components of the CBR engine and the matchmaking profile of the Web services

    Bad news: analysis of the quality of information on influenza prevention returned by Google in English and Italian

    Get PDF
    Information available to the public influences the approach of the population toward vaccination against influenza compared with other preventative approaches. In this study, we have analyzed the first 200 websites returned by searching Google on two topics (prevention of influenza and influenza vaccine), in English and Italian. For all the four searches above, websites were classified according to their typology (government, commercial, professional, portals, etc.) and for their trustworthiness as defined by the Journal of the American Medical Association (JAMA) score, which assesses whether they provide some basic elements of information quality (IQ): authorship, currency, disclosure, and references. The type of information described was also assessed to add another dimension of IQ. Websites on influenza prevention were classified according to the type of preventative approach mentioned (vaccine, lifestyle, hygiene, complementary medicine, etc.), whether the approaches were in agreement with evidence-based medicine (EBM) or not. Websites on influenza vaccination were classified as pro- or anti-vaccine, or neutral. The great majority of websites described EBM approaches to influenza prevention and had a pro-vaccine orientation. Government websites mainly pointed at EBM preventative approaches and had a pro-vaccine orientation, while there was a higher proportion of commercial websites among those which promote non-EBM approaches. Although the JAMA score was lower in commercial websites, it did not correlate with the preventative approaches suggested or the orientation toward vaccines. For each of the four search engine result pages (SERP), only one website displayed the health-of-the-net (HON) seal. In the SERP on vaccines, journalistic websites were the most abundant category and ranked higher than average in both languages. Analysis using natural language processing showed that journalistic websites were mostly reporting news about two specific topics (different in the two languages). While the ranking by Google favors EBM approaches and, in English, does not promote commercial websites, in both languages it gives a great advantage to news. Thus, the type of news published during the influenza season probably has a key importance in orienting the public opinion due to its high visibility. This raises important questions on the relationships between health IQ, trustworthiness, and newsworthiness

    FLASH: Randomized Algorithms Accelerated over CPU-GPU for Ultra-High Dimensional Similarity Search

    Full text link
    We present FLASH (\textbf{F}ast \textbf{L}SH \textbf{A}lgorithm for \textbf{S}imilarity search accelerated with \textbf{H}PC), a similarity search system for ultra-high dimensional datasets on a single machine, that does not require similarity computations and is tailored for high-performance computing platforms. By leveraging a LSH style randomized indexing procedure and combining it with several principled techniques, such as reservoir sampling, recent advances in one-pass minwise hashing, and count based estimations, we reduce the computational and parallelization costs of similarity search, while retaining sound theoretical guarantees. We evaluate FLASH on several real, high-dimensional datasets from different domains, including text, malicious URL, click-through prediction, social networks, etc. Our experiments shed new light on the difficulties associated with datasets having several million dimensions. Current state-of-the-art implementations either fail on the presented scale or are orders of magnitude slower than FLASH. FLASH is capable of computing an approximate k-NN graph, from scratch, over the full webspam dataset (1.3 billion nonzeros) in less than 10 seconds. Computing a full k-NN graph in less than 10 seconds on the webspam dataset, using brute-force (n2Dn^2D), will require at least 20 teraflops. We provide CPU and GPU implementations of FLASH for replicability of our results

    Please, talk about it! When hotel popularity boosts preferences

    Get PDF
    Many consumers post on-line reviews, affecting the average evaluation of products and services. Yet, little is known about the importance of the number of reviews for consumer decision making. We conducted an on-line experiment (n= 168) to assess the joint impact of the average evaluation, a measure of quality, and the number of reviews, a measure of popularity, on hotel preference. The results show that consumers' preference increases with the number of reviews, independently of the average evaluation being high or low. This is not what one would expect from an informational point of view, and review websites fail to take this pattern into account. This novel result is mediated by demographics: young people, and in particular young males, are less affected by popularity, relying more on quality. We suggest the adoption of appropriate ranking mechanisms to fit consumer preferences. © 2014 Elsevier Ltd

    Living Knowledge

    Get PDF
    Diversity, especially manifested in language and knowledge, is a function of local goals, needs, competences, beliefs, culture, opinions and personal experience. The Living Knowledge project considers diversity as an asset rather than a problem. With the project, foundational ideas emerged from the synergic contribution of different disciplines, methodologies (with which many partners were previously unfamiliar) and technologies flowed in concrete diversity-aware applications such as the Future Predictor and the Media Content Analyser providing users with better structured information while coping with Web scale complexities. The key notions of diversity, fact, opinion and bias have been defined in relation to three methodologies: Media Content Analysis (MCA) which operates from a social sciences perspective; Multimodal Genre Analysis (MGA) which operates from a semiotic perspective and Facet Analysis (FA) which operates from a knowledge representation and organization perspective. A conceptual architecture that pulls all of them together has become the core of the tools for automatic extraction and the way they interact. In particular, the conceptual architecture has been implemented with the Media Content Analyser application. The scientific and technological results obtained are described in the following

    Improving Entity Retrieval on Structured Data

    Full text link
    The increasing amount of data on the Web, in particular of Linked Data, has led to a diverse landscape of datasets, which make entity retrieval a challenging task. Explicit cross-dataset links, for instance to indicate co-references or related entities can significantly improve entity retrieval. However, only a small fraction of entities are interlinked through explicit statements. In this paper, we propose a two-fold entity retrieval approach. In a first, offline preprocessing step, we cluster entities based on the \emph{x--means} and \emph{spectral} clustering algorithms. In the second step, we propose an optimized retrieval model which takes advantage of our precomputed clusters. For a given set of entities retrieved by the BM25F retrieval approach and a given user query, we further expand the result set with relevant entities by considering features of the queries, entities and the precomputed clusters. Finally, we re-rank the expanded result set with respect to the relevance to the query. We perform a thorough experimental evaluation on the Billions Triple Challenge (BTC12) dataset. The proposed approach shows significant improvements compared to the baseline and state of the art approaches
    • …
    corecore