1,359 research outputs found

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    #Healthy: smart digital food safety and nutrition communication strategies—a critical commentary

    Get PDF
    This paper explores how food safety and nutrition organisations can harness the power of search engines, games, apps, social media, and digital analytics tools to craft broad-reaching and engaging digital communications. We start with search engines, showing how organisations can identify popular food safety and nutrition queries, facilitating the creation of timely and in-demand content. To ensure this content is discoverable by search engines, we cover several non-technical aspects of search engine optimisation (SEO). We next explore the potential of games, apps, social media, and going viral for reaching and engaging the public, and how digital data-based tools can be used to optimise communications. Throughout, we draw on examples not only from Europe and North America, but also China. While we are enthusiastic about the benefits of digital communications, we recognise that they are not without their drawbacks and challenges. To help organisations evaluate whether a given digital approach is appropriate for their objectives, we end each section with a discussion of limitations. We conclude with a discussion of General Data Protection Regulation (GDPR) and the practical, philosophical, and policy challenges associated with communicating food safety and nutrition information digitally

    Zero-Shot Relation Extraction via Reading Comprehension

    Full text link
    We show that relation extraction can be reduced to answering simple reading comprehension questions, by associating one or more natural-language questions with each relation slot. This reduction has several advantages: we can (1) learn relation-extraction models by extending recent neural reading-comprehension techniques, (2) build very large training sets for those models by combining relation-specific crowd-sourced questions with distant supervision, and even (3) do zero-shot learning by extracting new relation types that are only specified at test-time, for which we have no labeled training examples. Experiments on a Wikipedia slot-filling task demonstrate that the approach can generalize to new questions for known relation types with high accuracy, and that zero-shot generalization to unseen relation types is possible, at lower accuracy levels, setting the bar for future work on this task.Comment: CoNLL 201

    Wikidata and knowledge graphs in practice: using semantic SEO to create discoverable, accessible, machine-readable definitions of the people, places, and services in libraries and archives

    Get PDF
    Libraries expand the access and visibility of data and research in support of an informed public. Search engines have limited knowledge of the dynamic nature of libraries - their people, their services, and their resources. The very definition of libraries in online environments is outdated and misleading. This article offers a solution to this metadata problem by redefining libraries for Machine Learning environments and search engines. Two ways to approach this problem include implementing local structured data in a knowledge graph model and “inside-out” definitions in Semantic Web endpoints. MSU Library has found that implementing a “Knowledge Graph” linked data model leads to improved discovery and interpretation by the bots and search engines that index and describe what libraries are, what they do, and their scholarly content. In contrast, LSE Library has found that contributing to Wikidata, a collaborative and global metadata source, can increase understanding of libraries and extend their reach and engagement. This article demonstrates that Wikidata can be used to push out data, the technical details of knowledge graph markup, and the practice of semantic Search Engine Optimization (SEO). It explores how metadata can represent an organization equitably and how this improves the reach of global information communities

    Evaluation of the quality of Alexa’s metrics

    Get PDF
    Alexa is a tool that can easily be confused by name with the voice device that Amazon proposes, but in reality, it is a web traffic tool. Very little is known about how it functions and where it gets data from. With so little information available, how is it possible to know whether the tool is of good value or not. The ability to compare Alexa with other tools such as Google Analytics gives insight into the quality of metrics and makes it possible to judge its transparency, reliability, trustworthiness and flexibility. To achieve this a state of the art on the subject was held, portraying elements relative to the metrics, the tools and the methods, this gave a direction in which to take the study. This lead the way to a much more practical side of the project, actually dealing with and assessing data. With a call being sent out to multiple networks, a sample of 10 websites was created, they all varied greatly but they also held important information that would help answer the research questions. A strict work methodology was undertaken to ensure the data would not be tainted and that it remained usable in order to facilitate the analysis of the data, it also ensured no backtracking would be necessary. The findings were not as striking as expected, as some results were more similar than originally predicted, although the correlation between the numbers was very low. Hardly any websites from the sample presented results that were constantly similar, albeit one, there was also one metric that would have data that bore no resemblance between the different tools. In addition to the results emitted by the data and charts numerous limitations attached to the tools were identified and it was obvious that they added challenges into giving conclusive results. Even though Alexa presents itself to be a useful tool to the everyday individual it does have quite a few limitations that a more consequent tool does not possess. There are evidently also improvements to be made when it comes to the standardization of such tools in order to make their use easier for all. Not all the results found in this study were conclusive but the door is open for a more in-depth project that would answer the additional questions that came up

    Automatically Assessing the Need of Additional Citations for Information Quality Verification in Wikipedia Articles

    Get PDF
    Quality flaws prediction in Wikipedia is an ongoing research trend. In particular, in this work we tackle the problem of automatically assessing the need of including additional citations for contributing to verify the articles’ content; the so-called Refimprove quality flaw. This information quality flaw, ranks among the five most frequent flaws and represents 12.4% of the flawed articles in the English Wikipedia. Underbagged decision trees, biased-SVM, and centroid-based balanced SVM –three different state-of-the-art approaches– were evaluated, with the aim of handling the existing imbalances between the number of articles’ tagged as flawed content, and the remaining untagged documents that exist in Wikipedia, which can help in the learning stage of the algorithms. Also, a uniformly sampled balanced SVM classifier was evaluated as a baseline. The results showed that under-bagged decision trees with the min rule as aggregation method, perform best achieving an F1 score of 0.96 on the test corpus from the 1st International Competition on Quality Flaw Prediction in Wikipedia; a well-known uniform evaluation corpus from this research field. Likewise, biased-SVM also achieved an F1 score that outperform previously published results.II Track de Gobierno Digital y Ciudades Inteligentes.Red de Universidades con Carreras en Informátic

    Damage Detection and Mitigation in Open Collaboration Applications

    Get PDF
    Collaborative functionality is changing the way information is amassed, refined, and disseminated in online environments. A subclass of these systems characterized by open collaboration uniquely allow participants to *modify* content with low barriers-to-entry. A prominent example and our case study, English Wikipedia, exemplifies the vulnerabilities: 7%+ of its edits are blatantly unconstructive. Our measurement studies show this damage manifests in novel socio-technical forms, limiting the effectiveness of computational detection strategies from related domains. In turn this has made much mitigation the responsibility of a poorly organized and ill-routed human workforce. We aim to improve all facets of this incident response workflow. Complementing language based solutions we first develop content agnostic predictors of damage. We implicitly glean reputations for system entities and overcome sparse behavioral histories with a spatial reputation model combining evidence from multiple granularity. We also identify simple yet indicative metadata features that capture participatory dynamics and content maturation. When brought to bear over damage corpora our contributions: (1) advance benchmarks over a broad set of security issues ( vandalism ), (2) perform well in the first anti-spam specific approach, and (3) demonstrate their portability over diverse open collaboration use cases. Probabilities generated by our classifiers can also intelligently route human assets using prioritization schemes optimized for capture rate or impact minimization. Organizational primitives are introduced that improve workforce efficiency. The whole of these strategies are then implemented into a tool ( STiki ) that has been used to revert 350,000+ damaging instances from Wikipedia. These uses are analyzed to learn about human aspects of the edit review process, properties including scalability, motivation, and latency. Finally, we conclude by measuring practical impacts of work, discussing how to better integrate our solutions, and revealing outstanding vulnerabilities that speak to research challenges for open collaboration security
    • …
    corecore