19,930 research outputs found

    Lucene4IR: Developing information retrieval evaluation resources using Lucene

    Get PDF
    The workshop and hackathon on developing Information Retrieval Evaluation Resources using Lucene (L4IR) was held on the 8th and 9th of September, 2016 at the University of Strathclyde in Glasgow, UK and funded by the ESF Elias Network. The event featured three main elements: (i) a series of keynote and invited talks on industry, teaching and evaluation; (ii) planning, coding and hacking where a number of groups created modules and infrastructure to use Lucene to undertake TREC based evaluations; and (iii) a number of breakout groups discussing challenges, opportunities and problems in bridging the divide between academia and industry, and how we can use Lucene for teaching and learning Information Retrieval (IR). The event was composed of a mix and blend of academics, experts and students wanting to learn, share and create evaluation resources for the community. The hacking was intense and the discussions lively creating the basis of many useful tools but also raising numerous issues. It was clear that by adopting and contributing to most widely used and supported Open Source IR toolkit, there were many benefits for academics, students, researchers, developers and practitioners - providing a basis for stronger evaluation practices, increased reproducibility, more efficient knowledge transfer, greater collaboration between academia and industry, and shared teaching and training resources

    BlogForever: D2.5 Weblog Spam Filtering Report and Associated Methodology

    Get PDF
    This report is written as a first attempt to define the BlogForever spam detection strategy. It comprises a survey of weblog spam technology and approaches to their detection. While the report was written to help identify possible approaches to spam detection as a component within the BlogForver software, the discussion has been extended to include observations related to the historical, social and practical value of spam, and proposals of other ways of dealing with spam within the repository without necessarily removing them. It contains a general overview of spam types, ready-made anti-spam APIs available for weblogs, possible methods that have been suggested for preventing the introduction of spam into a blog, and research related to spam focusing on those that appear in the weblog context, concluding in a proposal for a spam detection workflow that might form the basis for the spam detection component of the BlogForever software
    • 

    corecore