410 research outputs found

    Lucene4IR: Developing information retrieval evaluation resources using Lucene

    Get PDF
    The workshop and hackathon on developing Information Retrieval Evaluation Resources using Lucene (L4IR) was held on the 8th and 9th of September, 2016 at the University of Strathclyde in Glasgow, UK and funded by the ESF Elias Network. The event featured three main elements: (i) a series of keynote and invited talks on industry, teaching and evaluation; (ii) planning, coding and hacking where a number of groups created modules and infrastructure to use Lucene to undertake TREC based evaluations; and (iii) a number of breakout groups discussing challenges, opportunities and problems in bridging the divide between academia and industry, and how we can use Lucene for teaching and learning Information Retrieval (IR). The event was composed of a mix and blend of academics, experts and students wanting to learn, share and create evaluation resources for the community. The hacking was intense and the discussions lively creating the basis of many useful tools but also raising numerous issues. It was clear that by adopting and contributing to most widely used and supported Open Source IR toolkit, there were many benefits for academics, students, researchers, developers and practitioners - providing a basis for stronger evaluation practices, increased reproducibility, more efficient knowledge transfer, greater collaboration between academia and industry, and shared teaching and training resources

    A Fast Content-Based Image Retrieval Method Using Deep Visual Features

    Full text link
    Fast and scalable Content-Based Image Retrieval using visual features is required for document analysis, Medical image analysis, etc. in the present age. Convolutional Neural Network (CNN) activations as features achieved their outstanding performance in this area. Deep Convolutional representations using the softmax function in the output layer are also ones among visual features. However, almost all the image retrieval systems hold their index of visual features on main memory in order to high responsiveness, limiting their applicability for big data applications. In this paper, we propose a fast calculation method of cosine similarity with L2 norm indexed in advance on Elasticsearch. We evaluate our approach with ImageNet Dataset and VGG-16 pre-trained model. The evaluation results show the effectiveness and efficiency of our proposed method.Comment: accepted in ICDAR-WML: The 2nd International Workshop on Machine Learning 201

    Optimization of the search engine ElasticSearch

    Get PDF
    This thesis will present the work done in the Search on Demand team at Orange. It will present the optimization of the search engine Elasticsearch, the ways to bring data into it with the mean of an ETL and how relevance can be tuned using Lucene's inverted indices

    An automated system to search, track, classify and report sensitive information exposed on an intranet

    Get PDF
    Tese de mestrado em Segurança Informática, Universidade de Lisboa, Faculdade de Ciências, 2015Through time, enterprises have been focusing their main attentions towards cyber attacks against their infrastructures derived from the outside and so they end, somehow, underrating the existing dangers on their internal network. This leads to a low importance given to the information available to every employee connected to the internal network, may it be of a sensitive nature and most likely should not be available to everyone’s access. Currently, the detection of documents with sensitive or confidential information unduly exposed on PTP’s (Portugal Telecom Portugal) internal network is a rather time consuming manual process. This project’s contribution is Hound, an automated system that searches for documents, exposed to all employees, with possible sensitive content and classifies them according to its degree of sensitivity, generating reports with that gathered information. This system was integrated in a PT project of larger dimensions, in order to provide DCY (Cybersecurity Department) with mechanisms to improve its effectiveness on the vulnerability detection area, in terms of exposure of files/documents with sensitive or confidential information in its internal network.Ao longo do tempo, as empresas têm vindo a focar as suas principais atenções para os ataques contra as suas infraestruturas provenientes do exterior acabando por, de certa forma, menosprezar os perigos existentes no interior da sua rede. Isto leva a que não dêem a devida importância à informação que está disponível para todos os funcionários na rede interna, podendo a mesma ser de caráter sensível e que muito provavelmente não deveria estar disponível para o acesso de todos. Atualmente, a deteção de ficheiros com informação sensível ou confidencial indevidamente expostos na rede interna da PTP (Portugal Telecom Portugal) é um processo manual bastante moroso. A contribuição deste projeto é o Hound, um sistema automatizado que procura documentos, expostos aos colaboradores, com conteúdo potencialmente sensível. Estes documentos são classificados de acordo com o seu grau de sensibilidade, gerando relatórios com a informação obtida. Este sistema foi integrado num projeto de maiores dimensões da PT de forma a dotar o Departamento de Cibersegurança dos mecanismos necessários a melhorar a sua eficácia nas áreas de deteção de vulnerabilidades, em termos de exposição de ficheiros/documentos com informação sensível ou confidencial na sua rede interna

    Implementing Semantic Search to a Case Management System

    Get PDF
    The amount of information in today’s information society is immense, which creates a need for intuitive and effective search functionalities and applications. In addition to openly available search applications, organizations need internal search functionalities for optimizing their information management. This thesis provides an implementation suggestion for JoutseNet semantic search application. JoutseNet is a case management system used by the authorities and the employees of the city of Turku. Thesis begins by introducing some relevant fundamentals of natural language processing and search engines. Literature review is utilized to find semantic search implementation methods from previous research papers. Case JoutseNet is introduced with some background information on the case management process and with a brief user research and examination on the current state of the system. Learnings from the fundamental guidelines and conducted research are combined to implement the search application. After the implementation documentation, guidelines for optimizing and testing the application are given. The value and performance of the implementation is yet to be determined because the production data of the JoutseNet system could not be used for research purposes. A comprehensive suggestion is provided, but further research and development is still needed before delivering it to the production environment

    Enabling European archaeological research: The ARIADNE E-infrastructure

    Get PDF
    Research e-infrastructures, digital archives and data services have become important pillars of scientific enterprise that in recent decades has become ever more collaborative, distributed and data-intensive. The archaeological research community has been an early adopter of digital tools for data acquisition, organisation, analysis and presentation of research results of individual projects. However, the provision of einfrastructure and services for data sharing, discovery, access and re-use has lagged behind. This situation is being addressed by ARIADNE: the Advanced Research Infrastructure for Archaeological Dataset Networking in Europe. This EUfunded network has developed an einfrastructure that enables data providers to register and provide access to their resources (datasets, collections) through the ARIADNE data portal, facilitating discovery, access and other services across the integrated resources. This article describes the current landscape of data repositories and services for archaeologists in Europe, and the issues that make interoperability between them difficult to realise. The results of the ARIADNE surveys on users' expectations and requirements are also presented. The main section of the article describes the architecture of the einfrastructure, core services (data registration, discovery and access) and various other extant or experimental services. The ongoing evaluation of the data integration and services is also discussed. Finally, the article summarises lessons learned, and outlines the prospects for the wider engagement of the archaeological research community in sharing data through ARIADNE
    • …
    corecore