248 research outputs found

    Large-scale interactive exploratory visual search

    Get PDF
    Large scale visual search has been one of the challenging issues in the era of big data. It demands techniques that are not only highly effective and efficient but also allow users conveniently express their information needs and refine their intents. In this thesis, we focus on developing an exploratory framework for large scale visual search. We also develop a number of enabling techniques in this thesis, including compact visual content representation for scalable search, near duplicate video shot detection, and action based event detection. We propose a novel scheme for extremely low bit rate visual search, which sends compressed visual words consisting of vocabulary tree histogram and descriptor orientations rather than descriptors. Compact representation of video data is achieved through identifying keyframes of a video which can also help users comprehend visual content efficiently. We propose a novel Bag-of-Importance model for static video summarization. Near duplicate detection is one of the key issues for large scale visual search, since there exist a large number nearly identical images and videos. We propose an improved near-duplicate video shot detection approach for more effective shot representation. Event detection has been one of the solutions for bridging the semantic gap in visual search. We particular focus on human action centred event detection. We propose an enhanced sparse coding scheme to model human actions. Our proposed approach is able to significantly reduce computational cost while achieving recognition accuracy highly comparable to the state-of-the-art methods. At last, we propose an integrated solution for addressing the prime challenges raised from large-scale interactive visual search. The proposed system is also one of the first attempts for exploratory visual search. It provides users more robust results to satisfy their exploring experiences

    Matchmakers or tastemakers? Platformization of cultural intermediation & social media’s engines for ‘making up taste’

    Get PDF
    There are long-standing practices and processes that have traditionally mediated between the processes of production and consumption of cultural content. The prominent instances of these are: curating content by identifying and selecting cultural content in order to promote to a particular set of audiences; measuring audience behaviours to construct knowledge about their tastes; and guiding audiences through recommendations from cultural experts. These cultural intermediation processes are currently being transformed, and social media platforms play important roles in this transformation. However, their role is often attributed to the work of users and/or recommendation algorithms. Thus, the processes through which data about users’ taste are aggregated and made ready for algorithmic processing are largely neglected. This study takes this problematic as an important gap in our understanding of social media platforms’ role in the transformation of cultural intermediation. To address this gap, the notion of platformization is used as a theoretical lens to examine the role of users and algorithms as part of social media’s distinct data-based sociotechnical configuration, which is built on the so-called ‘platform-logic’. Based on a set of conceptual ideas and the findings derived through a single case study on a music discovery platform, this thesis developed a framework to explain ‘platformization of cultural intermediation’. This framework outlines how curation, guidance, and measurement processes are ‘plat-formed’ in the course of development and optimisation of a social media platform. This is the main contribution of the thesis. The study also contributes to the literature by developing the concept of social media’s engines for ‘making up taste’. This concept illuminates how social media operate as sociotechnical cultural intermediaries and participates in tastemaking in ways that acquire legitimacy from the long-standing trust in the objectivity of classification, quantification, and measurement processes

    The Future of Information Sciences : INFuture2009 : Digital Resources and Knowledge Sharing

    Get PDF

    Defending Face-Recognition Technology (And Defending Against It)

    Get PDF
    This Article looks beneath the surface of attacks on face-recognition technology and explains how it can be an exceptionally useful tool for law enforcement, complementing traditional forensic evidence such as fingerprints and DNA. It punctures myths about the technology and explains how existing rules of criminal procedure, developed for other kinds of forensic evidence, are readily adaptable to face-recognition. It opposes across-the-board restrictions on use of face-recognition technologies and advocates a more sophisticated set of guarantees of defendant access to the information necessary to probe reliability of computerized face-matches. Defendants must have reasonable access to the details of the technology and how it was used so that they have a meaningful opportunity to inform the factfinder of doubts about reliability. Part II explains the technology, starting with machine learning, which enables a computer to represent faces digitally based on their physical characteristics, so that they can be matched with other faces. This part also explains how shortcomings in the algorithms or training database of faces can produce errors, both positive and negative, in identification. Part III explores existing and potential uses of face-recognition in law enforcement, placing the technology into the context of traditional police investigations. Part IV summarizes the relatively sparse caselaw and the much fuller literature on face-recognition technology, in particular, evaluates claims of threats to privacy, and analyzes legal principles developed for analogous conventional criminal investigative and proof methods. Part V constructs a legal framework for evaluating the probativeness of face-recognition technology in criminal prosecutions, develops strategies, and offers actual cross examination questions to guide defense counsel in challenging face-recognition technology. Part VI acknowledges that some specific uses of the technology to scan crowds and streams of people may need judicial control and suggests a draft statute to assure such control

    Methods for improving entity linking and exploiting social media messages across crises

    Get PDF
    Entity Linking (EL) is the task of automatically identifying entity mentions in texts and resolving them to a corresponding entity in a reference knowledge base (KB). There is a large number of tools available for different types of documents and domains, however the literature in entity linking has shown the quality of a tool varies across different corpus and depends on specific characteristics of the corpus it is applied to. Moreover the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real world applications. In the first part of this thesis I explore an approximation of the difficulty to link entity mentions and frame it as a supervised classification task. Classifying difficult to disambiguate entity mentions can facilitate identifying critical cases as part of a semi-automated system, while detecting latent corpus characteristics that affect the entity linking performance. Moreover, despiteless the large number of entity linking tools that have been proposed throughout the past years, some tools work better on short mentions while others perform better when there is more contextual information. To this end, I proposed a solution by exploiting results from distinct entity linking tools on the same corpus by leveraging their individual strengths on a per-mention basis. The proposed solution demonstrated to be effective and outperformed the individual entity systems employed in a series of experiments. An important component in the majority of the entity linking tools is the probability that a mentions links to one entity in a reference knowledge base, and the computation of this probability is usually done over a static snapshot of a reference KB. However, an entity’s popularity is temporally sensitive and may change due to short term events. Moreover, these changes might be then reflected in a KB and EL tools can produce different results for a given mention at different times. I investigated the prior probability change over time and the overall disambiguation performance using different KB from different time periods. The second part of this thesis is mainly concerned with short texts. Social media has become an integral part of the modern society. Twitter, for instance, is one of the most popular social media platforms around the world that enables people to share their opinions and post short messages about any subject on a daily basis. At first I presented one approach to identifying informative messages during catastrophic events using deep learning techniques. By automatically detecting informative messages posted by users during major events, it can enable professionals involved in crisis management to better estimate damages with only relevant information posted on social media channels, as well as to act immediately. Moreover I have also performed an analysis study on Twitter messages posted during the Covid-19 pandemic. Initially I collected 4 million tweets posted in Portuguese since the begining of the pandemic and provided an analysis of the debate aroud the pandemic. I used topic modeling, sentiment analysis and hashtags recomendation techniques to provide isights around the online discussion of the Covid-19 pandemic

    Modeling Non-Standard Text Classification Tasks

    Get PDF
    Text classification deals with discovering knowledge in texts and is used for extracting, filtering, or retrieving information in streams and collections. The discovery of knowledge is operationalized by modeling text classification tasks, which is mainly a human-driven engineering process. The outcome of this process, a text classification model, is used to inductively learn a text classification solution from a priori classified examples. The building blocks of modeling text classification tasks cover four aspects: (1) the way examples are represented, (2) the way examples are selected, (3) the way classifiers learn from examples, and (4) the way models are selected. This thesis proposes methods that improve the prediction quality of text classification solutions for unseen examples, especially for non-standard tasks where standard models do not fit. The original contributions are related to the aforementioned building blocks: (1) Several topic-orthogonal text representations are studied in the context of non-standard tasks and a new representation, namely co-stems, is introduced. (2) A new active learning strategy that goes beyond standard sampling is examined. (3) A new one-class ensemble for improving the effectiveness of one-class classification is proposed. (4) A new model selection framework to cope with subclass distribution shifts that occur in dynamic environments is introduced

    Enhancing disaster situational awareness through scalable curation of social media

    Get PDF
    Online social media is today used during humanitarian disasters by victims, responders, journalists and others, to publicly exchange accounts of ongoing events, requests for help, aggregate reports, reflections and commentary. In many cases, incident reports become available on social media before being picked up by traditional information channels, and often include rich evidence such as photos and video recordings. However, individual messages are sparse in content and message inflow rates can reach hundreds of thousands of items per hour during large scale events. Current information management methods struggle to make sense of this vast body of knowledge, due to limitations in terms of accuracy and scalability of processing, summarization capabilities, organizational acceptance and even basic understanding of users’ needs. If solutions to these problems can be found, social media can be mined to offer disaster responders unprecedented levels of situational awareness. This thesis provides a first comprehensive overview of humanitarian disaster stakeholders and their information needs, against which the utility of the proposed and future information management solutions can be assessed. The research then shows how automated online textclustering techniques can provide report de-duplication, timely event detection, ranking and summarization of content in rapid social media streams. To identify and filter out reports that correspond to the information needs of specific stakeholders, crowdsourced information extraction is combined with supervised classification techniques to generalize human annotation behaviour and scale up processing capacity several orders of magnitude. These hybrid processing techniques are implemented in CrisisTracker, a novel software tool, and evaluated through deployment in a large-scale multi-language disaster information management setting. Evaluation shows that the proposed techniques can effectively make social media an accessible complement to currently relied-on information collection methods, which enables disaster analysts to detect and comprehend unfolding events more quickly, deeply and with greater coverage.Actualmente, m´ıdias sociais s˜ao utilizadas em crises humanit´arias por v´ıtimas, apoios de emergˆencia, jornalistas e outros, para partilhar publicamente eventos, pedidos ajuda, relat´orios, reflex˜oes e coment´arios. Frequentemente, relat´orios de incidentes est˜ao dispon´ıveis nestes servic¸o muito antes de estarem dispon´ıveis nos canais de informac¸˜ao comuns e incluem recursos adicionais, tais como fotografia e video. No entanto, mensagens individuais s˜ao escassas em conteu´do e o fluxo destas pode chegar aos milhares de unidades por hora durante grandes eventos. Actualmente, sistemas de gest˜ao de informac¸˜ao s˜ao ineficientes, em grande parte devido a limita¸c˜oes em termos de rigor e escalabilidade de processamento, sintetiza¸c˜ao, aceitac¸˜ao organizacional ou simplesmente falta de compreens˜ao das necessidades dos utilizadores. Se existissem solu¸c˜oes eficientes para extrair informa¸c˜ao de m´ıdias sociais em tempos de crise, apoios de emergˆencia teriam acesso a informac¸˜ao rigorosa, resultando em respostas mais eficientes. Esta tese cont´em a primeira lista exaustiva de parte interessada em ajuda humanit´aria e suas necessidades de informa¸c˜ao, v´alida para a utilizac¸˜ao do sistema proposto e futuras soluc¸˜oes. A investiga¸c˜ao nesta tese demonstra que sistemas de aglomera¸c˜ao de texto autom´atico podem remover redundˆancia de termos; detectar eventos; ordenar por relevˆancia e sintetizar conteu´do dinˆamico de m´ıdias sociais. Para identificar e filtrar relat´orios relevantes para diversos parte interessada, algoritmos de inteligˆencia artificial s˜ao utilizados para generalizar anotac¸˜oes criadas por utilizadores e automatizar consideravelmente o processamento. Esta solu¸c˜ao inovadora, CrisisTracker, foi testada em situa¸c˜oes de grande escala, em diversas l´ınguas, para gest˜ao de informa¸c˜ao em casos de crise humanit´aria. Os resultados demonstram que os m´etodos propostos podem efectivamente tornar a informa¸c˜ao de m´ıdias sociais acess´ıvel e complementam os m´etodos actuais utilizados para gest˜ao de informa¸c˜ao por analistas de crises, para detectar e compreender eventos eficientemente, com maior detalhe e cobertura
    corecore