4 research outputs found

    Improved approximations for min sum vertex cover and generalized min sum set cover

    Get PDF
    We study the generalized min sum set cover (GMSSC) problem, wherein given a collection of hyperedges E with arbitrary covering requirements {ke ∈ Z+ : e ∈ E}, the goal is to find an ordering of the vertices to minimize the total cover time of the hyperedges; a hyperedge e is considered covered by the first time when ke many of its vertices appear in the ordering. We give a 4.642 approximation algorithm for GMSSC, coming close to the best possible bound of 4, already for the classical special case (with all ke = 1) of min sum set cover (MSSC) studied by Feige, Lovász and Tetali [11], and improving upon the previous best known bound of 12.4 due to Im, Sviridenko and van der Zwaan [20]. Our algorithm is based on transforming the LP solution by a suitable kernel and applying randomized rounding. This also gives an LP-based 4 approximation for MSSC. As part of the analysis of our algorithm, we also derive an inequality on the lower tail of a sum of independent Bernoulli random variables, which might be of independent interest and broader utility. Another well-known special case is the min sum vertex cover (MSVC) problem, in which the input hypergraph is a graph (i.e., |e| = 2) and ke = 1, for every edge e ∈ E. We give a 16/9 ' 1.778 approximation for MSVC, and show a matching integrality gap for the natural LP relaxation. This improves upon the previous best 1.999946 approximation of Barenholz, Feige and Peleg [6]. (The claimed 1.79 approximation result of Iwata, Tetali and Tripathi [21] for the MSVC turned out have an unfortunate, seemingly unfixable, mistake in it.) Finally, we revisit MSSC and consider the lp norm of cover-time of the hyperedges. Using a dual fitting argument, we show that the natural greedy algorithm simultaneously achieves approximation guarantees of (p + 1)1+1/p, for all p ≥ 1, giving another proof of the result of Golovin, Gupta, Kumar and Tangwongsan [13], and showing its tightness up to NP-hardness. For p = 1, this gives yet another proof of the 4 approximation for MSSC

    Recuperação multimodal e interativa de informação orientada por diversidade

    Get PDF
    Orientador: Ricardo da Silva TorresTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: Os métodos de Recuperação da Informação, especialmente considerando-se dados multimídia, evoluíram para a integração de múltiplas fontes de evidência na análise de relevância de itens em uma tarefa de busca. Neste contexto, para atenuar a distância semântica entre as propriedades de baixo nível extraídas do conteúdo dos objetos digitais e os conceitos semânticos de alto nível (objetos, categorias, etc.) e tornar estes sistemas adaptativos às diferentes necessidades dos usuários, modelos interativos que consideram o usuário mais próximo do processo de recuperação têm sido propostos, permitindo a sua interação com o sistema, principalmente por meio da realimentação de relevância implícita ou explícita. Analogamente, a promoção de diversidade surgiu como uma alternativa para lidar com consultas ambíguas ou incompletas. Adicionalmente, muitos trabalhos têm tratado a ideia de minimização do esforço requerido do usuário em fornecer julgamentos de relevância, à medida que mantém níveis aceitáveis de eficácia. Esta tese aborda, propõe e analisa experimentalmente métodos de recuperação da informação interativos e multimodais orientados por diversidade. Este trabalho aborda de forma abrangente a literatura acerca da recuperação interativa da informação e discute sobre os avanços recentes, os grandes desafios de pesquisa e oportunidades promissoras de trabalho. Nós propusemos e avaliamos dois métodos de aprimoramento do balanço entre relevância e diversidade, os quais integram múltiplas informações de imagens, tais como: propriedades visuais, metadados textuais, informação geográfica e descritores de credibilidade dos usuários. Por sua vez, como integração de técnicas de recuperação interativa e de promoção de diversidade, visando maximizar a cobertura de múltiplas interpretações/aspectos de busca e acelerar a transferência de informação entre o usuário e o sistema, nós propusemos e avaliamos um método multimodal de aprendizado para ranqueamento utilizando realimentação de relevância sobre resultados diversificados. Nossa análise experimental mostra que o uso conjunto de múltiplas fontes de informação teve impacto positivo nos algoritmos de balanceamento entre relevância e diversidade. Estes resultados sugerem que a integração de filtragem e re-ranqueamento multimodais é eficaz para o aumento da relevância dos resultados e também como mecanismo de potencialização dos métodos de diversificação. Além disso, com uma análise experimental minuciosa, nós investigamos várias questões de pesquisa relacionadas à possibilidade de aumento da diversidade dos resultados e a manutenção ou até mesmo melhoria da sua relevância em sessões interativas. Adicionalmente, nós analisamos como o esforço em diversificar afeta os resultados gerais de uma sessão de busca e como diferentes abordagens de diversificação se comportam para diferentes modalidades de dados. Analisando a eficácia geral e também em cada iteração de realimentação de relevância, nós mostramos que introduzir diversidade nos resultados pode prejudicar resultados iniciais, enquanto que aumenta significativamente a eficácia geral em uma sessão de busca, considerando-se não apenas a relevância e diversidade geral, mas também o quão cedo o usuário é exposto ao mesmo montante de itens relevantes e nível de diversidadeAbstract: Information retrieval methods, especially considering multimedia data, have evolved towards the integration of multiple sources of evidence in the analysis of the relevance of items considering a given user search task. In this context, for attenuating the semantic gap between low-level features extracted from the content of the digital objects and high-level semantic concepts (objects, categories, etc.) and making the systems adaptive to different user needs, interactive models have brought the user closer to the retrieval loop allowing user-system interaction mainly through implicit or explicit relevance feedback. Analogously, diversity promotion has emerged as an alternative for tackling ambiguous or underspecified queries. Additionally, several works have addressed the issue of minimizing the required user effort on providing relevance assessments while keeping an acceptable overall effectiveness. This thesis discusses, proposes, and experimentally analyzes multimodal and interactive diversity-oriented information retrieval methods. This work, comprehensively covers the interactive information retrieval literature and also discusses about recent advances, the great research challenges, and promising research opportunities. We have proposed and evaluated two relevance-diversity trade-off enhancement work-flows, which integrate multiple information from images, such as: visual features, textual metadata, geographic information, and user credibility descriptors. In turn, as an integration of interactive retrieval and diversity promotion techniques, for maximizing the coverage of multiple query interpretations/aspects and speeding up the information transfer between the user and the system, we have proposed and evaluated a multimodal learning-to-rank method trained with relevance feedback over diversified results. Our experimental analysis shows that the joint usage of multiple information sources positively impacted the relevance-diversity balancing algorithms. Our results also suggest that the integration of multimodal-relevance-based filtering and reranking was effective on improving result relevance and also boosted diversity promotion methods. Beyond it, with a thorough experimental analysis we have investigated several research questions related to the possibility of improving result diversity and keeping or even improving relevance in interactive search sessions. Moreover, we analyze how much the diversification effort affects overall search session results and how different diversification approaches behave for the different data modalities. By analyzing the overall and per feedback iteration effectiveness, we show that introducing diversity may harm initial results whereas it significantly enhances the overall session effectiveness not only considering the relevance and diversity, but also how early the user is exposed to the same amount of relevant items and diversityDoutoradoCiência da ComputaçãoDoutor em Ciência da ComputaçãoP-4388/2010140977/2012-0CAPESCNP

    User Intent in Online Video Search

    No full text
    Over the recent years, user expectations of the ability of video search engines have significantly risen. Users expect video search engines to be useful as an instrument that facilitates communication, education, entertainment and problem solving and, in relation to this, to satisfy diverse information needs. A user's information need is the lack that a user is attempting to overcome by engaging in information seeking behavior and can be seen as having two important dimensions: it comprises both a 'what' dimension reflecting the topic of the search and a 'why' dimension corresponding to the user intent, the immediate reason, purpose or goal behind the information need. Video search engines are relatively successful at returning search results that users find to be on topic. These results do not, however, completely satisfy the users' information needs unless they also fulfill the users' intents. The purpose of this thesis is to enable the intent-related focus shift in the design and realization of video search engines and to advance them in terms of user intent in order to satisfy users' information needs to their full extent. This advancement is challenging because it affects the entire pipeline of the video search engine: video indexing, query processing, and search results ranking. However, it also has the potential to substantially improve the overall utility of video search engines and increase the impact, significance and economic value of the online video content. We start to tackle this challenge by analyzing a real-world transaction log produced by a state-of-the-art video search engine with the objective to obtain a deeper understanding why queries submitted by users in their search sessions fail. Based on the results of this analysis, we build classifiers to automatically predict these reasons for query failure given a set of multimodal features derived from both the user interactions with the search engine as well as the search results produced by the engine. Our analysis of the transaction log reveals several distinct reasons for why user queries in video search fail. Among others, one of the reasons is the way user goals are expressed in the query, i.e., a single query can correspond to different underlying goals. In other words, intent is often not explicitly reflected in the query. This fact motivates us to tackle this challenge and to investigate the usefulness of incorporating user intent in video search engines. As a first step, we investigate the nature of the immediate reason, purpose or goal behind a user information need that constitutes intent. We carry out a social-Web mining approach combined with crowdsourcing and a manual coding process in order to derive a conceptual model (i.e., a typology constituting search intent categories) covering different reasons why users consult video search engines. This typology builds the basis for integrating user intent in video search engines. We then provide evidence that users differentiate videos in search engine results lists on the basis of these user intent categories. In addition to understanding which search intents exist in video search, it is equally important to understand which intents are associated with videos. This understanding is crucial, as it builds the foundation of matching the search intents expressed by users in search scenarios to the intents that are associated with videos stored in the video search engine's index. While in search scenarios intent can be characterized by the different reasons why users consult a video search engine, comparable user actions can be investigated for why videos were added to the search engine's index in the first place. For this reason, we investigate the user action of uploading videos to the Internet and apply a combination of social-Web mining and crowdsourcing to arrive at a conceptual model (i.e., a typology constituting uploader intent categories) that characterizes the various reasons why users upload videos to the Internet. We then build algorithms that automatically classify videos into these categories. Finally, we demonstrate that uploader intent categories correlate with search intent categories, which provides the opportunity for incorporating intent into the retrieval functions of video search engines. With search intent categories and uploader intent categories and their automatic prediction at hand, we face the challenging task of introducing user intent in search results rankings that produce video search results lists that optimally reflect user intent. We propose an intent-aware video search result optimization approach that exploits the structure of topically-relevant initial results lists produced by the search engine in response to user-submitted queries in order to predict which search intent/s the user would most likely wish to satisfy. Based on this information, the approach optimizes the initial lists in a way that search results with the highest potential to satisfy the users' search intent/s are positioned at the very top of the list without decreasing its topical focus. Finally, although this thesis contributes a substantial amount of research towards user intent-aware video search engines, we believe that additional challenges will emerge in the future that will go above and beyond the challenges addressed in this thesis. We identify and discuss these challenges and expect them to attract significant research efforts that will lead to productive outcomes in the field of user intent-aware video search engines in the following years.Intelligent SystemsElectrical Engineering, Mathematics and Computer Scienc
    corecore