134 research outputs found

    ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search

    Full text link
    Federated search, which involves integrating results from multiple independent search engines, will become increasingly pivotal in the context of Retrieval-Augmented Generation pipelines empowering LLM-based applications such as chatbots. These systems often distribute queries among various search engines, ranging from specialized (e.g., PubMed) to general (e.g., Google), based on the nature of user utterances. A critical aspect of federated search is resource selection - the selection of appropriate resources prior to issuing the query to ensure high-quality and rapid responses, and contain costs associated with calling the external search engines. However, current SOTA resource selection methodologies primarily rely on feature-based learning approaches. These methods often involve the labour intensive and expensive creation of training labels for each resource. In contrast, LLMs have exhibited strong effectiveness as zero-shot methods across NLP and IR tasks. We hypothesise that in the context of federated search LLMs can assess the relevance of resources without the need for extensive predefined labels or features. In this paper, we propose ReSLLM. Our ReSLLM method exploits LLMs to drive the selection of resources in federated search in a zero-shot setting. In addition, we devise an unsupervised fine tuning protocol, the Synthetic Label Augmentation Tuning (SLAT), where the relevance of previously logged queries and snippets from resources is predicted using an off-the-shelf LLM and then in turn used to fine-tune ReSLLM with respect to resource selection. Our empirical evaluation and analysis details the factors influencing the effectiveness of LLMs in this context. The results showcase the merits of ReSLLM for resource selection: not only competitive effectiveness in the zero-shot setting, but also obtaining large when fine-tuned using SLAT-protocol

    Ranking for Web Data Search Using On-The-Fly Data Integration

    Get PDF
    Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines

    Ranking for Web Data Search Using On-The-Fly Data Integration

    Get PDF
    Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines

    Scalability of findability: decentralized search and retrieval in large information networks

    Get PDF
    Amid the rapid growth of information today is the increasing challenge for people to survive and navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web challenge information retrieval in these environments. Collection of information in advance and centralization of IR operations are hardly possible because systems are dynamic and information is distributed. While monolithic search systems continue to struggle with scalability problems of today, the future of search likely requires a decentralized architecture where many information systems can participate. As individual systems interconnect to form a global structure, finding relevant information in distributed environments transforms into a problem concerning not only information retrieval but also complex networks. Understanding network connectivity will provide guidance on how decentralized search and retrieval methods can function in these information spaces. The dissertation studies one aspect of scalability challenges facing classic information retrieval models and presents a decentralized, organic view of information systems pertaining to search in large scale networks. It focuses on the impact of network structure on search performance and investigates a phenomenon we refer to as the Clustering Paradox, in which the topology of interconnected systems imposes a scalability limit. Experiments involving large scale benchmark collections provide evidence on the Clustering Paradox in the IR context. In an increasingly large, distributed environment, decentralized searches for relevant information can continue to function well only when systems interconnect in certain ways. Relying on partial indexes of distributed systems, some level of network clustering enables very efficient and effective discovery of relevant information in large scale networks. Increasing or reducing network clustering degrades search performances. Given this specific level of network clustering, search time is well explained by a poly-logarithmic relation to network size, indicating a high scalability potential for searching in a continuously growing information space

    2021 - The Second Annual Fall Symposium of Student Scholars

    Get PDF
    The full program book from the Fall 2020 Symposium of Student Scholars, held on November 18, 2021. Includes abstracts from the presentations and posters.https://digitalcommons.kennesaw.edu/sssprograms/1024/thumbnail.jp

    Immigration Federalism: A Reappraisal

    Get PDF
    This Article identifies how the current spate of state and local regulation is changing the way elected officials, scholars, courts, and the public think about the constitutional dimensions of immigration law and governmental responsibility for immigration enforcement. Reinvigorating the theoretical possibilities left open by the Supreme Court in its 1875 Chy Lung v. Freeman decision, state and local offi- cials characterize their laws as unavoidable responses to the policy problems they face when they are squeezed between the challenges of unauthorized migration and the federal government’s failure to fix a broken system. In the October 2012 term, in Arizona v. United States, the Court addressed, but did not settle, the difficult empirical, theoretical, and constitutional questions necessitated by these enactments and their attendant justifications. Our empirical investigation, however, discovered that most state and local immigration laws are not organic policy responses to pressing demographic challenges. Instead, such laws are the product of a more nuanced and politicized process in which demographic concerns are neither neces- sary nor sufficient factors and in which federal inactivity and subfederal activity are related phenomena, fomented by the same actors. This Article focuses on the con- stitutional and theoretical implications of these processes: It presents an evidence- based theory of state and local policy proliferation; it cautions legal scholars to rethink functionalist accounts for the rise of such laws; and it advises courts to reassess their use of traditional federalism frameworks to evaluate these sub federal enactments

    Private and censorship-resistant communication over public networks

    Get PDF
    Society’s increasing reliance on digital communication networks is creating unprecedented opportunities for wholesale surveillance and censorship. This thesis investigates the use of public networks such as the Internet to build robust, private communication systems that can resist monitoring and attacks by powerful adversaries such as national governments. We sketch the design of a censorship-resistant communication system based on peer-to-peer Internet overlays in which the participants only communicate directly with people they know and trust. This ‘friend-to-friend’ approach protects the participants’ privacy, but it also presents two significant challenges. The first is that, as with any peer-to-peer overlay, the users of the system must collectively provide the resources necessary for its operation; some users might prefer to use the system without contributing resources equal to those they consume, and if many users do so, the system may not be able to survive. To address this challenge we present a new game theoretic model of the problem of encouraging cooperation between selfish actors under conditions of scarcity, and develop a strategy for the game that provides rational incentives for cooperation under a wide range of conditions. The second challenge is that the structure of a friend-to-friend overlay may reveal the users’ social relationships to an adversary monitoring the underlying network. To conceal their sensitive relationships from the adversary, the users must be able to communicate indirectly across the overlay in a way that resists monitoring and attacks by other participants. We address this second challenge by developing two new routing protocols that robustly deliver messages across networks with unknown topologies, without revealing the identities of the communication endpoints to intermediate nodes or vice versa. The protocols make use of a novel unforgeable acknowledgement mechanism that proves that a message has been delivered without identifying the source or destination of the message or the path by which it was delivered. One of the routing protocols is shown to be robust to attacks by malicious participants, while the other provides rational incentives for selfish participants to cooperate in forwarding messages
    • …
    corecore