134 research outputs found
ReSLLM: Large Language Models are Strong Resource Selectors for Federated Search
Federated search, which involves integrating results from multiple
independent search engines, will become increasingly pivotal in the context of
Retrieval-Augmented Generation pipelines empowering LLM-based applications such
as chatbots. These systems often distribute queries among various search
engines, ranging from specialized (e.g., PubMed) to general (e.g., Google),
based on the nature of user utterances. A critical aspect of federated search
is resource selection - the selection of appropriate resources prior to issuing
the query to ensure high-quality and rapid responses, and contain costs
associated with calling the external search engines. However, current SOTA
resource selection methodologies primarily rely on feature-based learning
approaches. These methods often involve the labour intensive and expensive
creation of training labels for each resource. In contrast, LLMs have exhibited
strong effectiveness as zero-shot methods across NLP and IR tasks. We
hypothesise that in the context of federated search LLMs can assess the
relevance of resources without the need for extensive predefined labels or
features. In this paper, we propose ReSLLM. Our ReSLLM method exploits LLMs to
drive the selection of resources in federated search in a zero-shot setting. In
addition, we devise an unsupervised fine tuning protocol, the Synthetic Label
Augmentation Tuning (SLAT), where the relevance of previously logged queries
and snippets from resources is predicted using an off-the-shelf LLM and then in
turn used to fine-tune ReSLLM with respect to resource selection. Our empirical
evaluation and analysis details the factors influencing the effectiveness of
LLMs in this context. The results showcase the merits of ReSLLM for resource
selection: not only competitive effectiveness in the zero-shot setting, but
also obtaining large when fine-tuned using SLAT-protocol
Ranking for Web Data Search Using On-The-Fly Data Integration
Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines
Ranking for Web Data Search Using On-The-Fly Data Integration
Ranking - the algorithmic decision on how relevant an information artifact is for a given information need and the sorting of artifacts by their concluded relevancy - is an integral part of every search engine. In this book we investigate how structured Web data can be leveraged for ranking with the goal to improve the effectiveness of search. We propose new solutions for ranking using on-the-fly data integration and experimentally analyze and evaluate them against the latest baselines
Scalability of findability: decentralized search and retrieval in large information networks
Amid the rapid growth of information today is the increasing challenge for people to survive and navigate its magnitude. Dynamics and heterogeneity of large information spaces such as the Web challenge information retrieval in these environments. Collection of information in advance and centralization of IR operations are hardly possible because systems are dynamic and information is distributed. While monolithic search systems continue to struggle with scalability problems of today, the future of search likely requires a decentralized architecture where many information systems can participate. As individual systems interconnect to form a global structure, finding relevant information in distributed environments transforms into a problem concerning not only information retrieval but also complex networks. Understanding network connectivity will provide guidance on how decentralized search and retrieval methods can function in these information spaces. The dissertation studies one aspect of scalability challenges facing classic information retrieval models and presents a decentralized, organic view of information systems pertaining to search in large scale networks. It focuses on the impact of network structure on search performance and investigates a phenomenon we refer to as the Clustering Paradox, in which the topology of interconnected systems imposes a scalability limit. Experiments involving large scale benchmark collections provide evidence on the Clustering Paradox in the IR context. In an increasingly large, distributed environment, decentralized searches for relevant information can continue to function well only when systems interconnect in certain ways. Relying on partial indexes of distributed systems, some level of network clustering enables very efficient and effective discovery of relevant information in large scale networks. Increasing or reducing network clustering degrades search performances. Given this specific level of network clustering, search time is well explained by a poly-logarithmic relation to network size, indicating a high scalability potential for searching in a continuously growing information space
2021 - The Second Annual Fall Symposium of Student Scholars
The full program book from the Fall 2020 Symposium of Student Scholars, held on November 18, 2021. Includes abstracts from the presentations and posters.https://digitalcommons.kennesaw.edu/sssprograms/1024/thumbnail.jp
Immigration Federalism: A Reappraisal
This Article identifies how the current spate of state and local regulation is changing the way elected officials, scholars, courts, and the public think about the constitutional dimensions of immigration law and governmental responsibility for immigration enforcement. Reinvigorating the theoretical possibilities left open by the Supreme Court in its 1875 Chy Lung v. Freeman decision, state and local offi- cials characterize their laws as unavoidable responses to the policy problems they face when they are squeezed between the challenges of unauthorized migration and the federal government’s failure to fix a broken system. In the October 2012 term, in Arizona v. United States, the Court addressed, but did not settle, the difficult empirical, theoretical, and constitutional questions necessitated by these enactments and their attendant justifications. Our empirical investigation, however, discovered that most state and local immigration laws are not organic policy responses to pressing demographic challenges. Instead, such laws are the product of a more nuanced and politicized process in which demographic concerns are neither neces- sary nor sufficient factors and in which federal inactivity and subfederal activity are related phenomena, fomented by the same actors. This Article focuses on the con- stitutional and theoretical implications of these processes: It presents an evidence- based theory of state and local policy proliferation; it cautions legal scholars to rethink functionalist accounts for the rise of such laws; and it advises courts to reassess their use of traditional federalism frameworks to evaluate these sub federal enactments
Private and censorship-resistant communication over public networks
Society’s increasing reliance on digital communication networks is creating unprecedented opportunities for wholesale
surveillance and censorship. This thesis investigates the use of public networks such as the Internet to build
robust, private communication systems that can resist monitoring and attacks by powerful adversaries such as national
governments.
We sketch the design of a censorship-resistant communication system based on peer-to-peer Internet overlays in which
the participants only communicate directly with people they know and trust. This ‘friend-to-friend’ approach protects
the participants’ privacy, but it also presents two significant challenges. The first is that, as with any peer-to-peer
overlay, the users of the system must collectively provide the resources necessary for its operation; some users might
prefer to use the system without contributing resources equal to those they consume, and if many users do so, the
system may not be able to survive.
To address this challenge we present a new game theoretic model of the problem of encouraging cooperation between
selfish actors under conditions of scarcity, and develop a strategy for the game that provides rational incentives for
cooperation under a wide range of conditions.
The second challenge is that the structure of a friend-to-friend overlay may reveal the users’ social relationships to
an adversary monitoring the underlying network. To conceal their sensitive relationships from the adversary, the
users must be able to communicate indirectly across the overlay in a way that resists monitoring and attacks by other
participants.
We address this second challenge by developing two new routing protocols that robustly deliver messages across
networks with unknown topologies, without revealing the identities of the communication endpoints to intermediate
nodes or vice versa. The protocols make use of a novel unforgeable acknowledgement mechanism that proves that a
message has been delivered without identifying the source or destination of the message or the path by which it was
delivered. One of the routing protocols is shown to be robust to attacks by malicious participants, while the other
provides rational incentives for selfish participants to cooperate in forwarding messages
- …