3,723 research outputs found
Query Answering over Wikipedia for Mobile Devices on the Android Platform
p { margin-bottom: 0.1in; direction: ltr; line-height: 120%; text-align: left; widows: 2; orphans: 2; }p.western { font-family: "Times New Roman",serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } Tato bakalĂĄĆskĂĄ prĂĄce se zabĂœvĂĄ vĂœvojem systĂ©mu pro inteligentnĂ dotazovĂĄnĂ Wikipedie pro mobilnĂ zaĆĂzenĂ s operaÄnĂm systĂ©mem Android. Tato technickĂĄ zprĂĄva dĂĄle popisuje teoretickĂ© znalosti Ășzce souvisejĂcĂ s tĂ©matem a dĂĄle je popsĂĄna implementace serverovĂ©ho systĂ©mu a klientskĂ© aplikace. ÄĂĄst zprĂĄvy obsahuje testovĂĄnĂ vĂœslednĂ©ho systĂ©mu a v zĂĄvÄru je nastĂnÄn potencionĂĄlnĂ budoucĂ vĂœvoj.p { margin-bottom: 0.1in; direction: ltr; line-height: 120%; text-align: left; widows: 2; orphans: 2; }p.western { font-family: "Times New Roman",serif; }p.cjk { font-family: "Times New Roman"; }p.ctl { font-family: "Times New Roman"; font-size: 12pt; }a:link { color: rgb(0, 0, 255); } This bachelor thesis deals with the development of a system for query answering over Wikipedia for mobile devices running Android operating system. In this technical report theoretical knowledge related to this topic is described as well as the implementation process of a server system and client side application. Part of this thesis is dedicated to testing of the system and in the final part the potential for future development is drafted.
Leveraging Formulae and Text for Improved Math Retrieval
Large collections containing millions of math formulas are available online. Retrieving math expressions from these collections is challenging. Users can use formula, formula+text, or math questions to express their math information needs. The structural complexity of formulas requires specialized processing. Despite the existence of math search systems and online community question-answering websites for math, little is known about mathematical information needs. This research first explores the characteristics of math searches using a general search engine. The findings show how math searches are different from general searches. Then, test collections for math-aware search are introduced. The ARQMath test collections have two main tasks: 1) finding answers for math questions and 2) contextual formula search. In each test collection (ARQMath-1 to -3) the same collection is used, Math Stack Exchange posts from 2010 to 2018, introducing different topics for each task. Compared to the previous test collections, ARQMath has a much larger number of diverse topics, and improved evaluation protocol. Another key role of this research is to leverage text and math information for improved math information retrieval. Three formula search models that only use the formula, with no context are introduced. The first model is an n-gram embedding model using both symbol layout tree and operator tree representations. The second model uses tree-edit distance to re-rank the results from the first model. Finally, a learning-to-rank model that leverages full-tree, sub-tree, and vector similarity scores is introduced. To use context, Math Abstract Meaning Representation (MathAMR) is introduced, which generalizes AMR trees to include math formula operations and arguments. This MathAMR is then used for contextualized formula search using a fine-tuned Sentence-BERT model. The experiments show tree-edit distance ranking achieves the current state-of-the-art results on contextual formula search task, and the MathAMR model can be beneficial for re-ranking. This research also addresses the answer retrieval task, introducing a two-step retrieval model in which similar questions are first found and then answers previously given to those similar questions are ranked. The proposed model, fine-tunes two Sentence-BERT models, one for finding similar questions and another one for ranking the answers. For Sentence-BERT model, raw text as well as MathAMR are used
Evorus: A Crowd-powered Conversational Assistant Built to Automate Itself Over Time
Crowd-powered conversational assistants have been shown to be more robust
than automated systems, but do so at the cost of higher response latency and
monetary costs. A promising direction is to combine the two approaches for high
quality, low latency, and low cost solutions. In this paper, we introduce
Evorus, a crowd-powered conversational assistant built to automate itself over
time by (i) allowing new chatbots to be easily integrated to automate more
scenarios, (ii) reusing prior crowd answers, and (iii) learning to
automatically approve response candidates. Our 5-month-long deployment with 80
participants and 281 conversations shows that Evorus can automate itself
without compromising conversation quality. Crowd-AI architectures have long
been proposed as a way to reduce cost and latency for crowd-powered systems;
Evorus demonstrates how automation can be introduced successfully in a deployed
system. Its architecture allows future researchers to make further innovation
on the underlying automated components in the context of a deployed open domain
dialog system.Comment: 10 pages. To appear in the Proceedings of the Conference on Human
Factors in Computing Systems 2018 (CHI'18
Keyword Search on RDF Graphs - A Query Graph Assembly Approach
Keyword search provides ordinary users an easy-to-use interface for querying
RDF data. Given the input keywords, in this paper, we study how to assemble a
query graph that is to represent user's query intention accurately and
efficiently. Based on the input keywords, we first obtain the elementary query
graph building blocks, such as entity/class vertices and predicate edges. Then,
we formally define the query graph assembly (QGA) problem. Unfortunately, we
prove theoretically that QGA is a NP-complete problem. In order to solve that,
we design some heuristic lower bounds and propose a bipartite graph
matching-based best-first search algorithm. The algorithm's time complexity is
, where is the number of the keywords and is a
tunable parameter, i.e., the maximum number of candidate entity/class vertices
and predicate edges allowed to match each keyword. Although QGA is intractable,
both and are small in practice. Furthermore, the algorithm's time
complexity does not depend on the RDF graph size, which guarantees the good
scalability of our system in large RDF graphs. Experiments on DBpedia and
Freebase confirm the superiority of our system on both effectiveness and
efficiency
Search Bias Quantification: Investigating Political Bias in Social Media and Web Search
Users frequently use search systems on the Web as well as online social media to learn about ongoing events and public opinion on personalities. Prior studies have shown that the top-ranked results returned by these search engines can shape user opinion about the topic (e.g., event or person) being searched. In case of polarizing topics like politics, where multiple competing perspectives exist, the political bias in the top search results can play a significant role in shaping public opinion towards (or away from) certain perspectives. Given the considerable impact that search bias can have on the user, we propose a generalizable search bias quantification framework that not only measures the political bias in ranked list output by the search system but also decouples the bias introduced by the different sourcesâinput data and ranking system. We apply our framework to study the political bias in searches related to 2016 US Presidential primaries in Twitter social media search and find that both input data and ranking system matter in determining the final search output bias seen by the users. And finally, we use the framework to compare the relative bias for two popular search systemsâTwitter social media search and Google web searchâfor queries related to politicians and political events. We end by discussing some potential solutions to signal the bias in the search results to make the users more aware of them.publishe
Automatic Extraction and Assessment of Entities from the Web
The search for information about entities, such as people or movies, plays an increasingly important role on the Web. This information is still scattered across many Web pages, making it more time consuming for a user to ïŹnd all relevant information about an entity. This thesis describes techniques to extract entities and information about these entities from the Web, such as facts, opinions, questions and answers, interactive multimedia objects, and events. The ïŹndings of this thesis are that it is possible to create a large knowledge base automatically using a manually-crafted ontology. The precision of the extracted information was found to be between 75â90 % (facts and entities respectively) after using assessment algorithms. The algorithms from this thesis can be used to create such a knowledge base, which can be used in various research ïŹelds, such as question answering, named entity recognition, and information retrieval
- âŠ