16,121 research outputs found
Measuring and Managing Answer Quality for Online Data-Intensive Services
Online data-intensive services parallelize query execution across distributed
software components. Interactive response time is a priority, so online query
executions return answers without waiting for slow running components to
finish. However, data from these slow components could lead to better answers.
We propose Ubora, an approach to measure the effect of slow running components
on the quality of answers. Ubora randomly samples online queries and executes
them twice. The first execution elides data from slow components and provides
fast online answers; the second execution waits for all components to complete.
Ubora uses memoization to speed up mature executions by replaying network
messages exchanged between components. Our systems-level implementation works
for a wide range of platforms, including Hadoop/Yarn, Apache Lucene, the
EasyRec Recommendation Engine, and the OpenEphyra question answering system.
Ubora computes answer quality much faster than competing approaches that do not
use memoization. With Ubora, we show that answer quality can and should be used
to guide online admission control. Our adaptive controller processed 37% more
queries than a competing controller guided by the rate of timeouts.Comment: Technical Repor
Latent Dirichlet Allocation (LDA) for improving the topic modeling of the official bulletin of the spanish state (BOE)
Since Internet was born most people can access fully free to a lot sources of information. Every day a lot of web pages are created and new content is uploaded and shared. Never in the history the humans has been more informed but also uninformed due the huge amount of information that can be access. When we are looking for something in any search engine the results are too many for reading and filtering one by one. Recommended Systems (RS) was created to help us to discriminate and filter these information according to ours preferences. This contribution analyses the RS of the official agency of publications in Spain (BOE), which is known as "Mi BOE'. The way this RS works was analysed, and all the meta-data of the published documents were analysed in order to know the coverage of the system. The results of our analysis show that more than 89% of the documents cannot be recommended, because they are not well described at the documentary level, some of their key meta-data are empty. So, this contribution proposes a method to label documents automatically based on Latent Dirichlet Allocation (LDA). The results are that using this approach the system could recommend (at a theoretical point of view) more than twice of documents that it now does, 11% vs 23% after applied this approach
- …