4,083 research outputs found

    A Faster Algorithm to Build New Users Similarity List in Neighbourhood-based Collaborative Filtering

    Full text link
    Neighbourhood-based Collaborative Filtering (CF) has been applied in the industry for several decades, because of the easy implementation and high recommendation accuracy. As the core of neighbourhood-based CF, the task of dynamically maintaining users' similarity list is challenged by cold-start problem and scalability problem. Recently, several methods are presented on solving the two problems. However, these methods applied an O(n2)O(n^2) algorithm to compute the similarity list in a special case, where the new users, with enough recommendation data, have the same rating list. To address the problem of large computational cost caused by the special case, we design a faster (O(1125n2)O(\frac{1}{125}n^2)) algorithm, TwinSearch Algorithm, to avoid computing and sorting the similarity list for the new users repeatedly to save the computational resources. Both theoretical and experimental results show that the TwinSearch Algorithm achieves better running time than the traditional method

    On the genericity properties in networked estimation: Topology design and sensor placement

    Full text link
    In this paper, we consider networked estimation of linear, discrete-time dynamical systems monitored by a network of agents. In order to minimize the power requirement at the (possibly, battery-operated) agents, we require that the agents can exchange information with their neighbors only \emph{once per dynamical system time-step}; in contrast to consensus-based estimation where the agents exchange information until they reach a consensus. It can be verified that with this restriction on information exchange, measurement fusion alone results in an unbounded estimation error at every such agent that does not have an observable set of measurements in its neighborhood. To over come this challenge, state-estimate fusion has been proposed to recover the system observability. However, we show that adding state-estimate fusion may not recover observability when the system matrix is structured-rank (SS-rank) deficient. In this context, we characterize the state-estimate fusion and measurement fusion under both full SS-rank and SS-rank deficient system matrices.Comment: submitted for IEEE journal publicatio

    Software library for stream-based recommender systems

    Get PDF
    Tradicionalmente, um algoritmo de machine learning é capaz de aprender com dados, dado um conjunto tratado e construído anteriormente. Também é possível analisar esse conjunto de dados, usando técnicas de mineração de dados e extrair conclusões a partir dele. Ambos os conceitos têm inúmeras aplicações em todo o mundo, desde diagnósticos médicos até reconhecimento de fala ou mesmo consultas a mecanismos de pesquisa. No entanto, tradicionalmente, supõe-se que o conjunto de dados esteja disponível a todo o momento. Isso não é necessariamente verdade com os dados modernos pois os aplicativos de sistemas distribuídos recebem e processam milhões de fluxos de dados em uma fração de tempo limitado. Portanto, são necessárias técnicas para extrair e processar esses fluxos de dados, em um período de tempo limitado, com bons resultados e dimensionamento eficaz à medida que os dados aumentam. Um sistema específico de análise e previsão de conclusões futuras a partir de dados fornecidos são os sistemas de recomendação. Vários serviços online usam sistemas de recomendação para fornecer conteúdo personalizado a seus usuários. Em muitos casos, as recomendações são um dos geradores de tráfego mais eficazes nesses serviços. O problema reside em encontrar o melhor pequeno subconjunto de itens em um sistema que corresponda às preferências pessoais de cada usuário, através da análise de uma quantidade muito grande de dados históricos. Esse problema recebe mais atenção se for considerado um problema genérico, não específico, ou seja, se uma biblioteca for construída para que possa ser estendida e usada como uma ferramenta para ajudar a construir um sistema para um caso de uso específico. Podem-se distinguir soluções entre perfeitas ou estatisticamente semelhantes. Devido a grande quantidade de dados disponíveis, a decisão de reprocessar todos os dados, sempre que novos dados chegam, não seria viável; portanto, algoritmos incrementais são usados ​​para processar os dados recebidos e manter o modelo de recomendação atualizado. O objetivo real deste trabalho é implementar uma biblioteca que contenha e avalie essas abordagens incrementais para recomendações de que as atuais são específicas da tarefa.Traditionally, a machine learning algorithm is able to learn from data, given a previously built and treated data set. One can also analyze that data set, using data mining techniques, and draw conclusions from it. Both of these concepts have numerous world-wide applications, from medical diagnosis to speech recognition or even search engine queries. However, traditionally speaking, it is being assumed that the data set is available at all times. That is not necessarily true with modern data, as distributed systems applications receive and process millions of data streams on a limited time fraction. Therefore, there is a need for techniques to mine and process these data streams,on a limited time period with good results and effective scaling as data grows. One specific use case of analyzing and predicting future conclusions from given data, are recommendation systems.Several online services use recommender systems to deliver personalized content to their users.In many cases, recommendations are one of the most effective traffic generators in such services.The problem lies in finding the best small subset of items in a system that matches the personal preferences of each user, through the analysis of a very large amount of historical data. This problem gets more attention if it is considered as a generic problem, not as a specific one, that is,if a library is built so that it can be extended and used as a tool to help build a system for a specific use case. One can distinguish solutions between perfect ones or statistically similar ones. Due to the large amount of data available, the decision to reprocess all the data every time new data arrives, would not be feasible so, incremental algorithms are used to process incoming data and keeping the recommendation model updated. The real purpose of this work is to implement such a library which contains, and evaluates these incremental approaches to recommendation since current ones are task-specific

    Recommender Systems

    Get PDF
    The ongoing rapid expansion of the Internet greatly increases the necessity of effective recommender systems for filtering the abundant information. Extensive research for recommender systems is conducted by a broad range of communities including social and computer scientists, physicists, and interdisciplinary researchers. Despite substantial theoretical and practical achievements, unification and comparison of different approaches are lacking, which impedes further advances. In this article, we review recent developments in recommender systems and discuss the major challenges. We compare and evaluate available algorithms and examine their roles in the future developments. In addition to algorithms, physical aspects are described to illustrate macroscopic behavior of recommender systems. Potential impacts and future directions are discussed. We emphasize that recommendation has a great scientific depth and combines diverse research fields which makes it of interests for physicists as well as interdisciplinary researchers.Comment: 97 pages, 20 figures (To appear in Physics Reports

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
    corecore