11 research outputs found

    An Algorithm for the Longest Common Subsequence and Substring Problem

    Full text link
    In this note, we first introduce a new problem called the longest common subsequence and substring problem. Let XX and YY be two strings over an alphabet Σ\Sigma. The longest common subsequence and substring problem for XX and YY is to find the longest string which is a subsequence of XX and a substring of YY. We propose an algorithm to solve the problem

    Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit

    Get PDF
    International audienceThe problem of comparing two sequences S and T to determine their similarity is one of the fundamental problems in pattern matching. In this manuscript we will be primarily concerned with sequences as our objects and with various string comparison metrics. Our goal is to survey a methodology for utilizing repetitions in sequences in order to speed up the comparison process. Within this framework we consider various methods of parsing the sequences in order to frame their repetitions, and present a toolkit of various solutions whose time complexity depends both on the chosen parsing method as well as on the string-comparison metric used for the alignment

    Building user profiles based on sequences for content and collaborative filtering

    Full text link
    Modeling user profiles is a necessary step for most information filtering systems – such as recommender systems – to provide personalized recommendations. However, most of them work with users or items as vectors, by applying di erent types of mathematical operations between them and neglecting sequential or content-based information. Hence, in this paper we study how to propose an adaptive mechanism to obtain user sequences using di erent sources of information, allowing the generation of hybrid recommendations as a seamless, transparent technique from the system viewpoint. As a proof of concept, we develop the Longest Common Subsequence (LCS) algorithm as a similarity metric to compare the user sequences, where, in the process of adapting this algorithm to recommendation, we include di erent parameters to control the e - ciency by reducing the information used in the algorithm (preference filter), to decide when a neighbor is considered useful enough to be included in the process (confidence filter), to identify whether two interactions are equivalent ( -matching threshold), and to normalize the length of the LCS in a bounded interval (normalization functions). These parameters can be extended to work with any type of sequential algorithm. We evaluate our approach with several state-of-the-art recommendation algorithms using di erent evaluation metrics measuring the accuracy, diversity, and novelty of the recommendations, and analyze the impact of the proposed parameters. We have found that our approach o ers a competitive performance, outperforming content, collaborative, and hybrid baselines, and producing positive results when either content- or rating-based information is exploitedThis article has been co-funded by the European Social Fund (ESF) within the 2017 call for predoctoral contracts and the Spanish Ministry of Economy, Industry and Competitiveness (project reference: TIN2016-80630-P

    Two algorithms for LCS Consecutive Suffix Alignment

    Get PDF
    AbstractThe problem of aligning two sequences A and B to determine their similarity is one of the fundamental problems in pattern matching. A challenging, basic variation of the sequence similarity problem is the incremental string comparison problem, denoted Consecutive Suffix Alignment, which is, given two strings A and B, to compute the alignment solution of each suffix of A versus B.Here, we present two solutions to the Consecutive Suffix Alignment Problem under the LCS (Longest Common Subsequence) metric, where the LCS metric measures the subsequence of maximal length common to A and B. The first solution is an O(nL) time and space algorithm for constant alphabets, where the size of the compared strings is O(n) and L⩽n denotes the size of the LCS of A and B.The second solution is an O(nL+nlog|Σ|) time and O(n) space algorithm for general alphabets, where Σ denotes the alphabet of the compared strings

    Applying reranking strategies to route recommendation using sequence-aware evaluation

    Full text link
    Venue recommendation approaches have become particularly useful nowadays due to the increasing number of users registered in location-based social networks (LBSNs), applications where it is possible to share the venues someone has visited and establish connections with other users in the system. Besides, the venue recommendation problem has certain characteristics that differ from traditional recommendation, and it can also benefit from other contextual aspects to not only recommend independent venues, but complete routes or venue sequences of related locations. Hence, in this paper, we investigate the problem of route recommendation under the perspective of generating a sequence of meaningful locations for the users, by analyzing both their personal interests and the intrinsic relationships between the venues. We divide this problem into three stages, proposing general solutions to each case: First, we state a general methodology to derive user routes from LBSNs datasets that can be applied in as many scenarios as possible; second, we define a reranking framework that generate sequences of items from recommendation lists using different techniques; and third, we propose an evaluation metric that captures both accuracy and sequentiality at the same time. We report our experiments on several LBSNs datasets and by means of different recommendation quality metrics and algorithms. As a result, we have found that classical recommender systems are comparable to specifically tailored algorithms for this task, although exploiting the temporal dimension, in general, helps on improving the performance of these techniques; additionally, the proposed reranking strategies show promising results in terms of finding a trade-off between relevance, sequentiality, and distance, essential dimensions in both venue and route recommendation tasksThis work has been funded by the Ministerio de Ciencia, Innovación y Universidades (reference: TIN2016-80630-P) and by the European Social Fund (ESF), within the 2017 call for predoctoral contract

    Exploiting subsequence matching in Recommender Systems

    Full text link
    Desde su surgimiento al inicio de la década de los 90, los sistemas de recomendación han experimentado un crecimiento exponencial empleándose en cada vez más aplicaciones debido a la utilidad que tienen para ayudar a los usuarios a elegir artículos en función de sus gustos y necesidades. Actualmente son indispensables en un gran número de empresas que ofrecen su servicio a través de Internet, el medio de intercambio de información más importante que existe. Por esta razón, la continua innovación en estos sistemas resulta imprescindible para poder efectuar recomendaciones que sean capaces de seguir sorprendiendo a los usuarios y mejorar las ya existentes. En este Trabajo Fin de Máster hemos realizado un estudio e investigación acerca del estado actual de estos sistemas, prestando especial atención a los sistemas de filtrado colaborativo basados en vecinos y los basados en contenido. No obstante, debido a las desventajas que puede tener cada sistema por separado normalmente en aplicaciones reales se emplean combinaciones de varios sistemas, creando recomendadores híbridos. Como apoyo a este estudio, se propone como aspecto novedoso el uso del algoritmo de la subcadena común más larga (LCS) para ser utilizada como medida de similitud entre usuarios, introduciendo además, una técnica general y transparente para generar secuencias haciendo uso tanto de información de contenido como de información colaborativa, pudiendo generar recomendadores híbridos de manera sencilla. Complementando a estos nuevos recomendadores, también detallamos otros parámetros auxiliares (confianza, preferencia, normalizaciones y distintas ordenaciones) que tienen como fin mejorar el rendimiento de estos sistemas basados en LCS. Por otro lado, además de la definición de estos nuevos recomendadores, el trabajo se complementa con resultados experimentales haciendo uso de tres conjuntos de datos conocidos en el área: Movielens, Lastfm y MovieTweetings. Cada uno de ellos estará orientado a explotar un aspecto específico de la generación de secuencias. Los resultados han sido obtenidos haciendo uso de métricas de ranking (Precisión, Recall, MAP o nDCG) y de novedad y diversidad (_-nDCG, EPC, EPD, Aggregate diversity, EILD y Gini). Los resultados han tenido como fin comparar el rendimiento de los recomendadores basados en la subsecuencia común más larga frente a otros recomendadores conocidos en el área. Como resumen, se ha observado que los recomendadores propuestos resultan altamente competitivos en las pruebas realizadas siendo incluso mejores en algunas ocasiones a otros recomendadores conocidos en el área, tanto en términos de métricas de ranking como de novedad y diversidad. No obstante, también se ha observado que, en algunos casos, el uso de recomendadores híbridos basados en la subsecuencia común más larga obtiene unos resultados peores que otras versiones puramente colaborativas. En cualquier caso, consideramos que es una propuesta con potencial para seguir siendo investigada.Since their inception in the early 1990s, recommender systems have experienced exponential growth as they are being used in a large number of applications because of their usefulness in helping users choose items based on their tastes and needs. Nowadays, they are indispensable in many companies that o er their service through the Internet, the most important method for information exchange. For this reason, continuous innovation in these systems is essential to make recommendations that are able to continue surprising users, while improving the existing ones. In this Master's Thesis, we have studied and researched on the current state of these systems, paying special attention to collaborative ltering based on neighborhood and content-based algorithms. However, due to the disadvantages that each system may have separately, combinations of these systems are often used in real applications, creating hybrid recommenders. To support this study, we propose the use of the longest common subsequence (LCS) algorithm as a novel aspect to be used as a similarity metric between users, also introducing a general and transparent technique to generate sequences using both content and collaborative information, allowing us to generate hybrid recommenders in a simple way. Complementing these new recommendations, we also detail other auxiliary parameters (con dence, preference, normalization functions, and di erent orderings), whose main goal is to improve the performance of these LCS-based systems. On the other hand, in addition to the de nition of these new recommenders, the study is complemented with experimental results using three well-known datasets in the area: Movielens, Lastfm and MovieTweetings. Each one of them will be oriented to exploit a speci c aspect of the sequence generation process. The results have been obtained by using ranking metrics (Precision, Recall, MAP, or nDCG) and novelty and diversity metrics ( -nDCG, EPC, EPD, Aggregate diversity, EILD, and Gini). With these experiments, we aimed at comparing the performance of recommenders based on the longest common subsequence against other well-known recommenders in the area. As a summary, we have observed in the experiments performed that the proposed recommenders are highly competitive, and sometimes they are even better than other recommenders known in the area, both in terms of ranking quality metrics, and novelty and diversity dimensions. However, we have also observed that, in some cases, the use of hybrid recommenders based on the longest common subsequence results in worse performance than other purely collaborative versions. In any case, we believe this is a proposal with enough potential to be worthy of further investigation
    corecore