11 research outputs found
An Algorithm for the Longest Common Subsequence and Substring Problem
In this note, we first introduce a new problem called the longest common
subsequence and substring problem. Let and be two strings over an
alphabet . The longest common subsequence and substring problem for
and is to find the longest string which is a subsequence of and a
substring of . We propose an algorithm to solve the problem
Re-Use Dynamic Programming for Sequence Alignment: An Algorithmic Toolkit
International audienceThe problem of comparing two sequences S and T to determine their similarity is one of the fundamental problems in pattern matching. In this manuscript we will be primarily concerned with sequences as our objects and with various string comparison metrics. Our goal is to survey a methodology for utilizing repetitions in sequences in order to speed up the comparison process. Within this framework we consider various methods of parsing the sequences in order to frame their repetitions, and present a toolkit of various solutions whose time complexity depends both on the chosen parsing method as well as on the string-comparison metric used for the alignment
Building user profiles based on sequences for content and collaborative filtering
Modeling user profiles is a necessary step for most information filtering systems – such
as recommender systems – to provide personalized recommendations. However, most
of them work with users or items as vectors, by applying di erent types of mathematical
operations between them and neglecting sequential or content-based information.
Hence, in this paper we study how to propose an adaptive mechanism to obtain user
sequences using di erent sources of information, allowing the generation of hybrid
recommendations as a seamless, transparent technique from the system viewpoint. As
a proof of concept, we develop the Longest Common Subsequence (LCS) algorithm as
a similarity metric to compare the user sequences, where, in the process of adapting
this algorithm to recommendation, we include di erent parameters to control the e -
ciency by reducing the information used in the algorithm (preference filter), to decide
when a neighbor is considered useful enough to be included in the process (confidence
filter), to identify whether two interactions are equivalent ( -matching threshold), and
to normalize the length of the LCS in a bounded interval (normalization functions).
These parameters can be extended to work with any type of sequential algorithm.
We evaluate our approach with several state-of-the-art recommendation algorithms
using di erent evaluation metrics measuring the accuracy, diversity, and novelty of the
recommendations, and analyze the impact of the proposed parameters. We have found
that our approach o ers a competitive performance, outperforming content, collaborative,
and hybrid baselines, and producing positive results when either content- or
rating-based information is exploitedThis article has been co-funded by the European Social Fund (ESF) within the 2017 call for predoctoral contracts and the Spanish Ministry of Economy, Industry and Competitiveness (project reference: TIN2016-80630-P
Two algorithms for LCS Consecutive Suffix Alignment
AbstractThe problem of aligning two sequences A and B to determine their similarity is one of the fundamental problems in pattern matching. A challenging, basic variation of the sequence similarity problem is the incremental string comparison problem, denoted Consecutive Suffix Alignment, which is, given two strings A and B, to compute the alignment solution of each suffix of A versus B.Here, we present two solutions to the Consecutive Suffix Alignment Problem under the LCS (Longest Common Subsequence) metric, where the LCS metric measures the subsequence of maximal length common to A and B. The first solution is an O(nL) time and space algorithm for constant alphabets, where the size of the compared strings is O(n) and L⩽n denotes the size of the LCS of A and B.The second solution is an O(nL+nlog|Σ|) time and O(n) space algorithm for general alphabets, where Σ denotes the alphabet of the compared strings
Applying reranking strategies to route recommendation using sequence-aware evaluation
Venue recommendation approaches have become particularly useful nowadays due to the increasing number of users registered in location-based social networks (LBSNs), applications where it is possible to share the venues someone has visited and establish connections with other users in the system. Besides, the venue recommendation problem has certain characteristics that differ from traditional recommendation, and it can also benefit from other contextual aspects to not only recommend independent venues, but complete routes or venue sequences of related locations. Hence, in this paper, we investigate the problem of route recommendation under the perspective of generating a sequence of meaningful locations for the users, by analyzing both their personal interests and the intrinsic relationships between the venues. We divide this problem into three stages, proposing general solutions to each case: First, we state a general methodology to derive user routes from LBSNs datasets that can be applied in as many scenarios as possible; second, we define a reranking framework that generate sequences of items from recommendation lists using different techniques; and third, we propose an evaluation metric that captures both accuracy and sequentiality at the same time. We report our experiments on several LBSNs datasets and by means of different recommendation quality metrics and algorithms. As a result, we have found that classical recommender systems are comparable to specifically tailored algorithms for this task, although exploiting the temporal dimension, in general, helps on improving the performance of these techniques; additionally, the proposed reranking strategies show promising results in terms of finding a trade-off between relevance, sequentiality, and distance, essential dimensions in both venue and route recommendation tasksThis work has been funded by the Ministerio de Ciencia, Innovación y Universidades (reference: TIN2016-80630-P) and by the European Social Fund (ESF), within the 2017 call for predoctoral contract
Exploiting subsequence matching in Recommender Systems
Desde su surgimiento al inicio de la década de los 90, los sistemas de recomendación han
experimentado un crecimiento exponencial empleándose en cada vez más aplicaciones debido
a la utilidad que tienen para ayudar a los usuarios a elegir artÃculos en función de
sus gustos y necesidades. Actualmente son indispensables en un gran número de empresas
que ofrecen su servicio a través de Internet, el medio de intercambio de información más
importante que existe. Por esta razón, la continua innovación en estos sistemas resulta imprescindible
para poder efectuar recomendaciones que sean capaces de seguir sorprendiendo
a los usuarios y mejorar las ya existentes.
En este Trabajo Fin de Máster hemos realizado un estudio e investigación acerca del estado
actual de estos sistemas, prestando especial atención a los sistemas de filtrado colaborativo
basados en vecinos y los basados en contenido. No obstante, debido a las desventajas
que puede tener cada sistema por separado normalmente en aplicaciones reales se emplean
combinaciones de varios sistemas, creando recomendadores hÃbridos. Como apoyo a este
estudio, se propone como aspecto novedoso el uso del algoritmo de la subcadena común
más larga (LCS) para ser utilizada como medida de similitud entre usuarios, introduciendo
además, una técnica general y transparente para generar secuencias haciendo uso tanto de
información de contenido como de información colaborativa, pudiendo generar recomendadores
hÃbridos de manera sencilla. Complementando a estos nuevos recomendadores,
también detallamos otros parámetros auxiliares (confianza, preferencia, normalizaciones
y distintas ordenaciones) que tienen como fin mejorar el rendimiento de estos sistemas
basados en LCS.
Por otro lado, además de la definición de estos nuevos recomendadores, el trabajo
se complementa con resultados experimentales haciendo uso de tres conjuntos de datos
conocidos en el área: Movielens, Lastfm y MovieTweetings. Cada uno de ellos estará
orientado a explotar un aspecto especÃfico de la generación de secuencias. Los resultados
han sido obtenidos haciendo uso de métricas de ranking (Precisión, Recall, MAP o nDCG)
y de novedad y diversidad (_-nDCG, EPC, EPD, Aggregate diversity, EILD y Gini). Los
resultados han tenido como fin comparar el rendimiento de los recomendadores basados en
la subsecuencia común más larga frente a otros recomendadores conocidos en el área.
Como resumen, se ha observado que los recomendadores propuestos resultan altamente
competitivos en las pruebas realizadas siendo incluso mejores en algunas ocasiones a otros
recomendadores conocidos en el área, tanto en términos de métricas de ranking como
de novedad y diversidad. No obstante, también se ha observado que, en algunos casos,
el uso de recomendadores hÃbridos basados en la subsecuencia común más larga obtiene
unos resultados peores que otras versiones puramente colaborativas. En cualquier caso,
consideramos que es una propuesta con potencial para seguir siendo investigada.Since their inception in the early 1990s, recommender systems have experienced exponential
growth as they are being used in a large number of applications because of their
usefulness in helping users choose items based on their tastes and needs. Nowadays, they
are indispensable in many companies that o er their service through the Internet, the
most important method for information exchange. For this reason, continuous innovation
in these systems is essential to make recommendations that are able to continue surprising
users, while improving the existing ones.
In this Master's Thesis, we have studied and researched on the current state of these
systems, paying special attention to collaborative ltering based on neighborhood and
content-based algorithms. However, due to the disadvantages that each system may have
separately, combinations of these systems are often used in real applications, creating hybrid
recommenders. To support this study, we propose the use of the longest common
subsequence (LCS) algorithm as a novel aspect to be used as a similarity metric between
users, also introducing a general and transparent technique to generate sequences using
both content and collaborative information, allowing us to generate hybrid recommenders
in a simple way. Complementing these new recommendations, we also detail other auxiliary
parameters (con dence, preference, normalization functions, and di erent orderings),
whose main goal is to improve the performance of these LCS-based systems.
On the other hand, in addition to the de nition of these new recommenders, the study
is complemented with experimental results using three well-known datasets in the area:
Movielens, Lastfm and MovieTweetings. Each one of them will be oriented to exploit a
speci c aspect of the sequence generation process. The results have been obtained by using
ranking metrics (Precision, Recall, MAP, or nDCG) and novelty and diversity metrics
( -nDCG, EPC, EPD, Aggregate diversity, EILD, and Gini). With these experiments,
we aimed at comparing the performance of recommenders based on the longest common
subsequence against other well-known recommenders in the area.
As a summary, we have observed in the experiments performed that the proposed
recommenders are highly competitive, and sometimes they are even better than other
recommenders known in the area, both in terms of ranking quality metrics, and novelty and
diversity dimensions. However, we have also observed that, in some cases, the use of hybrid
recommenders based on the longest common subsequence results in worse performance than
other purely collaborative versions. In any case, we believe this is a proposal with enough
potential to be worthy of further investigation