9 research outputs found
Unsupervised keyword extraction from microblog posts via hashtags
© River Publishers. Nowadays, huge amounts of texts are being generated for social networking purposes on Web. Keyword extraction from such texts like microblog posts benefits many applications such as advertising, search, and content filtering. Unlike traditional web pages, a microblog post usually has some special social feature like a hashtag that is topical in nature and generated by users. Extracting keywords related to hashtags can reflect the intents of users and thus provides us better understanding on post content. In this paper, we propose a novel unsupervised keyword extraction approach for microblog posts by treating hashtags as topical indicators. Our approach consists of two hashtag enhanced algorithms. One is a topic model algorithm that infers topic distributions biased to hashtags on a collection of microblog posts. The words are ranked by their average topic probabilities. Our topic model algorithm can not only find the topics of a collection, but also extract hashtag-related keywords. The other is a random walk based algorithm. It first builds a word-post weighted graph by taking into account posts themselves. Then, a hashtag biased random walk is applied on this graph, which guides the algorithm to extract keywords according to hashtag topics. Last, the final ranking score of a word is determined by the stationary probability after a number of iterations. We evaluate our proposed approach on a collection of real Chinese microblog posts. Experiments show that our approach is more effective in terms of precision than traditional approaches considering no hashtag. The result achieved by the combination of two algorithms performs even better than each individual algorithm
A Comprehensive Exploration of Personalized Learning in Smart Education: From Student Modeling to Personalized Recommendations
With the development of artificial intelligence, personalized learning has
attracted much attention as an integral part of intelligent education. China,
the United States, the European Union, and others have put forward the
importance of personalized learning in recent years, emphasizing the
realization of the organic combination of large-scale education and
personalized training. The development of a personalized learning system
oriented to learners' preferences and suited to learners' needs should be
accelerated. This review provides a comprehensive analysis of the current
situation of personalized learning and its key role in education. It discusses
the research on personalized learning from multiple perspectives, combining
definitions, goals, and related educational theories to provide an in-depth
understanding of personalized learning from an educational perspective,
analyzing the implications of different theories on personalized learning, and
highlighting the potential of personalized learning to meet the needs of
individuals and to enhance their abilities. Data applications and assessment
indicators in personalized learning are described in detail, providing a solid
data foundation and evaluation system for subsequent research. Meanwhile, we
start from both student modeling and recommendation algorithms and deeply
analyze the cognitive and non-cognitive perspectives and the contribution of
personalized recommendations to personalized learning. Finally, we explore the
challenges and future trajectories of personalized learning. This review
provides a multidimensional analysis of personalized learning through a more
comprehensive study, providing academics and practitioners with cutting-edge
explorations to promote continuous progress in the field of personalized
learning.Comment: 82 pages,5 figure
Transformer-based Vulnerability Detection in Code at EditTime: Zero-shot, Few-shot, or Fine-tuning?
Software vulnerabilities bear enterprises significant costs. Despite
extensive efforts in research and development of software vulnerability
detection methods, uncaught vulnerabilities continue to put software owners and
users at risk. Many current vulnerability detection methods require that code
snippets can compile and build before attempting detection. This,
unfortunately, introduces a long latency between the time a vulnerability is
injected to the time it is removed, which can substantially increases the cost
of fixing a vulnerability. We recognize that the current advances in machine
learning can be used to detect vulnerable code patterns on syntactically
incomplete code snippets as the developer is writing the code at EditTime. In
this paper we present a practical system that leverages deep learning on a
large-scale data set of vulnerable code patterns to learn complex
manifestations of more than 250 vulnerability types and detect vulnerable code
patterns at EditTime. We discuss zero-shot, few-shot, and fine-tuning
approaches on state of the art pre-trained Large Language Models (LLMs). We
show that in comparison with state of the art vulnerability detection models
our approach improves the state of the art by 10%. We also evaluate our
approach to detect vulnerability in auto-generated code by code LLMs.
Evaluation on a benchmark of high-risk code scenarios shows a reduction of up
to 90% vulnerability reduction
Knowledge science, engineering and management: 8th international conference, ksem 2015 Chongqing, China, october 28-30, 2015 proceedings
Knowledge science, engineering and management: 8th international conference, ksem 2015 Chongqing, China, october 28-30, 2015 proceeding
Exploring attributes, sequences, and time in Recommender Systems: From classical to Point-of-Interest recommendation
Tesis Doctoral inédita leída en la Universidad Autónoma de Madrid, Escuela Politécnica Superior, Departamento de Ingenieria Informática. Fecha de lectura: 08-07-2021Since the emergence of the Internet and the spread of digital communications
throughout the world, the amount of data stored on the Web has been
growing exponentially. In this new digital era, a large number of companies
have emerged with the purpose of ltering the information available on the
web and provide users with interesting items. The algorithms and models
used to recommend these items are called Recommender Systems. These
systems are applied to a large number of domains, from music, books, or
movies to dating or Point-of-Interest (POI), which is an increasingly popular
domain where users receive recommendations of di erent places when
they arrive to a city.
In this thesis, we focus on exploiting the use of contextual information, especially
temporal and sequential data, and apply it in novel ways in both
traditional and Point-of-Interest recommendation. We believe that this type
of information can be used not only for creating new recommendation models
but also for developing new metrics for analyzing the quality of these
recommendations. In one of our rst contributions we propose di erent
metrics, some of them derived from previously existing frameworks, using
this contextual information. Besides, we also propose an intuitive algorithm
that is able to provide recommendations to a target user by exploiting the
last common interactions with other similar users of the system.
At the same time, we conduct a comprehensive review of the algorithms
that have been proposed in the area of POI recommendation between 2011
and 2019, identifying the common characteristics and methodologies used.
Once this classi cation of the algorithms proposed to date is completed, we
design a mechanism to recommend complete routes (not only independent
POIs) to users, making use of reranking techniques. In addition, due to the
great di culty of making recommendations in the POI domain, we propose
the use of data aggregation techniques to use information from di erent
cities to generate POI recommendations in a given target city.
In the experimental work we present our approaches on di erent datasets
belonging to both classical and POI recommendation. The results obtained
in these experiments con rm the usefulness of our recommendation proposals,
in terms of ranking accuracy and other dimensions like novelty, diversity,
and coverage, and the appropriateness of our metrics for analyzing temporal
information and biases in the recommendations producedDesde la aparici on de Internet y la difusi on de las redes de comunicaciones
en todo el mundo, la cantidad de datos almacenados en la red ha crecido
exponencialmente. En esta nueva era digital, han surgido un gran n umero
de empresas con el objetivo de ltrar la informaci on disponible en la red
y ofrecer a los usuarios art culos interesantes. Los algoritmos y modelos
utilizados para recomendar estos art culos reciben el nombre de Sistemas de
Recomendaci on. Estos sistemas se aplican a un gran n umero de dominios,
desde m usica, libros o pel culas hasta las citas o los Puntos de Inter es (POIs,
en ingl es), un dominio cada vez m as popular en el que los usuarios reciben
recomendaciones de diferentes lugares cuando llegan a una ciudad.
En esta tesis, nos centramos en explotar el uso de la informaci on contextual,
especialmente los datos temporales y secuenciales, y aplicarla de forma novedosa
tanto en la recomendaci on cl asica como en la recomendaci on de POIs.
Creemos que este tipo de informaci on puede utilizarse no s olo para crear
nuevos modelos de recomendaci on, sino tambi en para desarrollar nuevas
m etricas para analizar la calidad de estas recomendaciones. En una de
nuestras primeras contribuciones proponemos diferentes m etricas, algunas
derivadas de formulaciones previamente existentes, utilizando esta informaci
on contextual. Adem as, proponemos un algoritmo intuitivo que es
capaz de proporcionar recomendaciones a un usuario objetivo explotando
las ultimas interacciones comunes con otros usuarios similares del sistema.
Al mismo tiempo, realizamos una revisi on exhaustiva de los algoritmos que
se han propuesto en el a mbito de la recomendaci o n de POIs entre 2011 y
2019, identi cando las caracter sticas comunes y las metodolog as utilizadas.
Una vez realizada esta clasi caci on de los algoritmos propuestos hasta la
fecha, dise~namos un mecanismo para recomendar rutas completas (no s olo
POIs independientes) a los usuarios, haciendo uso de t ecnicas de reranking.
Adem as, debido a la gran di cultad de realizar recomendaciones en el
ambito de los POIs, proponemos el uso de t ecnicas de agregaci on de datos
para utilizar la informaci on de diferentes ciudades y generar recomendaciones
de POIs en una determinada ciudad objetivo.
En el trabajo experimental presentamos nuestros m etodos en diferentes
conjuntos de datos tanto de recomendaci on cl asica como de POIs. Los
resultados obtenidos en estos experimentos con rman la utilidad de nuestras
propuestas de recomendaci on en t erminos de precisi on de ranking y de
otras dimensiones como la novedad, la diversidad y la cobertura, y c omo de
apropiadas son nuestras m etricas para analizar la informaci on temporal y
los sesgos en las recomendaciones producida
Exploiting general-purpose background knowledge for automated schema matching
The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process.
In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources.
A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems.
One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented.
In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications
Deep learning applied to the assessment of online student programming exercises
Massive online open courses (MOOCs) teaching coding are increasing in number and popularity. They commonly include homework assignments in which the students must write code that is evaluated by
functional tests. Functional testing may to some extent be automated
however provision of more qualitative evaluation and feedback may
be prohibitively labor-intensive. Provision of qualitative evaluation at
scale, automatically, is the subject of much research effort.
In this thesis, deep learning is applied to the task of performing
automatic assessment of source code, with a focus on provision of
qualitative feedback. Four tasks: language modeling, detecting idiomatic code, semantic code search, and predicting variable names are
considered in detail.
First, deep learning models are applied to the task of language modeling source code. A comparison is made between the performance of
different deep learning language models, and it is shown how language
models can be used for source code auto-completion. It is also demonstrated how language models trained on source code can be used for
transfer learning, providing improved performance on other tasks.
Next, an analysis is made on how the language models from the
previous task can be used to detect idiomatic code. It is shown that
these language models are able to locate where a student has deviated
from correct code idioms. These locations can be highlighted to the
student in order to provide qualitative feedback.
Then, results are shown on semantic code search, again comparing
the performance across a variety of deep learning models. It is demonstrated how semantic code search can be used to reduce the time taken
for qualitative evaluation, by automatically pairing a student submission with an instructor’s hand-written feedback.
Finally, it is examined how deep learning can be used to predict
variable names within source code. These models can be used in a
qualitative evaluation setting where the deep learning models can be
used to suggest more appropriate variable names. It is also shown that
these models can even be used to predict the presence of functional
errors.
Novel experimental results show that: fine-tuning a pre-trained
language model is an effective way to improve performance across a
variety of tasks on source code, improving performance by 5% on average; pre-trained language models can be used as zero-shot learners across a variety of tasks, with the zero-shot performance of some architectures outperforming the fine-tuned performance of others; and
that language models can be used to detect both semantic and syntactic errors. Other novel findings include: removing the non-variable
tokens within source code has negligible impact on the performance of
models, and that these remaining tokens can be shuffled with only a
minimal decrease in performance.Engineering and Physical Sciences Research Council (EPSRC) fundin