1,158 research outputs found
Web readibility and computer-assisted language learning
Proficiency in a second language is of vital importance for many people. Today’s access to corpora of text, including the Web, allows new techniques for improving language skill. Our project’s aim is the development of techniques for presenting the user with suitable web text, to allow optimal language acquisition via reading. Some text found on the Web may be of a suitable level of difficulty but appropriate techniques need to be devised for locating it, as well as methods for rapid retrieval. Our experiments described here compare the range of difficulty of text found on the Web to that found in traditional hard-copy texts for English as a Second Language (ESL) learners, using standard readability measures. The results show that the ESL text readability range fall within the range for Web text. This suggests that an on-line text retrieval engine based on readability can be of use to language learners. However, web pages pose their own difficulty, since those with scores representing high readability are often of limited use. Therefore readability measurement techniques need to be modified for the Web domain
Human Resources Recommender system based on discrete variables
Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, specialization in Knowledge Management and Business IntelligenceNatural Language Processing and Understanding has become one of the most exciting and challenging
fields in the area of Artificial Intelligence and Machine Learning. With the rapidly changing business
environment and surroundings, the importance of having the data transformed in such a way that
makes it easy to interpret is the greatest competitive advantage a company can have. Having said this,
the purpose of this thesis dissertation is to implement a recommender system for the Human
Resources department in a company that will aid the decision-making process of filling a specific job
position with the right candidate. The recommender system fill be fed with applicants, each being
represented by their skills, and will produce a subset of most adequate candidates given a job position.
This work uses StarSpace, a novelty neural embedding model, whose aim is to represent entities in a
common vectorial space and further perform similarity measures amongst them
Recommended from our members
The role of machine learning in personalised instructional sequencing for language learning
The origins of personalised instructional sequencing can be dated back to the times of the Ancient Greeks to the times of Alexander The Great's tutor, Aristotle. However, over the centuries the demand for education and growth of students has been disproportionately greater than the number of teachers in training. Therefore, there has been a longstanding interest in finding a way to scale education without negatively affecting learning outcomes. This interest was fuelled further with the advent of computers and artificial intelligence, where a plethora of systems and models were built to bring technology driven personalised instructional sequencing to the world. Unfortunately, results were far from groundbreaking and many challenges still remain.
In my thesis, I investigate three aspects of personalised instructional sequencing: the personalised instructional sequencing mechanism, the student knowledge representation, and human forgetting. While I do not cover the entirety of personalised instructional sequencing, I cover what I consider the foundational components. I link psychological theory to model selection and design in each of my systems and present experiments to illustrate their impact. I show how reinforcement learning can be used for vocabulary learning. I also present a model that uses neural collaborative filtering to learn student knowledge representations. Lastly, I present a state-of-the-art model to predict the probability of vocabulary word recall for students learning English as a second language. The system's novelty lies in the use of word complexity to adapt the forgetting curve as well as its incorporation of psychological theory to select an appropriate model
Text Complexity Classification Based on Linguistic Information: Application to Intelligent Tutoring of ESL
The goal of this work is to build a classifier that can identify text
complexity within the context of teaching reading to English as a Second
Language (ESL) learners. To present language learners with texts that are
suitable to their level of English, a set of features that can describe the
phonological, morphological, lexical, syntactic, discursive, and psychological
complexity of a given text were identified. Using a corpus of 6171 texts, which
had already been classified into three different levels of difficulty by ESL
experts, different experiments were conducted with five machine learning
algorithms. The results showed that the adopted linguistic features provide a
good overall classification performance (F-Score = 0.97). A scalability
evaluation was conducted to test if such a classifier could be used within real
applications, where it can be, for example, plugged into a search engine or a
web-scraping module. In this evaluation, the texts in the test set are not only
different from those from the training set but also of different types (ESL
texts vs. children reading texts). Although the overall performance of the
classifier decreased significantly (F-Score = 0.65), the confusion matrix shows
that most of the classification errors are between the classes two and three
(the middle-level classes) and that the system has a robust performance in
categorizing texts of class one and four. This behavior can be explained by the
difference in classification criteria between the two corpora. Hence, the
observed results confirm the usability of such a classifier within a real-world
application.Comment: This is an unpublished pre-print, the JDMDH journal requires
submission to arxiv.org before the submission to the journal (see the link:
https://jdmdh.episciences.org/page/submissions#
Text-based Sentiment Analysis and Music Emotion Recognition
Nowadays, with the expansion of social media, large amounts of user-generated
texts like tweets, blog posts or product reviews are shared online. Sentiment polarity
analysis of such texts has become highly attractive and is utilized in recommender
systems, market predictions, business intelligence and more. We also witness deep
learning techniques becoming top performers on those types of tasks. There are
however several problems that need to be solved for efficient use of deep neural
networks on text mining and text polarity analysis.
First of all, deep neural networks are data hungry. They need to be fed with
datasets that are big in size, cleaned and preprocessed as well as properly labeled.
Second, the modern natural language processing concept of word embeddings as a
dense and distributed text feature representation solves sparsity and dimensionality
problems of the traditional bag-of-words model. Still, there are various uncertainties
regarding the use of word vectors: should they be generated from the same dataset
that is used to train the model or it is better to source them from big and popular
collections that work as generic text feature representations? Third, it is not easy for
practitioners to find a simple and highly effective deep learning setup for various
document lengths and types. Recurrent neural networks are weak with longer texts
and optimal convolution-pooling combinations are not easily conceived. It is thus
convenient to have generic neural network architectures that are effective and can
adapt to various texts, encapsulating much of design complexity.
This thesis addresses the above problems to provide methodological and practical
insights for utilizing neural networks on sentiment analysis of texts and achieving
state of the art results. Regarding the first problem, the effectiveness of various
crowdsourcing alternatives is explored and two medium-sized and emotion-labeled
song datasets are created utilizing social tags. One of the research interests of Telecom
Italia was the exploration of relations between music emotional stimulation and
driving style. Consequently, a context-aware music recommender system that aims
to enhance driving comfort and safety was also designed. To address the second
problem, a series of experiments with large text collections of various contents and
domains were conducted. Word embeddings of different parameters were exercised
and results revealed that their quality is influenced (mostly but not only) by the
size of texts they were created from. When working with small text datasets, it is
thus important to source word features from popular and generic word embedding
collections. Regarding the third problem, a series of experiments involving convolutional
and max-pooling neural layers were conducted. Various patterns relating
text properties and network parameters with optimal classification accuracy were
observed. Combining convolutions of words, bigrams, and trigrams with regional
max-pooling layers in a couple of stacks produced the best results. The derived
architecture achieves competitive performance on sentiment polarity analysis of
movie, business and product reviews.
Given that labeled data are becoming the bottleneck of the current deep learning
systems, a future research direction could be the exploration of various data programming
possibilities for constructing even bigger labeled datasets. Investigation
of feature-level or decision-level ensemble techniques in the context of deep neural
networks could also be fruitful. Different feature types do usually represent complementary
characteristics of data. Combining word embedding and traditional text
features or utilizing recurrent networks on document splits and then aggregating the
predictions could further increase prediction accuracy of such models
Improving Computer-Assisted Language Learning through Hierarchical Knowledge Structures
A common drawback in traditional language education is that all students in the same class use the same content. Since students may have different backgrounds such as prior knowledge and learning speed, one single curriculum may not be able to accommodate every student. Unfortunately, most students cannot afford personalized language learning, since preparing personalized learning content can be very time-consuming and potentially requires a significant amount of expert labor. Recently, researchers have proposed automatic systems to assist language education, such as Computer-based Assessment Systems (CAT) and Intelligent Tutoring Systems (ITS). However, previous work usually characterizes the student's knowledge and the difficulty of learning content using numeric scores, which may not be comprehensive. To improve on this, this thesis introduces hierarchical knowledge structures to assist in multiple tasks in language education. First, this structure multidimensionally characterizes the difficulty of each learning material by its relative difficulty to other materials and models the whole corpus with a graph structure. Additionally, we can utilize the hierarchical knowledge structure to multidimensionally assess a student's prior knowledge, predict the student's future performance on a specific task, and recommend learning content that is appropriate for each student. Furthermore, the hierarchical knowledge structure enables us to build a framework to characterize existing learning curricula extracted from textbooks and online learning tools, and apply expert wisdom that we have discovered to automatically design learning curricula. The hierarchical knowledge structure reduces the cost of expert labor and potentially makes language education more affordable and more engaging
Extended Recommendation Framework: Generating the Text of a User Review as a Personalized Summary
We propose to augment rating based recommender systems by providing the user
with additional information which might help him in his choice or in the
understanding of the recommendation. We consider here as a new task, the
generation of personalized reviews associated to items. We use an extractive
summary formulation for generating these reviews. We also show that the two
information sources, ratings and items could be used both for estimating
ratings and for generating summaries, leading to improved performance for each
system compared to the use of a single source. Besides these two contributions,
we show how a personalized polarity classifier can integrate the rating and
textual aspects. Overall, the proposed system offers the user three
personalized hints for a recommendation: rating, text and polarity. We evaluate
these three components on two datasets using appropriate measures for each
task
- …