316 research outputs found

    Knowledge Representation and WordNets

    Get PDF
    Knowledge itself is a representation of “real facts”. Knowledge is a logical model that presents facts from “the real world” witch can be expressed in a formal language. Representation means the construction of a model of some part of reality. Knowledge representation is contingent to both cognitive science and artificial intelligence. In cognitive science it expresses the way people store and process the information. In the AI field the goal is to store knowledge in such way that permits intelligent programs to represent information as nearly as possible to human intelligence. Knowledge Representation is referred to the formal representation of knowledge intended to be processed and stored by computers and to draw conclusions from this knowledge. Examples of applications are expert systems, machine translation systems, computer-aided maintenance systems and information retrieval systems (including database front-ends).knowledge, representation, ai models, databases, cams

    Measuring associational thinking through word embeddings

    Full text link
    [EN] The development of a model to quantify semantic similarity and relatedness between words has been the major focus of many studies in various fields, e.g. psychology, linguistics, and natural language processing. Unlike the measures proposed by most previous research, this article is aimed at estimating automatically the strength of associative words that can be semantically related or not. We demonstrate that the performance of the model depends not only on the combination of independently constructed word embeddings (namely, corpus- and network-based embeddings) but also on the way these word vectors interact. The research concludes that the weighted average of the cosine-similarity coefficients derived from independent word embeddings in a double vector space tends to yield high correlations with human judgements. Moreover, we demonstrate that evaluating word associations through a measure that relies on not only the rank ordering of word pairs but also the strength of associations can reveal some findings that go unnoticed by traditional measures such as Spearman's and Pearson's correlation coefficients.s Financial support for this research has been provided by the Spanish Ministry of Science, Innovation and Universities [grant number RTC 2017-6389-5], the Spanish ¿Agencia Estatal de Investigación¿ [grant number PID2020-112827GB-I00 / AEI / 10.13039/501100011033], and the European Union¿s Horizon 2020 research and innovation program [grant number 101017861: project SMARTLAGOON]. Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.Periñán-Pascual, C. (2022). Measuring associational thinking through word embeddings. Artificial Intelligence Review. 55(3):2065-2102. https://doi.org/10.1007/s10462-021-10056-62065210255

    Exploiting the conceptual space in hybrid recommender systems: a semantic-based approach

    Full text link
    Tesis doctoral inédita. Universidad Autónoma de Madrid, Escuela Politécnica Superior, octubre de 200

    Towards the automatic evaluation of stylistic quality of natural texts: constructing a special-­purpose corpus of stylistic edits from the Wikipedia revision history

    Get PDF
    This thesis proposes an approach to automatic evaluation of the stylistic quality of natural texts through data-driven methods of Natural Language Processing. Advantages of data driven methods and their dependency on the size of training data are discussed. Also the advantages of using Wikipedia as a source for textual data mining are presented. The method in this project crucially involves a program for quick automatic extraction of sentences edited by users from the Wikipedia Revision History. The resulting edits have been compiled in a large-scale corpus of examples of stylistic editing. The complete modular structure of the extraction program is described and its performance is analyzed. Furthermore, the need to separate stylistic edits stylistic edits from factual ones is discussed and a number of Machine Learning classification algorithms for this task are proposed and tested. The program developed in this project was able to process approximately 10% of the whole Russian Wikipedia Revision history (200 gigabytes of textual data) in one month, resulting in the extraction of more than two millions of user edits. The best algorithm for the classification of edits into factual and stylistic ones achieved 86.2% cross-validation accuracy, which is comparable with state-of-the-art performance of similar models described in published papers.Master i Datalingvistikk og språkteknologiMAHF-DASPDASP35

    OntoTouTra: tourist traceability ontology based on big data analytics

    Get PDF
    Tourist traceability is the analysis of the set of actions, procedures, and technical measures that allows us to identify and record the space–time causality of the tourist’s touring, from the beginning to the end of the chain of the tourist product. Besides, the traceability of tourists has implications for infrastructure, transport, products, marketing, the commercial viability of the industry, and the management of the destination’s social, environmental, and cultural impact. To this end, a tourist traceability system requires a knowledge base for processing elements, such as functions, objects, events, and logical connectors among them. A knowledge base provides us with information on the preparation, planning, and implementation or operation stages. In this regard, unifying tourism terminology in a traceability system is a challenge because we need a central repository that promotes standards for tourists and suppliers in forming a formal body of knowledge representation. Some studies are related to the construction of ontologies in tourism, but none focus on tourist traceability systems. For the above, we propose OntoTouTra, an ontology that uses formal specifications to represent knowledge of tourist traceability systems. This paper outlines the development of the OntoTouTra ontology and how we gathered and processed data from ubiquitous computing using Big Data analysis techniquesThis research was financially supported by the Ministry of Science, Technology, and Innovation of Colombia (733-2015) and by the Universidad Santo Tomás Seccional Tunja

    Sentence Embedding Approach using LSTM Auto-encoder for Discussion Threads Summarization

    Get PDF
    Online discussion forums are repositories of valuable information where users interact and articulate their ideas and opinions, and share experiences about numerous topics. These online discussion forums are internet-based online communities where users can ask for help and find the solution to a problem. A new user of online discussion forums becomes exhausted from reading the significant number of irrelevant replies in a discussion. An automated discussion thread summarizing system (DTS) is necessary to create a candid view of the entire discussion of a query. Most of the previous approaches for automated DTS use the continuous bag of words (CBOW) model as a sentence embedding tool, which is poor at capturing the overall meaning of the sentence and is unable to grasp word dependency. To overcome these limitations, we introduce the LSTM Auto-encoder as a sentence embedding technique to improve the performance of DTS. The empirical result in the context of the proposed approach’s average precision, recall, and F-measure with respect to ROGUE-1 and ROUGE-2 of two standard experimental datasets demonstrates the effectiveness and efficiency of the proposed approach and outperforms the state-of-the-art CBOW model in sentence embedding tasks and boost the performance of the automated DTS model

    CNN vs. LSTM for Turkish text classification

    Get PDF
    In this paper, the efficiency of two states of the art text classification techniques, i.e., Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) for supporting the Turkish text classification has been investigated. In addition, the effect of the main preprocessing steps such as Tokenization, Stop Word Elimination, Stemming, etc. has also been studied. Several experiments using "TTC-3600"dataset were performed, and it has been observed that both CNN and LSTM can efficiently support the Turkish language and can achieve quite good performance. Related to data preprocessing, results indicated that such a process improves the performance, however, for the Turkish language, it is preferred to exclude stemming. Also, by comparing the performance of feature extraction techniques for processing Turkish language, Word2Vec outperforms TF-IDF. © 2021 IEEE

    Landing on the right job : a machine learning approach to match candidates with jobs applying semantic embeddings

    Get PDF
    Project Work presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced AnalyticsJob application’ screening is a challenging and time-consuming task to execute manually. For recruiting companies such as Landing.Jobs it poses constraints on the ability to scale the business. Some systems have been built for assisting recruiters screening applications but they tend to overlook the challenges related with natural language. On the other side, most people nowadays specially in the IT-sector use the Internet to look for jobs, however, given the huge amount of job postings online, it can be complicated for a candidate to short-list the right ones for applying to. In this work we test a collection of Machine Learning algorithms and through the usage of cross-validation we calibrate the most important hyper-parameters of each algorithm. The learning algorithms attempt to learn what makes a successful match between candidate profile and job requirements using for training historical data of selected/reject applications in the screening phase. The features we use for building our models include the similarities between the job requirements and the candidate profile in dimensions such as skills, profession, location and a set of job features which intend to capture the experience level, salary expectations, among others. In a first set of experiments, our best results emerge from the application of the Multilayer Perceptron algorithm (also known as Feed-Forward Neural Networks). After this, we improve the skills-matching feature by applying techniques for semantically embedding required/offered skills in order to tackle problems such as synonyms and typos which artificially degrade the similarity between job profile and candidate profile and degrade the overall quality of the results. Through the usage of word2vec algorithm for embedding skills and Multilayer Perceptron to learn the overall matching we obtain our best results. We believe our results could be even further improved by extending the idea of semantic embedding to other features and by finding candidates with similar job preferences with the target candidate and building upon that a richer presentation of the candidate profile. We consider that the final model we present in this work can be deployed in production as a first-level tool for doing the heavy-lifting of screening all applications, then passing the top N matches for manual inspection. Also, the results of our model can be used to complement any recommendation system in place by simply running the model encoding the profile of all candidates in the database upon any new job opening and recommend the jobs to the candidates which yield higher matching probability
    corecore