301 research outputs found

    NLP Analysis of Email Interactions to find automation opportunities

    Get PDF
    Finding automatization opportunities for email interactions can have positive effects for several industries, especially in tasks such as reading, receiving, writing and responding emails, categorizing emails or even to prevent loss of productivity and financial loses by dealing with spam, or improve users' satisfaction; even improving automatic categorization systems can mitigate negative impacts on personal and organization performance. Furthermore, people who work in companies spend around 28 % of their time reading and answering emails. In this project we proposed a methodology based on NLP and Unsupervised Machine Learning to look for opportunities of automation arising from recurrent email patterns found in email texts. We intent to facilitate the linguistic analysis in order to retrieve interaction patterns that can trigger automation actions. We proposed CRISP-DM methodology that lays the groundwork for detection of automatization opportunities in tasks relates. We compared the unsupervised machine learning methods K-Means, DBSCAN, and HDBSCAN with four clustering metrics applied to the Enron e-mails dataset transformed into paragraph vectors and performed several experiments with Word Mover's Distance, Euclidean Distance, L2-Norm and Cosine Similarity. Although our process yielded limited results in the detection of email interactions, we found that DBSCAN combined with Euclidean Distance was the best method among all scores. This project also contributes to the parameterization literature of said clustering algorithms as well as showing which methods, distances and scores settings are relevant for unsupervised email mining

    V-Measure: A conditional entropy-based external cluster evaluation

    Get PDF
    We present V-measure, an external entropy-based cluster evaluation measure. Vmeasure provides an elegant solution to many problems that affect previously defined cluster evaluation measures including 1) dependence on clustering algorithm or data set, 2) the “problem of matching”, where the clustering of only a portion of data points are evaluated and 3) accurate evaluation and combination of two desirable aspects of clustering, homogeneity and completeness. We compare V-measure to a number of popular cluster evaluation measures and demonstrate that it satisfies several desirable properties of clustering solutions, using simulated clustering results. Finally, we use V-measure to evaluate two clustering tasks: document clustering and pitch accent type clustering

    Visual SLAM with RGB-D cameras based on pose graph optimization

    Get PDF
    En este trabajo abordamos el problema de localización y mapeo simultáneo (SLAM) utilizando únicamente información obtenida mediante una cámara RGB-D. El objetivo principal es desarrollar un sistema SLAM capaz de estimar la trayectoria completa del sensor y generar una representación 3D consistente del entorno en tiempo real. Para lograr este objetivo, el sistema se basa en un método de estimación del movimiento del sensor a partir de información de profundidad densa y en técnicas de reconocimiento de lugares a partir de características visuales. A partir de estos algoritmos, se extraen restricciones espaciales entre fotogramas cuidadosamente seleccionados. Con estas restricciones espaciales se construye un grafo de poses, empleado para inferir la trayectoria más verosímil. El sistema se ha diseñado para ejecutarse en dos hilos paralelos: uno para el seguimiento y el otro para la construcción de la representación consistente. El sistema se evalúa en conjuntos de datos públicamente accesible, alcanzando una precisión comparable a sistemas de SLAM del estado del arte. Además, el hilo de seguimiento se ejecuta a una frecuencia de 60 Hz en un ordenador portátil de prestaciones modestas. También se realizan pruebas en situaciones más realistas, procesando observaciones adquiridas mientras se movía el sensor por dos entornos de interiores distintos

    Using Generic Summarization to Improve Music Information Retrieval Tasks

    Get PDF
    In order to satisfy processing time constraints, many MIR tasks process only a segment of the whole music signal. This practice may lead to decreasing performance, since the most important information for the tasks may not be in those processed segments. In this paper, we leverage generic summarization algorithms, previously applied to text and speech summarization, to summarize items in music datasets. These algorithms build summaries, that are both concise and diverse, by selecting appropriate segments from the input signal which makes them good candidates to summarize music as well. We evaluate the summarization process on binary and multiclass music genre classification tasks, by comparing the performance obtained using summarized datasets against the performances obtained using continuous segments (which is the traditional method used for addressing the previously mentioned time constraints) and full songs of the same original dataset. We show that GRASSHOPPER, LexRank, LSA, MMR, and a Support Sets-based Centrality model improve classification performance when compared to selected 30-second baselines. We also show that summarized datasets lead to a classification performance whose difference is not statistically significant from using full songs. Furthermore, we make an argument stating the advantages of sharing summarized datasets for future MIR research.Comment: 24 pages, 10 tables; Submitted to IEEE/ACM Transactions on Audio, Speech and Language Processin

    Intelligent Information Access to Linked Data - Weaving the Cultural Heritage Web

    Get PDF
    The subject of the dissertation is an information alignment experiment of two cultural heritage information systems (ALAP): The Perseus Digital Library and Arachne. In modern societies, information integration is gaining importance for many tasks such as business decision making or even catastrophe management. It is beyond doubt that the information available in digital form can offer users new ways of interaction. Also, in the humanities and cultural heritage communities, more and more information is being published online. But in many situations the way that information has been made publicly available is disruptive to the research process due to its heterogeneity and distribution. Therefore integrated information will be a key factor to pursue successful research, and the need for information alignment is widely recognized. ALAP is an attempt to integrate information from Perseus and Arachne, not only on a schema level, but to also perform entity resolution. To that end, technical peculiarities and philosophical implications of the concepts of identity and co-reference are discussed. Multiple approaches to information integration and entity resolution are discussed and evaluated. The methodology that is used to implement ALAP is mainly rooted in the fields of information retrieval and knowledge discovery. First, an exploratory analysis was performed on both information systems to get a first impression of the data. After that, (semi-)structured information from both systems was extracted and normalized. Then, a clustering algorithm was used to reduce the number of needed entity comparisons. Finally, a thorough matching was performed on the different clusters. ALAP helped with identifying challenges and highlighted the opportunities that arise during the attempt to align cultural heritage information systems

    A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models

    Full text link
    Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP related tasks. In the end, this survey briefly discusses the commonly used ML and DL based classifiers, evaluation metrics and the applications of these word embeddings in different NLP tasks
    corecore