6 research outputs found

    Supervised and unsupervised methods for learning representations of linguistic units

    Get PDF
    Word representations, also called word embeddings, are generic representations, often high-dimensional vectors. They map the discrete space of words into a continuous vector space, which allows us to handle rare or even unseen events, e.g. by considering the nearest neighbors. Many Natural Language Processing tasks can be improved by word representations if we extend the task specific training data by the general knowledge incorporated in the word representations. The first publication investigates a supervised, graph-based method to create word representations. This method leads to a graph-theoretic similarity measure, CoSimRank, with equivalent formalizations that show CoSimRank’s close relationship to Personalized Page-Rank and SimRank. The new formalization is efficient because it can use the graph-based word representation to compute a single node similarity without having to compute the similarities of the entire graph. We also show how we can take advantage of fast matrix multiplication algorithms. In the second publication, we use existing unsupervised methods for word representation learning and combine these with semantic resources by learning representations for non-word objects like synsets and entities. We also investigate improved word representations which incorporate the semantic information from the resource. The method is flexible in that it can take any word representations as input and does not need an additional training corpus. A sparse tensor formalization guarantees efficiency and parallelizability. In the third publication, we introduce a method that learns an orthogonal transformation of the word representation space that focuses the information relevant for a task in an ultradense subspace of a dimensionality that is smaller by a factor of 100 than the original space. We use ultradense representations for a Lexicon Creation task in which words are annotated with three types of lexical information – sentiment, concreteness and frequency. The final publication introduces a new calculus for the interpretable ultradense subspaces, including polarity, concreteness, frequency and part-of-speech (POS). The calculus supports operations like “−1 × hate = love” and “give me a neutral word for greasy” (i.e., oleaginous) and extends existing analogy computations like “king − man + woman = queen”.Wortrepräsentationen, sogenannte Word Embeddings, sind generische Repräsentationen, meist hochdimensionale Vektoren. Sie bilden den diskreten Raum der Wörter in einen stetigen Vektorraum ab und erlauben uns, seltene oder ungesehene Ereignisse zu behandeln -- zum Beispiel durch die Betrachtung der nächsten Nachbarn. Viele Probleme der Computerlinguistik können durch Wortrepräsentationen gelöst werden, indem wir spezifische Trainingsdaten um die allgemeinen Informationen erweitern, welche in den Wortrepräsentationen enthalten sind. In der ersten Publikation untersuchen wir überwachte, graphenbasierte Methodenn um Wortrepräsentationen zu erzeugen. Diese Methoden führen zu einem graphenbasierten Ähnlichkeitsmaß, CoSimRank, für welches zwei äquivalente Formulierungen existieren, die sowohl die enge Beziehung zum personalisierten PageRank als auch zum SimRank zeigen. Die neue Formulierung kann einzelne Knotenähnlichkeiten effektiv berechnen, da graphenbasierte Wortrepräsentationen benutzt werden können. In der zweiten Publikation verwenden wir existierende Wortrepräsentationen und kombinieren diese mit semantischen Ressourcen, indem wir Repräsentationen für Objekte lernen, welche keine Wörter sind, wie zum Beispiel Synsets und Entitäten. Die Flexibilität unserer Methode zeichnet sich dadurch aus, dass wir beliebige Wortrepräsentationen als Eingabe verwenden können und keinen zusätzlichen Trainingskorpus benötigen. In der dritten Publikation stellen wir eine Methode vor, die eine Orthogonaltransformation des Vektorraums der Wortrepräsentationen lernt. Diese Transformation fokussiert relevante Informationen in einen ultra-kompakten Untervektorraum. Wir benutzen die ultra-kompakten Repräsentationen zur Erstellung von Wörterbüchern mit drei verschiedene Angaben -- Stimmung, Konkretheit und Häufigkeit. Die letzte Publikation präsentiert eine neue Rechenmethode für die interpretierbaren ultra-kompakten Untervektorräume -- Stimmung, Konkretheit, Häufigkeit und Wortart. Diese Rechenmethode beinhaltet Operationen wie ”−1 × Hass = Liebe” und ”neutrales Wort für Winkeladvokat” (d.h., Anwalt) und erweitert existierende Rechenmethoden, wie ”Onkel − Mann + Frau = Tante”

    HPS: High precision stemmer

    Get PDF
    Abstract Research into unsupervised ways of stemming has resulted, in the past few years, in the development of methods that are reliable and perform well. Our approach further shifts the boundaries of the state of the art by providing more accurate stemming results. The idea of the approach consists in building a stemmer in two stages. In the first stage, a stemming algorithm based upon clustering, which exploits the lexical and semantic information of words, is used to prepare large-scale training data for the second-stage algorithm. The second-stage algorithm uses a maximum entropy classifier. The stemming-specific features help the classifier decide when and how to stem a particular word. In our research, we have pursued the goal of creating a multi-purpose stemming tool. Its design opens up possibilities of solving non-traditional tasks such as approximating lemmas or improving language modeling. However, we still aim at very good results in the traditional task of information retrieval. The conducted tests reveal exceptional performance in all the above mentioned tasks. Our stemming method is compared with three state-of-the-art statistical algorithms and one rule-based algorithm. We used corpora in the Czech, Slovak, Polish, Hungarian, Spanish and English languages. In the tests, our algorithm excels in stemming previously unseen words (the words that are not present in the training set). Moreover, it was discovered that our approach demands very little text data for training when compared with competing unsupervised algorithms

    Incorporating stakeholders in policy assessment: Generating a framework for system analysis and data driven policy making

    Get PDF
    In this thesis, we explore the use of social media in assisting in decision making and present a case study of the development and implementation of an open data framework in a small organization. We analyze the role of social media data for providing policy insight and prioritization of initiatives among citizens in particular we explore the sentiment analysis application in data mining in twitter. Data related to poverty and basic income was collected for 24 days in 2019, cleaned and prepared for natural language processing. A subset of the data was manually labeled for sentiment analysis in order to inform and train the AI. This analysis of public opinion on poverty is situated within the sustainable development goals and support for poverty reduction policies. We also explore the case of study of the district of Squamish in the development and application of an open data framework aligned to the strategic values and with a look into the continuous improvement and correct documentation of the system. We develop a policy, guidelines and framework tailored to this organization with small communities in mind. We use the open data case of study of Squamish as a model for the framework, we feed the framework with information based on the strategic direction and meeting with the district working group to better direct the development and application. We finally explore a joint solution for the progress of equality policies and targeted governmental initiatives while exploring the interactions of external and internal stakeholders. We present the social media case as a component to support the overall framework and to empower citizen with valuable data and potential for information analysis for other stakeholders

    Análisis de sentimientos en redes sociales orientado a la percepción de la calidad de servicios de internet, redes móviles, tv cable y electricidad

    Get PDF
    Proyecto de título (Ingeniero Civil Informático)El distinto enfoque de cada RRSS, y por consecuencia, lo diferente de los perfiles de los usuarios que son parte de una red social u otra, es el motivo de mayor peso al elegir la red social en la que se trabajaría para esta tesis. Dado que lo que se busca en este trabajo de tesis es automatizar el análisis de sentimiento en base a una opinión en redes sociales, se escogió Twitter como plataforma de la que se extraerá la data. Esto dado las facilidades que ofrece para el acceso y almacenamiento de la información disponible. Una vez definida la red social en la que se trabajará, solo queda preguntar: ¿Es valiosa la información disponible?, ¿Se puede acceder a esa información?, ¿Es posible automatizar la extracción y análisis de miles y miles de publicaciones?, ¿Es posible, mediante a lo mencionado anteriormente, orientar el desarrollo de una empresa a un enfoque centrado en el grado de satisfacción del cliente?, la respuesta a todas esas preguntas es sí, y se pretende demostrar con este trabajo de tesis. Mediante el uso de técnicas como Web Scraping para la búsqueda y descarga de información, desarrollo de códigos para el tratamiento del texto, y clasificadores y algoritmos de IA (inteligencia artificial) para el análisis de sentimientos y automatización en base a aprendizaje supervisado, para finalmente representar mediante el uso de gráficos los resultados obtenidos. Para el análisis de sentimiento, es indispensable que se adapte el algoritmo al contexto en el que se esté trabajando. Esto, dado que, al momento de limpiar el texto, dependerá del contexto las palabras que se eliminen por considerar que no aportan información o sentimiento a la oración. Dado el enfoque de la tesis, considerando la importancia de tomar en cuenta la opinión de las personas en nuevas vías como Twitter, y cómo servicios como internet han pasado a ser indispensables, es que se ha decidido analizar la opinión sobre la calidad del servicio que entregan distintas empresas del rubro internet, cable, redes móviles y electricidad. Para esto, se ha verificado manualmente la cantidad de publicaciones que se realizan sobre estos temas en Twitter, y se comprueba que tiene una actividad constante, por lo que se decide extraer la información de mensual. A continuación, en el presente informe se presentará el completo desarrollo de la idea de tesis, explicando metodologías, aspectos técnicos, teorías y resultados

    Information models in sentiment analysis based on linguistic resources

    Get PDF
    Почетак новог миленијума обележен је бурним развојем друштвених мрежа, интернет технологијама у облаку и применом вештачке интелигенције у веб алатима. Изузетно брз раст броја текстова на интернету (блогова, сајтова за електронску трговину, форума, дискусионих група, система за пренос кратких порука, друштвених мрежа и портала за објаву вести) увећао је потребу за развојем метода брзе, свеобухватне и прецизне анализе текста. Због тога је значајан развој језичких технологија чији су примарни задаци: класификација докумената (енг. Document classification), груписање докумената (енг. Document clustering), проналажење информација (енг. Information Retrieval), разрешавање значења вишезначних речи (енг. Word-sense disambiguation), екстракција из текста (енг. Text еxtraction), машинско превођење (енг. Machine translation), рачунарско препознавање говора (енг. Computer speech recognition), генерисање природног језика (енг. Natural language generation), анализа осећања (енг. sentiment analysis), итд. У рачунарској лингвистици данас је у употреби више различитих назива за област чији је предмет интересовања обрада осећања у тексту: класификација према осећању (енг. sentiment classification), истраживање мишљење (енг. opinion mining), анализа осећања (енг. sentiment analysis), екстракција осећања (енг. sentiment extraction). По својој природи и методама које користи, анализа осећања у тексту спада у област рачунарске лингвистике која се бави класификацијом текста. У процесу обраде осећања се, у општем случају, говори о три врсте класификације текстова:...The beginning of the new millennium was marked by huge development of social networks, internet technologies in the cloud and applications of artificial intelligence tools on the web. Extremely rapid growth in the number of articles on the Internet (blogs, e-commerce websites, forums, discussion groups, and systems for transmission of short messages, social networks and portals for publishing news) has increased the need for developing methods of rapid, comprehensive and accurate analysis of the text. Therefore, remarkable development of language technologies has enabled their applying in processes of document classification, document clustering, information retrieval, word sense disambiguation, text extraction, machine translation, computer speech recognition, natural language generation, sentiment analysis, etc. In computational linguistics, several different names for the area concerning processing of emotions in text are in use: sentiment classification, opinion mining, sentiment analysis, sentiment extraction. According to the nature and the methods used, sentiment analysis in text belongs to the field of computational linguistics that deals with the classification of text. In the process of analysing of emotions we generally speak of three kinds of text classification:..

    Social Media Marketing Evaluation Decision Making Processes and the Agency-Client Relationship

    Get PDF
    Evaluation of social media marketing is central to its success. This thesis seeks to contribute to our understanding of social media marketing evaluation processes and outcomes, together with an exploration of the dynamics of agency-client relationships. It contributes to knowledge across three major themes: strategy development, evaluation, and agency-client relationships and is one of the first studies to consider the role of the agency-client relationship in social media marketing. In particular, the study addresses a gap in current knowledge by revealing the significant influence of agency-client relationships on the processes and outcomes of social media marketing strategy development and evaluation. Adopting the ontological and epistemological position that reality is socially constructed, a qualitative study of twenty social media marketers provided a specialist digital agency perspective of social media campaigns. Data was collected through semi-structured interviews with key practitioners, supported by a cognitive-mapping elicitation technique. The findings generate knowledge of the first two major themes: strategy and evaluation through the development of two process models: the ‘Cycle of Social Media Marketing’ for strategy, and the ‘Cycle of Social Media Marketing Evaluation’ for evaluation. Findings for the second theme reject the traditional view of agency-client relationships, and instead offers a fresh perspective on these relationships in social media marketing, identifying three sub-themes: context, conflict and co-creation. The findings reveal key techniques for enhancing client relationships, including client account management strategies; the impact of conflict on trust between both parties; the crucial role of mutual participation in strategy development of strategy and evaluation; and the importance of co-creation, largely facilitated through collaborative learning workshops. This study has implications for scholars as it contributes to our understanding of evaluation in relation to strategy development in a rapidly developing area of modern marketing practice, affirming the importance of social media data analysis to decision-making. This study has implications for practice as it extends knowledge through conceptualisations of processes and offering insights into the influence and dynamics of agency-client interactions in social media marketing. Finally, a key contribution to knowledge is the development of two conceptual frameworks: The Contextualised Conceptual Framework of Social Media Marketing Evaluation in Strategy Development, and The Conceptual Framework of Agency-Client Dynamics in Social Media Marketing which encapsulate the multi-layered nature of this study and the vital importance of evaluation in social media marketing
    corecore