1,048 research outputs found

    Mapping wordnets from the perspective of inter-lingual equivalence

    Get PDF
    Mapping wordnets from the perspective of inter-lingual equivalence This paper explores inter-lingual equivalence from the perspective of linking two large lexico-semantic databases, namely the Princeton WordNet of English and the plWordnet (pl. Słowosieć) of Polish. Wordnets are built as networks of lexico-semantic relations between words and their meanings, and constitute a type of monolingual dictionary cum thesaurus. The development of wordnets for different languages has given rise to many wordnet linking projects (e.g. EuroWordNet, Vossen, 2002). Regardless of a linking method used, these projects require defining rules for establishing equivalence links between wordnet building blocks, known as synsets (sets of synonymous lexical units, i.e., lemma-sense pairs). In this paper an analysis is carried out of a set of inter-wordnet relations used in the mapping of the plWordNet onto the Princeton WordNet, and an attempt is made to relate them to equivalence taxonomies described in specialist literature on bilingual lexicography and translation.   Rzutowanie wordnetów w perspektywie ekwiwalencji międzyjęzykowej Artykuł przedstawia analizę zjawiska ekwiwalencji międzyjęzykowej z perspektywy powiązania dwóch wielkich wordnetów: polskiej Słowosieci i angielskiego WordNetu princetońskiego. Wordnety są relacyjnymi bazami danych leksykalno-semantycznych opisującymi sieć relacji leksykalno-semantycznych pomiędzy słowami i ich znaczeniami. Stanowią zatem rodzaj słownika jednojęzycznego połączonego z tezaurusem. Rozwój wordnetów dla wielu języków świata zaowocował następnie ich wzajemnymi powiązaniami. Wymagało to zdefiniowania metodologii dla ustalenia ekwiwalencji pomiędzy ich podstawowymi elementami tzn. synsetami, które są zbiorami synonimicznych jednostek leksykalnych tzn. par lemat numer znaczenia. W artykule analizujemy zbiór relacji międzywordnetowych używanych w rzutowaniu pomiędzy Słowosiecią a WordNetem princetońskim, odnosząc je do taksonomii ekwiwalencji postulowanych w literaturze leksykograficznej i translatorycznej

    Identifying sources of opinions with conditional random fields and extraction patterns

    Get PDF
    Journal ArticleRecent systems have been developed for sentiment classification, opinion recognition, and opinion analysis (e.g., detecting polarity and strength). We pursue another aspect of opinion analysis: identifying the sources of opinions, emotions, and sentiments. We view this problem as an information extraction task and adopt a hybrid approach that combines Conditional Random Fields (Lafferty et al., 2001) and a variation of AutoSlog (Riloff, 1996a). While CRFs model source identification as a sequence tagging task, AutoSlog learns extraction patterns. Our results show that the combination of these two methods performs better than either one alone. The resulting system identifies opinion sources with 79:3% precision and 59:5% recall using a head noun matching measure, and 81:2% precision and 60:6% recall using an overlap measure

    CREATE: Concept Representation and Extraction from Heterogeneous Evidence

    Get PDF
    Traditional information retrieval methodology is guided by document retrieval paradigm, where relevant documents are returned in response to user queries. This paradigm faces serious drawback if the desired result is not explicitly present in a single document. The problem becomes more obvious when a user tries to obtain complete information about a real world entity, such as person, company, location etc. In such cases, various facts about the target entity or concept need to be gathered from multiple document sources. In this work, we present a method to extract information about a target entity based on the concept retrieval paradigm that focuses on extracting and blending information related to a concept from multiple sources if necessary. The paradigm is built around a generic notion of concept which is defined as any item that can be thought of as a topic of interest. Concepts may correspond to any real world entity such as restaurant, person, city, organization, etc, or any abstract item such as news topic, event, theory, etc. Web is a heterogeneous collection of data in different forms such as facts, news, opinions etc. We propose different models for different forms of data, all of which work towards the same goal of concept centric retrieval. We motivate our work based on studies about current trends and demands for information seeking. The framework helps in understanding the intent of content, i.e. opinion versus fact. Our work has been conducted on free text data in English. Nevertheless, our framework can be easily transferred to other languages

    Pushing the envelope of sentiment analysis beyond words and polarities

    Get PDF
    Idioms are multi-word expressions which hold a literal and figurative meaning which is conventionally understood by native speakers. Their overall meaning, often, cannot be deduced from the literal meaning of their constituent words. Sentiment analysis, also referred to as opinion mining, aims to automatically extract and classify sentiments, opinions, and emotions expressed in text. The research in this thesis is motivated by the fact that idioms, which often express an affective stance towards an entity or an event, are not featured systematically in sentiment analysis. To estimate the degree to which the inclusion of idioms as features may improve the results of traditional sentiment analysis, we compared our results to two state-of-the-art sentiment analysis approaches. Firstly, we collected a set of idioms that are relevant to sentiment analysis, i.e. those that can be mapped to an emotion. These mappings were obtained using a crowdsourcing approach. Secondly, to evaluate the results of sentiment analysis, we assembled a corpus of sentences in which idioms are used in context. Each sentence was annotated with an emotion, which formed the basis for the gold standard used for the comparison against the baseline methods. The classification performance was improved by almost 20 percentage points. Given the positive findings from our initial experiments, the main limitation was the significant knowledge-engineering overhead involved in hand-crafting lexico-semantic resources used to support idiom-based features. To minimise the bottleneck associated with the acquisition of such resources, we scaled up our original approach by automating their engineering. Subsequently, these resources were used to replace the manually engineered counterparts of such features in the originally proposed method. The fully automated approach outperformed the two baseline methods by 7 and 9 percentage points. These improvements, however, were poorer in comparison to those achieved in the initial study. Nevertheless, we have demonstrated, not only can idiom-based features be automatically engineered, but they too, improve sentiment classification results, when such features are present. Taking a long-term view of the research in this thesis, we want to address the limitations of state-of-the-art sentiment analysis approaches by focusing on a full range of emotions, rather than sentiment polarity. However, there is no consensus among researchers on a standardised framework for classifying emotions. Proposing such a framework would be a major contribution to the field of sentiment analysis, as it would stimulate its evolution into fully-fledged emotion classification and allow for systematic comparison of independent studies. With this goal in mind, we investigated the utility of different classification frameworks for sentiment analysis. A comprehensive statistical analysis of our experimental results provided explicit evidence that, in relative terms, six basic emotions are best suited for sentiment analysis. However, we identified the major shortcoming of oversimplifying positive emotions

    Setting and agenda for English for academic and professional purposes in Spain

    Get PDF
    This paper outlines the practical implications of the use of the term English for Academic and Professional Purposes (EAPP), a position originally taken by Alcaraz (2000). The article proposes an agenda for EAPP within tertiary level education in Spain. Firstly, we propose a methodological and theoretical linguistic framework for our EAPP classes in our engineering and business degrees, which takes the form of Personalized Learning and Continuous Evaluation and links our ideas about the nature of language issues in EAPP with a rigorous theoretical framework based on Hallidayan Systemic Linguistics. Secondly, we describe a methodology for Academic English based on corpus linguistic techniques, which involves the building and processing of a corpus rapidly so as to extract lexico-grammatical information that has direct application in the classroom. Thirdly, we suggest that English for Professional Purposes consists of interpersonal activities such as being able to negotiate effectively; conduct interviews and surveys; listen and take notes during meetings and presentations; communicate effectively on the telephone as well as with video conferencing technology; carry out oral presentations etc. Finally, we conclude this paper outlining the specific skills that would be needed by a teacher of EAPP

    Evaluative meaning in scientific writing : macro- and micro-analytic perspectives using data mining

    Get PDF
    In this thesis, we elaborate characteristics of evaluative meaning of different scientific disciplines and trace their diachronic linguistic evolution. A main focus lies on newly emerged disciplines, such as computational linguistics, which emerged through contact between two other disciplines, such as computer science and linguistics. Here, we consider (1) whether these newly emerged disciplines have created characteristics of their own over time, showing a process of diversification, and (2) whether they have also adopted characteristics from their disciplines of origin, reflected in a linguistic imprint, and if this might have changed over time. The newly emerged disciplines considered are computational linguistics, bioinformatics, digital construction and microelectronics, which have emerged through contact between computer science and a further discipline (linguistics, biology, mechanical engineering, and electrical engineering, respectively). In terms of theory, this work is grounded in a linguistic theory rooted in sociolinguistics, Systemic Functional Linguistics (SFL; Halliday, 2004), which with its functional perspective on language allowed us to position evaluative meaning within a linguistic theory and to create a model of analysis to trace choices made in the semantic system on the level of lexico-grammar. Moreover, its notion of register, concerned with functional variation, i.e. variation according to language use, combined with the sociolinguistic perspective made it possible to compare the linguistic choices made according to different social contexts, to which the disciplines belong. This allowed us to trace register diversification processes and registerial imprint of evaluative meaning across disciplines. In terms of methods, we apply classification as a data mining technique, taking a macro- and micro-analytic perspective (cf. Jockers, 2013) on the results. Doing so we gain insights on the degree of diversification and imprint (macro-analysis) and the kind of diversification and imprint (micro-analysis). Studies so far have considered either the macro- or the micro-analytic perspective. By considering both, we are able to investigate generalizable trends as well as detailed linguistic characteristics of evaluative meaning across disciplines and time. The approach presented in this thesis draws its strength from being grounded in a linguistic theory, which proved to be extremely useful in defining and testing hypotheses and interpreting results. Moreover, an empirical analysis of evaluative meaning across disciplines and time was possible by combining corpus-based methods with data mining techniques.In der vorliegenden Dissertation werden Bewertungscharakteristiken verschiedener Wissenschaftsdisziplinen erarbeitet und ihre diachrone linguistische Entwicklung untersucht. Ein Hauptfokus liegt auf in neuerer Zeit entstandenen Disziplinen (z. B. Computerlinguistik), die sich durch Kontakt zwischen zwei anderen Disziplinen gebildet haben (z. B. Informatik und Linguistik). In diesem Zusammenhang wird erforscht, (1) ob diese neu entstandenen Disziplinen diachron ihre eigenen Charakteristiken entwickeln und somit einen Diversifikationsprozess aufzeigen und (2) ob sie auch Charakteristiken der Ursprungsdisziplinen übernehmen und somit eine linguistische Prägung aus der Ursprungsdisziplin vorweisen und ob sich diese möglicherweise diachron verändert hat. Die untersuchten relativ neu entstandenen Disziplinen sind die Computerlinguistik, Bioinformatik, Bauinformatik und Mikroelektronik, die durch Kontakt zwischen der Informatik und einer anderen Disziplin entstanden sind, in unserem Fall entsprechend aus der Linguistik, Biologie, dem Maschinenbau und der Elektrotechnik. Die Arbeit basiert auf der soziolinguistischen Theorie der Systemisch Funktionalen Linguistik (SFL; Halliday (2004)). Aufgrund ihrer funktionalen Perspektive auf die Sprache war es uns möglich, das semantische Konzept der Bewertung in eine linguistische Theorie zu positionieren und ein Analysemodel zu entwickeln, um die Auswahl aus dem semantischen System auf der lexicogrammatischen Ebene nachzuverfolgen. Besonders wichtig ist hierbei auch das Registerkonzept aus der SFL, das sich mit funktionaler Variation befasst, d.h. Variation in Bezug auf den Sprachgebrauch. Die Kombination aus funktionaler Variation und soziolinguistischer Perspektive hat es erlaubt, die linguistischen Entscheidungen in Bezug auf Bewertungen, die in unterschiedlichen sozialen Kontexten (d.h. den verschiedenen Disziplinen) gefällt wurden, zu untersuchen und diese zu vergleichen. Dadurch konnten für die untersuchten Disziplinen registerspezifische Diversifikationsprozesse und Prägungen bezüglich Bewertungen ausgemacht werden. Methodisch wurde aus dem Bereich des Data Mining die Klassifikation angewandt, die es erlaubt hat, die Ergebnisse aus einer makro- und mikro-analytischen Perspektive (vgl. Jockers (2013)) zu erforschen. Dadurch konnten Erkenntnisse erlangt werden in Bezug auf den Diversifikations- und Prägungsgrad (Makro-Analyse) sowie der Art der Diversifikation und Prägung (Mikro-Analyse). Studien haben bislang entweder die makro- oder die mikro-analytische Perspektive angewandt. Durch den Einbezug beider Ebenen ist es uns gelungen, sowohl generalisierbare Tendenzen festzustellen als auch detaillierte linguistische Charakteristiken und diachrone Veränderungen von Bewertungsausdrücken in verschiedenen Disziplinen zu untersuchen. Die Stärken des in der vorliegenden Dissertation präsentierten Ansatzes liegen darin, dass er in einer linguistischen Theorie fundiert ist, die sich sehr hilfreich erwiesen hat bei der Hypothesenaufstellung und beim Testen der Hypothesen sowie auch bei der Interpretation der Ergebnisse. Darüber hinaus hat der Ansatz eine empirische Analyse von Bewertungen in wissenschaftlichen Disziplinen durch das Zusammenspiel von korpus-basierten Methoden und Techniken aus dem Data Mining ermöglicht

    Evaluative meaning in scientific writing : macro- and micro-analytic perspectives using data mining

    Get PDF
    In this thesis, we elaborate characteristics of evaluative meaning of different scientific disciplines and trace their diachronic linguistic evolution. A main focus lies on newly emerged disciplines, such as computational linguistics, which emerged through contact between two other disciplines, such as computer science and linguistics. Here, we consider (1) whether these newly emerged disciplines have created characteristics of their own over time, showing a process of diversification, and (2) whether they have also adopted characteristics from their disciplines of origin, reflected in a linguistic imprint, and if this might have changed over time. The newly emerged disciplines considered are computational linguistics, bioinformatics, digital construction and microelectronics, which have emerged through contact between computer science and a further discipline (linguistics, biology, mechanical engineering, and electrical engineering, respectively). In terms of theory, this work is grounded in a linguistic theory rooted in sociolinguistics, Systemic Functional Linguistics (SFL; Halliday, 2004), which with its functional perspective on language allowed us to position evaluative meaning within a linguistic theory and to create a model of analysis to trace choices made in the semantic system on the level of lexico-grammar. Moreover, its notion of register, concerned with functional variation, i.e. variation according to language use, combined with the sociolinguistic perspective made it possible to compare the linguistic choices made according to different social contexts, to which the disciplines belong. This allowed us to trace register diversification processes and registerial imprint of evaluative meaning across disciplines. In terms of methods, we apply classification as a data mining technique, taking a macro- and micro-analytic perspective (cf. Jockers, 2013) on the results. Doing so we gain insights on the degree of diversification and imprint (macro-analysis) and the kind of diversification and imprint (micro-analysis). Studies so far have considered either the macro- or the micro-analytic perspective. By considering both, we are able to investigate generalizable trends as well as detailed linguistic characteristics of evaluative meaning across disciplines and time. The approach presented in this thesis draws its strength from being grounded in a linguistic theory, which proved to be extremely useful in defining and testing hypotheses and interpreting results. Moreover, an empirical analysis of evaluative meaning across disciplines and time was possible by combining corpus-based methods with data mining techniques.In der vorliegenden Dissertation werden Bewertungscharakteristiken verschiedener Wissenschaftsdisziplinen erarbeitet und ihre diachrone linguistische Entwicklung untersucht. Ein Hauptfokus liegt auf in neuerer Zeit entstandenen Disziplinen (z. B. Computerlinguistik), die sich durch Kontakt zwischen zwei anderen Disziplinen gebildet haben (z. B. Informatik und Linguistik). In diesem Zusammenhang wird erforscht, (1) ob diese neu entstandenen Disziplinen diachron ihre eigenen Charakteristiken entwickeln und somit einen Diversifikationsprozess aufzeigen und (2) ob sie auch Charakteristiken der Ursprungsdisziplinen übernehmen und somit eine linguistische Prägung aus der Ursprungsdisziplin vorweisen und ob sich diese möglicherweise diachron verändert hat. Die untersuchten relativ neu entstandenen Disziplinen sind die Computerlinguistik, Bioinformatik, Bauinformatik und Mikroelektronik, die durch Kontakt zwischen der Informatik und einer anderen Disziplin entstanden sind, in unserem Fall entsprechend aus der Linguistik, Biologie, dem Maschinenbau und der Elektrotechnik. Die Arbeit basiert auf der soziolinguistischen Theorie der Systemisch Funktionalen Linguistik (SFL; Halliday (2004)). Aufgrund ihrer funktionalen Perspektive auf die Sprache war es uns möglich, das semantische Konzept der Bewertung in eine linguistische Theorie zu positionieren und ein Analysemodel zu entwickeln, um die Auswahl aus dem semantischen System auf der lexicogrammatischen Ebene nachzuverfolgen. Besonders wichtig ist hierbei auch das Registerkonzept aus der SFL, das sich mit funktionaler Variation befasst, d.h. Variation in Bezug auf den Sprachgebrauch. Die Kombination aus funktionaler Variation und soziolinguistischer Perspektive hat es erlaubt, die linguistischen Entscheidungen in Bezug auf Bewertungen, die in unterschiedlichen sozialen Kontexten (d.h. den verschiedenen Disziplinen) gefällt wurden, zu untersuchen und diese zu vergleichen. Dadurch konnten für die untersuchten Disziplinen registerspezifische Diversifikationsprozesse und Prägungen bezüglich Bewertungen ausgemacht werden. Methodisch wurde aus dem Bereich des Data Mining die Klassifikation angewandt, die es erlaubt hat, die Ergebnisse aus einer makro- und mikro-analytischen Perspektive (vgl. Jockers (2013)) zu erforschen. Dadurch konnten Erkenntnisse erlangt werden in Bezug auf den Diversifikations- und Prägungsgrad (Makro-Analyse) sowie der Art der Diversifikation und Prägung (Mikro-Analyse). Studien haben bislang entweder die makro- oder die mikro-analytische Perspektive angewandt. Durch den Einbezug beider Ebenen ist es uns gelungen, sowohl generalisierbare Tendenzen festzustellen als auch detaillierte linguistische Charakteristiken und diachrone Veränderungen von Bewertungsausdrücken in verschiedenen Disziplinen zu untersuchen. Die Stärken des in der vorliegenden Dissertation präsentierten Ansatzes liegen darin, dass er in einer linguistischen Theorie fundiert ist, die sich sehr hilfreich erwiesen hat bei der Hypothesenaufstellung und beim Testen der Hypothesen sowie auch bei der Interpretation der Ergebnisse. Darüber hinaus hat der Ansatz eine empirische Analyse von Bewertungen in wissenschaftlichen Disziplinen durch das Zusammenspiel von korpus-basierten Methoden und Techniken aus dem Data Mining ermöglicht
    corecore