833 research outputs found

    Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

    Get PDF
    Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a set of texts that share a specific metadata feature, which can be a time-period, but also a social entity such as a political party. For each viewpoint, we learn a semantic space in which each word is represented as a low dimensional neural embedded vector. The challenge is to compare the meaning of a word in one space to its meaning in another space and measure the size of the semantic shifts. We compare the effectiveness of a measure based on optimal transformations between the two spaces with a measure based on the similarity of the neighbors of the word in the respective spaces. Our experiments demonstrate that the combination of these two performs best. We show that the semantic shifts not only occur over time, but also along different viewpoints in a short period of time. For evaluation, we demonstrate how this approach captures meaningful semantic shifts and can help improve other tasks such as the contrastive viewpoint summarization and ideology detection (measured as classification accuracy) in political texts. We also show that the two laws of semantic change which were empirically shown to hold for temporal shifts also hold for shifts across viewpoints. These laws state that frequent words are less likely to shift meaning while words with many senses are more likely to do so.Comment: In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management (CIKM2017

    A Review On Automatic Text Summarization Approaches

    Get PDF
    It has been more than 50 years since the initial investigation on automatic text summarization was started.Various techniques have been successfully used to extract the important contents from text document to represent document summary.In this study,we review some of the studies that have been conducted in this still-developing research area.It covers the basics of text summarization,the types of summarization,the methods that have been used and some areas in which text summarization has been applied.Furthermore,this paper also reviews the significant efforts which have been put in studies concerning sentence extraction,domain specific summarization and multi document summarization and provides the theoretical explanation and the fundamental concepts related to it.In addition,the advantages and limitations concerning the approaches commonly used for text summarization are also highlighted in this study

    Text documents clustering using modified multi-verse optimizer

    Get PDF
    In this study, a multi-verse optimizer (MVO) is utilised for the text document clus- tering (TDC) problem. TDC is treated as a discrete optimization problem, and an objective function based on the Euclidean distance is applied as similarity measure. TDC is tackled by the division of the documents into clusters; documents belonging to the same cluster are similar, whereas those belonging to different clusters are dissimilar. MVO, which is a recent metaheuristic optimization algorithm established for continuous optimization problems, can intelligently navigate different areas in the search space and search deeply in each area using a particular learning mechanism. The proposed algorithm is called MVOTDC, and it adopts the convergence behaviour of MVO operators to deal with discrete, rather than continuous, optimization problems. For evaluating MVOTDC, a comprehensive comparative study is conducted on six text document datasets with various numbers of documents and clusters. The quality of the ïŹnal results is assessed using precision, recall, F-measure, entropy accuracy, and purity measures. Experimental results reveal that the proposed method performs competitively in comparison with state-of-the-art algorithms. Statistical analysis is also conducted and shows that MVOTDC can produce signiïŹcant results in comparison with three well-established methods

    Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces

    Get PDF
    Transfer operators such as the Perron--Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show that these operators are related to Hilbert space representations of conditional distributions, known as conditional mean embeddings in the machine learning community. Moreover, numerical methods to compute empirical estimates of these embeddings are akin to data-driven methods for the approximation of transfer operators such as extended dynamic mode decomposition and its variants. One main benefit of the presented kernel-based approaches is that these methods can be applied to any domain where a similarity measure given by a kernel is available. We illustrate the results with the aid of guiding examples and highlight potential applications in molecular dynamics as well as video and text data analysis

    Hybrid harmony search algorithm for continuous optimization problems

    Get PDF
    Harmony Search (HS) algorithm has been extensively adopted in the literature to address optimization problems in many different fields, such as industrial design, civil engineering, electrical and mechanical engineering problems. In order to ensure its search performance, HS requires extensive tuning of its four parameters control namely harmony memory size (HMS), harmony memory consideration rate (HMCR), pitch adjustment rate (PAR), and bandwidth (BW). However, tuning process is often cumbersome and is problem dependent. Furthermore, there is no one size fits all problems. Additionally, despite many useful works, HS and its variant still suffer from weak exploitation which can lead to poor convergence problem. Addressing these aforementioned issues, this thesis proposes to augment HS with adaptive tuning using Grey Wolf Optimizer (GWO). Meanwhile, to enhance its exploitation, this thesis also proposes to adopt a new variant of the opposition-based learning technique (OBL). Taken together, the proposed hybrid algorithm, called IHS-GWO, aims to address continuous optimization problems. The IHS-GWO is evaluated using two standard benchmarking sets and two real-world optimization problems. The first benchmarking set consists of 24 classical benchmark unimodal and multimodal functions whilst the second benchmark set contains 30 state-of-the-art benchmark functions from the Congress on Evolutionary Computation (CEC). The two real-world optimization problems involved the three-bar truss and spring design. Statistical analysis using Wilcoxon rank-sum and Friedman of IHS-GWO’s results with recent HS variants and other metaheuristic demonstrate superior performance

    Programmable Insight: A Computational Methodology to Explore Online News Use of Frames

    Get PDF
    abstract: The Internet is a major source of online news content. Online news is a form of large-scale narrative text with rich, complex contents that embed deep meanings (facts, strategic communication frames, and biases) for shaping and transitioning standards, values, attitudes, and beliefs of the masses. Currently, this body of narrative text remains untapped due—in large part—to human limitations. The human ability to comprehend rich text and extract hidden meanings is far superior to known computational algorithms but remains unscalable. In this research, computational treatment is given to online news framing for exposing a deeper level of expressivity coined “double subjectivity” as characterized by its cumulative amplification effects. A visual language is offered for extracting spatial and temporal dynamics of double subjectivity that may give insight into social influence about critical issues, such as environmental, economic, or political discourse. This research offers benefits of 1) scalability for processing hidden meanings in big data and 2) visibility of the entire network dynamics over time and space to give users insight into the current status and future trends of mass communication.Dissertation/ThesisDoctoral Dissertation Computer Science 201

    Microarray-Based Sketches of the HERV Transcriptome Landscape

    Get PDF
    Human endogenous retroviruses (HERVs) are spread throughout the genome and their long terminal repeats (LTRs) constitute a wide collection of putative regulatory sequences. Phylogenetic similarities and the profusion of integration sites, two inherent characteristics of transposable elements, make it difficult to study individual locus expression in a large-scale approach, and historically apart from some placental and testis-regulated elements, it was generally accepted that HERVs are silent due to epigenetic control. Herein, we have introduced a generic method aiming to optimally characterize individual loci associated with 25-mer probes by minimizing cross-hybridization risks. We therefore set up a microarray dedicated to a collection of 5,573 HERVs that can reasonably be assigned to a unique genomic position. We obtained a first view of the HERV transcriptome by using a composite panel of 40 normal and 39 tumor samples. The experiment showed that almost one third of the HERV repertoire is indeed transcribed. The HERV transcriptome follows tropism rules, is sensitive to the state of differentiation and, unexpectedly, seems not to correlate with the age of the HERV families. The probeset definition within the U3 and U5 regions was used to assign a function to some LTRs (i.e. promoter or polyA) and revealed that (i) autonomous active LTRs are broadly subjected to operational determinism (ii) the cellular gene density is substantially higher in the surrounding environment of active LTRs compared to silent LTRs and (iii) the configuration of neighboring cellular genes differs between active and silent LTRs, showing an approximately 8 kb zone upstream of promoter LTRs characterized by a drastic reduction in sense cellular genes. These gathered observations are discussed in terms of virus/host adaptive strategies, and together with the methods and tools developed for this purpose, this work paves the way for further HERV transcriptome projects

    Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering

    Full text link
    This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm

    Identification and monitoring polarization from social network perspective

    Get PDF
    Abstract. Polarization is a new phenomenon that threatens the cohesion and social development of our society. The raise of social media is known to have contributed significantly to the emergence of this phenomenon as it can be noticed from the multiplication of far right and racist online communities as well as the ill-structured political discourse. This can be noticed from scrutinizing recent US or EU elections. Automatic identification of polarization from social media plays a key role in devising appropriate defence strategy to tackle the issue and avoid escalation. This thesis implements several methods to identify polarization from Twitter data issued from Trump-Clinton US election campaign using metrics like Belief Polarization Index (BPI) and Sentiment Analysis. Furtherly, semantic role labelling and argument mining were applied to derive structure of arguments of polarized discourse. Especially, we constructed thirteen topics of interests that were used as potential candidates for polarized discourse. For each topic, the cosine distance of the frequency of the topic overtime between the two candidates was used to indicate the polarization (called as Belief Polarization Index). The statistics inference of sentiment scores was implemented to convey either a positive or negative polarity, which are then further examined using argument structure. All the proposed approaches provide attempts to measure the polarization between two individuals from different perspectives, which may give some hints or references for future research.TiivistelmÀ. Polarisaatio on uusi ilmiö, joka uhkaa yhteiskuntamme yhteenkuuluvuutta ja sosiaalista kehitystÀ. Sosiaalisen median nousun tiedetÀÀn vaikuttaneen merkittÀvÀsti tÀmÀn ilmiön syntymiseen, koska se voidaan havaita ÀÀrioikeistolaisten ja rasististen verkkoyhteisöjen lisÀÀntymisestÀ sekÀ huonosti jÀsennellystÀ poliittisesta keskustelusta. TÀmÀ voidaan havaita tarkastelemalla ÀskettÀisiÀ Yhdysvaltojen tai EU: n vaaleja. Polarisaation automaattisella tunnistamisella sosiaalisesta mediasta on keskeinen rooli sopivan puolustusstrategian suunnittelussa ongelman ratkaisemiseksi ja eskalaation vÀlttÀmiseksi. TÀssÀ opinnÀytetyössÀ toteutetaan useita menetelmiÀ polarisaation tunnistamiseksi Yhdysvaltain Trump-Clintonin vaalikampanjan Twitter-tiedoista kÀyttÀmÀllÀ mittareita, kuten vakaumuspolarisaatio indeksi (BPI) ja mielipiteiden analyysi. LisÀksi semanttisen roolin merkintöjÀ ja argumenttien louhintaa sovellettiin polarisoidun diskurssin argumenttien rakenteen johtamiseen. Erityisesti rakensimme kolmetoista aihepiiriÀ, joita kÀytettiin potentiaalisina ehdokkaina polarisoituneeseen keskusteluun. Kunkin aiheen kohdalla kahden ehdokkaan aiheiden ylityötiheyden kosinietÀisyyttÀ kÀytettiin osoittamaan polarisaatiota (kutsutaan nimellÀ Belief Polarization Index). Tunnelmapisteiden tilastollinen pÀÀttely toteutettiin joko positiivisen tai negatiivisen napaisuuden vÀlittÀmiseksi, joita sitten tutkitaan edelleen argumenttirakennetta kÀyttÀen. Kaikki ehdotetut lÀhestymistavat tarjoavat yrityksiÀ mitata kahden ihmisen vÀlistÀ polarisaatiota eri nÀkökulmista, mikÀ saattaa antaa vihjeitÀ tai viitteitÀ tulevaa tutkimusta varten
    • 

    corecore