Search CORE

833 research outputs found

Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

Author: Gallie W. B.
Jin P.
Kusner M. J.
Le Q. V.
Mikolov T.
Mikolov T.
Mikolov T.
Reese S. D.
Publication venue
Publication date: 01/01/2017
Field of study

Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints--broadly defined as a set of texts that share a specific metadata feature, which can be a time-period, but also a social entity such as a political party. For each viewpoint, we learn a semantic space in which each word is represented as a low dimensional neural embedded vector. The challenge is to compare the meaning of a word in one space to its meaning in another space and measure the size of the semantic shifts. We compare the effectiveness of a measure based on optimal transformations between the two spaces with a measure based on the similarity of the neighbors of the word in the respective spaces. Our experiments demonstrate that the combination of these two performs best. We show that the semantic shifts not only occur over time, but also along different viewpoints in a short period of time. For evaluation, we demonstrate how this approach captures meaningful semantic shifts and can help improve other tasks such as the contrastive viewpoint summarization and ideology detection (measured as classification accuracy) in political texts. We also show that the two laws of semantic change which were empirically shown to hold for temporal shifts also hold for shifts across viewpoints. These laws state that frequent words are less likely to shift meaning while words with many senses are more likely to do so.Comment: In Proceedings of the 26th ACM International on Conference on Information and Knowledge Management (CIKM2017

arXiv.org e-Print Archive

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

A Review On Automatic Text Summarization Approaches

Author: Basiron Halizah
C Suppiah Puspalata
Jaya Kumar Yogan
Ngo Hea Choon
Ong Sing Goh
Publication venue: 'Science Publications'
Publication date: 01/01/2016
Field of study

It has been more than 50 years since the initial investigation on automatic text summarization was started.Various techniques have been successfully used to extract the important contents from text document to represent document summary.In this study,we review some of the studies that have been conducted in this still-developing research area.It covers the basics of text summarization,the types of summarization,the methods that have been used and some areas in which text summarization has been applied.Furthermore,this paper also reviews the significant efforts which have been put in studies concerning sentence extraction,domain specific summarization and multi document summarization and provides the theoretical explanation and the fundamental concepts related to it.In addition,the advantages and limitations concerning the approaches commonly used for text summarization are also highlighted in this study

Universiti Teknikal Malaysia Melaka (UTeM) Repository

Text documents clustering using modified multi-verse optimizer

Author: Abasi Ammar Kamal
Al-Betar Mohammed Azmi
Alomari Osama Ahmad
Awadallah Mohammed A.
Khader Ahamad Tajudin
Naim Syibrah
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/12/2020
Field of study

In this study, a multi-verse optimizer (MVO) is utilised for the text document clus- tering (TDC) problem. TDC is treated as a discrete optimization problem, and an objective function based on the Euclidean distance is applied as similarity measure. TDC is tackled by the division of the documents into clusters; documents belonging to the same cluster are similar, whereas those belonging to different clusters are dissimilar. MVO, which is a recent metaheuristic optimization algorithm established for continuous optimization problems, can intelligently navigate different areas in the search space and search deeply in each area using a particular learning mechanism. The proposed algorithm is called MVOTDC, and it adopts the convergence behaviour of MVO operators to deal with discrete, rather than continuous, optimization problems. For evaluating MVOTDC, a comprehensive comparative study is conducted on six text document datasets with various numbers of documents and clusters. The quality of the ﬁnal results is assessed using precision, recall, F-measure, entropy accuracy, and purity measures. Experimental results reveal that the proposed method performs competitively in comparison with state-of-the-art algorithms. Statistical analysis is also conducted and shows that MVOTDC can produce signiﬁcant results in comparison with three well-established methods

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Eigendecompositions of Transfer Operators in Reproducing Kernel Hilbert Spaces

Author: Klus Stefan
Muandet Krikamol
Schuster Ingmar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/12/2017
Field of study

Transfer operators such as the Perron--Frobenius or Koopman operator play an important role in the global analysis of complex dynamical systems. The eigenfunctions of these operators can be used to detect metastable sets, to project the dynamics onto the dominant slow processes, or to separate superimposed signals. We extend transfer operator theory to reproducing kernel Hilbert spaces and show that these operators are related to Hilbert space representations of conditional distributions, known as conditional mean embeddings in the machine learning community. Moreover, numerical methods to compute empirical estimates of these embeddings are akin to data-driven methods for the approximation of transfer operators such as extended dynamic mode decomposition and its variants. One main benefit of the presented kernel-based approaches is that these methods can be applied to any domain where a similarity measure given by a kernel is available. We illustrate the results with the aid of guiding examples and highlight potential applications in molecular dynamics as well as video and text data analysis

arXiv.org e-Print Archive

Heriot Watt Pure

Repository: Freie Universität Berlin (FU), Math Department (fu_mi_publications)

MPG.PuRe

Hybrid harmony search algorithm for continuous optimization problems

Author: Ala’a Atallah Hamad Alomoush
Publication venue
Publication date: 01/09/2020
Field of study

Harmony Search (HS) algorithm has been extensively adopted in the literature to address optimization problems in many different fields, such as industrial design, civil engineering, electrical and mechanical engineering problems. In order to ensure its search performance, HS requires extensive tuning of its four parameters control namely harmony memory size (HMS), harmony memory consideration rate (HMCR), pitch adjustment rate (PAR), and bandwidth (BW). However, tuning process is often cumbersome and is problem dependent. Furthermore, there is no one size fits all problems. Additionally, despite many useful works, HS and its variant still suffer from weak exploitation which can lead to poor convergence problem. Addressing these aforementioned issues, this thesis proposes to augment HS with adaptive tuning using Grey Wolf Optimizer (GWO). Meanwhile, to enhance its exploitation, this thesis also proposes to adopt a new variant of the opposition-based learning technique (OBL). Taken together, the proposed hybrid algorithm, called IHS-GWO, aims to address continuous optimization problems. The IHS-GWO is evaluated using two standard benchmarking sets and two real-world optimization problems. The first benchmarking set consists of 24 classical benchmark unimodal and multimodal functions whilst the second benchmark set contains 30 state-of-the-art benchmark functions from the Congress on Evolutionary Computation (CEC). The two real-world optimization problems involved the three-bar truss and spring design. Statistical analysis using Wilcoxon rank-sum and Friedman of IHS-GWO’s results with recent HS variants and other metaheuristic demonstrate superior performance

UMP Institutional Repository

Programmable Insight: A Computational Methodology to Explore Online News Use of Frames

Author
Publication venue
Publication date: 01/01/2017
Field of study

abstract: The Internet is a major source of online news content. Online news is a form of large-scale narrative text with rich, complex contents that embed deep meanings (facts, strategic communication frames, and biases) for shaping and transitioning standards, values, attitudes, and beliefs of the masses. Currently, this body of narrative text remains untapped due—in large part—to human limitations. The human ability to comprehend rich text and extract hidden meanings is far superior to known computational algorithms but remains unscalable. In this research, computational treatment is given to online news framing for exposing a deeper level of expressivity coined “double subjectivity” as characterized by its cumulative amplification effects. A visual language is offered for extracting spatial and temporal dynamics of double subjectivity that may give insight into social influence about critical issues, such as environmental, economic, or political discourse. This research offers benefits of 1) scalability for processing hidden meanings in big data and 2) visibility of the entire network dynamics over time and space to give users insight into the current status and future trends of mass communication.Dissertation/ThesisDoctoral Dissertation Computer Science 201

ASU Digital Repository

Microarray-Based Sketches of the HERV Transcriptome Landscape

Author: A Boese
A Buzdin
A Buzdin
A Flockerzi
A Flockerzi
A Forsman
A Katzourakis
A Rearden
A Schwartz
A Smallwood
AB Conley
AB Rabson
AE Peaston
AFA Smit
AFA Smit
AS Ptolemy
B Bjerregaard
Bertrand Bonnaud
BR Cullen
C Hu
CA Dunn
CA Dunn
CJ Cohen
Consortium Mouse Genome Sequencing
Cécile Montgiraud
D Bello
D Jjingo
D Reiss
DA Nickerson
DL Mager
E Birney
E Gogvadze
ES Lander
F Li
F Mallet
F Wang-Johanning
François Mallet
G Lee BShin
G Navarro
G Okahara
GJ Faulkner
J Brosius
J Costas
J Gimenez
J Gimenez
J Paces
JC Venter
JD Hollister
JD Storey
JF Hughes
JF Hughes
JJ Goedert
JL Blond
JM Jern PCoffin
JP Pichon
JP Pichon
Juliette Gimenez
K Boller
K Boller
K Boller
K Buscher
K Kimura
K Trejbalova
LC Samuelson
LN van de Lagemaat
LN van de Lagemaat
M Denne
M Long
M Matouskova
M Oja
M Sauter
Magali Jaillard
MM Webber
MT Romanish
N Bannert
N Bannert
N de Parseval
N Wentzensen
Nathalie Mugnier
P Medstrand
P Pérot
Philippe Pérot
Q Liang
Q Liang
R Belshaw
R Lower
R Lower
R Strick
RA Irizarry
RC Gentleman
Richard Cordaux
S Blaise
S Blaise
S Ehlhardt
S Kaufmann
S Mi
S Muradrasoli
S Prudhomme
S Prudhomme
SJ Cooper
TW Lyden
U Martin
V Armbruester
V Armbruester
VG Tusher
VM Ruda
W Seifarth
WE Johnson
WE Johnson
WF Doolittle
Y Stauffer
Y Sun
YH Cheng
Publication venue: Public Library of Science
Publication date: 01/01/2012
Field of study

Human endogenous retroviruses (HERVs) are spread throughout the genome and their long terminal repeats (LTRs) constitute a wide collection of putative regulatory sequences. Phylogenetic similarities and the profusion of integration sites, two inherent characteristics of transposable elements, make it difficult to study individual locus expression in a large-scale approach, and historically apart from some placental and testis-regulated elements, it was generally accepted that HERVs are silent due to epigenetic control. Herein, we have introduced a generic method aiming to optimally characterize individual loci associated with 25-mer probes by minimizing cross-hybridization risks. We therefore set up a microarray dedicated to a collection of 5,573 HERVs that can reasonably be assigned to a unique genomic position. We obtained a first view of the HERV transcriptome by using a composite panel of 40 normal and 39 tumor samples. The experiment showed that almost one third of the HERV repertoire is indeed transcribed. The HERV transcriptome follows tropism rules, is sensitive to the state of differentiation and, unexpectedly, seems not to correlate with the age of the HERV families. The probeset definition within the U3 and U5 regions was used to assign a function to some LTRs (i.e. promoter or polyA) and revealed that (i) autonomous active LTRs are broadly subjected to operational determinism (ii) the cellular gene density is substantially higher in the surrounding environment of active LTRs compared to silent LTRs and (iii) the configuration of neighboring cellular genes differs between active and silent LTRs, showing an approximately 8 kb zone upstream of promoter LTRs characterized by a drastic reduction in sense cellular genes. These gathered observations are discussed in terms of virus/host adaptive strategies, and together with the methods and tools developed for this purpose, this work paves the way for further HERV transcriptome projects

Public Library of Science (PLOS)

Crossref

INRIA a CCSD electronic archive server

Directory of Open Access Journals

Advances in Meta-Heuristic Optimization Algorithms in Big Data Text Clustering

Author: Abualigah L
Alshinwan M
Elaziz MA
Gandomi AH
Hamad HA
Khasawneh AM
Omari M
Publication venue: 'MDPI AG'
Publication date: 12/05/2021
Field of study

This paper presents a comprehensive survey of the meta-heuristic optimization algorithms on the text clustering applications and highlights its main procedures. These Artificial Intelligence (AI) algorithms are recognized as promising swarm intelligence methods due to their successful ability to solve machine learning problems, especially text clustering problems. This paper reviews all of the relevant literature on meta-heuristic-based text clustering applications, including many variants, such as basic, modified, hybridized, and multi-objective methods. As well, the main procedures of text clustering and critical discussions are given. Hence, this review reports its advantages and disadvantages and recommends potential future research paths. The main keywords that have been considered in this paper are text, clustering, meta-heuristic, optimization, and algorithm

OPUS - University of Technology Sydney

Identification and monitoring polarization from social network perspective

Author: Wang X. (Xiaowen)
Publication venue: University of Oulu
Publication date: 17/09/2020
Field of study

Abstract. Polarization is a new phenomenon that threatens the cohesion and social development of our society. The raise of social media is known to have contributed significantly to the emergence of this phenomenon as it can be noticed from the multiplication of far right and racist online communities as well as the ill-structured political discourse. This can be noticed from scrutinizing recent US or EU elections. Automatic identification of polarization from social media plays a key role in devising appropriate defence strategy to tackle the issue and avoid escalation. This thesis implements several methods to identify polarization from Twitter data issued from Trump-Clinton US election campaign using metrics like Belief Polarization Index (BPI) and Sentiment Analysis. Furtherly, semantic role labelling and argument mining were applied to derive structure of arguments of polarized discourse. Especially, we constructed thirteen topics of interests that were used as potential candidates for polarized discourse. For each topic, the cosine distance of the frequency of the topic overtime between the two candidates was used to indicate the polarization (called as Belief Polarization Index). The statistics inference of sentiment scores was implemented to convey either a positive or negative polarity, which are then further examined using argument structure. All the proposed approaches provide attempts to measure the polarization between two individuals from different perspectives, which may give some hints or references for future research.Tiivistelmä. Polarisaatio on uusi ilmiö, joka uhkaa yhteiskuntamme yhteenkuuluvuutta ja sosiaalista kehitystä. Sosiaalisen median nousun tiedetään vaikuttaneen merkittävästi tämän ilmiön syntymiseen, koska se voidaan havaita äärioikeistolaisten ja rasististen verkkoyhteisöjen lisääntymisestä sekä huonosti jäsennellystä poliittisesta keskustelusta. Tämä voidaan havaita tarkastelemalla äskettäisiä Yhdysvaltojen tai EU: n vaaleja. Polarisaation automaattisella tunnistamisella sosiaalisesta mediasta on keskeinen rooli sopivan puolustusstrategian suunnittelussa ongelman ratkaisemiseksi ja eskalaation välttämiseksi. Tässä opinnäytetyössä toteutetaan useita menetelmiä polarisaation tunnistamiseksi Yhdysvaltain Trump-Clintonin vaalikampanjan Twitter-tiedoista käyttämällä mittareita, kuten vakaumuspolarisaatio indeksi (BPI) ja mielipiteiden analyysi. Lisäksi semanttisen roolin merkintöjä ja argumenttien louhintaa sovellettiin polarisoidun diskurssin argumenttien rakenteen johtamiseen. Erityisesti rakensimme kolmetoista aihepiiriä, joita käytettiin potentiaalisina ehdokkaina polarisoituneeseen keskusteluun. Kunkin aiheen kohdalla kahden ehdokkaan aiheiden ylityötiheyden kosinietäisyyttä käytettiin osoittamaan polarisaatiota (kutsutaan nimellä Belief Polarization Index). Tunnelmapisteiden tilastollinen päättely toteutettiin joko positiivisen tai negatiivisen napaisuuden välittämiseksi, joita sitten tutkitaan edelleen argumenttirakennetta käyttäen. Kaikki ehdotetut lähestymistavat tarjoavat yrityksiä mitata kahden ihmisen välistä polarisaatiota eri näkökulmista, mikä saattaa antaa vihjeitä tai viitteitä tulevaa tutkimusta varten

University of Oulu Repository - Jultika