Search CORE

7,353 research outputs found

Generating indicative-informative summaries with SumUM

Author: Benbrahim Mohamed
Guy Lapalme
Horacio Saggion
Jing Hongyan
Johnson Frances C
Jordan Michael P
Radev Dragomir R
Teufel S.
Tombros Anastasios
Publication venue: 'MIT Press - Journals'
Publication date: 01/01/2002
Field of study

We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies

CiteSeerX

Crossref

White Rose Research Online

Head to head: Semantic similarity of multi-word terms

Author: Buerki Andreas
Corcoran Padraig
Gagarin Andrei
Spasic Irena
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

Terms are linguistic signifiers of domain–specific concepts. Semantic similarity between terms refers to the corresponding distance in the conceptual space. In this study, we use lexico–syntactic information to define a vector space representation in which cosine similarity closely approximates semantic similarity between the corresponding terms. Given a multi–word term, each word is weighed in terms of its defining properties. In this context, the head noun is given the highest weight. Other words are weighed depending on their relations to the head noun. We formalized the problem as that of determining a topological ordering of a direct acyclic graph, which is based on constituency and dependency relations within a noun phrase. To counteract the errors associated with automatically inferred constituency and dependency relations, we implemented a heuristic approach to approximating the topological ordering. Different weights are assigned to different words based on their positions. Clustering experiments performed on such a vector space representation showed considerable improvement over the conventional bag–of–word representation. Specifically, it more consistently reflected semantic similarity between the terms. This was established by analyzing the differences between automatically generated dendrograms and manually constructed taxonomies. In conclusion, our method can be used to semi–automate taxonomy construction

Online Research @ Cardiff

Semi-automatic discovery of multilingual elements in English historical corpora : Methods and challenges

Author: Nurmi Arja
Tuominen Jukka
Tyrkkö Jukka
Publication venue: 'Brill'
Publication date: 10/01/2020
Field of study

Trepo - Institutional Repository of Tampere University

The polylogue project: part 1: shortmind

Author: Bazarnik Katarzyna
Bindervoet Erik
Conley Tim
Henkes Robbert-Jan
Mihálycsa Erika
Păcurar Elena
Sanz Gallego Guillermo
Senn Fritz
Terrinoni Enrico
Wawrzycka Jolanta
Publication venue
Publication date: 01/01/2012
Field of study

The aim of this collaborative project [edited by F. Senn, E. Mihalycsa and J. Wawrzycka], the work of ten authors and covering more than ten languages, is to chart the possibilities of translation to recreate in the TL texts, the anomalous, elliptic, pre-grammatical, inchoative forms that became almost a signature mark of the Joycean interior monologue, and which here are called 'shortmind'. It therefore addresses such issues as indeterminacy, (anomalous) word order, punctuation, ellipsis, polysemy, ungrammaticality, linguistic sub-standards etc., and examines the (un)willingness of translation texts to breach ingrained rules and norms of (syntactic, narrative) control, correctness and coherence, in the TL culture

Ghent University Academic Bibliography

Solutions to Detect and Analyze Online Radicalization : A Survey

Author: Correa Denzil
Sureka Ashish
Publication venue
Publication date: 21/01/2013
Field of study

Online Radicalization (also called Cyber-Terrorism or Extremism or Cyber-Racism or Cyber- Hate) is widespread and has become a major and growing concern to the society, governments and law enforcement agencies around the world. Research shows that various platforms on the Internet (low barrier to publish content, allows anonymity, provides exposure to millions of users and a potential of a very quick and widespread diffusion of message) such as YouTube (a popular video sharing website), Twitter (an online micro-blogging service), Facebook (a popular social networking website), online discussion forums and blogosphere are being misused for malicious intent. Such platforms are being used to form hate groups, racist communities, spread extremist agenda, incite anger or violence, promote radicalization, recruit members and create virtual organi- zations and communities. Automatic detection of online radicalization is a technically challenging problem because of the vast amount of the data, unstructured and noisy user-generated content, dynamically changing content and adversary behavior. There are several solutions proposed in the literature aiming to combat and counter cyber-hate and cyber-extremism. In this survey, we review solutions to detect and analyze online radicalization. We review 40 papers published at 12 venues from June 2003 to November 2011. We present a novel classification scheme to classify these papers. We analyze these techniques, perform trend analysis, discuss limitations of existing techniques and find out research gaps

arXiv.org e-Print Archive

CiteSeerX

The Emerging Scholarly Brain

Author: A Josang
A Lancichinetti
A Szalay
A-L Barabasi
A-L Barabasi
AJ Connolly
B Höldobler
C Alexander
CG Jung
CL Borgman
DO Hebb
EA Henneken
F Murtagh
J Bollen
J Bollen
J West
JA Baldwin
JD West
K Frisch von
KS Fu
L Leydesdorff
LL Thurstone
M Girvan
M Golay
M Rosvall
MEJ Newman
MJ Kurtz
MJ Kurtz
MJ Kurtz
MJ Kurtz
MJ Kurtz
O Stapledon
P Bonacich
P Ginsparg
PG Ossorio
PG Ossorio
PG Ossorio
PM Davis
PM Fitts
RJ Hanisch
S Brin
S Deerwester
S Fortunato
S Pinker
S Pinker
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/08/2010
Field of study

It is now a commonplace observation that human society is becoming a coherent super-organism, and that the information infrastructure forms its emerging brain. Perhaps, as the underlying technologies are likely to become billions of times more powerful than those we have today, we could say that we are now building the lizard brain for the future organism.Comment: to appear in Future Professional Communication in Astronomy-II (FPCA-II) editors A. Heck and A. Accomazz

arXiv.org e-Print Archive

Crossref

The TXM Portal Software giving access to Old French Manuscripts Online

Author: Heiden Serge
Lavrentiev Alexei
Publication venue: HAL CCSD
Publication date: 21/05/2012
Field of study

Texte intégral en ligne : http://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdfInternational audiencehttp://www.lrec-conf.org/proceedings/lrec2012/workshops/13.ProceedingsCultHeritage.pdf This paper presents the new TXM software platform giving online access to Old French Text Manuscripts images and tagged transcriptions for concordancing and text mining. This platform is able to import medieval sources encoded in XML according to the TEI Guidelines for linking manuscript images to transcriptions, encode several diplomatic levels of transcription including abbreviations and word level corrections. It includes a sophisticated tokenizer able to deal with TEI tags at different levels of linguistic hierarchy. Words are tagged on the fly during the import process using IMS TreeTagger tool with a specific language model. Synoptic editions displaying side by side manuscript images and text transcriptions are automatically produced during the import process. Texts are organized in a corpus with their own metadata (title, author, date, genre, etc.) and several word properties indexes are produced for the CQP search engine to allow efficient word patterns search to build different type of frequency lists or concordances. For syntactically annotated texts, special indexes are produced for the Tiger Search engine to allow efficient syntactic concordances building. The platform has also been tested on classical Latin, ancient Greek, Old Slavonic and Old Hieroglyphic Egyptian corpora (including various types of encoding and annotations)

HAL-ENS-LYON