Search CORE

1,119 research outputs found

Filaments of Meaning in Word Space

Author: Holst Anders
Karlgren Jussi
Sahlgren Magnus
Publication venue
Publication date: 01/01/2008
Field of study

Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dimensionality of typical vector space models lead to unintuitive effects on modeling likeness of meaning and that the local structure of word spaces is where interesting semantic relations reside. We show that the local structure of word spaces has substantially different dimensionality and character than the global space and that this structure shows potential to be exploited for further semantic analysis using methods for local analysis of vector space structure rather than globally scoped methods typically in use today such as singular value decomposition or principal component analysis

CiteSeerX

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

The Most Influential Paper Gerard Salton Never Wrote

Author: Dubin David
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign.
Publication date: 01/01/2004
Field of study

Gerard Salton is often credited with developing the vector space model (VSM) for information retrieval (IR). Citations to Salton give the impression that the VSM must have been articulated as an IR model sometime between 1970 and 1975. However, the VSM as it is understood today evolved over a longer time period than is usually acknowledged, and an articulation of the model and its assumptions did not appear in print until several years after those assumptions had been criticized and alternative models proposed. An often cited overview paper titled ???A Vector Space Model for Information Retrieval??? (alleged to have been published in 1975) does not exist, and citations to it represent a confusion of two 1975 articles, neither of which were overviews of the VSM as a model of information retrieval. Until the late 1970s, Salton did not present vector spaces as models of IR generally but rather as models of specifi c computations. Citations to the phantom paper refl ect an apparently widely held misconception that the operational features and explanatory devices now associated with the VSM must have been introduced at the same time it was fi rst proposed as an IR model.published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Automated Annotation-Based Bio-Ontology Alignment with Structural Validation

Author: Amanda White
Antonio Sanfilippo
Bob Baddeley
Carol Bult
Cliff Joslyn
Cliff Joslyn
Judith Blake
Karin Rodland
Mary Dolan
Rick Riensche
Publication venue
Publication date: 01/01/2009
Field of study

We outline the structure of an automated process to both align multiple bio-ontologies in terms of their genomic co-annotations, and then to measure the structural quality of that alignment. We illustrate the method with a genomic analysis of 70 genes implicated in lung disease against the Gene Ontology

Crossref

The Jackson Laboratory: The Mouseion at the JAXlibrary

Nature Precedings

Recommended from our members

Monitoring conceptual development with text mining technologies: CONSPECT

Author: Buelow Katja
Haley Debra
Wild Fridolin
Publication venue
Publication date: 01/10/2010
Field of study

This paper evaluates CONSPECT, a service that analyses states in a learner’s conceptual development. It combines two technologies – Latent Semantic Analysis to analyse text and Network Analysis (NA) to provide visualisations – into a technique called Meaningful Interaction Analysis (MIA). CONSPECT was designed to help both online learners and their tutors monitor their conceptual development. This paper reports on the validation experiments undertaken to determine how well LSA matches first year medical students in clustering concepts and in annotating text. The validation used several techniques, including card sorting and Likert scales. CONSPECT produces almost ‘peer’ quality results and what remains to be tested is whether it improves with more advanced learners. One of the experiments showed an average 0.7 correlation between humans and CONSPECT

Open Research Online (The Open University)

Shallow reading with Deep Learning: Predicting popularity of online content using only its title

Author: Marasek Krzysztof
Rokita Przemyslaw
Stokowiec Wociech
Trzcinski Tomasz
Wolk Krzysztof
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/07/2017
Field of study

With the ever decreasing attention span of contemporary Internet users, the title of online content (such as a news article or video) can be a major factor in determining its popularity. To take advantage of this phenomenon, we propose a new method based on a bidirectional Long Short-Term Memory (LSTM) neural network designed to predict the popularity of online content using only its title. We evaluate the proposed architecture on two distinct datasets of news articles and news videos distributed in social media that contain over 40,000 samples in total. On those datasets, our approach improves the performance over traditional shallow approaches by a margin of 15%. Additionally, we show that using pre-trained word vectors in the embedding layer improves the results of LSTM models, especially when the training set is small. To our knowledge, this is the first attempt of applying popularity prediction using only textual information from the title

arXiv.org e-Print Archive

Crossref