7,768 research outputs found
Exploratory topic modeling with distributional semantics
As we continue to collect and store textual data in a multitude of domains,
we are regularly confronted with material whose largely unknown thematic
structure we want to uncover. With unsupervised, exploratory analysis, no prior
knowledge about the content is required and highly open-ended tasks can be
supported. In the past few years, probabilistic topic modeling has emerged as a
popular approach to this problem. Nevertheless, the representation of the
latent topics as aggregations of semi-coherent terms limits their
interpretability and level of detail.
This paper presents an alternative approach to topic modeling that maps
topics as a network for exploration, based on distributional semantics using
learned word vectors. From the granular level of terms and their semantic
similarity relations global topic structures emerge as clustered regions and
gradients of concepts. Moreover, the paper discusses the visual interactive
representation of the topic map, which plays an important role in supporting
its exploration.Comment: Conference: The Fourteenth International Symposium on Intelligent
Data Analysis (IDA 2015
Probabilistic Bag-Of-Hyperlinks Model for Entity Linking
Many fundamental problems in natural language processing rely on determining
what entities appear in a given text. Commonly referenced as entity linking,
this step is a fundamental component of many NLP tasks such as text
understanding, automatic summarization, semantic search or machine translation.
Name ambiguity, word polysemy, context dependencies and a heavy-tailed
distribution of entities contribute to the complexity of this problem.
We here propose a probabilistic approach that makes use of an effective
graphical model to perform collective entity disambiguation. Input mentions
(i.e.,~linkable token spans) are disambiguated jointly across an entire
document by combining a document-level prior of entity co-occurrences with
local information captured from mentions and their surrounding context. The
model is based on simple sufficient statistics extracted from data, thus
relying on few parameters to be learned.
Our method does not require extensive feature engineering, nor an expensive
training procedure. We use loopy belief propagation to perform approximate
inference. The low complexity of our model makes this step sufficiently fast
for real-time usage. We demonstrate the accuracy of our approach on a wide
range of benchmark datasets, showing that it matches, and in many cases
outperforms, existing state-of-the-art methods
The intelligent browser for texpros
Browsing is a technique, which helps users to formulate their query and retrieve information in the information retrieval system. This technique provides users with capabilities of understanding their information needs and gaining system knowledge during the course of the browsing and thus it eases the users\u27 burden when issuing queries. The basic components of the browser provides an underlying structure which allows users to navigate and a browsing process controller which provides users with the needed assistance during each browsing session.
In this dissertation, a new infrastructure (OP-Net), transformed from the existing object network is proposed. Each object in the object network is transformed into a predicate-augmented information repository. The predicate associated with each information repository governs the content of relevant documents in the depository during the browsing process and is updated continuously according to queries given by the user. The OP-Net with the relevant information repositories provides a dynamic and efficient environment for browsing.
A new ranking model is also proposed based on the signature of the documents and the user\u27s query. The signature of a document is a document representative which utilizes the information provided by the dual model in TEXPROS (TEXt PROcessing System). With the signatures, the similarity of the document and the query can be computed, and the ranks of the documents can be derived.
This dissertation describes a three-layer architecture for the browser. At the top layer, the browsing process controller conducts and monitors the browsing process, and utilizes the services provided by the service providers. At the bottom of this architecture is the storage management system which stores the documents and then associated frame instances and responses to the requests from the service providers in the second layer. This architecture supports the principle of information hiding by allowing the change of the design of each component without changing the others. In the conclusion of this dissertation, the potential improvements and future research will be proposed
Basic tasks of sentiment analysis
Subjectivity detection is the task of identifying objective and subjective
sentences. Objective sentences are those which do not exhibit any sentiment.
So, it is desired for a sentiment analysis engine to find and separate the
objective sentences for further analysis, e.g., polarity detection. In
subjective sentences, opinions can often be expressed on one or multiple
topics. Aspect extraction is a subtask of sentiment analysis that consists in
identifying opinion targets in opinionated text, i.e., in detecting the
specific aspects of a product or service the opinion holder is either praising
or complaining about
The INCF Digital Atlasing Program: Report on Digital Atlasing Standards in the Rodent Brain
The goal of the INCF Digital Atlasing Program is to provide the vision and direction necessary to make the rapidly growing collection of multidimensional data of the rodent brain (images, gene expression, etc.) widely accessible and usable to the international research community. This Digital Brain Atlasing Standards Task Force was formed in May 2008 to investigate the state of rodent brain digital atlasing, and formulate standards, guidelines, and policy recommendations.

Our first objective has been the preparation of a detailed document that includes the vision and specific description of an infrastructure, systems and methods capable of serving the scientific goals of the community, as well as practical issues for achieving
the goals. This report builds on the 1st INCF Workshop on Mouse and Rat Brain Digital Atlasing Systems (Boline et al., 2007, _Nature Preceedings_, doi:10.1038/npre.2007.1046.1) and includes a more detailed analysis of both the current state and desired state of digital atlasing along with specific recommendations for achieving these goals
- …