Search CORE

21,361 research outputs found

Weak signal identification with semantic web mining

Author: Thorleuchter Dirk
Van den Poel Dirk
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

We investigate an automated identification of weak signals according to Ansoff to improve strategic planning and technological forecasting. Literature shows that weak signals can be found in the organization's environment and that they appear in different contexts. We use internet information to represent organization's environment and we select these websites that are related to a given hypothesis. In contrast to related research, a methodology is provided that uses latent semantic indexing (LSI) for the identification of weak signals. This improves existing knowledge based approaches because LSI considers the aspects of meaning and thus, it is able to identify similar textual patterns in different contexts. A new weak signal maximization approach is introduced that replaces the commonly used prediction modeling approach in LSI. It enables to calculate the largest number of relevant weak signals represented by singular value decomposition (SVD) dimensions. A case study identifies and analyses weak signals to predict trends in the field of on-site medical oxygen production. This supports the planning of research and development (R&D) for a medical oxygen supplier. As a result, it is shown that the proposed methodology enables organizations to identify weak signals from the internet for a given hypothesis. This helps strategic planners to react ahead of time

Ghent University Academic Bibliography

Fraunhofer-ePrints

XML documents clustering using a tensor space model

Author: Kutty Sangeetha
Li Yuefeng
Nayak Richi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

The traditional Vector Space Model (VSM) is not able to represent both the structure and the content of XML documents. This paper introduces a novel method of representing XML documents in a Tensor Space Model (TSM) and then utilizing it for clustering. Empirical analysis shows that the proposed method is scalable for large-sized datasets; as well, the factorized matrices produced from the proposed method help to improve the quality of clusters through the enriched document representation of both structure and content information

CiteSeerX

Queensland University of Technology ePrints Archive

Giving order to image queries

Author: Hare Jonathan
Lewis Paul
Martinez Kirk
Sinclair Patrick
Publication venue
Publication date: 30/01/2008
Field of study

Users of image retrieval systems often find it frustrating that the image they are looking for is not ranked near the top of the results they are presented. This paper presents a computational approach for ranking keyworded images in order of relevance to a given keyword. Our approach uses machine learning to attempt to learn what visual features within an image are most related to the keywords, and then provide ranking based on similarity to a visual aggregate. To evaluate the technique, a Web 2.0 application has been developed to obtain a corpus of user-generated ranking information for a given image collection that can be used to evaluate the performance of the ranking algorithm

Southampton (e-Prints Soton)

Uncovering protein interaction in abstracts and text using a novel linear model and word proximity networks

Author: Abi-Haidar Alaa
Kaur Jasleen
Maguitman Ana G.
Radivojac Predrag
Retchsteiner Andreas
Rocha Luis M.
Verspoor Karin
Wang Zhiping
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

We participated in three of the protein-protein interaction subtasks of the Second BioCreative Challenge: classification of abstracts relevant for protein-protein interaction (IAS), discovery of protein pairs (IPS) and text passages characterizing protein interaction (ISS) in full text documents. We approached the abstract classification task with a novel, lightweight linear model inspired by spam-detection techniques, as well as an uncertainty-based integration scheme. We also used a Support Vector Machine and the Singular Value Decomposition on the same features for comparison purposes. Our approach to the full text subtasks (protein pair and passage identification) includes a feature expansion method based on word-proximity networks. Our approach to the abstract classification task (IAS) was among the top submissions for this task in terms of the measures of performance used in the challenge evaluation (accuracy, F-score and AUC). We also report on a web-tool we produced using our approach: the Protein Interaction Abstract Relevance Evaluator (PIARE). Our approach to the full text tasks resulted in one of the highest recall rates as well as mean reciprocal rank of correct passages. Our approach to abstract classification shows that a simple linear model, using relatively few features, is capable of generalizing and uncovering the conceptual nature of protein-protein interaction from the bibliome. Since the novel approach is based on a very lightweight linear model, it can be easily ported and applied to similar problems. In full text problems, the expansion of word features with word-proximity networks is shown to be useful, though the need for some improvements is discussed

arXiv.org e-Print Archive

Crossref

CONICET Digital

Springer - Publisher Connector

PubMed Central

University of Melbourne Institutional Repository