Search CORE

34 research outputs found

Query Expansion with Locally-Trained Word Embeddings

Author: Craswell Nick
Diaz Fernando
Mitra Bhaskar
Publication venue
Publication date: 01/01/2016
Field of study

Continuous space word embeddings have received a great deal of attention in the natural language processing and machine learning communities for their ability to model term similarity and other relationships. We study the use of term relatedness in the context of query expansion for ad hoc information retrieval. We demonstrate that word embeddings such as word2vec and GloVe, when trained globally, underperform corpus and query specific embeddings for retrieval tasks. These results suggest that other tasks benefiting from global embeddings may also benefit from local embeddings

arXiv.org e-Print Archive

Crossref

Invading the fortress: How to beseige reinforced information bunkers.

Author: Hoppenbrouwers J.J.A.C.
Paijmans J.J.
Publication venue
Publication date
Field of study

Research Papers in Economics

Invading the fortress:How to beseige reinforced information bunkers

Author: Hoppenbrouwers J.J.A.C.
Paijmans J.J.
Publication venue: IEEE Computer Society
Publication date: 01/01/2000
Field of study

Tilburg University Repository

Classificação automática de texto buscando similaridade de palavras e significados ocultos

Author: Catae Fabricio S.
Rocha Ricardo Luis de Azevedo da
Publication venue
Publication date: 06/11/2012
Field of study

Adotamos o m etodo da indexação da semântica latente (LSI) para classifi car documentos que estejam relacionados por algum meio não restrito apenas aos termos presentes, mas buscando outras formas de similaridades. A redu cão de dimensionalidade da matriz Termo-Documento n~ao e novidade, sendo normalmente adotado entre 200 a 300 dimensões. Nesse trabalho, transformamos o LSI em um algoritmo semi-supervisionado e determinamos o n umero ideal de dimensão durante a fase de treinamento. O algoritmo utiliza um espa co isom etrico a aquele de nido pela matriz Termo-Documento para acelerar os c alculos.Eje: Workshop Bases de datos y minería de datos (WBDDM)Red de Universidades con Carreras en Informática (RedUNCI

Servicio de Difusión de la Creación Intelectual

A Computational Narrative Analysis of Children-Parent Attachment Relationships

Author: Portu-Zapirain Nerea
Zipitria Iraide
Publication venue: OASIcs - OpenAccess Series in Informatics. 2014 Workshop on Computational Models of Narrative
Publication date: 01/01/2014
Field of study

Children narratives implicitly represent their experiences and emotions. The relationships infants establish with their environment will shape their relationships with others and the concept of themselves. In this context, the Attachment Story Completion Task (ASCT) contains a series of unfinished stories to project the self in relation to attachment. Unfinished story procedures present a dilemma which needs to be solved and a codification of the secure, secure/insecure or insecure attachment categories. This paper analyses a story-corpus to explain 3 to 6 year old children-parent attachment relationships. It is a computational approach to exploring attachment representational models in two unfinished story-lines: "The stolen bike" and "The present". The resulting corpora contains 184 stories in one corpus and 170 stories in the other. The Latent Semantic Analysis (LSA) and Linguistic Inquiry and Word Count (LIWC) computational frameworks observe the emotions which children project. As a result, the computational analysis of the children mental representational model, in both corpora, have shown to be comparable to expert judgements in attachment categorization

Dagstuhl Research Online Publication Server

Discovering user access pattern based on probabilistic latent factor model

Author: Ma J
Xu G
Zhang Y
Zhou X
Publication venue
Publication date: 01/12/2005
Field of study

There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not only offer benefits for web site structure improvement but also for better understanding of user navigational behavior. In this paper, we present a web usage mining method, which utilize web user usage and page linkage information to capture user access pattern based on Probabilistic Latent Semantic Analysis (PLSA) model. A specific probabilistic model analysis algorithm, EM algorithm, is applied to the integrated usage data to infer the latent semantic factors as well as generate user session clusters for revealing user access patterns. Experiments have been conducted on real world data set to validate the effectiveness of the proposed approach. The results have shown that the presented method is capable of characterizing the latent semantic factors and generating user profile in terms of weighted page vectors, which may reflect the common access interest exhibited by users among same session cluster. © 2005, Australian Computer Society, Inc

OPUS - University of Technology Sydney

Neural Vector Spaces for Unsupervised Information Retrieval

Author: de Rijke Maarten
Kanoulas Evangelos
Van Gysel Christophe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 18/08/2018
Field of study

We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.Comment: TOIS 201

arXiv.org e-Print Archive

International Migration, Integration and Social Cohesion online publications

UvA-DARE

Cross-language source code re-use detection using latent semantic analysis

Author: Barrón-Cedeño Luis Alberto
Flores Sáez Enrique
Moreno Boronat Lidia Ana
Rosso Paolo
Publication venue: 'Verlag der Technischen Universitat Graz'
Publication date: 01/01/2015
Field of study

[EN] Nowadays, Internet is the main source to get information from blogs, encyclopedias, discussion forums, source code repositories, and more resources which are available just one click away. The temptation to re-use these materials is very high. Even source codes are easily available through a simple search on the Web. There is a need of detecting potential instances of source code re-use. Source code re-use detection has usually been approached comparing source codes in their compiled version. When dealing with cross-language source code re-use, traditional pproaches can deal only with the programming languages supported by the compiler. We assume that a source code is a piece of text ,with its syntax and structure, so we aim at applying models for free text re-use detection to source code. In this paper we compare a Latent Semantic Analysis (LSA) approach with previously used text re-use detection models for measuring cross-language similarity in source code. The LSA-based approach shows slightly better results than the other models, being able to distinguish between re-used and related source codes with a high performance.This work was partially supported by Universitat Polit`ecnica de Val`encia, WIQ-EI (IRSES grant n. 269180), and DIANA-APPLICATIONS (TIN2012- 38603-C02- 01) project. The work of the fourth author is also supported by VLC/CAMPUS Microcluster on Multimodal Interaction in Intelligent Systems.Flores Sáez, E.; Barrón-Cedeño, LA.; Moreno Boronat, LA.; Rosso, P. (2015). Cross-language source code re-use detection using latent semantic analysis. Journal of Universal Computer Science. 21(13):1708-1725. https://doi.org/10.3217/jucs-021-13-1708S17081725211

RiuNet

Function Based Design-by-Analogy: A Functional Vector Approach to Analogical Search

Author: Fu Katherine K
Jensen Dan
Murphy Jeremy
Otto Kevin
Wood Kristin
Yang Maria
Publication venue: 'ASME International'
Publication date: 01/07/2014
Field of study

Design-by-analogy is a powerful approach to augment traditional concept generation methods by expanding the set of generated ideas using similarity relationships from solutions to analogous problems. While the concept of design-by-analogy has been known for some time, few actual methods and tools exist to assist designers in systematically seeking and identifying analogies from general data sources, databases, or repositories, such as patent databases. A new method for extracting functional analogies from data sources has been developed to provide this capability, here based on a functional basis rather than form or conflict descriptions. Building on past research, we utilize a functional vector space model (VSM) to quantify analogous similarity of an idea's functionality. We quantitatively evaluate the functional similarity between represented design problems and, in this case, patent descriptions of products. We also develop document parsing algorithms to reduce text descriptions of the data sources down to the key functions, for use in the functional similarity analysis and functional vector space modeling. To do this, we apply Zipf's law on word count order reduction to reduce the words within the documents down to the applicable functionally critical terms, thus providing a mapping process for function based search. The reduction of a document into functional analogous words enables the matching to novel ideas that are functionally similar, which can be customized various ways. This approach thereby provides relevant sources of design-by-analogy inspiration. As a verification of the approach, two original design problem case studies illustrate the distance range of analogical solutions that can be extracted. This range extends from very near-field, literal solutions to far-field cross-domain analogies.National Science Foundation (U.S.) (Grant CMMI-0855326)National Science Foundation (U.S.) (Grant CMMI-0855510)National Science Foundation (U.S.) (Grant CMMI-0855293)SUTD-MIT International Design Centre (IDC

DSpace@MIT

Crossref