Search CORE

94 research outputs found

Representation Learning for Words and Entities

Author: Rastogi Pushpendre
Publication venue
Publication date: 12/06/2019
Field of study

This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview Latent Semantic Analysis (MVLSA). By incorporating up to 46 different types of co-occurrence statistics for the same vocabulary of english words, I show that MVLSA outperforms other state-of-the-art word embedding models. Next, I focus on learning entity representations for search and recommendation and present the second method of this thesis, Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints.Comment: phd thesis, Machine Learning, Natural Language Processing, Representation Learning, Knowledge Graphs, Entities, Word Embeddings, Entity Embedding

arXiv.org e-Print Archive

JScholarship

From Frequency to Meaning: Vector Space Models of Semantics

Author: Pantel Patrick
Turney Peter D.
Publication venue: 'AI Access Foundation'
Publication date: 01/01/2010
Field of study

Computers understand very little of the meaning of human language. This profoundly limits our ability to give instructions to computers, the ability of computers to explain their actions to us, and the ability of computers to analyse and process text. Vector space models (VSMs) of semantics are beginning to address these limits. This paper surveys the use of VSMs for semantic processing of text. We organize the literature on VSMs according to the structure of the matrix in a VSM. There are currently three broad classes of VSMs, based on term-document, word-context, and pair-pattern matrices, yielding three classes of applications. We survey a broad range of applications in these three categories and we take a detailed look at a specific open source project in each category. Our goal in this survey is to show the breadth of applications of VSMs for semantics, to provide a new perspective on VSMs for those who are already familiar with the area, and to provide pointers into the literature for those who are less familiar with the field

arXiv.org e-Print Archive

CiteSeerX

NRC Publications Archive

Crossref

Sparse online collaborative filtering with dynamic regularization

Author: Beizhan Wang
Fan Lin
Gil Alterovitz
Kangkang Li
Wenhua Zeng
Xiuze Zhou
Publication venue: 'Elsevier BV'
Publication date: 05/09/2019
Field of study

Abstract(#br)Collaborative filtering (CF) approaches are widely applied in recommender systems. Traditional CF approaches have high costs to train the models and cannot capture changes in user interests and item popularity. Most CF approaches assume that user interests remain unchanged throughout the whole process. However, user preferences are always evolving and the popularity of items is always changing. Additionally, in a sparse matrix, the amount of known rating data is very small. In this paper, we propose a method of online collaborative filtering with dynamic regularization (OCF-DR), that considers dynamic information and uses the neighborhood factor to track the dynamic change in online collaborative filtering (OCF). The results from experiments on the MovieLens100K, MovieLens1M, and HetRec2011 datasets show that the proposed methods are significant improvements over several baseline approaches

Xiamen University Institutional Repository

Representation Learning for Words and Entities

Author: Rastogi Pushpendre
Publication venue: 'The Busan Gyeongnam Mathematical Society'
Publication date: 30/07/2019
Field of study

This thesis presents new methods for unsupervised learning of distributed representations of words and entities from text and knowledge bases. The first algorithm presented in the thesis is a multi-view algorithm for learning representations of words called Multiview LSA (MVLSA). Through experiments on close to 50 different views, I show that MVLSA outperforms other state-of-the-art word embedding models. After that, I focus on learning entity representations for search and recommendation and present the second algorithm of this thesis called Neural Variational Set Expansion (NVSE). NVSE is also an unsupervised learning method, but it is based on the Variational Autoencoder framework. Evaluations with human annotators show that NVSE can facilitate better search and recommendation of information gathered from noisy, automatic annotation of unstructured natural language corpora. Finally, I move from unstructured data and focus on structured knowledge graphs. Moreover, I present novel approaches for learning embeddings of vertices and edges in a knowledge graph that obey logical constraints

JScholarship

A Partitioning Based Algorithm to Fuzzy Tricluster

Author: Lili Fu
Tengfei Yang
Yongli Liu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2015
Field of study

Fuzzy clustering allows an object to exist in multiple clusters and represents the affiliation of objects to clusters by memberships. It is extended to fuzzy coclustering by assigning both objects and features membership functions. In this paper we propose a new fuzzy triclustering (FTC) algorithm for automatic categorization of three-dimensional data collections. FTC specifies membership function for each dimension and is able to generate fuzzy clusters simultaneously on three dimensions. Thus FTC divides a three-dimensional cube into many little blocks which should be triclusters with strong coherent bonding among its members. The experimental studies on MovieLens demonstrate the strength of FTC in terms of accuracy compared to some recent popular fuzzy clustering and coclustering approaches

Crossref

Directory of Open Access Journals

Web Mining for Web Personalization

Author: Berendt B.
Berendt B.
Buchner A. G.
Chen M. S.
Coenen F.
Cooley R.
Huang Z.
Joachims T.
Joshi A.
Lieberman H.
Magdalini Eirinaki
Masseglia F.
Michalis Vazirgiannis
Mladenic D.
Mobasher B.
Mobasher B.
Nasraoui O.
Perkowitz M.
Perkowitz M.
Perkowitz M.
Shahabi C.
Spiliopoulou M.
Spiliopoulou M.
Yan T. W.
Zaiane O. R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/02/2003
Field of study

Web personalization is the process of customizing a Web site to the needs of specific users, taking advantage of the knowledge acquired from the analysis of the user\u27s navigational behavior (usage data) in correlation with other information collected in the Web context, namely, structure, content, and user profile data. Due to the explosive growth of the Web, the domain of Web personalization has gained great momentum both in the research and commercial areas. In this article we present a survey of the use of Web mining for Web personalization. More specifically, we introduce the modules that comprise a Web personalization system, emphasizing the Web usage mining module. A review of the most common methods that are used as well as technical issues that occur is given, along with a brief overview of the most popular tools and applications available from software vendors. Moreover, the most important research initiatives in the Web usage mining and personalization areas are presented

Crossref

SJSU ScholarWorks

Corporate Smart Content Evaluation

Author: Einhaus Johannes
Hasan Ahmad
La Fleur Alexandra
Paschke Adrian
Schäfermeier Ralph
Todor Alexandru-Aurelian
Publication venue
Publication date: 01/01/2016
Field of study

Nowadays, a wide range of information sources are available due to the evolution of web and collection of data. Plenty of these information are consumable and usable by humans but not understandable and processable by machines. Some data may be directly accessible in web pages or via data feeds, but most of the meaningful existing data is hidden within deep web databases and enterprise information systems. Besides the inability to access a wide range of data, manual processing by humans is effortful, error-prone and not contemporary any more. Semantic web technologies deliver capabilities for machine-readable, exchangeable content and metadata for automatic processing of content. The enrichment of heterogeneous data with background knowledge described in ontologies induces re-usability and supports automatic processing of data. The establishment of “Corporate Smart Content” (CSC) - semantically enriched data with high information content with sufficient benefits in economic areas - is the main focus of this study. We describe three actual research areas in the field of CSC concerning scenarios and datasets applicable for corporate applications, algorithms and research. Aspect- oriented Ontology Development advances modular ontology development and partial reuse of existing ontological knowledge. Complex Entity Recognition enhances traditional entity recognition techniques to recognize clusters of related textual information about entities. Semantic Pattern Mining combines semantic web technologies with pattern learning to mine for complex models by attaching background knowledge. This study introduces the afore-mentioned topics by analyzing applicable scenarios with economic and industrial focus, as well as research emphasis. Furthermore, a collection of existing datasets for the given areas of interest is presented and evaluated. The target audience includes researchers and developers of CSC technologies - people interested in semantic web features, ontology development, automation, extracting and mining valuable information in corporate environments. The aim of this study is to provide a comprehensive and broad overview over the three topics, give assistance for decision making in interesting scenarios and choosing practical datasets for evaluating custom problem statements. Detailed descriptions about attributes and metadata of the datasets should serve as starting point for individual ideas and approaches

Institutional Repository of the Freie Universität Berlin

Fraunhofer-ePrints

Studies of computer mediated communications systems : a synthesis of the findings

Author: Computerized Conferencing & Communications Center
Hiltz Starr Roxanne
Kerr Elaine B.
Publication venue: Digital Commons @ NJIT
Publication date: 01/01/1982
Field of study

This report is an attempt to collect and synthesize current knowledge about computer-mediated communication systems. It focuses on computerized conferencing systems, for which most evaluational studies have been conducted, and also includes those electronic mail and office support systems for which evaluative information is available. It was made possible only through the participation of the many systems designers and evaluators listed below, who took the time to help to build a common conceptual framework and report their findings in terms of that common framework

Digital Commons @ New Jersey Institute of Technology (NJIT)

Crowdsourcing and open innovation: a systematic literature review, an integrated framework and a research agenda

Author: Cricelli L.
Grimaldi M.
Vermicelli S.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2021
Field of study

IRIS Unicas (Università degli Studi di Cassino e del Lazio Meridionale)

Stateful data-parallel processing

Author: Castro Fernandez Raul
Publication venue: Computing, Imperial College London
Publication date: 01/03/2016
Field of study

Democratisation of data means that more people than ever are involved in the data analysis process. This is beneficial—it brings domain-specific knowledge from broad fields—but data scientists do not have adequate tools to write algorithms and execute them at scale. Processing models of current data-parallel processing systems, designed for scalability and fault tolerance, are stateless. Stateless processing facilitates capturing parallelisation opportunities and hides fault tolerance. However, data scientists want to write stateful programs—with explicit state that they can update, such as matrices in machine learning algorithms—and are used to imperative-style languages. These programs struggle to execute with high-performance in stateless data-parallel systems. Representing state explicitly makes data-parallel processing at scale challenging. To achieve scalability, state must be distributed and coordinated across machines. In the event of failures, state must be recovered to provide correct results. We introduce stateful data-parallel processing that addresses the previous challenges by: (i) representing state as a first-class citizen so that a system can manipulate it; (ii) introducing two distributed mutable state abstractions for scalability; and (iii) an integrated approach to scale out and fault tolerance that recovers large state—spanning the memory of multiple machines. To support imperative-style programs a static analysis tool analyses Java programs that manipulate state and translates them to a representation that can execute on SEEP, an implementation of a stateful data-parallel processing model. SEEP is evaluated with stateful Big Data applications and shows comparable or better performance than state-of-the-art stateless systems.Open Acces

Spiral - Imperial College Digital Repository