Search CORE

2,176 research outputs found

Inference and Evaluation of the Multinomial Mixture Model for Text Clustering

Author: Banerjee
Church
Deerwester
François Yvon
Halkidi
Hofmann
Jain
Katz
Kuhn
Lange
Loïs Rigouste
Mosimann
Nigam
Olivier Cappé
Robert
Sebastiani
Shahnaz
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

In this article, we investigate the use of a probabilistic model for unsupervised clustering in text collections. Unsupervised clustering has become a basic module for many intelligent text processing applications, such as information retrieval, text classification or information extraction. The model considered in this contribution consists of a mixture of multinomial distributions over the word counts, each component corresponding to a different theme. We present and contrast various estimation procedures, which apply both in supervised and unsupervised contexts. In supervised learning, this work suggests a criterion for evaluating the posterior odds of new documents which is more statistically sound than the "naive Bayes" approach. In an unsupervised context, we propose measures to set up a systematic evaluation framework and start with examining the Expectation-Maximization (EM) algorithm as the basic tool for inference. We discuss the importance of initialization and the influence of other features such as the smoothing strategy or the size of the vocabulary, thereby illustrating the difficulties incurred by the high dimensionality of the parameter space. We also propose a heuristic algorithm based on iterative EM with vocabulary reduction to solve this problem. Using the fact that the latent variables can be analytically integrated out, we finally show that Gibbs sampling algorithm is tractable and compares favorably to the basic expectation maximization approach

arXiv.org e-Print Archive

CiteSeerX

Crossref

HAL Descartes

Recognizing cited facts and principles in legal judgements

Author: Shulayeva Olga
Siddharthan Advaith
Wyner Adam
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

In common law jurisdictions, legal professionals cite facts and legal principles from precedent cases to support their arguments before the court for their intended outcome in a current case. This practice stems from the doctrine of stare decisis, where cases that have similar facts should receive similar decisions with respect to the principles. It is essential for legal professionals to identify such facts and principles in precedent cases, though this is a highly time intensive task. In this paper, we present studies that demonstrate that human annotators can achieve reasonable agreement on which sentences in legal judgements contain cited facts and principles (respectively, κ=0.65 and κ=0.95 for inter- and intra-annotator agreement). We further demonstrate that it is feasible to automatically annotate sentences containing such legal facts and principles in a supervised machine learning framework based on linguistic features, reporting per category precision and recall figures of between 0.79 and 0.89 for classifying sentences in legal judgements as cited facts, principles or neither using a Bayesian classifier, with an overall κ of 0.72 with the human-annotated gold standard

Aberdeen University Research

Springer - Publisher Connector

Open Research Online (The Open University)

Cronfa at Swansea University

Query expansion with naive bayes for searching distributed collections

Author: Yang Hui
Zhang Minjie
Publication venue
Publication date: 01/01/2002
Field of study

The proliferation of online information resources increases the importance of effective and efficient distributed searching. However, the problem of word mismatch seriously hurts the effectiveness of distributed information retrieval. Automatic query expansion has been suggested as a technique for dealing with the fundamental issue of word mismatch. In this paper, we propose a method - query expansion with Naive Bayes to address the problem, discuss its implementation in IISS system, and present experimental results demonstrating its effectiveness. Such technique not only enhances the discriminatory power of typical queries for choosing the right collections but also hence significantly improves retrieval results

CiteSeerX

Open Research Online (The Open University)

A novel Big Data analytics and intelligent technique to predict driver's intent

Author: Abtahi
Adam Grzywaczewski
Agrawal
Al-Sultan
Asimov
Bernardo
Bezdek
Bhavsar
Bostrom
Chang
Chen
Dawson
De Domenico
Diaz-Cabrera
Doctor
Doctor
Dreier
Faiyaz Doctor
Filev
Froehlich
Gerhardt
Grudin
Grzywaczewski
Hashem
Hawkins
Hawkins
Haykin
Hirsch
Huang
Huang
Iqbal
Jaguar Land Rover Limited
Jain
James
Kaisler
Kapicioglu
Karyotis
Karyotis
Kotsiantis
Kumar
Kumar
Kurihata
Lech Birek
Liao
Liu
Luukka
Mahmud
Maniak
Maniak
McFarland
McInerney
Mitchell
Nasoz
Noulas
Palen
Pang
Parpinelli
Poli
Quercia
Rahat Iqbal
Rainville
Reininger
Richards
Rish
Sagiroglu
Simmons
Sun
Suthaharan
Tan
Tran
Utgoff
Victor Chang
Wang
Warren
Wells-Parker
Whitley
Zadeh
Publication venue: 'Elsevier BV'
Publication date: 06/04/2018
Field of study

Modern age offers a great potential for automatically predicting the driver's intent through the increasing miniaturization of computing technologies, rapid advancements in communication technologies and continuous connectivity of heterogeneous smart objects. Inside the cabin and engine of modern cars, dedicated computer systems need to possess the ability to exploit the wealth of information generated by heterogeneous data sources with different contextual and conceptual representations. Processing and utilizing this diverse and voluminous data, involves many challenges concerning the design of the computational technique used to perform this task. In this paper, we investigate the various data sources available in the car and the surrounding environment, which can be utilized as inputs in order to predict driver's intent and behavior. As part of investigating these potential data sources, we conducted experiments on e-calendars for a large number of employees, and have reviewed a number of available geo referencing systems. Through the results of a statistical analysis and by computing location recognition accuracy results, we explored in detail the potential utilization of calendar location data to detect the driver's intentions. In order to exploit the numerous diverse data inputs available in modern vehicles, we investigate the suitability of different Computational Intelligence (CI) techniques, and propose a novel fuzzy computational modelling methodology. Finally, we outline the impact of applying advanced CI and Big Data analytics techniques in modern vehicles on the driver and society in general, and discuss ethical and legal issues arising from the deployment of intelligent self-learning cars

University of Essex Research Repository

Crossref

Teeside University's Research Repository

Coventry University Pure Portal

A unified approach to authorship attribution and verification

Author: Font Valverde Martí
Ginebra Molins Josep
Puig Oriol Xavier
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2016
Field of study

In authorship attribution, one assigns texts from an unknown author to either one of two or more candidate authors by comparing the disputed texts with texts known to have been written by the candidate authors. In authorship verification, one decides whether a text or a set of texts could have been written by a given author. These two problems are usually treated separately. By assuming an open-set classification framework for the attribution problem, contemplating the possibility that none of the candidate authors is the unknown author, the verification problem becomes a special case of attribution problem. Here both problems are posed as a formal Bayesian multinomial model selection problem and are given a closed-form solution, tailored for categorical data, naturally incorporating text length and dependence in the analysis, and coping well with settings with a small number of training texts. The approach to authorship verification is illustrated by exploring whether a court ruling sentence could have been written by the judge that signs it, and the approach to authorship attribution is illustrated by revisiting the authorship attribution of the Federalist papers and through a small simulation study.Peer ReviewedPostprint (author's final draft

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Machine Learning: A Review

Author: Udousoro Isonkobong Christopher
Publication venue: 'Bilingual Publishing Co.'
Publication date: 18/11/2020
Field of study

Due to the complexity of data, interpretation of pattern or extraction of information becomes difficult; therefore application of machine learning is used to teach machines how to handle data more efficiently. With the increase of datasets, various organizations now apply machine learning applications and algorithms. Many industries apply machine learning to extract relevant information for analysis purposes. Many scholars, mathematicians and programmers have carried out research and applied several machine learning approaches in order to find solution to problems. In this paper, we focus on general review of machine learning including various machine learning techniques. These techniques can be applied to different fields like image processing, data mining, predictive analysis and so on.The paper aims at reviewing machine learning techniques and algorithms.The research methodology is based on qualitative analysis where various literatures is being reviewed based on machine learning.

Bilingual Publishing Co. (BPC): E-Journals

Learning Taxonomy Adaptation in Large-scale Classification

Author: Amblard Cécile
Amini Massih-Reza
Babbar Rohit
Eric Gaussier
Partalas Ioannis
Publication venue: Microtome Publishing
Publication date: 01/05/2016
Field of study

International audienc

Hal - Université Grenoble Alpes

A Study on the Performances of Representation Strategies Handled For Text Categorization

Author: Dr. K. Meenakshi Sundaram, K. Ramya
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/09/2014
Field of study

No Abstrac

International Journal on Recent and Innovation Trends in Computing and Communication

Tree Induction vs. Logistic Regression: A Learning-Curve Analysis

Author: Perlich Claudia
Provost Foster
Simonoff Jeffrey S.
Publication venue: Stern School of Business, New York University
Publication date: 01/01/2001
Field of study

Tree induction and logistic regression are two standard, off-the-shelf methods for building models for classification. We present a large-scale experimental comparison of logistic regression and tree induction, assessing classification accuracy and the quality of rankings based on class-membership probabilities. We use a learning-curve analysis to examine the relationship of these measures to the size of the training set. The results of the study show several remarkable things. (I) Contrary to prior observations, logistic regression does not generally outperform tree induction. (2) More specifically, and not surprisingly, logistic regression is better for smaller training sets and tree induction for larger data sets. Importantly, this often holds for training sets drawn from the same domain (i.e., the learning curves cross), so conclusions about induction-algorithm superiority on a given domain must be based on an analysis of the learning curves. (3) Contrary to conventional wisdom, tree induction is effective at producing probability-based rankings, although apparently comparatively less so for a given training--set size than at making classifications. Finally, (4) the domains on which tree induction and logistic regression are ultimately preferable can be characterized surprisingly well by a simple measure of signal-to-noise ratio.Information Systems Working Papers Serie

CiteSeerX

New York University Faculty Digital Archive