Search CORE

602 research outputs found

Unsupervised Keyword Extraction from Polish Legal Texts

Author: C.D. Manning
K.W. Church
S.N. Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In this work, we present an application of the recently proposed unsupervised keyword extraction algorithm RAKE to a corpus of Polish legal texts from the field of public procurement. RAKE is essentially a language and domain independent method. Its only language-specific input is a stoplist containing a set of non-content words. The performance of the method heavily depends on the choice of such a stoplist, which should be domain adopted. Therefore, we complement RAKE algorithm with an automatic approach to selecting non-content words, which is based on the statistical properties of term distribution

arXiv.org e-Print Archive

Crossref

Transforming legal documents for visualization and analysis

Author: Collins C.
Feldman R.
Manning C.D.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/04/2018
Field of study

Regulations, laws, norms, and other documents of legal nature are a relevant part of any governmental organisation. During digitisation and transformation stages towards a digital government model, information and communication technologies are explored to improve internal processes and working practices of government infrastructures. This paper introduces preliminary results on a research line devoted to developing visualisation techniques for enhancing the readability and comprehension of legal texts. The content of documents is conveyed to a welldefined model, which is enriched with semantic information extracted automatically. Then, a set of digital views are created for document exploration from both a structural and semantic point of view. Effective and easier to use digital interfaces can enable and promote citizens engagement in decision-making processes, provide information for the public, and also enhance the study and analysis of legal texts by lawmakers, legal practitioners, and assorted scholars.“SmartEGOV: Harnessing EGOV for Smart Governance (Foundations, methods, Tools) / NORTE-01-0145-FEDER-000037”, supported by Norte Portugal Regional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, through the European Regional Development Fund (EFDR

Universidade do Minho: RepositoriUM

Crossref

Query Expansion Using PRF-CBD Approach for Documents Retrieval

Author: A. Kaczmarek
C.D. Manning
S. Robertson
S.E. Robertson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

Crossref

Classification of protein interaction sentences via gaussian processes

Author: A. Aizerman
A.M. Cohen
C.D. Manning
C.D. Manning
C.E. Rasmussen
C.H. Ding
D.D. Lewis
E.M. Marcotte
H. Chen
J. Huang
J.C. Platt
J.D. Kim
J.H. Albert
K. Crammer
K. Sugiyama
K.M.A. Chai
M. Girolami
M. Girolami
N. Lama
N. Lawrence
R. Bunescu
S. Rogers
S.S. Keerthi
Silva
T. Joachims
V. Vapnik
W. Chu
W. Chu
Y. Hao
Y. Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2009
Field of study

The increase in the availability of protein interaction studies in textual format coupled with the demand for easier access to the key results has lead to a need for text mining solutions. In the text processing pipeline, classification is a key step for extraction of small sections of relevant text. Consequently, for the task of locating protein-protein interaction sentences, we examine the use of a classifier which has rarely been applied to text, the Gaussian processes (GPs). GPs are a non-parametric probabilistic analogue to the more popular support vector machines (SVMs). We find that GPs outperform the SVM and na\"ive Bayes classifiers on binary sentence data, whilst showing equivalent performance on abstract and multiclass sentence corpora. In addition, the lack of the margin parameter, which requires costly tuning, along with the principled multiclass extensions enabled by the probabilistic framework make GPs an appealing alternative worth of further adoption

Behavioural and neural indices of perceptual decision-making in autistic children during visual motion tasks

Author: Evans N.J.
Hassall C.D.
Hunt L.T.
Manning C.
Norcia A.M.
Scerif G.
Wagenmakers E.-J.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/04/2022
Field of study

International Migration, Integration and Social Cohesion online publications

Semi-automated dialogue act classification for situated social agents in games

Author: A. Gorin
C.D. Manning
D. Edery
J. Orkin
J.R. Landis
J.R. Searle
R.C. Schank
T.M. Cover
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

As a step toward simulating dynamic dialogue between agents and humans in virtual environments, we describe learning a model of social behavior composed of interleaved utterances and physical actions. In our model, utterances are abstracted as {speech act, propositional content, referent} triples. After training a classifier on 100 gameplay logs from The Restaurant Game annotated with dialogue act triples, we have automatically classified utterances in an additional 5,000 logs. A quantitative evaluation of statistical models learned from the gameplay logs demonstrates that semi-automatically classified dialogue acts yield significantly more predictive power than automatically clustered utterances, and serve as a better common currency for modeling interleaved actions and utterances

DSpace@MIT

Crossref

Multimedia Retrieval by Means of Merge of Results from Textual and Content Based Retrieval Subsystems

Author: C.D. Manning
E. Ves de
G. Ayala
J. Villena-Roman
M. Lestari Paramita
R. Granados
T. Arni
T. Leon
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The main goal of this paper it is to present our experiments in ImageCLEF 2009 Campaign (photo retrieval task). In 2008 we proved empirically that the Text-based Image Retrieval (TBIR) methods defeats the Content-based Image Retrieval CBIR “quality” of results, so this time we developed several experiments in which the CBIR helps the TBIR. The TBIR System [6] main improvement is the named-entity sub-module. In case of the CBIR system [3] the number of low-level features has been increased from the 68 component used at ImageCLEF 2008 up to 114 components, and only the Mahalanobis distance has been used. We propose an ad-hoc management of the topics delivered, and the generation of XML structures for 0.5 million captions of the photographs (corpus) delivered. Two different merging algorithms were developed and the third one tries to improve our previous cluster level results promoting the diversity. Our best run for precision metrics appeared in position 16th, in the 19th for MAP score, and for diversity value in position 11th, for a total of 84 submitted experiments. Our best and “only textual” experiment was the 6th one over 41

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Archivo Digital UPM (Univ. Politécnica de Madrid)

Document Word Clouds: Visualising Web Documents as Tag Clouds to Aid Users in Relevance Decisions

Author: A. Tombros
C.D. Manning
F.B. Viégas
G. Lindgaard
M.F. Porter
M.G. Noll
P. Schönhofen
S. Bateman
S. Dziadosz
S. Krug
T. Gottron
Publication venue
Publication date: 01/01/2009
Field of study

Περιέχει το πλήρες κείμενοInformation Retrieval systems spend a great effort on determining the significant terms in a document. When, instead, a user is looking at a document he cannot benefit from such information. He has to read the text to understand which words are important. In this paper we take a look at the idea of enhancing the perception of web documents with visualisation techniques borrowed from the tag clouds of Web 2.0. Highlighting the important words in a document by using a larger font size allows to get a quick impression of the relevant concepts in a text. As this process does not depend on a user query it can also be used for explorative search. A user study showed, that already simple TF-IDF values used as notion of word importance helped the users to decide quicker, whether or not a document is relevant to a topic

LEKYTHOS

Crossref

Gutenberg Open Science

Extracting collective trends from Twitter using social-based data mining

Author: C. Cortes
C.D. Manning
D.N. Trung
D.R. Cutting
H. Ahonen-Myka
J.A. Hartigan
J.R. Quinlan
M.D. Buhmann
O. Zamir
P. Domingos
T. Bruckhaus
T. Cover
Y. Li
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-642-40495-5_62Proceedings 5th International Conference, ICCCI 2013, Craiova, Romania, September 11-13, 2013,Social Networks have become an important environment for Collective Trends extraction. The interactions amongst users provide information of their preferences and relationships. This information can be used to measure the influence of ideas, or opinions, and how they are spread within the Network. Currently, one of the most relevant and popular Social Network is Twitter. This Social Network was created to share comments and opinions. The information provided by users is specially useful in different fields and research areas such as marketing. This data is presented as short text strings containing different ideas expressed by real people. With this representation, different Data Mining and Text Mining techniques (such as classification and clustering) might be used for knowledge extraction trying to distinguish the meaning of the opinions. This work is focused on the analysis about how these techniques can interpret these opinions within the Social Network using information related to IKEA® company.The preparation of this manuscript has been supported by the Spanish Ministry of Science and Innovation under the following projects: TIN2010-19872, ECO2011-30105 (National Plan for Research, Development and Innovation) and the Multidisciplinary Project of Universidad Aut´onoma de Madrid (CEMU-2012-034

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UCL Discovery

Biblos-e Archivo

From text summarisation to style-specific summarisation for broadcast news

Author: B. Kolluru
C.D. Manning
C.M. Bishop
H. Christensen
H.P. Edmundson
S. Renals
Y. Gotoh
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2004
Field of study

In this paper we report on a series of experiments investigating the path from text summarisation to style-specific summarisation of spoken news stories. We show that the portability of traditional text summarisation features to broadcast news is dependent on the diffusiveness of the information in the broadcast news story. An analysis of two categories of news stories (containing only read speech or including some spontaneous speech) demonstrates the importance of the style and the quality of the transcript, when extracting the summary-worthy information content. Further experiments indicate the advantages of doing style-specific summarisation of broadcast news

CiteSeerX

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer