Search CORE

57,480 research outputs found

SeLeCT: a lexical cohesion based news story segmentation system

Author: Carthy Joe
Smeaton Alan F.
Stokes Nicola
Publication venue: 'IOS Press'
Publication date: 01/01/2004
Field of study

In this paper we compare the performance of three distinct approaches to lexical cohesion based text segmentation. Most work in this area has focused on the discovery of textual units that discuss subtopic structure within documents. In contrast our segmentation task requires the discovery of topical units of text i.e., distinct news stories from broadcast news programmes. Our approach to news story segmentation (the SeLeCT system) is based on an analysis of lexical cohesive strength between textual units using a linguistic technique called lexical chaining. We evaluate the relative performance of SeLeCT with respect to two other cohesion based segmenters: TextTiling and C99. Using a recently introduced evaluation metric WindowDiff, we contrast the segmentation accuracy of each system on both "spoken" (CNN news transcripts) and "written" (Reuters newswire) news story test sets extracted from the TDT1 corpus

CiteSeerX

Irish Universities

DCU Online Research Access Service

Time and position distributions in large volume spherical scintillation detectors

Author: Alimonti
Birks
Gatti
Gioacchino Ranucci
Knoll
Kraus
Paolo Lombardi
Papoulis
Prunty
Ranucci
Ranucci
Ranucci
Zhaomin
Publication venue: 'Elsevier BV'
Publication date: 08/10/2007
Field of study

Large spherical scintillation detectors are playing an increasingly important role in experimental neutrino physics studies. From the instrumental point of view the primary signal response of these set-ups is constituted by the time and amplitude of the anode pulses delivered by each individual phototube following a particle interaction in the scintillator. In this work, under some approximate assumptions, we derive a number of analytical formulas able to give a fairly accurate description of the most important timing features of these detectors, intended to complement the more complete Monte Carlo studies normally used for a full modelling approach. The paper is completed with a mathematical description of the event position distributions which can be inferred, through some inference algorithm, starting from the primary time measures of the photomultiplier tubes.Comment: 29 pages, 20 figures, accepted for publication on Nucl. Instr. and Meth.

arXiv.org e-Print Archive

Crossref

Automated Big Text Security Classification

Author: Alzhrani Khudran
Boult Terrance E.
Chow C. Edward
Rudd Ethan M.
Publication venue
Publication date: 21/10/2016
Field of study

In recent years, traditional cybersecurity safeguards have proven ineffective against insider threats. Famous cases of sensitive information leaks caused by insiders, including the WikiLeaks release of diplomatic cables and the Edward Snowden incident, have greatly harmed the U.S. government's relationship with other governments and with its own citizens. Data Leak Prevention (DLP) is a solution for detecting and preventing information leaks from within an organization's network. However, state-of-art DLP detection models are only able to detect very limited types of sensitive information, and research in the field has been hindered due to the lack of available sensitive texts. Many researchers have focused on document-based detection with artificially labeled "confidential documents" for which security labels are assigned to the entire document, when in reality only a portion of the document is sensitive. This type of whole-document based security labeling increases the chances of preventing authorized users from accessing non-sensitive information within sensitive documents. In this paper, we introduce Automated Classification Enabled by Security Similarity (ACESS), a new and innovative detection model that penetrates the complexity of big text security classification/detection. To analyze the ACESS system, we constructed a novel dataset, containing formerly classified paragraphs from diplomatic cables made public by the WikiLeaks organization. To our knowledge this paper is the first to analyze a dataset that contains actual formerly sensitive information annotated at paragraph granularity.Comment: Pre-print of Best Paper Award IEEE Intelligence and Security Informatics (ISI) 2016 Manuscrip

arXiv.org e-Print Archive

Crossref

Integrating Prosodic and Lexical Cues for Automatic Topic Segmentation

Author: Andreas Stolcke
Dilek Hakkani-Tür
Elizabeth Shriberg
Grosz B.
Gökhan Tür
Hearst Marti A
Passonneau Rebecca J
Publication venue
Publication date: 01/01/2000
Field of study

We present a probabilistic model that uses both prosodic and lexical cues for the automatic segmentation of speech into topically coherent units. We propose two methods for combining lexical and prosodic information using hidden Markov models and decision trees. Lexical information is obtained from a speech recognizer, and prosodic features are extracted automatically from speech waveforms. We evaluate our approach on the Broadcast News corpus, using the DARPA-TDT evaluation metrics. Results show that the prosodic model alone is competitive with word-based segmentation methods. Furthermore, we achieve a significant reduction in error by combining the prosodic and word-based knowledge sources.Comment: 27 pages, 8 figure

arXiv.org e-Print Archive

CiteSeerX

Crossref

Bilkent University Institutional Repository

Automatic Segmentation of Multiparty Dialogue

Author: Hsueh Pei-Yun
Moore Johanna
Renals Steve
Publication venue
Publication date: 01/01/2006
Field of study

In this paper, we investigate the problem of automatically predicting segment boundaries in spoken multiparty dialogue. We extend prior work in two ways. We first apply approaches that have been proposed for predicting top-level topic shifts to the problem of identifying subtopic boundaries. We then explore the impact on performance of using ASR output as opposed to human transcription. Examination of the effect of features shows that predicting top-level and predicting subtopic boundaries are two distinct tasks: (1) for predicting subtopic boundaries, the lexical cohesion-based approach alone can achieve competitive results, (2) for predicting top-level boundaries, the machine learning approach that combines lexical-cohesion and conversational features performs best, and (3) conversational cues, such as cue phrases and overlapping speech, are better indicators for the top-level prediction task. We also find that the transcription errors inevitable in ASR output have a negative impact on models that combine lexical-cohesion and conversational features, but do not change the general preference of approach for the two tasks

CiteSeerX

Edinburgh Research Explorer

Phase Stability and Segregation in Alloy 22 Base Metal and Weldments

Author: Keeler Raymond E.
LaCombe Jeffrey
Namjoshi Shantanu A.
Russel Paige
Smiecinski Amy J.
Publication venue: Digital Scholarship@UNLV
Publication date: 25/10/2004
Field of study

The current design of the waste disposal containers relies heavily on encasement in a multi-layered container, featuring a corrosion barrier of Alloy 22, a Ni-Cr-Mo-W based alloy with excellent corrosion resistance over a wide range of conditions. The fundamental concern from the perspective of the Yucca Mountain Project, however, is the inherent uncertainty in the (very) long-term stability of the base metal and welds. Should the properties of the selected materials change over the long service life of the waste packages, it is conceivable that the desired performance characteristics (such as corrosion reistance) will become compromised, leading to premature failure of the system. To address this, we will study the phase stability and solute segregation characteristics of Alloy 22 base metal and welds. A better understanding of the underlying microstructural evolution tendencies, and their connections with corrosion behavior will (in turn) produce a higher confidence in the extrapolated behavior of the container materials over time periods that are not feasibly tested in a laboratory. Additionally, the knowledge gained here may potentially lead to cost savings through development of safe and realistic design constraints and model assumptions throughout the entire disposal system

University of Nevada, Las Vegas Repository

GumDrop at the DISRPT2019 Shared Task: A Model Stacking Approach to Discourse Unit Segmentation and Connective Detection

Author: Gong Mackenzie
Liu Yan
Liu Yang
Peng Siyao
Yu Yue
Zeldes Amir
Zhu Yilun
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2019
Field of study

In this paper we present GumDrop, Georgetown University's entry at the DISRPT 2019 Shared Task on automatic discourse unit segmentation and connective detection. Our approach relies on model stacking, creating a heterogeneous ensemble of classifiers, which feed into a metalearner for each final task. The system encompasses three trainable component stacks: one for sentence splitting, one for discourse unit segmentation and one for connective detection. The flexibility of each ensemble allows the system to generalize well to datasets of different sizes and with varying levels of homogeneity.Comment: Proceedings of Discourse Relation Parsing and Treebanking (DISRPT2019

arXiv.org e-Print Archive

Crossref

A Study of Speed of the Boundary Element Method as applied to the Realtime Computational Simulation of Biological Organs

Author: P Kirana Kumara
Publication venue
Publication date: 14/01/2014
Field of study

In this work, possibility of simulating biological organs in realtime using the Boundary Element Method (BEM) is investigated. Biological organs are assumed to follow linear elastostatic material behavior, and constant boundary element is the element type used. First, a Graphics Processing Unit (GPU) is used to speed up the BEM computations to achieve the realtime performance. Next, instead of the GPU, a computer cluster is used. Results indicate that BEM is fast enough to provide for realtime graphics if biological organs are assumed to follow linear elastostatic material behavior. Although the present work does not conduct any simulation using nonlinear material models, results from using the linear elastostatic material model imply that it would be difficult to obtain realtime performance if highly nonlinear material models that properly characterize biological organs are used. Although the use of BEM for the simulation of biological organs is not new, the results presented in the present study are not found elsewhere in the literature.Comment: preprint, draft, 2 tables, 47 references, 7 files, Codes that can solve three dimensional linear elastostatic problems using constant boundary elements (of triangular shape) while ignoring body forces are provided as supplementary files; codes are distributed under the MIT License in three versions: i) MATLAB version ii) Fortran 90 version (sequential code) iii) Fortran 90 version (parallel code

arXiv.org e-Print Archive

Open Access Repository of IISc Research Publications

New Jersey History (NJH - E-Journal)