32,369 research outputs found

    Generating indicative-informative summaries with SumUM

    Get PDF
    We present and evaluate SumUM, a text summarization system that takes a raw technical text as input and produces an indicative informative summary. The indicative part of the summary identifies the topics of the document, and the informative part elaborates on some of these topics according to the reader's interest. SumUM motivates the topics, describes entities, and defines concepts. It is a first step for exploring the issue of dynamic summarization. This is accomplished through a process of shallow syntactic and semantic analysis, concept identification, and text regeneration. Our method was developed through the study of a corpus of abstracts written by professional abstractors. Relying on human judgment, we have evaluated indicativeness, informativeness, and text acceptability of the automatic summaries. The results thus far indicate good performance when compared with other summarization technologies

    Thematic Annotation: extracting concepts out of documents

    Get PDF
    Contrarily to standard approaches to topic annotation, the technique used in this work does not centrally rely on some sort of -- possibly statistical -- keyword extraction. In fact, the proposed annotation algorithm uses a large scale semantic database -- the EDR Electronic Dictionary -- that provides a concept hierarchy based on hyponym and hypernym relations. This concept hierarchy is used to generate a synthetic representation of the document by aggregating the words present in topically homogeneous document segments into a set of concepts best preserving the document's content. This new extraction technique uses an unexplored approach to topic selection. Instead of using semantic similarity measures based on a semantic resource, the later is processed to extract the part of the conceptual hierarchy relevant to the document content. Then this conceptual hierarchy is searched to extract the most relevant set of concepts to represent the topics discussed in the document. Notice that this algorithm is able to extract generic concepts that are not directly present in the document.Comment: Technical report EPFL/LIA. 81 pages, 16 figure

    A network model of interpersonal alignment in dialog

    Get PDF
    In dyadic communication, both interlocutors adapt to each other linguistically, that is, they align interpersonally. In this article, we develop a framework for modeling interpersonal alignment in terms of the structural similarity of the interlocutors’ dialog lexica. This is done by means of so-called two-layer time-aligned network series, that is, a time-adjusted graph model. The graph model is partitioned into two layers, so that the interlocutors’ lexica are captured as subgraphs of an encompassing dialog graph. Each constituent network of the series is updated utterance-wise. Thus, both the inherent bipartition of dyadic conversations and their gradual development are modeled. The notion of alignment is then operationalized within a quantitative model of structure formation based on the mutual information of the subgraphs that represent the interlocutor’s dialog lexica. By adapting and further developing several models of complex network theory, we show that dialog lexica evolve as a novel class of graphs that have not been considered before in the area of complex (linguistic) networks. Additionally, we show that our framework allows for classifying dialogs according to their alignment status. To the best of our knowledge, this is the first approach to measuring alignment in communication that explores the similarities of graph-like cognitive representations. Keywords: alignment in communication; structural coupling; linguistic networks; graph distance measures; mutual information of graphs; quantitative network analysi

    Is breathing sensitive to the communication partner?

    No full text
    International audienceThis paper investigates breathing profiles in eleven female speakers (subjects) when talking successively with the same two females (partners). Breathing kinematics of the two inter-locutors was recorded synchronously by means of two Induct-ance Plethysmographs. In order to understand the implication of breathing in dialogue, we analyzed changes in breathing pauses according to the main dialogue events (listening, back-channels, turns start and turns continuation). Breathing and syllable rates were also compared among partners and sub-jects. The duration of inhalations and related pauses was re-duced before a turn continuation in comparison to a turn start. The delay between speech offset in a breathing cycle and the onset of the next inhalation increased when a speaker and a listener swap roles as compared to a speaker who continued the turn. This was observed for both partners and subjects. The partners differed in their breathing and articulation rates but the two rates were not clearly correlated. In agreement with previous works, the current study shows that breathing kine-matics is strongly linked to dialogue events. However, it doesn't show any clear effect of partner on speaker's breath-ing. This last result is discussed relative to methodological as-pects

    Spoken content retrieval: A survey of techniques and technologies

    Get PDF
    Speech media, that is, digital audio and video containing spoken content, has blossomed in recent years. Large collections are accruing on the Internet as well as in private and enterprise settings. This growth has motivated extensive research on techniques and technologies that facilitate reliable indexing and retrieval. Spoken content retrieval (SCR) requires the combination of audio and speech processing technologies with methods from information retrieval (IR). SCR research initially investigated planned speech structured in document-like units, but has subsequently shifted focus to more informal spoken content produced spontaneously, outside of the studio and in conversational settings. This survey provides an overview of the field of SCR encompassing component technologies, the relationship of SCR to text IR and automatic speech recognition and user interaction issues. It is aimed at researchers with backgrounds in speech technology or IR who are seeking deeper insight on how these fields are integrated to support research and development, thus addressing the core challenges of SCR

    Extracting (good) discourse examples from an oral specialised corpus of wine tasting interactions

    No full text
    International audienceThis article outlines the semi-automated extraction of dictionary examples used in the compilation of a professional online dictionary of wine tasting. Named OenoLex Bourgogne, this dictionary was started to respond to the demand for a lexicographic information tool from the French wine industry of Burgundy, the Bureau Interprofessionnel des Vins de Bourgogne

    Using term clouds to represent segment-level semantic content of podcasts

    Get PDF
    Spoken audio, like any time-continuous medium, is notoriously difficult to browse or skim without support of an interface providing semantically annotated jump points to signal the user where to listen in. Creation of time-aligned metadata by human annotators is prohibitively expensive, motivating the investigation of representations of segment-level semantic content based on transcripts generated by automatic speech recognition (ASR). This paper examines the feasibility of using term clouds to provide users with a structured representation of the semantic content of podcast episodes. Podcast episodes are visualized as a series of sub-episode segments, each represented by a term cloud derived from a transcript generated by automatic speech recognition (ASR). Quality of segment-level term clouds is measured quantitatively and their utility is investigated using a small-scale user study based on human labeled segment boundaries. Since the segment-level clouds generated from ASR-transcripts prove useful, we examine an adaptation of text tiling techniques to speech in order to be able to generate segments as part of a completely automated indexing and structuring system for browsing of spoken audio. Results demonstrate that the segments generated are comparable with human selected segment boundaries

    Why stagnant? Behind the scenes in Indonesia's reformed state asset management policies

    Get PDF
    This study seeks to answer the question of “why is policy innovation in Indonesia, in particular reformed state asset management laws and regulations, stagnant?” through an empirical and qualitative approach, identifying and exploring potential impeding influences to the full and equal implementation of said laws and regulations. The policies and regulations governing the practice of state asset management has emerged as an urgent question among many countries worldwide (Conway, 2006; Dow, Gillies, Nichols, & Polen, 2006; Kaganova, McKellar, & Peterson, 2006; McKellar, 2006b) for there is heightened awareness of the complex and crucial role that state assets play in public service provision. Indonesia is an example of such country, introducing a ‘big-bang’ reform in state asset management laws, policies, regulations, and technical guidelines. Two main reasons propelled said policy innovation: a) world-wide common challenges in state asset management practices - such as incomplete information system, accountability, and governance adherence/conceptualisation (Kaganova, McKellar and Peterson 2006); and b) unfavourable state assets audit results in all regional governments across Indonesia. The latter reasoning is emphasised, as the Indonesian government admits to past neglect in ensuring efficiency and best practice in its state asset management practices. Prior to reform there was euphoria of building and developing state assets and public infrastructure to support government programs of the day. Although this euphoria resulted in high growth within Indonesia, there seems to be little attention paid to how state assets bought/built is managed. Up until 2003-2004 state asset management is considered to be minimal; inventory of assets is done manually, there is incomplete public sector accounting standards, and incomplete financial reporting standards (Hadiyanto 2009). During that time transparency, accountability, and maintenance state assets was not the main focus, be it by the government or the society itself (Hadiyanto 2009). Indonesia exemplified its enthusiasm in reforming state asset management policies and practices through the establishment of the Directorate General of State Assets in 2006. The Directorate General of State Assets have stressed the new direction that it is taking state asset management laws and policies through the introduction of Republic of Indonesia Law Number 38 Year 2008, which is an amended regulation overruling Republic of Indonesia Law Number 6 Year 2006 on Central/Regional Government State Asset Management (Hadiyanto, 2009c). Law number 38/2008 aims to further exemplify good governance principles and puts forward a ‘the highest and best use of assets’ principle in state asset management (Hadiyanto, 2009a). The methodology of this study is that of qualitative case study approach, with a triangulated data collection method of document analysis (all relevant state asset management laws, regulations, policies, technical guidelines, and external audit reports), semi-structured interviews, and on-site observation. Empirical data of this study involved a sample of four Indonesian regional governments and 70 interviews, performed during January-July 2010. The analytical approach of this study is that of thematic analysis, in an effort to identify common influences and/or challenges to policy innovation within Indonesia. Based on the empirical data of this study specific impeding influences to state asset management reform is explored, answering the question why innovative policy implementation is stagnant. An in-depth analysis of each influencing factors to state asset management reform, and the attached interviewee’s opinions for each factor, suggests the potential of an ‘excuse rhetoric’; whereby the influencing factors identified are a smoke-screen, or are myths that public policy makers and implementers believe in; as a means to explain innovative policy stagnancy. This study offers insights to Indonesian policy makers interested in ensuring the conceptualisation and full implementation of innovative policies, particularly, although not limited to, within the context of state asset management practices

    Prosodic phrasing, pitch range, and word order variation in Murrinhpatha

    Get PDF
    Like many Indigenous Australian languages, Murrinhpatha has flexible word order with no apparent configurational syntax. We analyzed an experimental corpus of Murrinhpatha utterances for associations between different thematic role orders, intonational phrasing patterns and pitch downtrends. We found that initial constituents (Agents or Patients) tend to carry the highest pitch targets (HiF0), followed by patterns of downstep and declination. Sentence-final verbs always have lower Hif0 values than either initial or medial Agents or Patients. Thematic role order does not influence intonational patterns, with the results suggesting that Murrinhpatha has positional prosody, although final nominals can disrupt global pitch downtrends regardless of thematic role
    corecore