Search CORE

10 research outputs found

Modeling Statistical Properties of Written Text

Author: A Clauset
A Saichev
A Sarkar
A-L Barabási
AK Joshi
Alessandro Flammini
B Liu
C Cattuto
C Elkan
C Manning
D de Solla Price
D Newman
DM Blei
E Alvarez-Lacalle
Enrico Scalas
F Menczer
F Menczer
Filippo Menczer
G Salton
GK Zipf
H Chen
HA Simon
HS Heaps
J Allan
J Kleinberg
J Kleinberg
J Pennebaker
JL Dolby
JS Adelman
K-I Goh
KW Church
M Jansche
M Porter
M. Ángeles Serrano
MA Nowak
MD Hauser
N Chomsky
QD Atkinson
R Albert
R Baeza-Yates
R Feldman
R Madsen
RH Baayen
S Chakrabarti
S Fortunato
SM Katz
T Griffiths
T Hofmann
TL Griffiths
VP Maslov
W Li
WS Murray
Y Yang
Publication venue: Public Library of Science
Publication date: 29/04/2009
Field of study

Written text is one of the fundamental manifestations of human language, and the study of its universal regularities can give clues about how our brains process information and how we, as a society, organize and share it. Among these regularities, only Zipf's law has been explored in depth. Other basic properties, such as the existence of bursts of rare words in specific documents, have only been studied independently of each other and mainly by descriptive models. As a consequence, there is a lack of understanding of linguistic processes as complex emergent phenomena. Beyond Zipf's law for word frequencies, here we focus on burstiness, Heaps' law describing the sublinear growth of vocabulary size with the length of a document, and the topicality of document collections, which encode correlations within and across documents absent in random null models. We introduce and validate a generative model that explains the simultaneous emergence of all these patterns from simple rules. As a result, we find a connection between the bursty nature of rare words and the topical organization of texts and identify dynamic word ranking and memory across documents as key mechanisms explaining the non trivial organization of written text. Our research can have broad implications and practical applications in computer science, cognitive science and linguistics

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Diposit Digital de la Universitat de Barcelona

Automatic summarization of voicemail messages using lexical and prosodic features

Author: Chen F.
Cordoba R.
Garofolo J.
Gotoh Y.
Hakkani-Tür D.
Hirschberg J.
Hirschberg J.
Hori C.
Huang J.
Jansche M.
Kato Y.
Konstantinos Koumpis
Koumpis K.
Koumpis K.
Koumpis K.
Koumpis K.
Koumpis K.
Kubala F.
Maclay H.
Makhoul J.
Medan Y.
Morgan N.
Morgan N.
Padmanabhan M.
Paksoy E.
Rohlicek J. R.
Saon G.
Scott M.
Shriberg E.
Steve Renals
Stevenson M.
Valenza R.
Walker M. A.
Warnke V.
Williams G.
Zechner K.
Zweig M. H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2005
Field of study

This article presents trainable methods for extracting principal content words from voicemail messages. The short text summaries generated are suitable for mobile messaging applications. The system uses a set of classifiers to identify the summary words with each word described by a vector of lexical and prosodic features. We use an ROC-based algorithm, Parcel, to select input features (and classifiers). We have performed a series of objective and subjective evaluations using unseen data from two different speech recognition systems as well as human transcriptions of voicemail speech

CiteSeerX

Crossref

Edinburgh Research Archive

Edinburgh Research Explorer

Inference of string mappings for speech technology

Author: Jansche M.
Publication venue: Ohio State Univ.
Publication date: 01/01/2003
Field of study

OhioLINK Electronic Thesis and Dissertation Center

MPG.PuRe

The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research

Author: Dominique Declerck
Emmanuel Lesaffre
Gustafson P.
Jansche M.
Sakamoto Y.
Samuel M Mwalili
Vanobbergen J.
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref

A Classification of Bioinformatics Algorithms from the Viewpoint of Maximizing Expected Accuracy (MEA)

Author: Do C.B.
Gross S.S.
Jansche M.
Kiyoshi Asai
Michiaki Hamada
Nánási M.
Suzuki J.
Publication venue: Mary Ann Liebert, Inc.
Publication date
Field of study

Many estimation problems in bioinformatics are formulated as point estimation problems in a high-dimensional discrete space. In general, it is difficult to design reliable estimators for this type of problem, because the number of possible solutions is immense, which leads to an extremely low probability for every solution—even for the one with the highest probability. Therefore, maximum score and maximum likelihood estimators do not work well in this situation although they are widely employed in a number of applications. Maximizing expected accuracy (MEA) estimation, in which accuracy measures of the target problem and the entire distribution of solutions are considered, is a more successful approach. In this review, we provide an extensive discussion of algorithms and software based on MEA. We describe how a number of algorithms used in previous studies can be classified from the viewpoint of MEA. We believe that this review will be useful not only for users wishing to utilize software to solve the estimation problems appearing in this article, but also for developers wishing to design algorithms on the basis of MEA

Crossref

PubMed Central

Grammar logicised: relativisation

Author: A.E. Ades
E. Engdahl
F.J. Newmeyer
G. Morrill
G. Morrill
Glyn Morrill
I.A. Sag
J. Lambek
J. Sprouse
J.-Y. Girard
J.M. Andreoli
M. Jansche
M. Kanazawa
M. Steedman
P. Deane
P. Hofmeister
P.M. Postal
Y. Kubota
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

A self-contained approach to mellin transform analysis for square integrable functions; applications

Author: Betero M.
Butzer P.L.
Butzer P.L.
Butzer P.L.
Butzer P.L.
Butzer P.L.
Butzer P.L.
Butzer P.L.
Hewitt E.
Higgins J.R.
Jansche Stefan
Naylor D.
Ostrowsky N.
Paul L. Butzer
Sneddon I.N.
Zayed A.I.
Zayed A.I.
Publication venue: 'Informa UK Limited'
Publication date
Field of study

Crossref

Compositional Machine Transliteration

Author: A. Kumaran
Goto I.
Jansche M.
Jiampojamarn S.
Kang B. J.
Lafferty J. D.
Lee J. S.
Li H.
Li H.
Malik M. G. A.
Mitesh M. Khapra
Nakov P.
Oh J.-H.
Pushpak Bhattacharyya
Stalls B. G.
Udupa R.
Veeravalli S.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

ABRAXAS1 orchestrates BRCA1 activities to counter genome destabilizing repair pathways:lessons from breast cancer patients

Author: Deniz M. (Miriam)
Faust U. (Ulrike)
Heitmeir B. (Benedikt)
Jansche R. (Rebecca)
Merk T. (Tatjana)
Peltoketo H. (Hellevi)
Pospiech H. (Helmut)
Pylkäs K. (Katri)
Riess A. (Angelika)
Roggia C. (Cristiana)
Sachsenweger J. (Juliane)
Schroeder C. (Christopher)
Tzschach A. (Andreas)
Wiesmüller L. (Lisa)
Winqvist R. (Robert)
Publication venue: Springer Nature
Publication date: 01/01/2023
Field of study

Abstract It has been well-established that mutations in BRCA1 and BRCA2, compromising functions in DNA double-strand break repair (DSBR), confer hereditary breast and ovarian cancer risk. Importantly, mutations in these genes explain only a minor fraction of the hereditary risk and of the subset of DSBR deficient tumors. Our screening efforts identified two truncating germline mutations in the gene encoding the BRCA1 complex partner ABRAXAS1 in German early-onset breast cancer patients. To unravel the molecular mechanisms triggering carcinogenesis in these carriers of heterozygous mutations, we examined DSBR functions in patient-derived lymphoblastoid cells (LCLs) and in genetically manipulated mammary epithelial cells. By use of these strategies we were able to demonstrate that these truncating ABRAXAS1 mutations exerted dominant effects on BRCA1 functions. Interestingly, we did not observe haploinsufficiency regarding homologous recombination (HR) proficiency (reporter assay, RAD51-foci, PARP-inhibitor sensitivity) in mutation carriers. However, the balance was shifted to use of mutagenic DSBR-pathways. The dominant effect of truncated ABRAXAS1 devoid of the C-terminal BRCA1 binding site can be explained by retention of the N-terminal interaction sites for other BRCA1-A complex partners like RAP80. In this case BRCA1 was channeled from the BRCA1-A to the BRCA1-C complex, which induced single-strand annealing (SSA). Further truncation, additionally deleting the coiled-coil region of ABRAXAS1, unleashed excessive DNA damage responses (DDRs) de-repressing multiple DSBR-pathways including SSA and non-homologous end-joining (NHEJ). Our data reveal de-repression of low-fidelity repair activities as a common feature of cells from patients with heterozygous mutations in genes encoding BRCA1 and its complex partners

University of Oulu Repository - Jultika

FORMAL IS NATURAL: TOWARD AN ECOLOGICAL PHONOLOGY

Author: A. Dirksen
D. Gibbon
D. Gibbon
D. Gibbon
D. Gibbon
D. Gibbon
D. Gibbon
D. Gibbon
Dafydd Gibbon
E.-A. Urua
G. Clements
J. Bachan
J. Barlow
J. Carson-Berndsen
K. Beesley
K. Dziubalska-Kołaczyk
L. Hyman
L. Karttunen
L. Karttunen
M. Jansche
M. Liberman
N. Govender
R. Kaplan
R. Tucker
S. Bromberger
S. Hertz
T. Vennemann
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/01/2009
Field of study

Gibbon D. FORMAL IS NATURAL: TOWARD AN ECOLOGICAL PHONOLOGY. POZNAN STUDIES IN CONTEMPORARY LINGUISTICS. 2009;45(1):73-102.Naturalism Phonology (NP) has a history of opposition to abstractness, to generative linguistics, to formalist approaches, and differs from these in its strong focus on external rather than distributional, structural evidential domains. But evidence domains are orthogonal to empirical and formal methods, and, like formalist theories such as Optimality Theory (OT), the pedigree of NP includes structuralist and generative phonology. In an analysis which is sympathetic to both NP and OT, this contribution examines the relation between NP and OT, analyses a classic OT case study of syllabification in Tashlhiyt Berber, and presents computational linguistic analyses of this case, as well as of English syllable phonotactics and of tone language tonotactics. The contribution advocates an opening towards these methods, and the adoption of explicit, consistent, precise, complete and sound formal criteria for theories, which enable an exact interpretation in terms of operational models and computational implementations, and practical applications. The general frame of reference is a the Ecological Cycle in theory formation, from clarification of the domain through theory construction, interpretation with a model, evaluation and application in the original evidential domain, with payback to the language community from which the evidence was gained

Crossref

Publications at Bielefeld University