Search CORE

40 research outputs found

Using Text Segmentation to Enhance the Cluster Hypothesis

Author: B. Levrat
F. Saubion
S. Lamprier
T. Amghar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

An alternative way to tackle Information Retrieval, called Passage Retrieval, considers text fragments independently rather than assessing global relevance of documents. In such a context, the fact that relevant information is surrounded by parts of text deviating from the interesting topic does not penalize the document. In this paper, we propose to study the impact of the consideration of these text fragments on a document clustering process. The use of clustering in the field of Information Retrieval is mainly supported by the cluster hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents and hence a clustering process is likely to gather them. Previous experiments have shown that clustering the first retrieved documents as response to a user’s query allows the Information Retrieval systems to improve their effectiveness. In the clustering process used in these studies, documents have been considered globally. Nevertheless, the assumption stating that a document can refer to more than one topic/concept may have also impacts on the document clustering process. Considering passages of the retrieved documents separately may allow to create more representative clusters of the addressed topics. Different approaches have been assessed and results show that using text fragments in the clustering process may turn out to be actually relevant

Okina

Hal-Diderot

Bir Procrutes Hikâyesi: Türkçe Fransızca Gibi İşlenirmi ?

Author: B. Levrat
B. Parlak
O. Senemoglu
S. Turhan
T. Amghar
Publication venue: Taylor & Francis (Routledge)
Publication date: 01/01/2013
Field of study

International audienc

Okina

Hal-Diderot

Managing Genetic Algorithm Parameters to Improve SegGen - A Thematic Segmentation Algorithm

Author: B. Levrat
S. Saygili
T. Acarman
T. Amghar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

SegGen [1] is a linear thematic segmentation algorithm grounded on a variant of the Strength Pareto Evolutionary Algorithm [2] and aims at optimizing the two criteria of the Salton\u27s [3] definition of segments: a segment is a part of text whose internal cohesion and dissimilarity with its adjacent segments are maximal. This paper describes improvements that have been implemented in the approach taken by SegGen by tuning the genetic algorithm parameters according with the evolution of the quality of the generated populations. Two kinds of reasons originate the tuning of the parameters and have been implemented here. First as it could be measured by the values of global criteria of the population quality, the global quality of the generated populations increases as the process goes and it seems reasonable to set values to parameters and define new operators, which favor intensification and diminish diversification factors in the search process. Second since individuals in the populations are plausible segmentations it seems reasonable to weight sentences in the current segmentation depending on their distance to the boundaries of the segment they belong to for the calculus of similarities between sentences implied in the two criteria to be optimized. Although this tuning of the parameters of the algorithm currently rests on estimations based on experiments, first results are promising

Okina

Hal-Diderot

RQM description of the charge form factor of the pion and its asymptotic behavior

Author: A. Amghar
A. Amghar
A. Amghar
A. Amghar
A. Youanc Le
A.F. Krutov
A.V. Efrimov
B. Bakamjian
B. Bakamjian
B. Bakker
B. Desplanques
B. Desplanques
B. Desplanques
B. Desplanques
B. Desplanques
B. Desplanques
B. Desplanques
C. Alabiso
C.D. Roberts
C.J. Bebek
D. Merten
F. Cardarelli
F. Cardarelli
G.P. Lepage
G.R. Farrar
H.-M. Choi
J. Carbonell
J. Carlson
J. He
J. Volmer
J.L. Basdevant
J.P.B.C. Melo de
J.P.B.C. Melo de
J.P.C.B. Melo de
N. Isgur
P. Maris
P. Maris
P.A.M. Dirac
P.L. Chung
Q.B. Li
R. Tarrach
S. Godfrey
S. Simula
S.B. Gerasimov
S.J. Brodsky
S.N. Sokolov
S.R. Amendolia
T. Horn
T.W. Allen
V. Bernard
V. Braguta
V. Tadevosyan
W.H. Klink
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/06/2009
Field of study

The pion charge and scalar form factors,

F_1(Q^2)

and

F_0(Q^2)

, are first calculated in different forms of relativistic quantum mechanics. This is done using the solution of a mass operator that contains both confinement and one-gluon-exchange interactions. Results of calculations, based on a one-body current, are compared to experiment for the first one. As it could be expected, those point-form, and instant and front-form ones in a parallel momentum configuration fail to reproduce experiment. The other results corresponding to a perpendicular momentum configuration (instant form in the Breit frame and front form with

q^+=0

) do much better. The comparison of charge and scalar form factors shows that the spin-1/2 nature of the constituents plays an important role. Taking into account that only the last set of results represents a reasonable basis for improving the description of the charge form factor, this one is then discussed with regard to the asymptotic QCD-power-law behavior

Q^{-2}

. The contribution of two-body currents in achieving the right power law is considered while the scalar form factor,

F_0(Q^2)

, is shown to have the right power-law behavior in any case. The low-

Q^2

behavior of the charge form factor and the pion-decay constant are also discussed.}Comment: 30 pages, 10 figure

arXiv.org e-Print Archive

HAL-IN2P3

Crossref

Hal - Université Grenoble Alpes

EDP Sciences OAI-PMH repository (1.2.0)

Classification en recherche d'information : Utilisation de segments thématiques

Author: B. Levrat
F. Saubion
S. Lamprier
T. Amghar
Publication venue
Publication date: 01/01/2008
Field of study

Okina

Traveling Among Clusters: A Way to Reconsider the Benefits of the Cluster Hypothesis

Author: B. Levrat
F. Saubion
S. Lamprier
T. Amghar
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

Relying on the Cluster Hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents, most of information retrieval systems organizing search results as a set of clusters seek to gather all relevant documents in the same cluster. We propose here to reconsider the benefits of the entailed concentration of the relevant information. Contrary to what is commonly admitted, we believe that systems which aim to distribute the relevant documents in different clusters, since being more likely to highlight different aspects of the subject, may be at least as useful for the user as systems gathering all relevant documents in a single group. Since existing evaluation measures tend to greatly favor the latter systems, we first investigate ways to more fairly assess the ability to reach the relevant information from the list of cluster descriptions. At last, we show that systems distributing the relevant information in different clusters may actually provide a better information access than classical systems

HAL Descartes

Okina

Hal-Diderot

Toward a More Global and Coherent Segmentation of Texts

Author: B. Levrat
F. Saubion
S. Lamprier
T. Amghar
Publication venue: 'Informa UK Limited'
Publication date: 01/01/2008
Field of study

The automatic text segmentation task consists of identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. Text segmentation has motivated a large amount of research. We focus here on the statistical approaches that rely on an analysis of the distribution of the words in the text. Usually, the segmentation of texts is realized sequentially on the basis of very local clues. However, such an approach prevents the consideration of the text in a global way, particularly concerning the granularity degree adopted for the expression of the different topics it addresses. We thus propose here two new segmentation algorithms—ClassStruggle and SegGen—which use criteria rendering global views of texts. ClassStruggle is based on an initial clustering of the sentences of the text, thus allowing the consideration of similarities within a group rather than individually. It relies on the distribution of the occurrences of the members of each class 1 to segment the texts. SegGen proposes to evaluate potential segmentations of the whole text thanks to a genetic algorithm. It attempts to find a solution of segmentation optimizing two criteria, the maximization of the internal cohesion of the segments and the minimization of the similarity between adjacent ones. According to experimental results, both approaches appear to be very competitive compared to existing methods

Okina

Hal-Diderot

Thematic Segment Retrieval Revisited

Author: B. Levrat
F. Saubion
S. Lamprier
T. Amghar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Documents, especially long ones, may contain very diverse passages related to different topics. Passages Retrieval approaches have shown that, in most cases, there is a great potential benefit in considering these passages independently when computing the similarity of a document with a user’s query. Experiments have been realized in order to identify the kinds of passage which are the best suited for such a process. Contrarily to what could have been expected, working with thematic segments, which are likely to represent only one topic each, has led to greatly lower effectiveness results than the use of arbitrary sequences of words. In this paper, we show that this paradoxical observation is mainly due to biases induced by the great length diversity of the thematic passages. Therefore, we propose here to cope with these biases by using a more powerful text length normalization technique. Experiments show that, when length biases are laid aside, the use of thematic passages is better suited than arbitrary sequences of words to retrieve relevant informations as response to a user’s query

Okina

Hal-Diderot

Başım mı ağrıyor yoksa ben mi başımda ağrıyorum? : Türkçe ve Fransızca karşılaştırmalı metonimi araştırması

Author: B. Levrat
B. Parlak
O. Senemoglu
T. Amghar
Publication venue
Publication date: 01/01/2014
Field of study

National audienc

Okina

Hal-Diderot

Taking Differences between Turkish and English Languages into account in Internal Representations

Author: B. Levrat
B. Parlak
S. Turhan
T. Amghar
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

It is generally assumed that the representation of the meaning of sentences in a knowledge representation language does not depend of the natural language in which this meaning is initially expressed. We argue here that, despite the fact that the translation of a sentence from one language to another one is always possible, this rests mainly on the fact that the two languages are natural languages. Using online translations systems (e.g. Google, Yandex translators) make it clear that structural differences between languages gives rise to more or less faithful translations depending on the proximity of the implied languages and there is no doubt that effect of the differences between languages are more crucial if one of the language is a knowledge representation language. Our purpose is illustrated through numerous examples of sentences in Turkish and their translation in English, emphasizing differences between these languages which belong to two different natural language families. As knowledge representations languages we use the first order predicate logic (FOPP) and the conceptual graph (CG) language and its associated logical semantics. We show that important Turkish constructions like gerunds, action names and differences in focus lead to representations corresponding to the reification of verbal predicates and to favor CG as semantic network representation language, whereas English seems more suited to the traditional predicates centered representation schema. We conclude that this first study give rise toideas to be considered as new inspirations in the area of knowledge representation of linguistics data and its uses in natural language translation systems

Crossref

Okina

Hal-Diderot