175 research outputs found

    Using Text Segmentation to Enhance the Cluster Hypothesis

    Get PDF
    An alternative way to tackle Information Retrieval, called Passage Retrieval, considers text fragments independently rather than assessing global relevance of documents. In such a context, the fact that relevant information is surrounded by parts of text deviating from the interesting topic does not penalize the document. In this paper, we propose to study the impact of the consideration of these text fragments on a document clustering process. The use of clustering in the field of Information Retrieval is mainly supported by the cluster hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents and hence a clustering process is likely to gather them. Previous experiments have shown that clustering the first retrieved documents as response to a user’s query allows the Information Retrieval systems to improve their effectiveness. In the clustering process used in these studies, documents have been considered globally. Nevertheless, the assumption stating that a document can refer to more than one topic/concept may have also impacts on the document clustering process. Considering passages of the retrieved documents separately may allow to create more representative clusters of the addressed topics. Different approaches have been assessed and results show that using text fragments in the clustering process may turn out to be actually relevant

    Bir Procrutes Hikâyesi: Türkçe Fransızca Gibi İşlenirmi ?

    Get PDF
    International audienc

    Idiopathic orthostatic hypotension: Recent data (eleven cases) and review of the literature

    Get PDF
    Eight cases of Shy-Drager syndrome and three of Bradbury-Eggleston idiopathic orthostatic hypotension were examined. In all cases, examination of circulatory reflexes showed major dysfunction of the sympathetic vasoconstrictor system. Anomalies in the vagal cardiomoderator system were less constant. Normal urinary elimination of catecholamines was recorded daily. Characteristically, no elevation of blood or urine norepinephrine levels were found in orthostatism. Insulin hypoglycemia normally raised urinary adrenalin elimination in three of ten patients. Plasma dopa-beta-hydroxylase activity was normal. Renin-angiotensin-aldosterone system showed variable activity at basal state but usually rose during orthostatism. On the average, very low homovanillic acid levels were found in cerebrospinal fluid before and after probenecid; hydroxyindolacetic acid was normal. Cerebral autoregulation had deteriorated in two of four cases. Physiopathologically the two clinical types are indistinguishable with or without central neurological signs

    Taking Differences between Turkish and English Languages into account in Internal Representations

    Get PDF
    It is generally assumed that the representation of the meaning of sentences in a knowledge representation language does not depend of the natural language in which this meaning is initially expressed. We argue here that, despite the fact that the translation of a sentence from one language to another one is always possible, this rests mainly on the fact that the two languages are natural languages. Using online translations systems (e.g. Google, Yandex translators) make it clear that structural differences between languages gives rise to more or less faithful translations depending on the proximity of the implied languages and there is no doubt that effect of the differences between languages are more crucial if one of the language is a knowledge representation language. Our purpose is illustrated through numerous examples of sentences in Turkish and their translation in English, emphasizing differences between these languages which belong to two different natural language families. As knowledge representations languages we use the first order predicate logic (FOPP) and the conceptual graph (CG) language and its associated logical semantics. We show that important Turkish constructions like gerunds, action names and differences in focus lead to representations corresponding to the reification of verbal predicates and to favor CG as semantic network representation language, whereas English seems more suited to the traditional predicates centered representation schema. We conclude that this first study give rise toideas to be considered as new inspirations in the area of knowledge representation of linguistics data and its uses in natural language translation systems

    Traveling Among Clusters: A Way to Reconsider the Benefits of the Cluster Hypothesis

    Get PDF
    Relying on the Cluster Hypothesis which states that relevant documents tend to be more similar one to each other than to non-relevant documents, most of information retrieval systems organizing search results as a set of clusters seek to gather all relevant documents in the same cluster. We propose here to reconsider the benefits of the entailed concentration of the relevant information. Contrary to what is commonly admitted, we believe that systems which aim to distribute the relevant documents in different clusters, since being more likely to highlight different aspects of the subject, may be at least as useful for the user as systems gathering all relevant documents in a single group. Since existing evaluation measures tend to greatly favor the latter systems, we first investigate ways to more fairly assess the ability to reach the relevant information from the list of cluster descriptions. At last, we show that systems distributing the relevant information in different clusters may actually provide a better information access than classical systems

    Segmentation Thématique : Unité du Texte vs Indépendance des Segments

    Get PDF
    Date du colloque : 01/2008National audienc

    Toward a More Global and Coherent Segmentation of Texts

    Get PDF
    The automatic text segmentation task consists of identifying the most important thematic breaks in a document in order to cut it into homogeneous passages. Text segmentation has motivated a large amount of research. We focus here on the statistical approaches that rely on an analysis of the distribution of the words in the text. Usually, the segmentation of texts is realized sequentially on the basis of very local clues. However, such an approach prevents the consideration of the text in a global way, particularly concerning the granularity degree adopted for the expression of the different topics it addresses. We thus propose here two new segmentation algorithms—ClassStruggle and SegGen—which use criteria rendering global views of texts. ClassStruggle is based on an initial clustering of the sentences of the text, thus allowing the consideration of similarities within a group rather than individually. It relies on the distribution of the occurrences of the members of each class 1 to segment the texts. SegGen proposes to evaluate potential segmentations of the whole text thanks to a genetic algorithm. It attempts to find a solution of segmentation optimizing two criteria, the maximization of the internal cohesion of the segments and the minimization of the similarity between adjacent ones. According to experimental results, both approaches appear to be very competitive compared to existing methods
    corecore