Search CORE

5 research outputs found

Correct your Text with Google

Author: Jacquemont Stéphanie
Jacquenet François
Sebban Marc
Publication venue: HAL CCSD
Publication date: 02/11/2007
Field of study

to appear in the Proceedings of the International Conference on Web Intelligence, IEEE 2007.International audienceWith the increasing amount of text files that are produced nowadays, spell checkers have become essential tools for everyday tasks of millions of end users. Among the years, several tools have been designed that show decent performances. Of course, grammatical checkers may improve corrections of texts, nevertheless, this requires large resources. We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings. We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features

HAL-UJM

Crossref

The Snippets Taxonomy in Web Search Engines

Author: A Broder
A Khalili
A Strzelecki
A Uyar
Andrej Miklosik
BJ Jansen
CC Wakefield
D Bilal
D Elsweiler
D Lewandowski
J Sachse
K Juel Vang
K Kousha
R Heersmink
W Hop
WT Kritzinger
Y Zhao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 23/10/2019
Field of study

In this paper authors analyzed 50 000 keywords results collected from localized Polish Google search engine. We proposed a taxonomy for snippets displayed in search results as regular, rich, news, featured and entity types snippets. We observed some correlations between overlapping snippets in the same keywords. Results show that commercial keywords do not cause results having rich or entity types snippets, whereas keywords resulting with snippets are not commercial nature. We found that significant number of snippets are scholarly articles and rich cards carousel. We conclude our findings with conclusion and research limitations.Comment: 12 pages, 3 table

arXiv.org e-Print Archive

Crossref

An Emergent Approach to Text Analysis Based on a Connectionist Model and the Web

Author: Cimino MARIO GIOVANNI COSIMO ANTONIO
Vaglini Gigliola
Publication venue: 'MDPI AG'
Publication date: 01/01/2013
Field of study

In this paper, we present a method to provide proactive assistance in text checking, based on usage relationships between words structuralized on the Web. For a given sentence, the method builds a connectionist structure of relationships between word n-grams. Such structure is then parameterized by means of an unsupervised and language agnostic optimization process. Finally, the method provides a representation of the sentence that allows emerging the least prominent usage-based relational patterns, helping to easily find badly-written and unpopular text. The study includes the problem statement and its characterization in the literature, as well as the proposed solving approach and some experimental use

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Archivio della Ricerca - Università di Pisa

Extracción de información en informes médicos

Author: Hervás Martín Lucía
Martínez Simón Víctor
Sánchez Martínez Irene
Publication venue
Publication date: 01/01/2013
Field of study

El acceso a la información contenida dentro de un informe médico es vital tanto para la investigación como para el tratamiento de los pacientes. Sin embargo, la información relevante suele estar escrita en lenguaje natural, por lo su procesamiento automático no es una tarea trivial. Con este objetivo en mente, hemos desarrollado un sistema capaz de obtener un archivo que represente el contenido más relevante de un informe clínico. Como parte de esta representación se deberán detectar aquellos conceptos médicos pertenecientes a una de las ontologías más utilizadas en este ámbito, UMLS. Además previamente se realizará un proceso automático de corrección ortográfica, expansión de acrónimos y detección de frases afirmadas, negadas y especuladas. Todo esto en dos de los idiomas más hablados a nivel mundial: español e inglés. Esta representación permitirá a su vez desarrollar aplicaciones que la utilicen, por lo que se ha implementado también un buscador de informes médicos como ejemplo de ello. Por último, como parte de este trabajo, también se incluye todo el proceso seguido durante nuestra participación en el Conference and Labs of the Evaluation Forum del año 2013, una de las organizaciones más conocidas a nivel internacional en el campo de la recuperación de información, así como el artículo científico desarrollado para la misma. [ABSTRACT] The information inside a medial report it’s very important for researchers and for the patient. But this information is usually written in natural language, so automatic processing isn’t a trivial task. With this target in mind, we developed a system that is able to generate a representation which contains the most relevant information in a medical report. It detects medical concepts from one of most popular biomedical ontologies, UMLS. Previously will also perform a spelling correction, acronym expansion and affirmed, negated and speculated sentences detection. All this process could be executed into the two most spoken languages in the world, English and Spanish. The representation will allow us to develop applications that use it. In fact it’s been including a searcher for medical reports to show an example of what can be done with our software. Finally, as part of this work, we explain our experience in our participation into the Conference and Labs of the Evaluation Forum 2013, a self-organized body whose is wellknown in the international IR community, and the paper generate for it

Docta Complutense