5 research outputs found

    Correct your Text with Google

    No full text
    to appear in the Proceedings of the International Conference on Web Intelligence, IEEE 2007.International audienceWith the increasing amount of text files that are produced nowadays, spell checkers have become essential tools for everyday tasks of millions of end users. Among the years, several tools have been designed that show decent performances. Of course, grammatical checkers may improve corrections of texts, nevertheless, this requires large resources. We think that basic spell checking may be improved (a step towards) using the Web as a corpus and taking into account the context of words that are identified as potential misspellings. We propose to use the Google search engine and some machine learning techniques, in order to design a flexible and dynamic spell checker that may evolve among the time with new linguistic features

    The Snippets Taxonomy in Web Search Engines

    Full text link
    In this paper authors analyzed 50 000 keywords results collected from localized Polish Google search engine. We proposed a taxonomy for snippets displayed in search results as regular, rich, news, featured and entity types snippets. We observed some correlations between overlapping snippets in the same keywords. Results show that commercial keywords do not cause results having rich or entity types snippets, whereas keywords resulting with snippets are not commercial nature. We found that significant number of snippets are scholarly articles and rich cards carousel. We conclude our findings with conclusion and research limitations.Comment: 12 pages, 3 table

    An Emergent Approach to Text Analysis Based on a Connectionist Model and the Web

    Get PDF
    In this paper, we present a method to provide proactive assistance in text checking, based on usage relationships between words structuralized on the Web. For a given sentence, the method builds a connectionist structure of relationships between word n-grams. Such structure is then parameterized by means of an unsupervised and language agnostic optimization process. Finally, the method provides a representation of the sentence that allows emerging the least prominent usage-based relational patterns, helping to easily find badly-written and unpopular text. The study includes the problem statement and its characterization in the literature, as well as the proposed solving approach and some experimental use

    Extracción de información en informes médicos

    Get PDF
    El acceso a la información contenida dentro de un informe médico es vital tanto para la investigación como para el tratamiento de los pacientes. Sin embargo, la información relevante suele estar escrita en lenguaje natural, por lo su procesamiento automático no es una tarea trivial. Con este objetivo en mente, hemos desarrollado un sistema capaz de obtener un archivo que represente el contenido más relevante de un informe clínico. Como parte de esta representación se deberán detectar aquellos conceptos médicos pertenecientes a una de las ontologías más utilizadas en este ámbito, UMLS. Además previamente se realizará un proceso automático de corrección ortográfica, expansión de acrónimos y detección de frases afirmadas, negadas y especuladas. Todo esto en dos de los idiomas más hablados a nivel mundial: español e inglés. Esta representación permitirá a su vez desarrollar aplicaciones que la utilicen, por lo que se ha implementado también un buscador de informes médicos como ejemplo de ello. Por último, como parte de este trabajo, también se incluye todo el proceso seguido durante nuestra participación en el Conference and Labs of the Evaluation Forum del año 2013, una de las organizaciones más conocidas a nivel internacional en el campo de la recuperación de información, así como el artículo científico desarrollado para la misma. [ABSTRACT] The information inside a medial report it’s very important for researchers and for the patient. But this information is usually written in natural language, so automatic processing isn’t a trivial task. With this target in mind, we developed a system that is able to generate a representation which contains the most relevant information in a medical report. It detects medical concepts from one of most popular biomedical ontologies, UMLS. Previously will also perform a spelling correction, acronym expansion and affirmed, negated and speculated sentences detection. All this process could be executed into the two most spoken languages in the world, English and Spanish. The representation will allow us to develop applications that use it. In fact it’s been including a searcher for medical reports to show an example of what can be done with our software. Finally, as part of this work, we explain our experience in our participation into the Conference and Labs of the Evaluation Forum 2013, a self-organized body whose is wellknown in the international IR community, and the paper generate for it