2,304 research outputs found

    Weaving Entities into Relations: From Page Retrieval to Relation Mining on the Web

    Get PDF
    With its sheer amount of information, the Web is clearly an important frontier for data mining. While Web mining must start with content on the Web, there is no effective ``search-based'' mechanism to help sifting through the information on the Web. Our goal is to provide a such online search-based facility for supporting query primitives, upon which Web mining applications can be built. As a first step, this paper aims at entity-relation discovery, or E-R discovery, as a useful function-- to weave scattered entities on the Web into coherent relations. To begin with, as our proposal, we formalize the concept of E-R discovery. Further, to realize E-R discovery, as our main thesis, we abstract tuple ranking-- the essential challenge of E-R discovery-- as pattern-based cooccurrence analysis. Finally, as our key insight, we observe that such relation mining shares the same core functions as traditional page-retrieval systems, which enables us to build the new E-R discovery upon today's search engines, almost for free. We report our system prototype and testbed, WISDM-ER, with real Web corpus. Our case studies have demonstrated a high promise, achieving 83%-91% accuracy for real benchmark queries-- and thus the real possibilities of enabling ad-hoc Web mining tasks with online E-R discovery

    A posteriori metadata from automated provenance tracking: Integration of AiiDA and TCOD

    Full text link
    In order to make results of computational scientific research findable, accessible, interoperable and re-usable, it is necessary to decorate them with standardised metadata. However, there are a number of technical and practical challenges that make this process difficult to achieve in practice. Here the implementation of a protocol is presented to tag crystal structures with their computed properties, without the need of human intervention to curate the data. This protocol leverages the capabilities of AiiDA, an open-source platform to manage and automate scientific computational workflows, and TCOD, an open-access database storing computed materials properties using a well-defined and exhaustive ontology. Based on these, the complete procedure to deposit computed data in the TCOD database is automated. All relevant metadata are extracted from the full provenance information that AiiDA tracks and stores automatically while managing the calculations. Such a protocol also enables reproducibility of scientific data in the field of computational materials science. As a proof of concept, the AiiDA-TCOD interface is used to deposit 170 theoretical structures together with their computed properties and their full provenance graphs, consisting in over 4600 AiiDA nodes

    Review of five English learners' dictionaries on CD-ROM

    Get PDF

    Language technologies for a multilingual Europe

    Get PDF
    This volume of the series “Translation and Multilingual Natural Language Processing” includes most of the papers presented at the Workshop “Language Technology for a Multilingual Europe”, held at the University of Hamburg on September 27, 2011 in the framework of the conference GSCL 2011 with the topic “Multilingual Resources and Multilingual Applications”, along with several additional contributions. In addition to an overview article on Machine Translation and two contributions on the European initiatives META-NET and Multilingual Web, the volume includes six full research articles. Our intention with this workshop was to bring together various groups concerned with the umbrella topics of multilingualism and language technology, especially multilingual technologies. This encompassed, on the one hand, representatives from research and development in the field of language technologies, and, on the other hand, users from diverse areas such as, among others, industry, administration and funding agencies. The Workshop “Language Technology for a Multilingual Europe” was co-organised by the two GSCL working groups “Text Technology” and “Machine Translation” (http://gscl.info) as well as by META-NET (http://www.meta-net.eu)

    A user's guide to electronic dictionaries for language learners

    Get PDF

    A review of software for text analysis

    Full text link
    Der Band bespricht eine Auswahl an Software für eine computerunterstützte Textanalyse. Das vorrangige Ziel besteht darin, einen detaillierten und aktuellen Überblick über das Spektrum vorhandener Textanalyse-Software zu geben und die Unterstützungsarten, die die ausgewählte Software dem Nutzer bietet, zu katalogisieren. Ein damit verbundenes, allgemeineres Ziel ist, die Tendenzen sowohl in der Funktionalität als auch in der Technologie aufzuzeigen und die Bereiche zu identifizieren, in denen noch mehr Entwicklung nötig ist. Die vorgestellte Auswahl an Software umfasst aus diesem Grund nicht nur voll entwickelte kommerzielle Programme und Forschungsprogramme, sondern auch Prototypen und Beta-Versionen. Ein weiterer Aspekt mit Blick auf die vorgestellten Softwarearten besteht darin, dass sowohl qualitativ als auch quantitativ orientierte Forschungstypen eingeschlossen werden. Der Textanalyst kann in Abhängigkeit von Forschungszwecken und Projektdesign von verfügbaren Tools unabhängig von deren Ausrichtung profitieren. Bei der Computerunterstützung kommt es gegenwärtig oft vor, dass die Grenzen zwischen quantitativen und qualitativen Methodologien "unklar" werden; stattdessen kann man eine Reihe von Gemeinsamkeiten entdecken, die in einen breiteren Textanalyse-Zusammenhang gestellt werden können. Folgende 15 Programme werden besprochen: AQUAD, ATLAS.ti, CoAN, Code-A-Text, DICTION, DIMAP-MCCA, HyperRESEARCH, KEDS, NUD-IST, QED, TATOE, TEXTPACK, TextSmart, WinMAXpro und WordStat. Der letzte Teil des Bandes enthält eine ausführliche Diskussion über Textanalyse-Programme und über die konkreten Problemstellungen, die sich aus der Besprechung ergeben. (ICIÜbers)"The book reviews a selection of software for computer-assisted text analysis. The primary aim is to provide a detailed (and up-to-date) account of the spectrum of available text analysis software and catalogue the kinds of support the selected software offers to the user. A related, more general, goal is to record the tendencies both in functionality and technology and identify the areas where more development is needed. For this reason the presented selection of software comprises not only fully developed commercial and research programs, but also prototypes and beta versions. An additional aspect with regards to the kinds of software reviewed is that both qualitative and quantitative-oriented types of research are included. Depending on research purposes and project design the text analyst can profit from available tools independently of their orientation. Today it is often the case that in computational support, the borderline between quantitative and qualitative methodologies can become 'obscure'; instead, one can detect a number of commonalities which can be placed within a broader text analysis context. The following fifteen programs are reviewed: AQUAD, ATLAS.ti, CoAN, Code-A-Text, DICTION, DIMAP-MCCA, HyperRESEARCH, KEDS, NUD-IST, QED, TATOE, TEXTPACK, TextSmart, WinMAXpro, and WordStat and the criteria and methodology used for selecting them are delineated. The last part of the book contains an extensive discussion about text analysis programs and the concrete issues raised from the review." (author's abstract

    Natural language software registry (second edition)

    Get PDF
    corecore