2,133 research outputs found
Acquiring Compound Word Translations both Automatically and Dynamically
This paper addresses the problem of compound word translation and proposes the approaches to acquiring translations. The proposed approaches focus on exploring web data and utilizing English translations to link words of the source language and the correspondences in the target language. The paper uses Japanese-Chinese language pairs for the sake of illustration and shows initial experimental results. The proposed method is language-independent and therefore can be applied to other language pairs
D7.1. Criteria for evaluation of resources, technology and integration.
This deliverable defines how evaluation is carried out at each integration cycle in the PANACEA project. As PANACEA aims at producing large scale resources, evaluation becomes a critical and challenging issue. Critical because it is important to assess the quality of the results that should be delivered to users. Challenging because we prospect rather new areas, and through a technical platform: some new methodologies will have to be explored or old ones to be adapted
Foundation, Implementation and Evaluation of the MorphoSaurus System: Subword Indexing, Lexical Learning and Word Sense Disambiguation for Medical Cross-Language Information Retrieval
Im medizinischen Alltag, zu welchem viel Dokumentations- und Recherchearbeit gehört, ist mittlerweile der ĂŒberwiegende Teil textuell kodierter Information elektronisch verfĂŒgbar. Hiermit kommt der Entwicklung leistungsfĂ€higer Methoden zur effizienten Recherche eine vorrangige Bedeutung zu.
Bewertet man die NĂŒtzlichkeit gĂ€ngiger Textretrievalsysteme aus dem Blickwinkel der medizinischen Fachsprache, dann mangelt es ihnen an morphologischer FunktionalitĂ€t (Flexion, Derivation und Komposition), lexikalisch-semantischer FunktionalitĂ€t und der FĂ€higkeit zu einer sprachĂŒbergreifenden Analyse groĂer DokumentenbestĂ€nde.
In der vorliegenden Promotionsschrift werden die theoretischen Grundlagen des MorphoSaurus-Systems (ein Akronym fĂŒr Morphem-Thesaurus) behandelt. Dessen methodischer Kern stellt ein um Morpheme der medizinischen Fach- und Laiensprache gruppierter Thesaurus dar, dessen EintrĂ€ge mittels semantischer Relationen sprachĂŒbergreifend verknĂŒpft sind. Darauf aufbauend wird ein Verfahren vorgestellt, welches (komplexe) Wörter in Morpheme segmentiert, die durch sprachunabhĂ€ngige, konzeptklassenartige Symbole ersetzt werden. Die resultierende ReprĂ€sentation ist die Basis fĂŒr das sprachĂŒbergreifende, morphemorientierte Textretrieval.
Neben der Kerntechnologie wird eine Methode zur automatischen Akquise von LexikoneintrĂ€gen vorgestellt, wodurch bestehende Morphemlexika um weitere Sprachen ergĂ€nzt werden. Die BerĂŒcksichtigung sprachĂŒbergreifender PhĂ€nomene fĂŒhrt im Anschluss zu einem neuartigen Verfahren zur Auflösung von semantischen AmbiguitĂ€ten.
Die LeistungsfĂ€higkeit des morphemorientierten Textretrievals wird im Rahmen umfangreicher, standardisierter Evaluationen empirisch getestet und gĂ€ngigen Herangehensweisen gegenĂŒbergestellt
Recommended from our members
PowerAqua: Open Question Answering on the Semantic Web
With the rapid growth of semantic information in the Web, the processes of searching and querying these very large amounts of heterogeneous content have become increasingly challenging. This research tackles the problem of supporting users in querying and exploring information across multiple and heterogeneous Semantic Web (SW) sources.
A review of literature on ontology-based Question Answering reveals the limitations of existing technology. Our approach is based on providing a natural language Question Answering interface for the SW, PowerAqua. The realization of PowerAqua represents a considerable advance with respect to other systems, which restrict their scope to an ontology-specific or homogeneous fraction of the publicly available SW content. To our knowledge, PowerAqua is the only system that is able to take advantage of the semantic data available on the Web to interpret and answer user queries posed in natural language. In particular, PowerAqua is uniquely able to answer queries by combining and aggregating information, which can be distributed across heterogeneous semantic resources.
Here, we provide a complete overview of our work on PowerAqua, including: the research challenges it addresses; its architecture; the techniques we have realised to map queries to semantic data, to integrate partial answers drawn from different semantic resources and to rank alternative answers; and the evaluation studies we have performed, to assess the performance of PowerAqua. We believe our experiences can be extrapolated to a variety of end-user applications that wish to open up to large scale and heterogeneous structured datasets, to be able to exploit effectively what possibly is the greatest wealth of data in the history of Artificial Intelligence
Paradigms of Mediated Translation in Armenian: An Exploration
This paper examines four discreet issues influencing the macro-context of mediated translations into Armenian from Late Antiquity to the modern period. The first treats religious scripture, reviewing the very different contexts for the translation of the Hebrew Scriptures (5thcentury) and the QurâÄn (17th century). The second analyzes the Silk Route as a vehicle for exchange between peripheral cultures facilitating the Armenian reception of two works of Sanskrit literature. The third pursues evolving literary traditions and their textual diffusion via a case study of the Alexander Romance. Meanwhile, the fourth examines the nature of colonial experiments in the 18th-19th centuries in creating regionality within the wider process of globalization that impinged on the translation processes of communities in different parts of the Armenian oikoumene of the time with special attention to Mesrop TaĆiadianâs novel VÄp Vardgisi of 1846
COSPO/CENDI Industry Day Conference
The conference's objective was to provide a forum where government information managers and industry information technology experts could have an open exchange and discuss their respective needs and compare them to the available, or soon to be available, solutions. Technical summaries and points of contact are provided for the following sessions: secure products, protocols, and encryption; information providers; electronic document management and publishing; information indexing, discovery, and retrieval (IIDR); automated language translators; IIDR - natural language capabilities; IIDR - advanced technologies; IIDR - distributed heterogeneous and large database support; and communications - speed, bandwidth, and wireless
Challenges to knowledge representation in multilingual contexts
To meet the increasing demands of the complex inter-organizational processes and the demand for
continuous innovation and internationalization, it is evident that new forms of organisation are
being adopted, fostering more intensive collaboration processes and sharing of resources, in what
can be called collaborative networks (Camarinha-Matos, 2006:03). Information and knowledge are
crucial resources in collaborative networks, being their management fundamental processes to
optimize.
Knowledge organisation and collaboration systems are thus important instruments for the success of
collaborative networks of organisations having been researched in the last decade in the areas of
computer science, information science, management sciences, terminology and linguistics.
Nevertheless, research in this area didnât give much attention to multilingual contexts of
collaboration, which pose specific and challenging problems. It is then clear that access to and
representation of knowledge will happen more and more on a multilingual setting which implies the
overcoming of difficulties inherent to the presence of multiple languages, through the use of
processes like localization of ontologies.
Although localization, like other processes that involve multilingualism, is a rather well-developed
practice and its methodologies and tools fruitfully employed by the language industry in the
development and adaptation of multilingual content, it has not yet been sufficiently explored as an
element of support to the development of knowledge representations - in particular ontologies -
expressed in more than one language. Multilingual knowledge representation is then an open
research area calling for cross-contributions from knowledge engineering, terminology, ontology
engineering, cognitive sciences, computational linguistics, natural language processing, and
management sciences.
This workshop joined researchers interested in multilingual knowledge representation, in a
multidisciplinary environment to debate the possibilities of cross-fertilization between knowledge
engineering, terminology, ontology engineering, cognitive sciences, computational linguistics,
natural language processing, and management sciences applied to contexts where multilingualism
continuously creates new and demanding challenges to current knowledge representation methods
and techniques.
In this workshop six papers dealing with different approaches to multilingual knowledge
representation are presented, most of them describing tools, approaches and results obtained in the
development of ongoing projects.
In the first case, AndrĂ©s DomĂnguez Burgos, Koen Kerremansa and Rita Temmerman present a
software module that is part of a workbench for terminological and ontological mining,
Termontospider, a wiki crawler that aims at optimally traverse Wikipedia in search of domainspecific
texts for extracting terminological and ontological information. The crawler is part of a tool
suite for automatically developing multilingual termontological databases, i.e. ontologicallyunderpinned
multilingual terminological databases. In this paper the authors describe the basic principles
behind the crawler and summarized the research setting in which the tool is currently tested.
In the second paper, Fumiko Kano presents a work comparing four feature-based similarity
measures derived from cognitive sciences. The purpose of the comparative analysis presented by the author is to verify the potentially most effective model that can be applied for mapping independent ontologies in a culturally influenced domain. For that, datasets based on standardized
pre-defined feature dimensions and values, which are obtainable from the UNESCO Institute for
Statistics (UIS) have been used for the comparative analysis of the similarity measures. The purpose
of the comparison is to verify the similarity measures based on the objectively developed datasets.
According to the author the results demonstrate that the Bayesian Model of Generalization provides
for the most effective cognitive model for identifying the most similar corresponding concepts
existing for a targeted socio-cultural community.
In another presentation, Thierry Declerck, Hans-Ulrich Krieger and Dagmar Gromann present an
ongoing work and propose an approach to automatic extraction of information from multilingual
financial Web resources, to provide candidate terms for building ontology elements or instances of
ontology concepts. The authors present a complementary approach to the direct
localization/translation of ontology labels, by acquiring terminologies through the access and
harvesting of multilingual Web presences of structured information providers in the field of finance,
leading to both the detection of candidate terms in various multilingual sources in the financial
domain that can be used not only as labels of ontology classes and properties but also for the
possible generation of (multilingual) domain ontologies themselves.
In the next paper, Manuel Silva, AntĂłnio Lucas Soares and Rute Costa claim that despite the
availability of tools, resources and techniques aimed at the construction of ontological artifacts,
developing a shared conceptualization of a given reality still raises questions about the principles
and methods that support the initial phases of conceptualization. These questions become, according
to the authors, more complex when the conceptualization occurs in a multilingual setting. To tackle
these issues the authors present a collaborative platform â conceptME - where terminological and
knowledge representation processes support domain experts throughout a conceptualization
framework, allowing the inclusion of multilingual data as a way to promote knowledge sharing and
enhance conceptualization and support a multilingual ontology specification.
In another presentation Frieda Steurs and Hendrik J. Kockaert present us TermWise, a large project
dealing with legal terminology and phraseology for the Belgian public services, i.e. the translation
office of the ministry of justice, a project which aims at developing an advanced tool including
expert knowledge in the algorithms that extract specialized language from textual data (legal
documents) and whose outcome is a knowledge database including Dutch/French equivalents for
legal concepts, enriched with the phraseology related to the terms under discussion.
Finally, Deborah Grbac, Luca Losito, Andrea Sada and Paolo Sirito report on the preliminary
results of a pilot project currently ongoing at UCSC Central Library, where they propose to adapt to
subject librarians, employed in large and multilingual Academic Institutions, the model used by
translators working within European Union Institutions. The authors are using User Experience
(UX) Analysis in order to provide subject librarians with a visual support, by means of âontology
tablesâ depicting conceptual linking and connections of words with concepts presented according to
their semantic and linguistic meaning.
The organizers hope that the selection of papers presented here will be of interest to a broad audience, and will be a starting point for further discussion and cooperation
- âŠ