Search CORE

26 research outputs found

The ALVIS Format for Linguistically Annotated Documents

Author: Alphonse Erick
Derivière Julien
Hamon Thierry
Nazarenko Adeline
Vauvert Guillaume
Weissenbacher Davy
Publication venue
Publication date: 01/01/2006
Field of study

The paper describes the ALVIS annotation format designed for the indexing of large collections of documents in topic-specific search engines. This paper is exemplified on the biological domain and on MedLine abstracts, as developing a specialized search engine for biologists is one of the ALVIS case studies. The ALVIS principle for linguistic annotations is based on existing works and standard propositions. We made the choice of stand-off annotations rather than inserted mark-up. Annotations are encoded as XML elements which form the linguistic subsection of the document record

arXiv.org e-Print Archive

CiteSeerX

HAL-Paris 13

A Robust Linguistic Platform for Efficient and Domain specific Web Content Analysis

Author: Aubin Sophie
Derivière Julien
Hamon Thierry
Nazarenko Adeline
Poibeau Thierry
Publication venue
Publication date: 30/05/2007
Field of study

Web semantic access in specific domains calls for specialized search engines with enhanced semantic querying and indexing capacities, which pertain both to information retrieval (IR) and to information extraction (IE). A rich linguistic analysis is required either to identify the relevant semantic units to index and weight them according to linguistic specific statistical distribution, or as the basis of an information extraction process. Recent developments make Natural Language Processing (NLP) techniques reliable enough to process large collections of documents and to enrich them with semantic annotations. This paper focuses on the design and the development of a text processing platform, Ogmios, which has been developed in the ALVIS project. The Ogmios platform exploits existing NLP modules and resources, which may be tuned to specific domains and produces linguistically annotated documents. We show how the three constraints of genericity, domain semantic awareness and performance can be handled all together

arXiv.org e-Print Archive

CiteSeerX

HAL-Paris 13

Text Augmentation: Inserting markup into natural language text with PPM Models

Author: Yeates Stuart Andrew
Publication venue: The University of Waikato
Publication date: 01/01/2006
Field of study

This thesis describes a new optimisation and new heuristics for automatically marking up XML documents. These are implemented in CEM, using PPMmodels. CEM is significantly more general than previous systems, marking up large numbers of hierarchical tags, using n-gram models for large n and a variety of escape methods. Four corpora are discussed, including the bibliography corpus of 14682 bibliographies laid out in seven standard styles using the BIBTEX system and markedup in XML with every field from the original BIBTEX. Other corpora include the ROCLING Chinese text segmentation corpus, the Computists’ Communique corpus and the Reuters’ corpus. A detailed examination is presented of the methods of evaluating mark up algorithms, including computation complexity measures and correctness measures from the fields of information retrieval, string processing, machine learning and information theory. A new taxonomy of markup complexities is established and the properties of each taxon are examined in relation to the complexity of marked-up documents. The performance of the new heuristics and optimisation is examined using the four corpora

CiteSeerX

Research Commons@Waikato

CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

Author: Boujemaa Nozha
Compañó Ramón
Dosch Christoph
Geurts Joost
Karlgren Jussi
King Paul
Kompatsiaris Yiannis
Köhler Joachim
Le Moine Jean-Yves
Ortgies Robert
Point Jean-Charles
Rotenberg Boris
Rudström Åsa
Sebe Nicu
Publication venue: Chorus Project Consortium
Publication date: 01/01/2007
Field of study

Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

RISE – Research Institutes of Sweden

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Swedish Institute of Computer Science Publications Database

Software institutes' Online Digital Archive

Barry Smith an sich

Author: Erion Gerald J.
Zúñiga Y. Postigo Gloria
Publication venue
Publication date: 01/01/2017
Field of study

Festschrift in Honor of Barry Smith on the occasion of his 65th Birthday. Published as issue 4:4 of the journal Cosmos + Taxis: Studies in Emergent Order and Organization. Includes contributions by Wolfgang Grassl, Nicola Guarino, John T. Kearns, Rudolf Lüthe, Luc Schneider, Peter Simons, Wojciech Żełaniec, and Jan Woleński

PhilPapers

A customized semantic service retrieval methodology for the digital ecosystems environment

Author: Dong Hai
Publication venue: Curtin University
Publication date: 01/01/2010
Field of study

With the emergence of the Web and its pervasive intrusion on individuals, organizations, businesses etc., people now realize that they are living in a digital environment analogous to the ecological ecosystem. Consequently, no individual or organization can ignore the huge impact of the Web on social well-being, growth and prosperity, or the changes that it has brought about to the world economy, transforming it from a self-contained, isolated, and static environment to an open, connected, dynamic environment. Recently, the European Union initiated a research vision in relation to this ubiquitous digital environment, known as Digital (Business) Ecosystems. In the Digital Ecosystems environment, there exist ubiquitous and heterogeneous species, and ubiquitous, heterogeneous, context-dependent and dynamic services provided or requested by species. Nevertheless, existing commercial search engines lack sufficient semantic supports, which cannot be employed to disambiguate user queries and cannot provide trustworthy and reliable service retrieval. Furthermore, current semantic service retrieval research focuses on service retrieval in the Web service field, which cannot provide requested service retrieval functions that take into account the features of Digital Ecosystem services. Hence, in this thesis, we propose a customized semantic service retrieval methodology, enabling trustworthy and reliable service retrieval in the Digital Ecosystems environment, by considering the heterogeneous, context-dependent and dynamic nature of services and the heterogeneous and dynamic nature of service providers and service requesters in Digital Ecosystems.The customized semantic service retrieval methodology comprises: 1) a service information discovery, annotation and classification methodology; 2) a service retrieval methodology; 3) a service concept recommendation methodology; 4) a quality of service (QoS) evaluation and service ranking methodology; and 5) a service domain knowledge updating, and service-provider-based Service Description Entity (SDE) metadata publishing, maintenance and classification methodology.The service information discovery, annotation and classification methodology is designed for discovering ubiquitous service information from the Web, annotating the discovered service information with ontology mark-up languages, and classifying the annotated service information by means of specific service domain knowledge, taking into account the heterogeneous and context-dependent nature of Digital Ecosystem services and the heterogeneous nature of service providers. The methodology is realized by the prototype of a Semantic Crawler, the aim of which is to discover service advertisements and service provider profiles from webpages, and annotating the information with service domain ontologies.The service retrieval methodology enables service requesters to precisely retrieve the annotated service information, taking into account the heterogeneous nature of Digital Ecosystem service requesters. The methodology is presented by the prototype of a Service Search Engine. Since service requesters can be divided according to the group which has relevant knowledge with regard to their service requests, and the group which does not have relevant knowledge with regard to their service requests, we respectively provide two different service retrieval modules. The module for the first group enables service requesters to directly retrieve service information by querying its attributes. The module for the second group enables service requesters to interact with the search engine to denote their queries by means of service domain knowledge, and then retrieve service information based on the denoted queries.The service concept recommendation methodology concerns the issue of incomplete or incorrect queries. The methodology enables the search engine to recommend relevant concepts to service requesters, once they find that the service concepts eventually selected cannot be used to denote their service requests. We premise that there is some extent of overlap between the selected concepts and the concepts denoting service requests, as a result of the impact of service requesters’ understandings of service requests on the selected concepts by a series of human-computer interactions. Therefore, a semantic similarity model is designed that seeks semantically similar concepts based on selected concepts.The QoS evaluation and service ranking methodology is proposed to allow service requesters to evaluate the trustworthiness of a service advertisement and rank retrieved service advertisements based on their QoS values, taking into account the contextdependent nature of services in Digital Ecosystems. The core of this methodology is an extended CCCI (Correlation of Interaction, Correlation of Criterion, Clarity of Criterion, and Importance of Criterion) metrics, which allows a service requester to evaluate the performance of a service provider in a service transaction based on QoS evaluation criteria in a specific service domain. The evaluation result is then incorporated with the previous results to produce the eventual QoS value of the service advertisement in a service domain. Service requesters can rank service advertisements by considering their QoS values under each criterion in a service domain.The methodology for service domain knowledge updating, service-provider-based SDE metadata publishing, maintenance, and classification is initiated to allow: 1) knowledge users to update service domain ontologies employed in the service retrieval methodology, taking into account the dynamic nature of services in Digital Ecosystems; and 2) service providers to update their service profiles and manually annotate their published service advertisements by means of service domain knowledge, taking into account the dynamic nature of service providers in Digital Ecosystems. The methodology for service domain knowledge updating is realized by a voting system for any proposals for changes in service domain knowledge, and by assigning different weights to the votes of domain experts and normal users.In order to validate the customized semantic service retrieval methodology, we build a prototype – a Customized Semantic Service Search Engine. Based on the prototype, we test the mathematical algorithms involved in the methodology by a simulation approach and validate the proposed functions of the methodology by a functional testing approach

espace@Curtin

Recommended from our members

Sociolinguistically Driven Approaches for Just Natural Language Processing

Author: Blodgett Su Lin
Publication venue: ScholarWorks@UMass Amherst
Publication date: 06/04/2021
Field of study

Natural language processing (NLP) systems are now ubiquitous. Yet the benefits of these language technologies do not accrue evenly to all users, and indeed they can be harmful; NLP systems reproduce stereotypes, prevent speakers of non-standard language varieties from participating fully in public discourse, and re-inscribe historical patterns of linguistic stigmatization and discrimination. How harms arise in NLP systems, and who is harmed by them, can only be understood at the intersection of work on NLP, fairness and justice in machine learning, and the relationships between language and social justice. In this thesis, we propose to address two questions at this intersection: i) How can we conceptualize harms arising from NLP systems?, and ii) How can we quantify such harms? We propose the following contributions. First, we contribute a model in order to collect the first large dataset of African American Language (AAL)-like social media text. We use the dataset to quantify the performance of two types of NLP systems, identifying disparities in model performance between Mainstream U.S. English (MUSE)- and AAL-like text. Turning to the landscape of bias in NLP more broadly, we then provide a critical survey of the emerging literature on bias in NLP and identify its limitations. Drawing on work across sociology, sociolinguistics, linguistic anthropology, social psychology, and education, we provide an account of the relationships between language and injustice, propose a taxonomy of harms arising from NLP systems grounded in those relationships, and propose a set of guiding research questions for work on bias in NLP. Finally, we adapt the measurement modeling framework from the quantitative social sciences to effectively evaluate approaches for quantifying bias in NLP systems. We conclude with a discussion of recent work on bias through the lens of style in NLP, raising a set of normative questions for future work

ScholarWorks@UMass Amherst

Interventions using digital tools to improve students’ engagement and learning outcomes in higher business education

Author: Bertheussen Bernt Arne
Publication venue: UiT Norges arktiske universitet
Publication date: 01/01/2016
Field of study

The papers of this thesis are not available in Munin. Paper 1: Bertheussen, B. A.: "Cultivating spreadsheet usage in a finance course through learning and assessment innovations". Available in International Journal of Innovation in Education 2015, 3(1). Paper 2: Bertheussen, B. A., Myrland, Ø.: "Relation between academic performance and students’ engagement in digital learning activities". Available in Journal of Education for Business 2016, 91(3), 1–7. Paper 3: Bertheussen, B. A.: "Er handelshøyskolene innelåst i historiske pedagogiske spor?". Available in Magma 2013, 16(5),40–48. Paper 4: Bertheussen, B. A. "Ruteark eller regneark. Kognitive utfordringer med å løse finansoppgaver på papier og PC". Available in Uniped 2012, 35(3):87–101. Paper 5: Bertheussen, B. A.: "Validating a Digital Assessment Practice". (Manuscript). Paper 6: Bertheussen, B. A. "Power to business professors. Automatic grading of problem-solving tasks". Available in Journal of Accounting Education 2014, 32(1):76–87. Paper 7: Bertheussen, B. A.: "Automatisk formativ feedback kan gi god motivasjon og læring". Available in Uniped 2014, 37(4):59–71. Paper 8: Bertheussen, B. A. "Revitalizing plenary finance lectures". Available in Beta 2013, 27(1):78–92. The purpose of the present study was to develop interventions using digital tools to improve student engagement and learning outcomes. The empirical context was an undergraduate finance course wherein digital learning and assessment interventions were important features of the course design. When designing the interventions, the development activities were underpinned by pedagogical principles based on cognitive and sociocultural learning perspectives. Special emphasis was placed on integrating spreadsheet usage into all learning and assessment activities and constructively aligning course targets, assessment tasks and learning activities with the overall goal to foster an active and engaging learning environment. In addition, rooted in a pragmatic research paradigm, the methodology utilised includes many similarities with interventionist action research, which has gained a foothold in qualitative management accounting research. This interventionist research project includes two main contributions. The first is its impact on practice by designing and developing interventions to solve complex problems in an authentic classroom setting. Consequently, six practical educational interventions are discussed in this dissertation. The second contribution is theory building, which advances our knowledge regarding the characteristics of the interventions and the process of designing and developing them. Consequently, a total of eight refereed scientific articles have been produced during this research and development project. As outlined in this study, the development of the digital formative feedback intervention, is in line with research stating that, in higher education, traditional paper-based feedback is being supplemented with and in some cases replaced by innovative use of ICT. Moreover, software algorithms can effectively provide detailed and helpful individual formative feedback to students regarding their learning processes and outcomes. This study strongly supports the claim that it is problematic to use technology to enhance learning without recognition through assessments. The digital summative assessment intervention reported is regarded as a precondition for establishing a spreadsheet user-culture in the subject, especially as it served as an ‘icebreaker’ for other learning interventions that were integrated into the course design. The intervention processes discussed have been through several iterations and their stepwise development and implementation have emerged through negotiating, compromising and resolving tension between the practitioner researcher, students and institution. The resulting compromises resolved tensions which sometimes resulted from limited physical resources. As the students valued the outcome from engaging in the digital learning and assessment interventions, they had a flexible attitude and deployed their private infrastructure (laptops) within the learning environment. Consequently, a vital part of the institution’s infrastructure was transformed from a fixed asset (number of PCs available in a data lab) to a flexible asset in the theatres. This compromise that was negotiated between the institution, the practitioner researcher and the students was essential for the digital educational interventions to work and progress. The overall theoretical research findings from this study are presented in the form of a tentative framework, which can help bridge the gap between the intervention practice and theory. A central conjecture in the framework is that tool usage that is integrated into interventions can be influential on learning activity and engagement and consequently on students’ learning outcomes. Moreover, the framework supports the notion of ICT as a mediating cultural tool that provides a new type of affordance that can extend the mind and promote an active and engaging learning environment. In particular, integrating a spreadsheet tool in learning of management accounting subjects can offer opportunities for learners to rapidly construct financial models, enable simulations using the completed models and stimulate subject reflections based on the functions of the models and their results. The practical outcome of this study has been emphasised through the development of artefacts that aim to support practitioners intending to integrate spreadsheet usage within their subject teaching and learning. By publishing and sharing the artefacts, the current research project is capable of informing future development and implementation decisions by guiding practitioners in similar pedagogical contexts

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Modeling competing risks in discrete event simulation models: illustrating and comparing different approaches

Author: Degeling K.
IJzerman M. J.
Koffijberg H.
Publication venue
Publication date: 01/05/2017
Field of study

University of Twente Research Information