33 research outputs found

    Complex question answering on semi-structured repositories: a user centric process enhanced with context

    Get PDF
    A Teia Mundial (Web) foi prevista como uma rede de documentos de hipertexto interligados de forma a criar uma espaço de informação onde humanos e máquinas poderiam comunicar. No entanto, a informação contida na Web tradicional foi/é armazenada de forma não estruturada o que leva a que apenas os humanos a possam consumir convenientemente. Consequentemente, a procura de informações na Web sintáctica é uma tarefa principalmente executada pelos humanos e nesse sentido nem sempre é fácil de concretizar. Neste contexto, tornou-se essencial a evolução para uma Web mais estruturada e mais significativa onde é dado significado bem definido à informação de forma a permitir a cooperação entre humanos e máquinas. Esta Web é usualmente referida como Web Semântica. Além disso, a Web Semântica é totalmente alcançável apenas se os dados de diferentes fontes forem ligados criando assim um repositório de Dados Abertos Ligados (LOD). Com o aparecimento de uma nova Web de Dados (Abertos) Ligados (i.e. a Web Semântica), novas oportunidades e desafios surgiram. Pergunta Resposta (QA) sobre informação semântica é actualmente uma área de investigação activa que tenta tirar vantagens do uso das tecnologias ligadas à Web Semântica para melhorar a tarefa de responder a questões. O principal objectivo do projecto World Search passa por explorar a Web Semântica para criar mecanismos que suportem os utilizadores de domínios de aplicação específicos a responder a questões complexas com base em dados oriundos de diferentes repositórios. No entanto, a avaliação feita ao estado da arte permite concluir que as aplicações existentes não suportam os utilizadores na resposta a questões complexas. Nesse sentido, o trabalho desenvolvido neste documento foca-se em estudar/desenvolver metodologias/processos que permitam ajudar os utilizadores a encontrar respostas exactas/corretas para questões complexas que não podem ser respondidas fazendo uso dos sistemas tradicionais. Tal inclui: (i) Ultrapassar a dificuldade dos utilizadores visionarem o esquema subjacente aos repositórios de conhecimento; (ii) Fazer a ponte entre a linguagem natural expressa pelos utilizadores e a linguagem (formal) entendível pelos repositórios; (iii) Processar e retornar informações relevantes que respondem apropriadamente às questões dos utilizadores. Para esse efeito, são identificadas um conjunto de funcionalidades que são consideradas necessárias para suportar o utilizador na resposta a questões complexas. É também fornecida uma descrição formal dessas funcionalidades. A proposta é materializada num protótipo que implementa as funcionalidades previamente descritas. As experiências realizadas com o protótipo desenvolvido demonstram que os utilizadores efectivamente beneficiam das funcionalidades apresentadas: ▪ Pois estas permitem que os utilizadores naveguem eficientemente sobre os repositórios de informação; ▪ O fosso entre as conceptualizações dos diferentes intervenientes é minimizado; ▪ Os utilizadores conseguem responder a questões complexas que não conseguiam responder com os sistemas tradicionais. Em suma, este documento apresenta uma proposta que comprovadamente permite, de forma orientada pelo utilizador, responder a questões complexas em repositórios semiestruturados.The World Wide Web (WWW) was envisioned as a network of interlinked hypertext documents thus creating an information space where humans and machines should be able to communicate. However, information published in the traditional WWW was/is unstructured and therefore is (mostly) consumable by humans only. As a consequence, searching and retrieving information in this syntactic and ever evolving WWW is a task that is mainly performed by humans and therefore it may not be trivial. In this sense, the evolution to a more structured and meaningful web where information is given well-defined meaning thus enabling cooperation between humans and machines is mandatory. This web is usually referred to as Semantic Web. Moreover, the Semantic Web is only fully achievable if data from different resources is connected in order to create a Linked Open Data (LOD) repository. This new Web of Linked (Open) Data (i.e. the Semantic Web) has opened a new set of opportunities but also some new challenges. Question Answering (QA) over semantic information is now an active research field that tries to take advantage of the Semantic Web technologies to improve the question answering task. In this sense, the main goal of this work is to help users finding accurate answers for complex questions that may not be answered using traditional systems. To achieve this goal, it is proposed a user centric process comprehending a set of functionalities that are iteratively, incrementally and interactively exploited. The proposed process and functionalities aim to help users building complex queries against semi-structured repositories (e.g. LOD repositories)

    Distributed Web-Scale Infrastructure For Crawling, Indexing And Search With Semantic Support

    Get PDF
    In this paper, we describe our work in progress in the scope of web-scale informationextraction and information retrieval utilizing distributed computing. Wepresent a distributed architecture built on top of the MapReduce paradigm forinformation retrieval, information processing and intelligent search supportedby spatial capabilities. Proposed architecture is focused on crawling documentsin several different formats, information extraction, lightweight semantic annotationof the extracted information, indexing of extracted information andfinally on indexing of documents based on the geo-spatial information foundin a document. We demonstrate the architecture on two use cases, where thefirst is search in job offers retrieved from the LinkedIn portal and the second issearch in BBC news feeds and discuss several problems we had to face duringthe implementation. We also discuss spatial search applications for both casesbecause both LinkedIn job offer pages and BBC news feeds contain a lot of spatialinformation to extract and process

    Izvori wikipedije: relevantne i /ili ne relevantne informacije

    Get PDF
    Pojavom interneta i postupnim stvaranjem globalne informacijske mreže dolazi do informacijske revolucije devedesetih godina dvadesetog stoljeća. Wikipedia je online enciklopedija otvorenog sadržaja (www.wikipedia.org) uvjerljiv je primjer proizvodnje otvorenog koda, čiji otvoreni slobodni sadržaj može bilo tko urediti. Wikipedija je međunarodni internetski projekt koji pokušava stvoriti besplatnu enciklopediju u više jezika, koja je nastala djelovanjem volontera pomoću Wiki softvera. Wikipedija potiče suradnike da postanu "registrirani korisnici" iznošenjem prednosti korisničkog računa, uključujući izgradnju ugleda u zajednici. Sav sadržaj Wikipedije licenciran je pod licencom GNU Free Documentation. Vrednovanje sadržaja Wikipedije pomaže čitatelju pri identifikaciji kvalitetnih članaka, a sama procjena kvalitete predstavlja osobit izazov, prvenstveno zbog dinamičke orijentacije ove mrežne enciklopedije i pripadajućih karakteristika koje uvelike otežavaju ovaj posao. Vandalizam obuhvaća dodavanje, brisanje ili modificiranje teksta članka, a istraživanja pokazuju da destruktivne izmjene čine 3-6% od ukupnog broja. Mnoga istraživanja pokazuju da se Wikipedija kao izvor informacija koristi u privatne, ali i akademske svrhe jer pruža trenutne i sažete informacije na jednostavan način. Cilj ovog rada je pružiti kratak uvod u povijest Wikipedije njezin nastanak i razvoj, informacije koje ona pruža te vrednovanje tih informacija. Rad će pokušati donijeti zaključak koliko su informacije na Wikipediji legitimne te je li ih moguće koristiti u akademske svrhe

    Social Media and Collective Intelligence: Ongoing and Future Research Streams

    Get PDF
    The tremendous growth in the use of Social Media has led to radical paradigm shifts in the ways we communicate, collaborate, consume, and create information. Our focus in this special issue is on the reciprocal interplay of Social Media and Collective Intelligence. We therefore discuss constituting attributes of Social Media and Collective Intelligence, and we structure the rapidly growing body of literature including adjacent research streams such as Social Network Analysis, Web Science, and Computational Social Science. We conclude by making propositions for future research where in particular the disciplines of artificial intelligence, computer science, and information systems can substantially contribute to the interdisciplinary academic discourse

    Information Systems for “Wicked Problems” - Research at the Intersection of Social Media and Collective Intelligence

    Get PDF
    The objective of this commentary is to propose fruitful research directions built upon the reciprocal interplay of social media and collective intelligence. We focus on “wicked problems” – a class of problems that Introne et al. (Künstl. Intell. 27:45–52, 2013) call “prob- lems for which no single computational formulation of the problem is suffi- cient, for which different stakeholders do not even agree on what the prob- lem really is, and for which there are no right or wrong answers, only answers that are better or worse from differ- ent points of view”. We argue that in- formation systems research in partic- ular can aid in designing appropriate systems due to benefits derived from the combined perspectives of both so- cial media and collective intelligence. We document the relevance and time- liness of social media and collective in- telligence for business and information systems engineering, pinpoint needed functionality of information systems for wicked problems, describe related re- search challenges, highlight prospec- tive suitable methods to tackle those challenges, and review examples of initial results

    Linking geographic vocabularies through WordNet

    Get PDF
    The linked open data (LOD) paradigm has emerged as a promising approach to structuring and sharing geospatial information. One of the major obstacles to this vision lies in the difficulties found in the automatic integration between heterogeneous vocabularies and ontologies that provides the semantic backbone of the growing constellation of open geo-knowledge bases. In this article, we show how to utilize WordNet as a semantic hub to increase the integration of LOD. With this purpose in mind, we devise Voc2WordNet, an unsupervised mapping technique between a given vocabulary and WordNet, combining intensional and extensional aspects of the geographic terms. Voc2WordNet is evaluated against a sample of human-generated alignments with the OpenStreetMap (OSM) Semantic Network, a crowdsourced geospatial resource, and the GeoNames ontology, the vocabulary of a large digital gazetteer. These empirical results indicate that the approach can obtain high precision and recall

    Algorithms for Recollection of Search Terms Based on the Wikipedia Category Structure

    Get PDF
    The common user interface for a search engine consists of a text field where the user can enter queries consisting of one or more keywords. Keyword query based search engines work well when the users have a clear vision what they are looking for and are capable of articulating their query using the same terms as indexed. For our multimedia database containing 202,868 items with text descriptions, we supplement such a search engine with a category-based interface whose category structure is tailored to the content of the database. This facilitates browsing and offers the users the possibility to look for named entities, even if they forgot their names. We demonstrate that this approach allows users who fail to recollect the name of named entities to retrieve data with little effort. In all our experiments, it takes 1 query on a category and on average 2.49 clicks, compared to 5.68 queries on the database’s traditional text search engine for a 68.3% success probability or 6.01 queries when the user also turns to Google, for a 97.1% success probability
    corecore