1,044 research outputs found
Implementation of a knowledge discovery and enhancement module from structured information gained from unstructured sources of information
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
Recommended from our members
Analysis of spatio-social relations in a photographic archive (Flickr)
This thesis aims to study and analyse the complex spatio-social relations among social entities who interact together in a spatially structured social group. This aim is approached in three steps:
1. Collecting and classifying spatio-social data,
2. Disambiguating place names that people use to refer to their homes and
3. Analysis of data of this kind (numerical and visual).
The source of spatio-social data used in this work is Flickr. Flickr is a yahoo photo sharing site. Users have a social network of friends and a collection of photos on their profiles. According to available statistics1 the Flickr database contains more than three billion photos, out of which a hundred million are geo-tagged. In retrieving data from Flickr database two different samples have been explored. Initially a random collection of photos that have been uploaded in Flickr during the examined periods has been collected on a daily basis. This is followed by much narrower and more precise criteria for the second data sampling that resulted in Flickr sample GB data.
The thesis concludes that location dominates a significant pattern in online behavior of social entities who interact together via internet. The core contributions of this thesis are in the areas of:
1. Extracting indicative sample from very large data sets,
2. Disambiguation of place names that people use in their natural language to refer to their home locations and
3. Proposing potential new insights into behaviors of social entities with spatio-social relations.
Overall, the popularity of social networking sites and availability of data that can be obtained from the web (whether people provide voluntarily or can be retrieve as a consequence of online interactions) are likely to continue the increasing trend in future. In addition, the realm of spatio-social data analysis and its visualization also continue to expand, as do the types of maps that are achievable, the visualization packages that the maps can be built with, the number of map users and improved gazetteers with more comprehensive coverage of vague terms. Therefore, the developed methods, algorithm and applications in this study can be beneficial to researchers in social and e-social sciences, those who are interested in developing and maintaining social networking sites, geographers who work on disambiguation of fuzzy vernacular geographic terms, visualization and spatial data analysts in general and those who are looking for development and accommodation of better business strategies (i.e. localization and personalization).
1 (http://www.Flickr.com, retrieved 20/07/09
Adaptive Semantic Annotation of Entity and Concept Mentions in Text
The recent years have seen an increase in interest for knowledge repositories that are useful across applications, in contrast to the creation of ad hoc or application-specific databases.
These knowledge repositories figure as a central provider of unambiguous identifiers and semantic relationships between entities. As such, these shared entity descriptions serve as a common vocabulary to exchange and organize information in different formats and for different purposes. Therefore, there has been remarkable interest in systems that are able to automatically tag textual documents with identifiers from shared knowledge repositories so that the content in those documents is described in a vocabulary that is unambiguously understood across applications.
Tagging textual documents according to these knowledge bases is a challenging task. It involves recognizing the entities and concepts that have been mentioned in a particular passage and attempting to resolve eventual ambiguity of language in order to choose one of many possible meanings for a phrase. There has been substantial work on recognizing and disambiguating entities for specialized applications, or constrained to limited entity types and particular types of text. In the context of shared knowledge bases, since each application has potentially very different needs, systems must have unprecedented breadth and flexibility to ensure their usefulness across applications. Documents may exhibit different language and discourse characteristics, discuss very diverse topics, or require the focus on parts of the knowledge repository that are inherently harder to disambiguate. In practice, for developers looking for a system to support their use case, is often unclear if an existing solution is applicable, leading those developers to trial-and-error and ad hoc usage of multiple systems in an attempt to achieve their objective.
In this dissertation, I propose a conceptual model that unifies related techniques in this space under a common multi-dimensional framework that enables the elucidation of strengths and limitations of each technique, supporting developers in their search for a suitable tool for their needs. Moreover, the model serves as the basis for the development of flexible systems that have the ability of supporting document tagging for different use cases. I describe such an implementation, DBpedia Spotlight, along with extensions that we performed to the knowledge base DBpedia to support this implementation. I report evaluations of this tool on several well known data sets, and demonstrate applications to diverse use cases for further validation
CHORUS Deliverable 2.2: Second report - identification of multi-disciplinary key issues for gap analysis toward EU multimedia search engines roadmap
After addressing the state-of-the-art during the first year of Chorus and establishing the existing landscape in
multimedia search engines, we have identified and analyzed gaps within European research effort during our second year.
In this period we focused on three directions, notably technological issues, user-centred issues and use-cases and socio-
economic and legal aspects. These were assessed by two central studies: firstly, a concerted vision of functional breakdown
of generic multimedia search engine, and secondly, a representative use-cases descriptions with the related discussion on
requirement for technological challenges. Both studies have been carried out in cooperation and consultation with the
community at large through EC concertation meetings (multimedia search engines cluster), several meetings with our
Think-Tank, presentations in international conferences, and surveys addressed to EU projects coordinators as well as
National initiatives coordinators. Based on the obtained feedback we identified two types of gaps, namely core
technological gaps that involve research challenges, and “enablers”, which are not necessarily technical research
challenges, but have impact on innovation progress. New socio-economic trends are presented as well as emerging legal
challenges
Applying Wikipedia to Interactive Information Retrieval
There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval
- …