3,236 research outputs found
Retrieving with good sense
Although always present in text, word sense ambiguity only recently became regarded as a problem to information
retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in
disambiguation research. This paper first outlines this research and surveys the resulting efforts in information
retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt
from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval
The SIMBAD astronomical database
Simbad is the reference database for identification and bibliography of
astronomical objects. It contains identifications, `basic data', bibliography,
and selected observational measurements for several million astronomical
objects. Simbad is developed and maintained by CDS, Strasbourg. Building the
database contents is achieved with the help of several contributing institutes.
Scanning the bibliography is the result of the collaboration of CDS with
bibliographers in Observatoire de Paris (DASGAL), Institut d'Astrophysique de
Paris, and Observatoire de Bordeaux. When selecting catalogues and tables for
inclusion, priority is given to optimal multi-wavelength coverage of the
database, and to support of research developments linked to large projects. In
parallel, the systematic scanning of the bibliography reflects the diversity
and general trends of astronomical research.
A WWW interface to Simbad is available at: http://simbad.u-strasbg.fr/SimbadComment: 14 pages, 5 Postscript figures; to be published in A&A
Virtual WWW Documents: a Concept to Explicit the Structure of WWW Sites
http://www.emse.fr/~beigbeder/PUBLIS/1999-BCS-IRSG-p185-doan-v1.pdfInternational audienceThis paper shows a new concept of a virtual WWW document (VWD), as a set of WWW pages representing a logical information space, generally dealing with one particular domain. The VWD is described using metadata in the XML syntax and will be accessed through a metadata.class file, stored at the root level of WWW sites. We'll suggest how the VWD can improve information retrieval on the WWW and reduce the network load generated by the robots. We describe a prototype implemented in JAVA, within an application in the environmental domain. The exchanges of such metadata lay in a flexible architecture based on two kinds of robots : generalists and specialists that collect and organize this metadata, in order to localize the resources on the WWW. They will contribute to the overall auto-organizing information process by exchanging their indices, therefore forwarding their knowledge each other
Recommended from our members
Classification design : understanding the decisions between theory and consequence
Classification systems are systems of terms and term relationships intended to sort and gather like concepts and documents. These systems are ubiquitous as the substrate of our interactions with library collections, retail websites, and bureaucracies. Through their design and impact, classification systems share with other technologies an unavoidable though often ignored relationship to politics, power, and authority (Fleischmann & Wallace, 2007). Despite concern among scholars that classification systems embody values and bias, there is little work examining how these qualities are built into a classification system. Specifically, we do not adequately understand classification construction, in which classification designers make decisions by applying classification theory to the specific context of a project (Park, 2008). If systems embody values— particularly values that might either cause harm (Berman, 1971) or provide an additional means of communicating the creator’s position (Feinberg, 2007)— we must understand how and when the system takes on these qualities. This dissertation bridges critical classification theory with design-oriented classification theory. Where critical classification theory is concerned with the outcomes of classification system design, design-oriented classification theory is concerned with the correct processes by which to build a classification system. To connect the consequences of classification system design to designers’ methods and intentions, I use the research lens of infrastructure studies, particularly infrastructural inversion (Star & Ruhleder, 1996) or making visible the work behind infrastructures such as classification systems. Accordingly, my research focuses on designers’ decisions and rethinks our assumptions regarding the factors that classification designers consider in making their design decisions. I adopted an ethnographic approach to the study of classification design that would make visible design decisions and designers’ consideration of factors. Using this approach, I studied the daily design work of volunteer classification designers who maintain a curated folksonomy. Using the grounded theory method (Strauss & Corbin, 1998), I analyzed the designers’ decisions. My analysis identified the implications of the designers’ convergences and divergences from established classification methods for the character of the system and for the connection between classification theory and classification methods. I show how the factors—and the prioritization of factors—that these designers considered in making their decisions were consistent with the values and needs of the community. Therefore, I argue that classification designers have an important role in creating the values or bias of a classification system. In particular, designers’ divergence from universal guidelines and designers’ choices among sources of evidence represent opportunities to align a classification system to its community. I recommend that classification research focus on such instances of divergence and choice to understand the connection between classification design and the values of classification systems. The Introduction motivates the problem space around values in classification systems and outlines my approach in focusing on classification design. The Literature Review outlines the dominant theories in classification scholarship according to three elements of classification design: what decisions designers make, what information designers use in their decisions, and what skills designers apply to their decisions. In the Methods chapter, I introduce the site of my ethnographic research (The Fanwork Repository), detail my ethnographic methods, summarize the types of data I collected, and describe my grounded analysis. Three findings chapters examine one type of complex decision each: Names, Works, and Guidelines, respectively. In the fourth findings chapter, Synthesis, I define 10 factors designers considered across these complex design decisions. I then discuss how the factors figured into complex design decisions, how the factors overlapped and conflicted in design decisions, and how designers understood their role in making complex design decisions. In the Discussion chapter I connect the findings from the site of my ethnography to classification scholarship. In the Conclusion, I consider the contribution of examining classification systems as infrastructure, highlight the differences in accounts of classification design decisions made visible through classification theory and infrastructure studies approaches, and present suggestions for future research in classification design and the study of classification systems as infrastructure.Informatio
Personalization of tagging systems
Social media systems have encouraged end user participation in the Internet, for the purpose of storing and distributing Internet content, sharing opinions and maintaining relationships. Collaborative tagging allows users to annotate the resulting user-generated content, and enables effective retrieval of otherwise uncategorised data. However, compared to professional web content production, collaborative tagging systems face the challenge that end-users assign tags in an uncontrolled manner, resulting in unsystematic and inconsistent metadata.
This paper introduces a framework for the personalization of social media systems. We pinpoint three tasks that would benefit from personalization: collaborative tagging, collaborative browsing and collaborative s
Guided generation of pedagogical concept maps from the Wikipedia
We propose a new method for guided generation of concept maps from open accessonline knowledge resources such as Wikies. Based on this method we have implemented aprototype extracting semantic relations from sentences surrounding hyperlinks in the Wikipedia’sarticles and letting a learner to create customized learning objects in real-time based oncollaborative recommendations considering her earlier knowledge. Open source modules enablepedagogically motivated exploration in Wiki spaces, corresponding to an intelligent tutoringsystem. The method extracted compact noun–verb–noun phrases, suggested for labeling arcsbetween nodes that were labeled with article titles. On average, 80 percent of these phrases wereuseful while their length was only 20 percent of the length of the original sentences. Experimentsindicate that even simple analysis algorithms can well support user-initiated information retrievaland building intuitive learning objects that follow the learner’s needs.Peer reviewe
Applying Wikipedia to Interactive Information Retrieval
There are many opportunities to improve the interactivity of information retrieval systems beyond the ubiquitous search box. One idea is to use knowledge bases—e.g. controlled vocabularies, classification schemes, thesauri and ontologies—to organize, describe and navigate the information space. These resources are popular in libraries and specialist collections, but have proven too expensive and narrow to be applied to everyday webscale search. Wikipedia has the potential to bring structured knowledge into more widespread use. This online, collaboratively generated encyclopaedia is one of the largest and most consulted reference works in existence. It is broader, deeper and more agile than the knowledge bases put forward to assist retrieval in the past. Rendering this resource machine-readable is a challenging task that has captured the interest of many researchers. Many see it as a key step required to break the knowledge acquisition bottleneck that crippled previous efforts. This thesis claims that the roadblock can be sidestepped: Wikipedia can be applied effectively to open-domain information retrieval with minimal natural language processing or information extraction. The key is to focus on gathering and applying human-readable rather than machine-readable knowledge. To demonstrate this claim, the thesis tackles three separate problems: extracting knowledge from Wikipedia; connecting it to textual documents; and applying it to the retrieval process. First, we demonstrate that a large thesaurus-like structure can be obtained directly from Wikipedia, and that accurate measures of semantic relatedness can be efficiently mined from it. Second, we show that Wikipedia provides the necessary features and training data for existing data mining techniques to accurately detect and disambiguate topics when they are mentioned in plain text. Third, we provide two systems and user studies that demonstrate the utility of the Wikipedia-derived knowledge base for interactive information retrieval
- …