Search CORE

493 research outputs found

The history of information retrieval research

Author: Croft W
Sanderson M
Publication venue: IEEE (United States)
Publication date: 01/01/2012
Field of study

This paper describes a brief history of the research and development of information retrieval systems starting with the creation of electromechanical searching devices, through to the early adoption of computers to search for items that are relevant to a user's query. The advances achieved by information retrieval researchers from the 1950s through to the present day are detailed next, focusing on the process of locating relevant information. The paper closes with speculation on where the future of information retrieval lies

RMIT Research Repository

Axiomatic thinking for information retrieval: introduction to special issue

Author
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 18/06/2020
Field of study

4siopenopenAmigo E.; Fang H.; Mizzaro S.; Zhai C.Amigo, E.; Fang, H.; Mizzaro, S.; Zhai, C

Archivio istituzionale della ricerca - Università degli Studi di Udine

Recommended from our members

"The dearest of our possessions": applying Floridi's information privacy concept in models of information behavior and information literacy

Author: Bawden D.
Busch A.
Daniels N.
DeCew J.
Doty P.
Fisher K. E.
Fisher K. E.
Floridi L.
Ford N.
Hoboken J.
Koops B.
Mackey T. P.
Mai J.‐E.
Maletzke G.
Moore A. D.
Nissenbaum H.
Nissenbaum H.
Rawls J.
Reynolds P. D.
Robinson L.
Secker J.
Solove D. J.
Tuominen K.
Westin A. F.
Wilson T. D.
Woolf V.
Publication venue: 'Wiley'
Publication date: 01/09/2020
Field of study

This conceptual paper argues for the value of an approach to privacy in the digital information environment informed by Luciano Floridi's philosophy of information and information ethics. This approach involves achieving informational privacy, through the features of anonymity and obscurity, through an optimal balance of ontological frictions. This approach may be used to modify models for information behavior and for information literacy, giving them a fuller and more effective coverage of privacy issues in the infosphere. For information behavior, the Information Seeking and Communication Model, and the Information Grounds conception, are most appropriate for this purpose. For information literacy, the metaliteracy model, using a modification a privacy literacy framework, is most suitable

City Research Online

Crossref

Outlier Edge Detection Using Random Graph Generation Models and Applications

Author: A Lancichinetti
AK Jain
DJ Watts
G Karypis
H Zhang
J Leskovec
J Shi
J Yang
L Akoglu
L Danon
L Danon
L Liu
L Lu
L Waltman
LC Freeman
M Choudhury De
M Coscia
M Newman
M Rosvall
ME Newman
ME Newman
MEJ Newman
MR Brito
R Yu
S Fortunato
S Lloyd
S Papadopoulos
SE Schaeffer
VD Blondel
VJ Hodge
X Dong
Publication venue
Publication date: 21/06/2016
Field of study

Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose outlier edge detection algorithms using two random graph generation models. We found that the edge-ego-network, which can be defined as the induced graph that contains two end nodes of an edge, their neighboring nodes and the edges that link these nodes, contains critical information to detect outlier edges. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. Further more, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: 1) a preprocessing tool that improves the performance of graph clustering algorithms; 2) an outlier node detection algorithm; and 3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques.Comment: 14 pages, 5 figures, journal pape

arXiv.org e-Print Archive

Qatar University Institutional Repository

Crossref

Directory of Open Access Journals

Trepo - Institutional Repository of Tampere University

Semantic Web: Who is who in the field – A bibliometric analysis

Author: Ding Ying
Publication venue: 'SAGE Publications'
Publication date: 01/06/2010
Field of study

The Semantic Web (SW) is one of the main efforts aiming to enhance human and machine interaction by representing data in an understandable way for machines to mediate data and services. It is a fast-moving and multidisciplinary field. This study conducts a thorough bibliometric analysis of the field by collecting data from Web of Science (WOS) and Scopus for the period of 1960-2009. It utilizes a total of 44,157 papers with 651,673 citations from Scopus, and 22,951 papers with 571,911 citations from WOS. Based on these papers and citations, it evaluates the research performance of the SW by identifying the most productive players, major scholarly communication media, highly cited authors, influential papers and emerging stars

IUScholarWorks (University of Indiana)

A pilot study in an application of text mining to learning system evaluation

Author: Katerattanakul Nitsawan
Publication venue: Scholars\u27 Mine
Publication date: 01/01/2010
Field of study

Text mining concerns discovering and extracting knowledge from unstructured data. It transforms textual data into a usable, intelligible format that facilitates classifying documents, finding explicit relationships or associations between documents, and clustering documents into categories. Given a collection of survey comments evaluating the civil engineering learning system, text mining technique is applied to discover and extract knowledge from the comments. This research focuses on the study of a systematic way to apply a software tool, SAS Enterprise Miner, to the survey data. The purpose is to categorize the comments into different groups in an attempt to identify major concerns from the users or students. Each group will be associated with a set of key terms. This is able to assist the evaluators of the learning system to obtain the ideas from those summarized terms without the need of going through a potentially huge amount of data --Abstract, page iii

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Template Mining for Information Extraction from Digital Documents

Author: Chowdhury Gobinda G.
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1999
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository

Technology classification with latent semantic indexing

Author: Thorleuchter Dirk
Van den Poel Dirk
Publication venue: 'Elsevier BV'
Publication date: 01/01/2013
Field of study

Many national and international governments establish organizations for applied science research funding. For this, several organizations have defined procedures for identifying relevant projects that based on prioritized technologies. Even for applied science research projects, which combine several technologies it is difficult to identify all corresponding technologies of all research-funding organizations. In this paper, we present an approach to support researchers and to support research-funding planners by classifying applied science research projects according to corresponding technologies of research-funding organizations. In contrast to related work, this problem is solved by considering results from literature concerning the application based technological relationships and by creating a new approach that is based on latent semantic indexing (LSI) as semantic text classification algorithm. Technologies that occur together in the process of creating an application are grouped in classes, semantic textual patterns are identified as representative for each class, and projects are assigned to one of these classes. This enables the assignment of each project to all technologies semantically grouped by use of LSI. This approach is evaluated using the example of defense and security based technological research. This is because the growing importance of this application field leads to an increasing number of research projects and to the appearance of many new technologies

Ghent University Academic Bibliography

Fraunhofer-ePrints

A Novel and Domain-Specific Document Clustering and Topic Aggregation Toolset for a News Organisation

Author: McMahon Claire
Publication venue: Dublin Institute of Technology
Publication date: 30/09/2015
Field of study

Large collections of documents are becoming increasingly common in the news gathering industry. A review of the literature shows there is a growing interest in datadriven journalism and specifically that the journalism profession needs better tools to understand and develop actionable knowledge from large document sets. On a daily basis, journalists are tasked with searching a diverse range of document sets including news gathering services, emails, freedom of information requests, court records, government reports, press releases and many other types of generally unstructured documents. Document clustering techniques can help address problems of understanding the ever expanding quantities of documents available to journalists by finding patterns within documents. These patterns can be used to develop useful and actionable knowledge which can contribute to journalism. News articles in particular are fertile ground for document clustering principles. Term weighting schemes assign importance to terms within a document and are central to the study of document clustering methods. This study contributes a review of the dominant and most commonly used term frequency weighting functions put forward in research, establishes the merits and limitations of each approach, and proposes modifications to develop a news-centric document clustering and topic aggregation approach. Experimentation was conducted on a large unstructured collection of newspaper articles from the Irish Times to establish if the newly proposed news-centric term weighting and document similarity approach improves document clustering accuracy and topic aggregation capabilities for news articles when compared to the traditional term weighting approach. Whilst the experimentation shows that that the developed approach is promising when compared to the manual document clustering effort undertaken by the three journalist expert users, it also highlights the challenges of natural language processing and document clustering methods in general. The results may suggest that a blended approach of complimenting automated methods with human-level supervision and guidance may yield the best results

Arrow@TUDublin