Search CORE

20,893 research outputs found

Finding Support Documents with a Logistic Regression Approach

Author: He Daqing
Li Qi
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 28/07/2011
Field of study

Entity retrieval finds the relevant results for a user’s information needs at a finer unit called “entity”. To retrieve such entity, people usually first locate a small set of support documents which contain answer entities, and then further detect the answer entities in this set. In the literature, people view the support documents as relevant documents, and their findings as a conventional document retrieval problem. In this paper, we will state that finding support documents and that of relevant documents, although sounds similar, have important differences. Further, we propose a logistic regression approach to find support documents. Our experiment results show that the logistic regression method performs significantly better than a baseline system that treat the support document finding as a conventional document retrieval problem

D-Scholarship@Pitt

Relation Discovery from Web Data for Competency Management

Author: Eisenstadt M.
Goncalves A
Motta E.
Pacheco R
Song D.
Uren V.
Zhu J.L.
Publication venue
Publication date: 01/12/2007
Field of study

This paper describes a technique for automatically discovering associations between people and expertise from an analysis of very large data sources (including web pages, blogs and emails), using a family of algorithms that perform accurate named-entity recognition, assign different weights to terms according to an analysis of document structure, and access distances between terms in a document. My contribution is to add a social networking approach called BuddyFinder which relies on associations within a large enterprise-wide "buddy list" to help delimit the search space and also to provide a form of 'social triangulation' whereby the system can discover documents from your colleagues that contain pertinent information about you. This work has been influential in the information retrieval community generally, as it is the basis of a landmark system that achieved overall first place in every category in the Enterprise Search Track of TREC2006

Open Access Institutional Repository at Robert Gordon University

Open Research Online (The Open University)

DeltaPhish: Detecting Phishing Webpages in Compromised Websites

Author: AY Fu
B Biggio
C Cortes
C Ludl
DH Wolpert
G Chechik
G Xiang
G Xiang
J Hong
KT Chen
L Wenyin
M Khonji
MJ Swain
PF Felzenszwalb
RB Basnet
S Marchal
TC Chen
TC Chen
Publication venue
Publication date: 01/01/2017
Field of study

The large-scale deployment of modern phishing attacks relies on the automatic exploitation of vulnerable websites in the wild, to maximize profit while hindering attack traceability, detection and blacklisting. To the best of our knowledge, this is the first work that specifically leverages this adversarial behavior for detection purposes. We show that phishing webpages can be accurately detected by highlighting HTML code and visual differences with respect to other (legitimate) pages hosted within a compromised website. Our system, named DeltaPhish, can be installed as part of a web application firewall, to detect the presence of anomalous content on a website after compromise, and eventually prevent access to it. DeltaPhish is also robust against adversarial attempts in which the HTML code of the phishing page is carefully manipulated to evade detection. We empirically evaluate it on more than 5,500 webpages collected in the wild from compromised websites, showing that it is capable of detecting more than 99% of phishing webpages, while only misclassifying less than 1% of legitimate pages. We further show that the detection rate remains higher than 70% even under very sophisticated attacks carefully designed to evade our system.Comment: Preprint version of the work accepted at ESORICS 201

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

Identifying e-Commerce in Enterprises by means of Text Mining and Classification Algorithms

Author: Bianchi Gianpiero
Bruni Renato
Scalfati Francesco
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2018
Field of study

Monitoring specific features of the enterprises, for example, the adoption of e-commerce, is an important and basic task for several economic activities. This type of information is usually obtained by means of surveys, which are costly due to the amount of personnel involved in the task. An automatic detection of this information would allow consistent savings. This can actually be performed by relying on computer engineering, since in general this information is publicly available on-line through the corporate websites. This work describes how to convert the detection of e-commerce into a supervised classification problem, where each record is obtained from the automatic analysis of one corporate website, and the class is the presence or the absence of e-commerce facilities. The automatic generation of similar data records requires the use of several Text Mining phases; in particular we compare six strategies based on the selection of best words and best n-grams. After this, we classify the obtained dataset by means of four classification algorithms: Support Vector Machines; Random Forest; Statistical and Logical Analysis of Data; Logistic Classifier. This turns out to be a difficult case of classification problem. However, after a careful design and set-up of the whole procedure, the results on a practical case of Italian enterprises are encouraging

Directory of Open Access Journals

Archivio della ricerca- Università di Roma La Sapienza

Mining web data for competency management

Author: Goncalves A.L.
Motta E.
Pacheco R.
Uren V.S.
Zhu J.
Publication venue: IEEE Computer Society
Publication date: 01/01/2005
Field of study

We present CORDER (COmmunity Relation Discovery by named Entity Recognition) an un-supervised machine learning algorithm that exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments

CiteSeerX

Open Research Online (The Open University)

Recommended from our members

Knowledge Cartography: Software tools and mapping techniques

Author
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 15/09/2008
Field of study

Knowledge Cartography is the discipline of mapping intellectual landscapes.The focus of this book is on the process by which manually crafting interactive, hypertextual maps clarifies one’s own understanding, as well as communicating it.The authors see mapping software as a set of visual tools for reading and writing in a networked age. In an information ocean, the primary challenge is to find meaningful patterns around which we can weave plausible narratives. Maps of concepts, discussions and arguments make the connections between ideas tangible and disputable. With 17 chapters from the leading researchers and practitioners, the reader will find the current state–of-the-art in the field. Part 1 focuses on educational applications in schools and universities, before Part 2 turns to applications in professional communitie

Open Research Online (The Open University)

Automatically assembling a full census of an academic field

Author: Clauset Aaron
Morgan Allison C.
Way Samuel F.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2018
Field of study

The composition of the scientific workforce shapes the direction of scientific research, directly through the selection of questions to investigate, and indirectly through its influence on the training of future scientists. In most fields, however, complete census information is difficult to obtain, complicating efforts to study workforce dynamics and the effects of policy. This is particularly true in computer science, which lacks a single, all-encompassing directory or professional organization. A full census of computer science would serve many purposes, not the least of which is a better understanding of the trends and causes of unequal representation in computing. Previous academic census efforts have relied on narrow or biased samples, or on professional society membership rolls. A full census can be constructed directly from online departmental faculty directories, but doing so by hand is prohibitively expensive and time-consuming. Here, we introduce a topical web crawler for automating the collection of faculty information from web-based department rosters, and demonstrate the resulting system on the 205 PhD-granting computer science departments in the U.S. and Canada. This method constructs a complete census of the field within a few minutes, and achieves over 99% precision and recall. We conclude by comparing the resulting 2017 census to a hand-curated 2011 census to quantify turnover and retention in computer science, in general and for female faculty in particular, demonstrating the types of analysis made possible by automated census construction.Comment: 11 pages, 6 figures, 2 table

arXiv.org e-Print Archive

CU Scholar Institutional Repository

Directory of Open Access Journals

FigShare