91 research outputs found
Context-aware person identification in personal photo collections
Identifying the people in photos is an important need for users of photo management systems. We present MediAssist, one such system which facilitates browsing, searching and semi-automatic annotation of personal photos, using analysis of both image content and the context in which the photo is captured. This semi-automatic annotation includes annotation of the identity of people in photos. In this paper, we focus on such person annotation, and propose person identiïŹcation techniques based on a combination of context and content. We propose language modelling and nearest neighbor approaches to context-based person identiïŹcation, in addition to novel face color and image color content-based features (used alongside face recognition and body patch features). We conduct a comprehensive empirical study of these techniques using the real private photo collections of a number of users, and show that combining context- and content-based analysis improves performance over content or context alone
Coping with noise in a real-world weblog crawler and retrieval system
In this paper we examine the effects of noise when creating a real-world weblog corpus for information retrieval. We focus on the DiffPost (Lee et al. 2008) approach to noise removal from blog pages, examining the difficulties encountered when crawling the blogosphere during the creation of a real-world corpus of blog pages. We introduce and evaluate a number of enhancements to the original DiffPost approach in order to increase the robustness of the algorithm. We then extend DiffPost by looking at the anchor-text to text ratio, and dis- cover that the time-interval between crawls is more impor- tant to the successful application of noise-removal algorithms within the blog context, than any additional improvements to the removal algorithm itself
Efficient storage and decoding of SURF feature points
Practical use of SURF feature points in large-scale indexing and retrieval engines requires an efficient means for storing and decoding these features. This paper investigates several methods for compression and storage of SURF feature points, considering both storage consumption and disk-read efficiency. We compare each scheme with a baseline plain-text encoding scheme as used by many existing SURF implementations. Our final proposed scheme significantly reduces both the time required to load and decode feature points, and the space required to store them on disk
Combination of content analysis and context features for digital photograph retrieval.
In recent years digital cameras have seen an enormous rise
in popularity, leading to a huge increase in the quantity of
digital photos being taken. This brings with it the challenge of organising these large collections. The MediAssist project uses date/time and GPS location for the
organisation of personal collections. However, this context
information is not always sufficient to support retrieval
when faced with a large, shared, archive made up of
photos from a number of users. We present work in this
paper which retrieves photos of known objects (buildings,
monuments) using both location information and content-based
retrieval tools from the AceToolbox. We show that
for this retrieval scenario, where a user is searching for
photos of a known building or monument in a large shared
collection, content-based techniques can offer a significant
improvement over ranking based on context (specifically
location) alone
An investigation of term weighting approaches for microblog retrieval
The use of effective term frequency weighting and document length normalisation strategies have been shown over a number of decades to have a significant positive effect for document retrieval. When dealing with much shorter documents, such as those obtained from microblogs, it would seem intuitive that these would have less benefit. In this paper we investigate their effect on microblog retrieval performance using the Tweets2011 collection from the TREC 2011 Microblog Track
Combining social network analysis and sentiment analysis to explore the potential for online radicalisation
The increased online presence of jihadists has raised the possibility of individuals being radicalised via the Internet. To date, the study of violent radicalisation has focused on dedicated jihadist websites and forums. This may not be the ideal starting point for such research, as participants in these venues may be described as âalready madeup mindsâ. Crawling a global social networking platform, such as YouTube, on the other hand, has the potential to unearth content and interaction aimed at radicalisation of those with little or no apparent prior interest in violent jihadism. This research explores whether such an approach is indeed fruitful. We collected a large dataset from a group within YouTube that we identified as potentially having a radicalising agenda. We analysed this data using social network analysis and sentiment analysis tools, examining the topics discussed and what the sentiment polarity (positive or negative) is towards these topics. In particular, we focus on gender differences in this group of users, suggesting most extreme and less tolerant views among female users
A generic news story segmentation system and its evaluation
The paper presents an approach to segmenting broadcast TV news programmes automatically into individual news stories. We first segment the programme into individual shots, and then a number of analysis tools are run on the programme to extract features to represent each shot. The results of these feature extraction tools are then combined using a support vector machine trained to detect anchorperson shots. A news broadcast can then be segmented into individual stories based on the location of the anchorperson shots within the programme. We use one generic system to segment programmes from two different broadcasters, illustrating the robustness of our feature extraction process to the production styles of different broadcasters
"Iâm Eating a Sandwich in Glasgow": Modeling locations with tweets
Social media such as Twitter generate large quantities of data about what a person is thinking and doing in a partic- ular location. We leverage this data to build models of locations to improve our understanding of a userâs geographic context. Understanding the userâs geographic context can in turn enable a variety of services that allow us to present information, recommend businesses and services, and place advertisements that are relevant at a hyper-local level.
In this paper we create language models of locations using coordinates extracted from geotagged Twitter data. We model locations at varying levels of granularity, from the zip code to the country level. We measure the accuracy of these models by the degree to which we can predict the location of an individual tweet, and further by the accuracy with which we can predict the location of a user. We find that we can meet the performance of the industry standard tool for pre- dicting both the tweet and the user at the country, state and city levels, and far exceed its performance at the hyper-local level, achieving a three- to ten-fold increase in accuracy at the zip code level
A study of the imaging of contrast agents for use in computerised tomography
A computed tomography (CT) scanner is a device which
is capable of mapping the variation in linear attenuation
coefficient in a slice through an object. This is achieved
by the multiple measurement of the attenuation of an X-ray
beam at various positions and angles through the body. In
medical diagnostic imaging using CT, contrast agents are
administered to patients resulting in increased
attenuation of the beam in the areas where the contrast
agent resides. The increased contrast results in the
easier and more accurate visualisation of abnormalities.
In contrast-enhanced CT, iodine is almost universally
used as the contrast agent when imaging the heart and
associated arteries / veins. This is due to its low
toxicity and high enhancement. It has been used
extensively in traditional diagnostic radiology prior to
the introduction of CT. A study was performed to determine
whether iodine was the optimum element, in terms of the
minimum concentration needed for visualisation, to use in
contrast-enhanced CT scanning of the myocardium / heart
wall. The results of this study show that gadolinium, and
not iodine, is the optimum element to use as a CT contrast
agent. Gadolinium, chelated to DTPA, is presently used as
a contrast agent in MRI.
The above study concentrated only on the particular
case of imaging the myocardium. A theoretical study was
undertaken to determine the minimum concentration of any
element when scanned using two different imaging methods.
The situation studied was that of administering the
contrast agent / analyte to a cylinder, which is itself
contained inside another cylinder, the space between
filled with some matrix. By varying the size of the inner
cylinder, administration of a contrast agent to various
organs or arteries can be simulated. By varying the size
of the outer cylinder, various object / patient sizes can
be studied.
In the first imaging method, two scans are performed
at any energy, one with and one without the analyte
present. These scans are subtracted to yield an image of
the analyte alone. In the second method two scans are
performed; one on the high side and one on the low side of
the K absorption edge of the analyte. Again these are
subtracted to yield an image of the analyte since the
variation in the attenuation of the matrix across the
K-edge is minor compared to that of the analyte.
The equations were verified by both computer
simulations and experimental scans. Two important results
were obtained. As the relative size of the inner cylinder
decreases, firstly the optimum element shifts towards
higher atomic number transition elements and secondly, the
ratio of the minimum concentration of the optimum
elements to the minimum concentration of iodine needed
decreases making the case for using the transition
elements as contrast agents stronger when imaging low
relative size objects
- âŠ