12 research outputs found
Adding Context to Automated Text Input Error Analysis with Reference to Understanding How Children Make Typing Errors
Despite the enormous body of literature studying the typing errors of adults, children's typing errors remain an understudied area. It is well known in the field of Child-Computer Interaction that children are not 'little adults'. This means findings regarding how adults make typing mistakes cannot simply be transferred into how children make typing errors, without first understanding the differences.
To understand how children differ from adults in the way they make typing mistakes, typing data were gathered from both children and adults. It was important that the data collected from the contrasting participant groups were comparable. Various methods of collecting typing data from adults were reviewed for suitability with children. Several issues were identified that could create a bias towards the adults. To resolve these issues, new tools and methods were designed, such as a new phrase set, a new data collector and new computer experience questionnaires.
Additionally, there was a lack of an analysis method of typing data suitable for use with both children and adults. A new categorisation method was defined based on typing errors made by both children and adults. This categorisation method was then adapted into a Java program, which dramatically reduced the time required to carry out typing categorisation.
Finally, in a large study, typing data collected from 231 primary school children, aged between 7 and 10 years, and 229 undergraduate computing students were analysed. Grouping the typing errors according to the context in which they occurred allowed for a much more detailed analysis than was possible with error rates. The analysis showed children have a set of errors they made frequently that adults rarely made. These errors that are specific to children suggest that differences exist between the ways the two groups make typing errors. This finding means that children's typing errors should be studied in their own right
A cortical model of object perception based on Bayesian networks and belief propagation.
Evidence suggests that high-level feedback plays an important role in visual perception by shaping
the response in lower cortical levels (Sillito et al. 2006, Angelucci and Bullier 2003, Bullier
2001, Harrison et al. 2007). A notable example of this is reflected by the retinotopic activation
of V1 and V2 neurons in response to illusory contours, such as Kanizsa figures, which has been
reported in numerous studies (Maertens et al. 2008, Seghier and Vuilleumier 2006, Halgren et al.
2003, Lee 2003, Lee and Nguyen 2001). The illusory contour activity emerges first in lateral
occipital cortex (LOC), then in V2 and finally in V1, strongly suggesting that the response is
driven by feedback connections. Generative models and Bayesian belief propagation have been
suggested to provide a theoretical framework that can account for feedback connectivity, explain
psychophysical and physiological results, and map well onto the hierarchical distributed
cortical connectivity (Friston and Kiebel 2009, Dayan et al. 1995, Knill and Richards 1996,
Geisler and Kersten 2002, Yuille and Kersten 2006, Deneve 2008a, George and Hawkins 2009,
Lee and Mumford 2003, Rao 2006, Litvak and Ullman 2009, Steimer et al. 2009).
The present study explores the role of feedback in object perception, taking as a starting point
the HMAX model, a biologically inspired hierarchical model of object recognition (Riesenhuber
and Poggio 1999, Serre et al. 2007b), and extending it to include feedback connectivity.
A Bayesian network that captures the structure and properties of the HMAX model is
developed, replacing the classical deterministic view with a probabilistic interpretation. The
proposed model approximates the selectivity and invariance operations of the HMAX model
using the belief propagation algorithm. Hence, the model not only achieves successful feedforward
recognition invariant to position and size, but is also able to reproduce modulatory effects
of higher-level feedback, such as illusory contour completion, attention and mental imagery.
Overall, the model provides a biophysiologically plausible interpretation, based on state-of-theart
probabilistic approaches and supported by current experimental evidence, of the interaction
between top-down global feedback and bottom-up local evidence in the context of hierarchical
object perception
Recommended from our members
Incremental Non-Greedy Clustering at Scale
Clustering is the task of organizing data into meaningful groups. Modern clustering applications such as entity resolution put several demands on clustering algorithms: (1) scalability to massive numbers of points as well as clusters, (2) incremental additions of data, (3) support for any user-specified similarity functions.
Hierarchical clusterings are often desired as they represent multiple alternative flat clusterings (e.g., at different granularity levels). These tree-structured clusterings provide for both fine-grained clusters as well as uncertainty in the presence of newly arriving data. Previous work on hierarchical clustering does not fully address all three of the aforementioned desiderata. Work on incremental hierarchical clustering often makes greedy, irrevocable clustering decisions that are regretted in the presence of future data. Work on scalable hierarchical clustering does not support incremental additions or deletions. These methods often make requirements on the similarity functions used and/or empirically tend to over merge clusters, which can lead to inaccurate clusterings.
In this thesis, we present incremental and scalable methods for hierarchical clustering to empirically satisfy the above desiderata. Our work aims to represent uncertainty and meaningful alternative clusterings, to efficiently reconsider past decisions in the incremental case, and to use parallelism to scale to massive datasets. Our method, Grinch, handles incrementally arriving data in a non-greedy fashion, by reconsidering past decisions using tree structure re-arrangements (e.g., rotations and grafts) invoked in accordance with the user’s specified similarity function. To achieve scalability to massive datasets, our method, SCC, builds a hierarchical clusterings in a level-wise bottom-up manner. Certain clustering decisions are made independently in parallel within each level, and a global similarity threshold schedule prevents greedy over-merging. We show how SCC can be combined with the tree-structure re-arrangements in Grinch to form a mini-batch algorithm achieving both scalable and incremental performance. Lastly, we generalize our hierarchical clustering approaches to DAG-structured ones, which can better represent uncertainty in clustering by representing overlapping clusters. We introduce an efficient bottom-up method for DAG-structured clustering, Llama. For each of the proposed methods, we provide both a theoretical and empirical analysis. Empirically, our methods achieve state-of-the-art results on clustering benchmarks in both the batch and the incremental settings, including multiple point improvements in dendrogram purity and scalability to billions of points
Proceedings of the NASA Conference on Space Telerobotics, volume 4
Papers presented at the NASA Conference on Space Telerobotics are compiled. The theme of the conference was man-machine collaboration in space. The conference provided a forum for researchers and engineers to exchange ideas on the research and development required for the application of telerobotic technology to the space systems planned for the 1990's and beyond. Volume 4 contains papers related to the following subject areas: manipulator control; telemanipulation; flight experiments (systems and simulators); sensor-based planning; robot kinematics, dynamics, and control; robot task planning and assembly; and research activities at the NASA Langley Research Center
Similarity search and data mining techniques for advanced database systems.
Modern automated methods for measurement, collection, and analysis of data in industry and science are providing more and more data with drastically increasing structure complexity. On the one hand, this growing complexity is justified by the need for a richer and more precise description of real-world objects, on the other hand it is justified by the rapid progress in measurement and analysis techniques that allow the user a versatile exploration of objects. In order to manage the huge volume of such complex data, advanced database systems are employed. In contrast to conventional database systems that support exact match queries, the user of these advanced database systems focuses on applying similarity search and data mining techniques.
Based on an analysis of typical advanced database systems — such as biometrical, biological, multimedia, moving, and CAD-object database systems — the following three challenging characteristics of complexity are detected: uncertainty (probabilistic feature vectors), multiple instances (a set of homogeneous feature vectors), and multiple representations (a set of heterogeneous feature vectors). Therefore, the goal of this thesis is to develop similarity search and data mining techniques that are capable of handling uncertain, multi-instance, and multi-represented objects.
The first part of this thesis deals with similarity search techniques. Object identification is a similarity search technique that is typically used for the recognition of objects from image, video, or audio data. Thus, we develop a novel probabilistic model for object identification. Based on it, two novel types of identification queries are defined. In order to process the novel query types efficiently, we introduce an index structure called Gauss-tree. In addition, we specify further probabilistic models and query types for uncertain multi-instance objects and uncertain spatial objects. Based on the index structure, we develop algorithms for an efficient processing of these query types. Practical benefits of using probabilistic feature vectors are demonstrated on a real-world application for video similarity search. Furthermore, a similarity search technique is presented that is based on aggregated multi-instance objects, and that is suitable for video similarity search. This technique takes multiple representations into account in order to achieve better effectiveness.
The second part of this thesis deals with two major data mining techniques: clustering and classification. Since privacy preservation is a very important demand of distributed advanced applications, we propose using uncertainty for data obfuscation in order to provide privacy preservation during clustering. Furthermore, a model-based and a density-based clustering method for multi-instance objects are developed. Afterwards, original extensions and enhancements of the density-based clustering algorithms DBSCAN and OPTICS for handling multi-represented objects are introduced. Since several advanced database systems like biological or multimedia database systems handle predefined, very large class systems, two novel classification techniques for large class sets that benefit from using multiple representations are defined. The first classification method is based on the idea of a k-nearest-neighbor classifier. It employs a novel density-based technique to reduce training instances and exploits the entropy impurity of the local neighborhood in order to weight a given representation. The second technique addresses hierarchically-organized class systems. It uses a novel hierarchical, supervised method for the reduction of large multi-instance objects, e.g. audio or video, and applies support vector machines for efficient hierarchical classification of multi-represented objects. User benefits of this technique are demonstrated by a prototype that performs a classification of large music collections.
The effectiveness and efficiency of all proposed techniques are discussed and verified by comparison with conventional approaches in versatile experimental evaluations on real-world datasets
On Two Web IR Boosting Tools: Clustering and Ranking
This thesis investigates several research problems which arise in modern Web Information Retrieval (WebIR). The Holy Grail of modern WebIR is to find a way to organize and to rank results so that the most ``relevant' come first. The first break-through technique was the exploitation of the link structure of the Web graph in order to rank the result pages, using the well-known Hits and Pagerank algorithms. This link-analysis approaches have been improved and extended, but yet they seem to be insufficient in providing a satisfying search experience.
In a number of situations a flat list of search results is not enough, and the users might desire to have search results grouped on-the-fly in folders of similar topics. In addition, the folders should be annotated with meaningful labels for rapid identification of the desired group of results. In other situations, users may have different search goals even when they express them with the same query. In this case the search results should be personalized according to the users' on-line activities. In order to address this need, we will discuss the algorithmic ideas behind SnakeT, a hierarchical clustering meta-search engine which personalizes searches according to the clusters selected by users on-the-fly.
There are also situations where users might desire to access fresh information. In these cases, traditional link analysis could not be suitable. In fact, it is possible that there is not enough time to have many links pointing to a recently produced piece of information. In order to address this need, we will discuss the algorithmic and numerical ideas behind a new ranking algorithm suitable for ranking fresh type of information, such as news articles or blogs.
When link analysis suffices to produce good quality search results, the huge amount of Web information asks for fast ranking methodologies. We will discuss numerical methodologies for accelerating the eingenvector-like computation, commonly used by link analysis.
An important result of this thesis is that we show how to address the above predominant issues of Web Information Retrieval by using clustering and ranking methodologies. We will demonstrate that both clustering and ranking have a mutual reinforcement propriety which has not yet been studied intensively. This propriety can be exploited to boost the precision of both the two methodologies
Is operational research in UK universities fit-for-purpose for the growing field of analytics?
Over the last decade considerable interest has been generated into the use of analytical methods in organisations. Along with this, many have reported a significant gap between organisational demand for analytical-trained staff, and the number of potential recruits qualified for such roles. This interest is of high relevance to the operational research discipline, both in terms of raising the profile of the field, as well as in the teaching and training of graduates to fill these roles. However, what is less clear, is the extent to which operational research teaching in universities, or indeed teaching on the various courses labelled as analytics , are offering a curriculum that can prepare graduates for these roles.
It is within this space that this research is positioned, specifically seeking to analyse the suitability of current provisions, limited to master s education in UK universities, and to make recommendations on how curricula may be developed. To do so, a mixed methods research design, in the pragmatic tradition, is presented. This includes a variety of research instruments. Firstly, a computational literature review is presented on analytics, assessing (amongst other things) the amount of research into analytics from a range of disciplines. Secondly, a historical analysis is performed of the literature regarding elements that can be seen as the pre-cursor of analytics, such as management information systems, decision support systems and business intelligence. Thirdly, an analysis of job adverts is included, utilising an online topic model and correlations analyses. Fourthly, online materials from UK universities concerning relevant degrees are analysed using a bagged support vector classifier and a bespoke module analysis algorithm. Finally, interviews with both potential employers of graduates, and also academics involved in analytics courses, are presented.
The results of these separate analyses are synthesised and contrasted. The outcome of this is an assessment of the current state of the market, some reflections on the role operational research make have, and a framework for the development of analytics curricula.
The principal contribution of this work is practical; providing tangible recommendations on curricula design and development, as well as to the operational research community in general in respect to how it may react to the growth of analytics. Additional contributions are made in respect to methodology, with a novel, mixed-method approach employed, and to theory, with insights as to the nature of how trends develop in both the jobs market and in academia. It is hoped that the insights here, may be of value to course designers seeking to react to similar trends in a wide range of disciplines and fields
An investigation of computer based nominal data record linkage
The Internet now provides access to vast volumes of nominal data (data associated
with names e. g. birth/death records, parish records, text articles, multimedia) collected
for a range of different purposes. This research focuses on parish registers containing
baptism, marriage, and burial records. Mining these data resources involves linkage
investigating as to how two records are related with regards to attributes like surname,
spatio-temporal location, legal association and inter-relationships. Furthermore, as
well as handling the implicit constraints of nominal data, such a system must also be
able to handle automatically a range of temporal and spatial rules and constraints.
The research examines the linkage rules that apply and how such rules interact. In
this investigation a report is given of the current practices in several disciplines (e. g.
history, demography, genealogy, and epidemiology) and how these are implemented
in current computer and database systems. The practical aspects of this study, and the
workbench approach proposed are centred on the extensive Lancashire & Cheshire
Parish Register archive held on the MIMAS database computer located at Manchester
University. The research also proposes how these findings can have wider
applications.
This thesis describes some initial research into this problem. It describes three
prototypes of nominal data workbench that allow the specification and examination of
several linkage types and discusses the merits of alternative name matching methods,
name grouping techniques and method comparisons. The conclusion is that in the
cases examined so far, effective nominal data linkage is essentially a query
optimisation process. The process is made more efficient if linkage specific indexes
exist, and suggests that query re-organization based on these indexes, though a
complex process, is entirely feasible. To facilitate the use of indexes and to guide the
optimization process, the work suggests the use of formal ontologies