9 research outputs found
Automatic assessment of elecrtical exams
Sähköiset tentit ovat yleistyneet huomattavasti ja samalla näille on luotu monia eri tenttijärjestelmiä. Myös automaattisen tarkastamisen kehittäminen on hiljalleen kasvattanut suosiotaan. Suuri osa tästä kehityksestä on painottunut esseiden tarkastamiseen. Samaa kehitystä voitaneen myös soveltaa muissa tenttityypeissä.
Tässä työssä tarkastellaan eri tenttityyppejä ja miten niitä on sähköistetty, eli vaihdettu paperi vastausalustana tietokoneeseen. Esseiden kirjoittamiseen tietokonetta on jo käytetty pitkään, mutta tenttien tekemiseen sitä on alettu vasta viimeaikoina hyödyntämään suuremmalla mittakaavalla. Matematiikan tenttien sähköistämisessä suurin ongelma on matematiikan kaavojen ja merkkien kirjoittaminen tietokoneella, mutta tämä ei ole ylitsepääsemätön este.
Tietokoneiden hyödyntäminen vain tentteihin vastaamiseen jättää paljon tietokoneen potentiaalista käyttämättä; tietokoneilla voidaan myös tarkastaa tentit automaattisesti. Näin opettajien työtaakka pienenee huomattavasti, ja opiskelijoiden mahdollisuudet itsenäiseen oppimiseen kasvaa. Jo yksinkertainenkin ohjelma, joka helpottaa tenttijärjestelmistä saatavaa vastausaineiston hallintaa ja tarkastamista, johtaa tähän tavoitteeseen. Tässä työssä esitellään tähän tehtävään tehty ohjelma, jolle on annettu nimeksi PunaKynä
A Study of Retrieval Success with Original Works of Art Comparing the Subject Index Terms Provided by Experts in Art Museums With Those Provided By Novice and Intermediate Indexers
This paper compares the retrieval success of terms for searching online art museum collections of two different origins: the use of terms that are the natural byproducts of curatorial processes and those provided by volunteer gallery teachers and students. The terms used by scholars and gallery teachers obtained the best retrieval, with approximately 15% of terms successfully retrieving the desired work. Little successful application of the terms available in the Art and Architecture Thesaurus (AAT) or of the terms used by scholars was seen in the online museum collections. Overall, the terms supplied by study participants had poor retrieval success. Application of additional index terms describing the basic elements, materials and colors featured in the works and terms from the AAT could improve retrieval
The Relative Effectiveness of Text and Images in Image Search Result Listings
This study was conducted to determine the best type of image surrogate to use within search result sets: Text, Image Preview, or Text + Image Preview. Users' performance and satisfaction with the three different image surrogates within search result sets were evaluated. Data was collected from 28 participants via a web-based system of questionnaires and logs of their interactions with result set presentations. Of the three image surrogate types, Image Preview and Text + Image Preview surrogates consistently outperformed Text surrogates on measures of the time required to make relevance judgments, the quality of those relevance judgments, perceived ease of use and perceived usefulness. While relevance judgment scoring with Image Preview and Text + Image Preview surrogates was identical, answers to the post-session questionnaire indicated that users may prefer the Text + Image Preview surrogate, as it was "liked best overall" by more people
Human-Centered Content-Based Image Retrieval
Retrieval of images that lack a (suitable) annotations cannot be achieved through (traditional) Information Retrieval (IR) techniques. Access through such collections can be achieved through the application of computer vision techniques on the IR problem, which is baptized Content-Based Image Retrieval (CBIR). In contrast with most purely technological approaches, the thesis Human-Centered Content-Based Image Retrieval approaches the problem from a human/user centered perspective. Psychophysical experiments were conducted in which people were asked to categorize colors. The data gathered from these experiments was fed to a Fast Exact Euclidean Distance (FEED) transform (Schouten & Van den Broek, 2004), which enabled the segmentation of color space based on human perception (Van den Broek et al., 2008). This unique color space segementation was exploited for texture analysis and image segmentation, and subsequently for full-featured CBIR. In addition, a unique CBIR-benchmark was developed (Van den Broek et al., 2004, 2005). This benchmark was used to explore what and how several parameters (e.g., color and distance measures) of the CBIR process influence retrieval results. In contrast with other research, users judgements were assigned as metric. The online IR and CBIR system Multimedia for Art Retrieval (M4ART) (URL: http://www.m4art.org) has been (partly) founded on the techniques discussed in this thesis. References: - Broek, E.L. van den, Kisters, P.M.F., and Vuurpijl, L.G. (2004). The utilization of human color categorization for content-based image retrieval. Proceedings of SPIE (Human Vision and Electronic Imaging), 5292, 351-362. [see also Chapter 7] - Broek, E.L. van den, Kisters, P.M.F., and Vuurpijl, L.G. (2005). Content-Based Image Retrieval Benchmarking: Utilizing Color Categories and Color Distributions. Journal of Imaging Science and Technology, 49(3), 293-301. [see also Chapter 8] - Broek, E.L. van den, Schouten, Th.E., and Kisters, P.M.F. (2008). Modeling Human Color Categorization. Pattern Recognition Letters, 29(8), 1136-1144. [see also Chapter 5] - Schouten, Th.E. and Broek, E.L. van den (2004). Fast Exact Euclidean Distance (FEED) transformation. In J. Kittler, M. Petrou, and M. Nixon (Eds.), Proceedings of the 17th IEEE International Conference on Pattern Recognition (ICPR 2004), Vol 3, p. 594-597. August 23-26, Cambridge - United Kingdom. [see also Appendix C
Finding hidden semantics of text tables
Combining data from different sources for further automatic processing is often hindered by differences in the underlying semantics and representation. Therefore when linking information presented in documents in tabular form with data held in databases, it is important to determine as much information about the table and its content. Important information about the table data is often given in the text surrounding the table in that document. The table's creators cannot clarify all the semantics in the table itself therefore they use the table context or the text around it to give further information. These semantics are very useful when integrating and using this data, but are often difficult to detect automatically. We propose a solution to part of this problem based on a domain ontology. The input to our system is a document that contains tabular data and the system aims to find semantics in the document that are related to the tabular data. The output of our system is a set of detected semantics linked to the corresponding table. The system uses elements of semantic detection, semantic representation, and data integration. Semantic detection uses a domain ontology, in which we store concepts of that domain. This allows us to analyse the content of the document (text) and detect context information about the tables present in a document containing tabular data. Our approach consists of two components: (1) extract, from the domain ontology, concepts, synonyms, and relations that correspond to the table data. (2) Build a tree for the paragraphs and use this tree to detect the hidden semantics by searching for words matching the extracted concepts. Semantic representation techniques then allow representation of the detected semantics of the table data. Our system represents the detected semantics, as either 'semantic units' or 'enhanced metadata'. Semantic units are a flexible set of meta-attributes that describe the meaning of the data item along with the detected semantics. In addition, each semantic unit has a concept label associated with it that specifies the relationship between the unit and the real world aspects it describes. In the enhanced metadata, table metadata is enhanced with the semantics and representation context found in the text. Integrating data in our proposed system takes place in two steps. First, the semantic units are converted to a common context, reflecting the application. This is achieved by using appropriate conversion functions. Secondly, the semantically identical semantic units, will be identified and integrated into a common representation. This latter is the subject of future work. Thus the research has shown that semantics about a table are in the text and how it is possible to locate and use these semantics by transforming them into an appropriate form to enhance the basic table metadata
Recommended from our members
Perceived features and similarity of images: An investigation into their relationships and a test of Tversky's contrast model.
The creation, storage, manipulation, and transmission of images have become less costly and more efficient. Consequently, the numbers of images and their users are growing rapidly. This poses challenges to those who organize and provide access to them. One of these challenges is similarity matching. Most current content-based image retrieval (CBIR) systems which can extract only low-level visual features such as color, shape, and texture, use similarity measures based on geometric models of similarity. However, most human similarity judgment data violate the metric axioms of these models. Tversky's (1977) contrast model, which defines similarity as a feature contrast task and equates the degree of similarity of two stimuli to a linear combination of their common and distinctive features, explains human similarity judgments much better than the geometric models. This study tested the contrast model as a conceptual framework to investigate the nature of the relationships between features and similarity of images as perceived by human judges. Data were collected from 150 participants who performed two tasks: an image description and a similarity judgment task. Qualitative methods (content analysis) and quantitative (correlational) methods were used to seek answers to four research questions related to the relationships between common and distinctive features and similarity judgments of images as well as measures of their common and distinctive features. Structural equation modeling, correlation analysis, and regression analysis confirmed the relationships between perceived features and similarity of objects hypothesized by Tversky (1977). Tversky's (1977) contrast model based upon a combination of two methods for measuring common and distinctive features, and two methods for measuring similarity produced statistically significant structural coefficients between the independent latent variables (common and distinctive features) and the dependent latent variable (similarity). This model fit the data well for a sample of 30 (435 pairs of) images and 150 participants (χ2 =16.97, df=10, p = .07508, RMSEA= .040, SRMR= .0205, GFI= .990, AGFI= .965). The goodness of fit indices showed the model did not significantly deviate from the actual sample data. This study is the first to test the contrast model in the context of information representation and retrieval. Results of the study are hoped to provide the foundations for future research that will attempt to further test the contrast model and assist designers of image organization and retrieval systems by pointing toward alternative document representations and similarity measures that more closely match human similarity judgments
Information Retrieval Beyond the Text Document
published or submitted for publicatio
Combinatoric Models of Information Retrieval Ranking Methods and Performance Measures for Weakly-Ordered Document Collections
This dissertation answers three research questions: (1) What are the characteristics of a combinatoric measure, based on the Average Search Length (ASL), that performs the same as a probabilistic version of the ASL?; (2) Does the combinatoric ASL measure produce the same performance result as the one that is obtained by ranking a collection of documents and calculating the ASL by empirical means?; and (3) When does the ASL and either the Expected Search Length, MZ-based E, or Mean Reciprocal Rank measure both imply that one document ranking is better than another document ranking? Concepts and techniques from enumerative combinatorics and other branches of mathematics were used in this research to develop combinatoric models and equations for several information retrieval ranking methods and performance measures. Empirical, statistical, and simulation means were used to validate these models and equations. The document cut-off performance measure equation variants that were developed in this dissertation can be used for performance prediction and to help study any vector V of ranked documents, at arbitrary document cut-off points, provided that (1) relevance is binary and (2) the following information can be determined from the ranked output: the document equivalence classes and their relative sequence, the number of documents in each equivalence class, and the number of relevant documents that each class contains. The performance measure equations yielded correct values for both strongly- and weakly-ordered document collections