2,747 research outputs found
CLiFF Notes: Research In Natural Language Processing at the University of Pennsylvania
The Computational Linguistics Feedback Forum (CLIFF) is a group of students and faculty who gather once a week to discuss the members\u27 current research. As the word feedback suggests, the group\u27s purpose is the sharing of ideas. The group also promotes interdisciplinary contacts between researchers who share an interest in Cognitive Science.
There is no single theme describing the research in Natural Language Processing at Penn. There is work done in CCG, Tree adjoining grammars, intonation, statistical methods, plan inference, instruction understanding, incremental interpretation, language acquisition, syntactic parsing, causal reasoning, free word order languages, ... and many other areas. With this in mind, rather than trying to summarize the varied work currently underway here at Penn, we suggest reading the following abstracts to see how the students and faculty themselves describe their work. Their abstracts illustrate the diversity of interests among the researchers, explain the areas of common interest, and describe some very interesting work in Cognitive Science.
This report is a collection of abstracts from both faculty and graduate students in Computer Science, Psychology and Linguistics. We pride ourselves on the close working relations between these groups, as we believe that the communication among the different departments and the ongoing inter-departmental research not only improves the quality of our work, but makes much of that work possible
Recommended from our members
MC2: MPEG-7 content modelling communities
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel UniversityThe use of multimedia content on the web has grown significantly in recent years. Websites such as Facebook, YouTube and Flickr cater for enormous amounts of multimedia content uploaded by users. This vast amount of multimedia content requires comprehensive content modelling otherwise
retrieving relevant content will be challenging. Modelling multimedia content can be an extremely time consuming task that may seem impossible particularly when undertaken by individual users. However, the advent of Web 2.0 and associated communities, such as YouTube and Flickr, has
shown that users appear to be more willing to collaborate in order to take on enormous tasks such as multimedia content modelling. Harnessing the power of communities to achieve comprehensive content modelling is the primary focus of this research.
The aim of this thesis is to explore collaborative multimedia content modelling and in particular the effectiveness of existing multimedia content modelling tools, taking into account the key development challenges of existing collaborative content modelling research and the associated
modelling tools. Four research objectives are pursued in order to achieve this; first, design a user experiment to study users’ tagging behaviour with existing multimedia tagging tools and identify any relationships between such user behaviour; second, design and develop a framework for MPEG-7 content modelling communities based on the results of the experiment; third, implement an online
service as a proof of concept of the framework; fourth, validate the framework through the online service during a repeat of the initial user experiment.
This research contributes first, a conceptual model of user behaviour visualised as a fuzzy cognitive
map and, second, an MPEG-7 framework for multimedia content modelling communities (MC2) and its proof of concept as an online service. The fuzzy cognitive model embodies relationships between user tagging behaviour and context and provides an understanding of user priorities in the description of content features and the relationships that exist between them. The MC2 framework,
developed based on the fuzzy cognitive model, is deep-rooted in user content modelling behaviour and content preferences. A proof of concept of the MC2 framework is implemented as an online service in which all metadata is modelled using MPEG-7. The online service is validated, first, empirically with the same group of users and through the same experiment that led to the development of the fuzzy cognitive model and, second, functionally against the folksonomy and MPEG-7 content modelling tools used in the initial experiment. The validation demonstrates that MC2 has the advantages without the shortcomings of existing multimedia tagging tools by harnessing the ease of use of folksonomy tools while producing comprehensive structured metadata.Supported by UK Engineering and Physical Sciences Research Council (EPSRC
Latent Semantic Indexing (LSI) Based Distributed System and Search On Encrypted Data
Latent semantic indexing (LSI) was initially introduced to overcome the issues of synonymy and polysemy of the traditional vector space model (VSM). LSI, however, has challenges of its own, mainly scalability. Despite being introduced in 1990, there are few attempts that provide an efficient solution for LSI, most of the literature is focuses on LSI’s applications rather than improving the original algorithm. In this work we analyze the first framework to provide scalable implementation of LSI and report its performance on the distributed environment of RAAD.
The possibility of adopting LSI in the field of searching over encrypted data is also investigated. The importance of that field is stemmed from the need for cloud computing as an effective computing paradigm that provides an affordable access to high computational power. Encryption is usually applied to prevent unauthorized access to the data (the host is assumed to be curious), however this limits accessibility to the data given that search over encryption is yet to catch with the latest techniques adopted by the Information Retrieval (IR) community. In this work we propose a system that uses LSI for indexing and free-query text for retrieving.
The results show that the available LSI framework does scale on large datasets, however it had some limitations with respect to factors like dictionary size and memory limit. When replicating the exact settings of the baseline on RAAD, it performed relatively slower. This could be resulted by the fact that RAAD uses a distributed file system or because of network latency. The results also show that the proposed system for applying LSI on encrypted data retrieved documents in the same order as the baseline (unencrypted data)
A hybrid approach for item collection recommendations : an application to automatic playlist continuation
Current recommender systems aim mainly to generate accurate item recommendations, without properly evaluating the multiple dimensions of the recommendation problem. However, in many domains, like in music, where items are rarely consumed in isolation, users would rather need a set of items, designed to work well together, while having some cognitive properties as a whole, related to their perception of quality and satisfaction.
In this thesis, a hybrid case-based recommendation approach for item collections is proposed. In particular, an application to automatic playlist continuation, addressing similar cognitive concepts, rather than similar users, is presented. Playlists, that are sets of music items designed to be consumed as a sequence, with a specific purpose and within a specific context, are treated as cases. The proposed recommender system is based on a meta-level hybridization. First, Latent Dirichlet Allocation is applied to the set of past playlists, described as distributions over music styles, to identify their underlying concepts. Then, for a started playlist, its semantic characteristics, like its latent concept and the styles of the included items, are inferred, and Case-Based Reasoning is applied to the set of past playlists addressing the same concept, to construct and recommend a relevant playlist continuation. A graph-based item model is used to overcome the semantic gap between songs’ signal-based descriptions and users’ high-level preferences, efficiently capture the playlists’ structures and the similarity of the music items in those. As the proposed method bases its reasoning on previous playlists, it does not require the construction of complex user profiles to generate accurate recommendations. Furthermore, apart from relevance, support to parameters beyond accuracy, like increased coherence or support to diverse items is provided to deliver a more complete user experience.
Experiments on real music datasets have revealed improved results, compared to other state of the art techniques, while achieving a “good trade-off” between recommendations’ relevance, diversity and coherence. Finally, although actually focusing on playlist continuations, the designed approach could be easily adapted to serve other recommendation domains with similar characteristics.Los sistemas de recomendación actuales tienen como objetivo principal generar recomendaciones precisas de artículos, sin evaluar propiamente las múltiples dimensiones del problema de recomendación. Sin embargo, en dominios como la música, donde los artículos rara vez se consumen en forma aislada, los usuarios más bien necesitarían recibir recomendaciones de conjuntos de elementos, diseñados para que se complementaran bien juntos, mientras se cubran algunas propiedades cognitivas, relacionadas con su percepción de calidad y satisfacción. En esta tesis, se propone un sistema híbrido de recomendación meta-nivel, que genera recomendaciones de colecciones de artículos. En particular, el sistema se centra en la generación automática de continuaciones de listas de música, tratando conceptos cognitivos similares, en lugar de usuarios similares. Las listas de reproducción son conjuntos de elementos musicales diseñados para ser consumidos en secuencia, con un propósito específico y dentro de un contexto específico. El sistema propuesto primero aplica el método de Latent Dirichlet Allocation a las listas de reproducción, que se describen como distribuciones sobre estilos musicales, para identificar sus conceptos. Cuando se ha iniciado una nueva lista, se deducen sus características semánticas, como su concepto y los estilos de los elementos incluidos en ella. A continuación, el sistema aplica razonamiento basado en casos, utilizando las listas del mismo concepto, para construir y recomendar una continuación relevante. Se utiliza un grafo que modeliza las relaciones de los elementos, para superar el ?salto semántico? existente entre las descripciones de las canciones, normalmente basadas en características sonoras, y las preferencias de los usuarios, expresadas en características de alto nivel. También se utiliza para calcular la similitud de los elementos musicales y para capturar la estructura de las listas de dichos elementos. Como el método propuesto basa su razonamiento en las listas de reproducción y no en usuarios que las construyeron, no se requiere la construcción de perfiles de usuarios complejos para poder generar recomendaciones precisas. Aparte de la relevancia de las recomendaciones, el sistema tiene en cuenta parámetros más allá de la precisión, como mayor coherencia o soporte a la diversidad de los elementos para enriquecer la experiencia del usuario. Los experimentos realizados en bases de datos reales, han revelado mejores resultados, en comparación con las técnicas utilizadas normalmente. Al mismo tiempo, el algoritmo propuesto logra un "buen equilibrio" entre la relevancia, la diversidad y la coherencia de las recomendaciones generadas. Finalmente, aunque la metodología presentada se centra en la recomendación de continuaciones de listas de reproducción musical, el sistema se puede adaptar fácilmente a otros dominios con características similares.Postprint (published version
Visual Text Analysis in Digital Humanities
In 2005, Franco Moretti introduced Distant Reading to analyse entire literary text collections. This was a rather revolutionary idea compared to the traditional Close Reading, which focuses on the thorough interpretation of an individual work. Both reading techniques are the prior means of Visual Text Analysis. We present an overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities. Therefore, we classify the observed papers according to a taxonomy of text analysis tasks, categorize applied close and distant reading techniques to support the investigation of these tasks and illustrate approaches that combine both reading techniques in order to provide a multi-faceted view of the textual data. In addition, we take a look at the used text sources and at the typical data transformation steps required for the proposed visualizations. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and we give an outlook on future challenges in that research area
Semantic discovery and reuse of business process patterns
Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse
Feedback 2.0: An Investigation into Using Sharable Feedback Tags as Programming Feedback
Objectives: Learning and teaching computer programming is a recognised challenge in Higher Education. Since feedback is regarded as being the most important part of the learning process, it is expected that improving it could support students' learning. This thesis aims to investigate how new forms of feedback can improve student learning of programming and how feedback sharing can further enhance the students' learning experience.
Methods: This thesis investigates the use of new forms of feedback for programming courses. The work explores the use of collaborative tagging often found in Web 2.0 software systems and a feedback approach that requires examiners to annotate students source code with short, potentially reusable feedback. The thesis utilises a variety of research methods including questionnaires, focus groups and collection of system usage data recorded from student interactions with their feedback. Sentiment and thematic analysis are used to investigate how well feedback tags communicate the intended message from examiners to students. The approaches used are tested and refined over two preliminary investigations before use in the final investigation.
Results: The work identified that a majority of students responded positively to the new feedback approach described. Student engagement was high with up to 100% viewing their feedback and at least 42% of students opting to share their feedback. Students in the cohort who achieved either the lower or higher marks for the assignment appeared more likely to share their feedback.
Conclusions: This thesis has demonstrated that sharing of feedback can be useful for disseminating good practice and common pitfalls. Provision of feedback which is contextually rich and textually concise has resulted in higher engagement from students. However, the outcomes of this research have been shown to be influenced by the assessment process adopted by the University. For example, students were more likely to engage with their feedback if marks are unavailable at the time of feedback release. This issue and many others are proposed as further work
Usefulness of social tagging in organizing and providing access to the web: An analysis of indexing consistency and quality
This dissertation research points out major challenging problems with current Knowledge Organization (KO) systems, such as subject gateways or web directories: (1) the current systems use traditional knowledge organization systems based on controlled vocabulary which is not very well suited to web resources, and (2) information is organized by professionals not by users, which means it does not reflect intuitively and instantaneously expressed users’ current needs. In order to explore users’ needs, I examined social tags which are user-generated uncontrolled vocabulary. As investment in professionally-developed subject gateways and web directories diminishes (support for both BUBL and Intute, examined in this study, is being discontinued), understanding characteristics of social tagging becomes even more critical.
Several researchers have discussed social tagging behavior and its usefulness for classification or retrieval; however, further research is needed to qualitatively and quantitatively investigate social tagging in order to verify its quality and benefit. This research particularly examined the indexing consistency of social tagging in comparison to professional indexing to examine the quality and efficacy of tagging. The data analysis was divided into three phases: analysis of indexing consistency, analysis of tagging effectiveness, and analysis of tag attributes. Most indexing consistency studies have been conducted with a small number of professional indexers, and they tended to exclude users. Furthermore, the studies mainly have focused on physical library collections. This dissertation research bridged these gaps by (1) extending the scope of resources to various web documents indexed by users and (2) employing the Information Retrieval (IR) Vector Space Model (VSM) - based indexing consistency method since it is suitable for dealing with a large number of indexers. As a second phase, an analysis of tagging effectiveness with tagging exhaustivity and tag specificity was conducted to ameliorate the drawbacks of consistency analysis based on only the quantitative measures of vocabulary matching. Finally, to investigate tagging pattern and behaviors, a content analysis on tag attributes was conducted based on the FRBR model.
The findings revealed that there was greater consistency over all subjects among taggers compared to that for two groups of professionals. The analysis of tagging exhaustivity and tag specificity in relation to tagging effectiveness was conducted to ameliorate difficulties associated with limitations in the analysis of indexing consistency based on only the quantitative measures of vocabulary matching. Examination of exhaustivity and specificity of social tags provided insights into particular characteristics of tagging behavior and its variation across subjects. To further investigate the quality of tags, a Latent Semantic Analysis (LSA) was conducted to determine to what extent tags are conceptually related to professionals’ keywords and it was found that tags of higher specificity tended to have a higher semantic relatedness to professionals’ keywords. This leads to the conclusion that the term’s power as a differentiator is related to its semantic relatedness to documents. The findings on tag attributes identified the important bibliographic attributes of tags beyond describing subjects or topics of a document. The findings also showed that tags have essential attributes matching those defined in FRBR. Furthermore, in terms of specific subject areas, the findings originally identified that taggers exhibited different tagging behaviors representing distinctive features and tendencies on web documents characterizing digital heterogeneous media resources. These results have led to the conclusion that there should be an increased awareness of diverse user needs by subject in order to improve metadata in practical applications.
This dissertation research is the first necessary step to utilize social tagging in digital information organization by verifying the quality and efficacy of social tagging. This dissertation research combined both quantitative (statistics) and qualitative (content analysis using FRBR) approaches to vocabulary analysis of tags which provided a more complete examination of the quality of tags. Through the detailed analysis of tag properties undertaken in this dissertation, we have a clearer understanding of the extent to which social tagging can be used to replace (and in some cases to improve upon) professional indexing
- …