Search CORE

1,477 research outputs found

Doctor of Philosophy

Author: Lin Xing
Publication venue: University of Utah
Publication date: 01/12/2015
Field of study

dissertationIn the past few years, we have seen a tremendous increase in digital data being generated. By 2011, storage vendors had shipped 905 PB of purpose-built backup appliances. By 2013, the number of objects stored in Amazon S3 had reached 2 trillion. Facebook had stored 20 PB of photos by 2010. All of these require an efficient storage solution. To improve space efficiency, compression and deduplication are being widely used. Compression works by identifying repeated strings and replacing them with more compact encodings while deduplication partitions data into fixed-size or variable-size chunks and removes duplicate blocks. While we have seen great improvements in space efficiency from these two approaches, there are still some limitations. First, traditional compressors are limited in their ability to detect redundancy across a large range since they search for redundant data in a fine-grain level (string level). For deduplication, metadata embedded in an input file changes more frequently, and this introduces more unnecessary unique chunks, leading to poor deduplication. Cloud storage systems suffer from unpredictable and inefficient performance because of interference among different types of workloads. This dissertation proposes techniques to improve the effectiveness of traditional compressors and deduplication in improving space efficiency, and a new IO scheduling algorithm to improve performance predictability and efficiency for cloud storage systems. The common idea is to utilize similarity. To improve the effectiveness of compression and deduplication, similarity in content is used to transform an input file into a compression- or deduplication-friendly format. We propose Migratory Compression, a generic data transformation that identifies similar data in a coarse-grain level (block level) and then groups similar blocks together. It can be used as a preprocessing stage for any traditional compressor. We find metadata have a huge impact in reducing the benefit of deduplication. To isolate the impact from metadata, we propose to separate metadata from data. Three approaches are presented for use cases with different constrains. For the commonly used tar format, we propose Migratory Tar: a data transformation and also a new tar format that deduplicates better. We also present a case study where we use deduplication to reduce storage consumption for storing disk images, while at the same time achieving high performance in image deployment. Finally, we apply the same principle of utilizing similarity in IO scheduling to prevent interference between random and sequential workloads, leading to efficient, consistent, and predictable performance for sequential workloads and a high disk utilization

The University of Utah: J. Willard Marriott Digital Library

An Investigation of Students\u27 Use and Understanding of Evaluation Strategies

Author: Akinyemi Abolaji R.
Publication venue: DigitalCommons@UMaine
Publication date: 20/08/2021
Field of study

One expected outcome of physics instruction is that students develop quantitative reasoning skills, including evaluation of problem solutions. To investigate students’ use of evaluation strategies, we developed and administered tasks prompting students to check the validity of a given expression. We collected written (N\u3e673) and interview (N=31) data at the introductory, sophomore, and junior levels. Tasks were administered in three different physics contexts: the velocity of a block at the bottom of an incline with friction, the electric field due to three point charges of equal magnitude, and the final velocities of two masses in an elastic collision. Responses were analyzed using modified grounded theory and phenomenology. In these three contexts, we explored different facets of students’ use and understanding of evaluation strategies. First, we document and analyze the various evaluation strategies students use when prompted, comparing to canonical strategies. Second, we describe how the identified strategies relate to prior work, with particular emphasis on how a strategy we describe as grouping relates to the phenomenon of chunking as described in cognitive science. Finally, we examine how the prevalence of these strategies varies across different levels of the physics curriculum. From our quantitative data, we found that while all the surveyed student populations drew from the same set of evaluation strategies, the percentage of students who used sophisticated evaluation strategies was higher in the sophomore and junior/senior student populations than in the first-year population. From our case studies of two pair interviews (one pair of first years, and one pair of juniors), we found that that while evaluating an expression, both juniors and first-years performed similar actions. However, while the first-year students focused on computation and checked for arithmetic consistency with the laws of physics, juniors checked for computational correctness and probed whether the equation accurately described the physical world and obeyed the laws of physics. Our case studies suggest that a key difference between expert and novice evaluation is that experts extract physical meaning from their result and make sense of them by comparing them to other representations of laws of physics, and real-life experience. We conclude with remarks including implications for classroom instruction as well as suggestions for future work

University of Maine

Toward a Heuristic Model for Evaluating the Complexity of Computer Security Visualization Interface

Author: Wang Hsiu-Chung
Publication venue: ScholarWorks @ Georgia State University
Publication date: 05/12/2006
Field of study

Computer security visualization has gained much attention in the research community in the past few years. However, the advancement in security visualization research has been hampered by the lack of standardization in visualization design, centralized datasets, and evaluation methods. We propose a new heuristic model for evaluating the complexity of computer security visualizations. This complexity evaluation method is designed to evaluate the efficiency of performing visual search in security visualizations in terms of measuring critical memory capacity load needed to perform such tasks. Our method is based on research in cognitive psychology along with characteristics found in a majority of the security visualizations. The main goal for developing this complexity evaluation method is to guide computer security visualization design and compare different visualization designs. Finally, we compare several well known computer security visualization systems. The proposed method has the potential to be extended to other areas of information visualization

ScholarWorks @ Georgia State University

What is the nature of the knowledge specialist teachers conceive of as deep subject and pedagogical knowledge of primary mathematics?

Author: Donaldson G.
Donaldson G.
Publication venue
Publication date: 01/01/2014
Field of study

One of the key recommendations of the Williams review of primary mathematics (2008) was for every school to have a primary mathematics specialist teacher (MaST) with ‘deep mathematical subject and pedagogical knowledge’ (Williams, 2008 p. 7). This knowledge would act as a ‘nucleus’ (p.1) for the whole school, with MaSTs supporting the teaching and learning of mathematics across the primary phase. As yet there is no model for the knowledge of these specialist teachers. This study aimed to examine the nature of this knowledge conceived of by a small sample of MaSTs, by conducting interviews as they undertook the role, and after developing it over two years and completing the Masters level training programme. The interviewer identified with the MaSTs the knowledge they conceived that they drew on in their teaching of one aspect of the mathematics curriculum and which they identified as deep subject knowledge. There were common features in this knowledge, which are argued to be indicative of the knowledge of the specialist teachers more generally. These features related to knowledge of progression across the primary phase. The MaSTs perceived that they gained new knowledge of mathematics and pedagogy which enabled them to support other staff but also impacted on their own teaching. The research found only a partial relationship between the current models which articulate the knowledge of primary classroom teachers of mathematics (Rowland et al 2009; Ball et al 2008; Ma, 1999) and the knowledge which MaSTs conceived that they drew on, and identified as deep. The research examined the relationship between the perceived knowledge of these teachers as specialists and class teachers, finding examples of case and strategic knowledge (Shulman, 1986). The MaSTs identified their new knowledge as distinct from that gained by classroom experience and valued the Masters aspects of their training programme

Canterbury Research and Theses Environment

The effects of age and expertise on discourse processing

Author: Soederberg Lisa Marie
Publication venue: University of New Hampshire Scholars\u27 Repository
Publication date: 01/01/1997
Field of study

The paradoxical nature of adult development is that it is marked by a decline in processing capacity but an increase in knowledge. A specialized formulation of increased knowledge that can occur throughout the lifespan is expertise. Because discourse processing is both a method of acquiring domain expertise and is facilitated by domain expertise, the nature of this interrelationship is central to successful aging. However, the processes through which expertise facilitates discourse processing are virtually unexplored within the cognitive aging literature. Four experiments investigating this issue are presented. The first experiment investigated age differences in on-line reading strategies of readers with high and low recall using passages in which expertise was induced by giving high-knowledge subjects titles to passages that were otherwise incoherent. In Experiment 2, age differences in parsing mechanisms underlying discourse processing of high- and low-knowledge listeners were examined using speech segmentation methodology. Experiment 3 was conducted to examine age differences in the effects of task demands on the reading strategies of high- and low-knowledge adults. Lastly, in Experiment 4, age differences in discourse processing strategies were investigated in the real-world domain of cooking

UNH Scholars' Repository

The Resonant Dynamics of Speech Perception: Interword Integration and Duration-Dependent Backward Effects

Author: Grossberg Stephen
Myers Christopher
Publication venue: Boston University Center for Adaptive Systems and Department of Cognitive and Neural Systems
Publication date: 01/01/1999
Field of study

How do listeners integrate temporally distributed phonemic information into coherent representations of syllables and words? During fluent speech perception, variations in the durations of speech sounds and silent pauses can produce different pereeived groupings. For exarnple, increasing the silence interval between the words "gray chip" may result in the percept "great chip", whereas increasing the duration of fricative noise in "chip" may alter the percept to "great ship" (Repp et al., 1978). The ARTWORD neural model quantitatively simulates such context-sensitive speech data. In AHTWORD, sequential activation and storage of phonemic items in working memory provides bottom-up input to unitized representations, or list chunks, that group together sequences of items of variable length. The list chunks compete with each other as they dynamically integrate this bottom-up information. The winning groupings feed back to provide top-down supportto their phonemic items. Feedback establishes a resonance which temporarily boosts the activation levels of selected items and chunks, thereby creating an emergent conscious percept. Because the resonance evolves more slowly than wotking memory activation, it can be influenced by information presented after relatively long intervening silence intervals. The same phonemic input can hereby yield different groupings depending on its arrival time. Processes of resonant transfer and competitive teaming help determine which groupings win the competition. Habituating levels of neurotransmitter along the pathways that sustain the resonant feedback lead to a resonant collapsee that permits the formation of subsequent. resonances.Air Force Office of Scientific Research (F49620-92-J-0225); Defense Advanced Research projects Agency and Office of Naval Research (N00014-95-1-0409); National Science Foundation (IRI-97-20333); Office of Naval Research (N00014-92-J-1309, NOOO14-95-1-0657

Boston University Institutional Repository (OpenBU)

On the Use of Parsing for Named Entity Recognition

Author: Alonso Miguel A.
Gómez-Rodríguez Carlos
Vilares Jesús
Publication venue: 'MDPI AG'
Publication date: 01/01/2021
Field of study

[Abstract] Parsing is a core natural language processing technique that can be used to obtain the structure underlying sentences in human languages. Named entity recognition (NER) is the task of identifying the entities that appear in a text. NER is a challenging natural language processing task that is essential to extract knowledge from texts in multiple domains, ranging from financial to medical. It is intuitive that the structure of a text can be helpful to determine whether or not a certain portion of it is an entity and if so, to establish its concrete limits. However, parsing has been a relatively little-used technique in NER systems, since most of them have chosen to consider shallow approaches to deal with text. In this work, we study the characteristics of NER, a task that is far from being solved despite its long history; we analyze the latest advances in parsing that make its use advisable in NER settings; we review the different approaches to NER that make use of syntactic information; and we propose a new way of using parsing in NER based on casting parsing itself as a sequence labeling task.Xunta de Galicia; ED431C 2020/11Xunta de Galicia; ED431G 2019/01This work has been funded by MINECO, AEI and FEDER of UE through the ANSWER-ASAP project (TIN2017-85160-C2-1-R); and by Xunta de Galicia through a Competitive Reference Group grant (ED431C 2020/11). CITIC, as Research Center of the Galician University System, is funded by the Consellería de Educación, Universidade e Formación Profesional of the Xunta de Galicia through the European Regional Development Fund (ERDF/FEDER) with 80%, the Galicia ERDF 2014-20 Operational Programme, and the remaining 20% from the Secretaría Xeral de Universidades (Ref. ED431G 2019/01). Carlos Gómez-Rodríguez has also received funding from the European Research Council (ERC), under the European Union’s Horizon 2020 research and innovation programme (FASTPARSE, Grant No. 714150)

Multidisciplinary Digital Publishing Institute

Repositorio da Universidade da Coruña

Directory of Open Access Journals

An Introduction to Hyperdex and the Brave New World of High Performance, Scalable, Consistent, Faulttolerant Data Stores

Author: Bernard Wong
Emin
Gün Sirer
Robert Escriva
Publication venue
Publication date
Field of study

CiteSeerX

Recommended from our members

Adapting Automatic Summarization to New Sources of Information

Author: Ouyang Jessica Jin
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2019
Field of study

English-language news articles are no longer necessarily the best source of information. The Web allows information to spread more quickly and travel farther: first-person accounts of breaking news events pop up on social media, and foreign-language news articles are accessible to, if not immediately understandable by, English-speaking users. This thesis focuses on developing automatic summarization techniques for these new sources of information. We focus on summarizing two specific new sources of information: personal narratives, first-person accounts of exciting or unusual events that are readily found in blog entries and other social media posts, and non-English documents, which must first be translated into English, often introducing translation errors that complicate the summarization process. Personal narratives are a very new area of interest in natural language processing research, and they present two key challenges for summarization. First, unlike many news articles, whose lead sentences serve as summaries of the most important ideas in the articles, personal narratives provide no such shortcuts for determining where important information occurs in within them; second, personal narratives are written informally and colloquially, and unlike news articles, they are rarely edited, so they require heavier editing and rewriting during the summarization process. Non-English documents, whether news or narrative, present yet another source of difficulty on top of any challenges inherent to their genre: they must be translated into English, potentially introducing translation errors and disfluencies that must be identified and corrected during summarization. The bulk of this thesis is dedicated to addressing the challenges of summarizing personal narratives found on the Web. We develop a two-stage summarization system for personal narrative that first extracts sentences containing important content and then rewrites those sentences into summary-appropriate forms. Our content extraction system is inspired by contextualist narrative theory, using changes in writing style throughout a narrative to detect sentences containing important information; it outperforms both graph-based and neural network approaches to sentence extraction for this genre. Our paraphrasing system rewrites the extracted sentences into shorter, standalone summary sentences, learning to mimic the paraphrasing choices of human summarizers more closely than can traditional lexicon- or translation-based paraphrasing approaches. We conclude with a chapter dedicated to summarizing non-English documents written in low-resource languages – documents that would otherwise be unreadable for English-speaking users. We develop a cross-lingual summarization system that performs even heavier editing and rewriting than does our personal narrative paraphrasing system; we create and train on large amounts of synthetic errorful translations of foreign-language documents. Our approach produces fluent English summaries from disdisfluent translations of non-English documents, and it generalizes across languages

Columbia University Academic Commons