49,147 research outputs found
ViPEr-HiSS: A Case for Storage Design Tools
The viability of large-scale multimedia applications, depends on the
performance of storage systems. Providing cost-effective access to vast
amounts of video, image, audio, and text data, requires (a) proper
configuration of storage hierarchies as well as (b) efficient resource
management techniques at all levels of the storage hierarchy. The
resulting complexities of the hardware/software co-design in turn
contribute to difficulties in making accurate predictions about
performance, scalability, and cost-effectiveness of a storage system.
Moreover, poor decisions at design time can be costly and problematic to
correct in later stages of development. Hence, measurement of systems
after they have been developed is not a desirable approach to
predicting their performance. What is needed is the ability to evaluate
the system's design while there are still opportunities to make
corrections to fundamental design flaws. In this paper we describe the
framework of ViPEr-HiSS, a tool which facilitates design, development, and
subsequent performance evaluation of designs of multimedia storage
hierarchies by providing mechanisms for relatively easy experimentation
with (a) system configurations as well as (b) application- and media-aware
resource management techniques.
(Also cross-referenced as UMIACS-TR-99-69
LiteMat: a scalable, cost-efficient inference encoding scheme for large RDF graphs
The number of linked data sources and the size of the linked open data graph
keep growing every day. As a consequence, semantic RDF services are more and
more confronted with various "big data" problems. Query processing in the
presence of inferences is one them. For instance, to complete the answer set of
SPARQL queries, RDF database systems evaluate semantic RDFS relationships
(subPropertyOf, subClassOf) through time-consuming query rewriting algorithms
or space-consuming data materialization solutions. To reduce the memory
footprint and ease the exchange of large datasets, these systems generally
apply a dictionary approach for compressing triple data sizes by replacing
resource identifiers (IRIs), blank nodes and literals with integer values. In
this article, we present a structured resource identification scheme using a
clever encoding of concepts and property hierarchies for efficiently evaluating
the main common RDFS entailment rules while minimizing triple materialization
and query rewriting. We will show how this encoding can be computed by a
scalable parallel algorithm and directly be implemented over the Apache Spark
framework. The efficiency of our encoding scheme is emphasized by an evaluation
conducted over both synthetic and real world datasets.Comment: 8 pages, 1 figur
From software APIs to web service ontologies: a semi-automatic extraction method
Successful employment of semantic web services depends on
the availability of high quality ontologies to describe the domains of these services. As always, building such ontologies is difficult and costly, thus hampering web service deployment. Our hypothesis is that since the functionality offered by a web service is reflected by the underlying software, domain ontologies could be built by analyzing the documentation of that software. We verify this hypothesis in the domain of RDF ontology storage tools.We implemented and fine-tuned a semi-automatic method to extract domain ontologies from software documentation. The quality of the extracted ontologies was verified against a high quality hand-built ontology of the same domain. Despite the low linguistic quality of the corpus, our method allows extracting a considerable amount
of information for a domain ontology
Analyzing Large Collections of Electronic Text Using OLAP
Computer-assisted reading and analysis of text has various applications in
the humanities and social sciences. The increasing size of many electronic text
archives has the advantage of a more complete analysis but the disadvantage of
taking longer to obtain results. On-Line Analytical Processing is a method used
to store and quickly analyze multidimensional data. By storing text analysis
information in an OLAP system, a user can obtain solutions to inquiries in a
matter of seconds as opposed to minutes, hours, or even days. This analysis is
user-driven allowing various users the freedom to pursue their own direction of
research
- …