52,030 research outputs found
ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments
This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement
No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version
The Digital Architectures of Social Media: Comparing Political Campaigning on Facebook, Twitter, Instagram, and Snapchat in the 2016 U.S. Election
The present study argues that political communication on social media is
mediated by a platform's digital architecture, defined as the technical
protocols that enable, constrain, and shape user behavior in a virtual space. A
framework for understanding digital architectures is introduced, and four
platforms (Facebook, Twitter, Instagram, and Snapchat) are compared along the
typology. Using the 2016 US election as a case, interviews with three
Republican digital strategists are combined with social media data to qualify
the studyies theoretical claim that a platform's network structure,
functionality, algorithmic filtering, and datafication model affect political
campaign strategy on social media
A systematic review of speech recognition technology in health care
BACKGROUND To undertake a systematic review of existing literature relating to speech recognition technology and its application within health care. METHODS A systematic review of existing literature from 2000 was undertaken. Inclusion criteria were: all papers that referred to speech recognition (SR) in health care settings, used by health professionals (allied health, medicine, nursing, technical or support staff), with an evaluation or patient or staff outcomes. Experimental and non-experimental designs were considered. Six databases (Ebscohost including CINAHL, EMBASE, MEDLINE including the Cochrane Database of Systematic Reviews, OVID Technologies, PreMED-LINE, PsycINFO) were searched by a qualified health librarian trained in systematic review searches initially capturing 1,730 references. Fourteen studies met the inclusion criteria and were retained. RESULTS The heterogeneity of the studies made comparative analysis and synthesis of the data challenging resulting in a narrative presentation of the results. SR, although not as accurate as human transcription, does deliver reduced turnaround times for reporting and cost-effective reporting, although equivocal evidence of improved workflow processes. CONCLUSIONS SR systems have substantial benefits and should be considered in light of the cost and selection of the SR system, training requirements, length of the transcription task, potential use of macros and templates, the presence of accented voices or experienced and in-experienced typists, and workflow patterns.Funding for this study was provided by the University of Western Sydney.
NICTA is funded by the Australian Government through the Department of
Communications and the Australian Research Council through the ICT
Centre of Excellence Program. NICTA is also funded and supported by the
Australian Capital Territory, the New South Wales, Queensland and Victorian
Governments, the Australian National University, the University of New South
Wales, the University of Melbourne, the University of Queensland, the
University of Sydney, Griffith University, Queensland University of
Technology, Monash University and other university partners
An Introduction to Programming for Bioscientists: A Python-based Primer
Computing has revolutionized the biological sciences over the past several
decades, such that virtually all contemporary research in the biosciences
utilizes computer programs. The computational advances have come on many
fronts, spurred by fundamental developments in hardware, software, and
algorithms. These advances have influenced, and even engendered, a phenomenal
array of bioscience fields, including molecular evolution and bioinformatics;
genome-, proteome-, transcriptome- and metabolome-wide experimental studies;
structural genomics; and atomistic simulations of cellular-scale molecular
assemblies as large as ribosomes and intact viruses. In short, much of
post-genomic biology is increasingly becoming a form of computational biology.
The ability to design and write computer programs is among the most
indispensable skills that a modern researcher can cultivate. Python has become
a popular programming language in the biosciences, largely because (i) its
straightforward semantics and clean syntax make it a readily accessible first
language; (ii) it is expressive and well-suited to object-oriented programming,
as well as other modern paradigms; and (iii) the many available libraries and
third-party toolkits extend the functionality of the core language into
virtually every biological domain (sequence and structure analyses,
phylogenomics, workflow management systems, etc.). This primer offers a basic
introduction to coding, via Python, and it includes concrete examples and
exercises to illustrate the language's usage and capabilities; the main text
culminates with a final project in structural bioinformatics. A suite of
Supplemental Chapters is also provided. Starting with basic concepts, such as
that of a 'variable', the Chapters methodically advance the reader to the point
of writing a graphical user interface to compute the Hamming distance between
two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables,
numerous exercises, and 19 pages of Supporting Information; currently in
press at PLOS Computational Biolog
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
Cosmological Simulations Using Special Purpose Computers: Implementing P3M on Grape
An adaptation of the Particle-Particle/Particle-Mesh (P3M) code to the
special purpose hardware GRAPE is presented. The short range force is
calculated by a four chip GRAPE-3A board, while the rest of the calculation is
performed on a Sun Sparc 10/51 workstation. The limited precision of the GRAPE
hardware and algorithm constraints introduce stochastic errors of the order of
a few percent in the gravitational forces. Tests of this new P3MG3A code show
that it is a robust tool for cosmological simulations. The code currently
achieves a peak efficiency of one third the speed of the vectorized P3M code on
a Cray C-90 and significant improvements are planned in the near future.
Special purpose computers like GRAPE are therefore an attractive alternative to
supercomputers for numerical cosmology.Comment: 9 pages (ApJS style); uuencoded compressed PostScript file (371 kb)
Also available by anonymous 'ftp' to astro.Princeton.EDU [128.112.24.45] in:
summers/grape/p3mg3a.ps (668 kb) and WWW at:
http://astro.Princeton.EDU/~library/prep.html (as POPe-600) Send all
comments, questions, requests, etc. to: [email protected]
Design: One, but in different forms
This overview paper defends an augmented cognitively oriented generic-design
hypothesis: there are both significant similarities between the design
activities implemented in different situations and crucial differences between
these and other cognitive activities; yet, characteristics of a design
situation (related to the design process, the designers, and the artefact)
introduce specificities in the corresponding cognitive activities and
structures that are used, and in the resulting designs. We thus augment the
classical generic-design hypothesis with that of different forms of designing.
We review the data available in the cognitive design research literature and
propose a series of candidates underlying such forms of design, outlining a
number of directions requiring further elaboration
Queensland University of Technology at TREC 2005
The Information Retrieval and Web Intelligence (IR-WI) research group is a research team at the Faculty of Information Technology, QUT, Brisbane, Australia. The IR-WI group participated in the Terabyte and Robust track at TREC 2005, both for the first time. For the Robust track we applied our existing information retrieval system that was originally designed for use with structured (XML) retrieval to the domain of document retrieval. For the Terabyte track we experimented with an open source IR system, Zettair and performed two types of experiments. First, we compared Zettair’s performance on both a high-powered supercomputer and a distributed system across seven midrange personal computers. Second, we compared Zettair’s performance when a standard TREC title is used, compared with a natural language query, and a query expanded with synonyms. We compare the systems both in terms of efficiency and retrieval performance. Our results indicate that the distributed system is faster than the supercomputer, while slightly decreasing retrieval performance, and that natural language queries also slightly decrease retrieval performance, while our query expansion technique significantly decreased performance
- …