Search CORE

52,030 research outputs found

ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

Author: Berral García Josep Lluís
Call Aaron
Carrera Pérez David
Green Daron
Poggi Mastrokalo Nicolas
Reinauer Rob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

arXiv.org e-Print Archive

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

The Digital Architectures of Social Media: Comparing Political Campaigning on Facebook, Twitter, Instagram, and Snapchat in the 2016 U.S. Election

Author: Bossetta Michael
Publication venue: 'SAGE Publications'
Publication date: 01/06/2018
Field of study

The present study argues that political communication on social media is mediated by a platform's digital architecture, defined as the technical protocols that enable, constrain, and shape user behavior in a virtual space. A framework for understanding digital architectures is introduced, and four platforms (Facebook, Twitter, Instagram, and Snapchat) are compared along the typology. Using the 2016 US election as a case, interviews with three Republican digital strategists are combined with social media data to qualify the studyies theoretical claim that a platform's network structure, functionality, algorithmic filtering, and datafication model affect political campaign strategy on social media

arXiv.org e-Print Archive

Copenhagen University Research Information System

A systematic review of speech recognition technology in health care

Author: Basilakis Jim
Dawson Linda
Johnson Maree
Lapkin Samuel
Long Vanessa
Sanchez Paula
Suominen Hanna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/02/2016
Field of study

BACKGROUND To undertake a systematic review of existing literature relating to speech recognition technology and its application within health care. METHODS A systematic review of existing literature from 2000 was undertaken. Inclusion criteria were: all papers that referred to speech recognition (SR) in health care settings, used by health professionals (allied health, medicine, nursing, technical or support staff), with an evaluation or patient or staff outcomes. Experimental and non-experimental designs were considered. Six databases (Ebscohost including CINAHL, EMBASE, MEDLINE including the Cochrane Database of Systematic Reviews, OVID Technologies, PreMED-LINE, PsycINFO) were searched by a qualified health librarian trained in systematic review searches initially capturing 1,730 references. Fourteen studies met the inclusion criteria and were retained. RESULTS The heterogeneity of the studies made comparative analysis and synthesis of the data challenging resulting in a narrative presentation of the results. SR, although not as accurate as human transcription, does deliver reduced turnaround times for reporting and cost-effective reporting, although equivocal evidence of improved workflow processes. CONCLUSIONS SR systems have substantial benefits and should be considered in light of the cost and selection of the SR system, training requirements, length of the transcription task, potential use of macros and templates, the presence of accented voices or experienced and in-experienced typists, and workflow patterns.Funding for this study was provided by the University of Western Sydney. NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program. NICTA is also funded and supported by the Australian Capital Territory, the New South Wales, Queensland and Victorian Governments, the Australian National University, the University of New South Wales, the University of Melbourne, the University of Queensland, the University of Sydney, Griffith University, Queensland University of Technology, Monash University and other university partners

The Australian National University

An Introduction to Programming for Bioscientists: A Python-based Primer

Author: Ekmekci Berk
McAnany Charles E.
Mura Cameron
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 17/05/2016
Field of study

Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

FigShare

Recommended from our members

Hierarchical classification for multiple, distributed web databases

Author: Yang Hui
Zhang Minjie
Publication venue
Publication date: 01/01/2004
Field of study

The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance

Open Research Online (The Open University)

White Rose Research Online

Cosmological Simulations Using Special Purpose Computers: Implementing P3M on Grape

Author: Brieu Philippe P.
Ostriker Jeremiah P.
Summers FJ
Publication venue: 'University of Chicago Press'
Publication date: 31/10/1994
Field of study

An adaptation of the Particle-Particle/Particle-Mesh (P3M) code to the special purpose hardware GRAPE is presented. The short range force is calculated by a four chip GRAPE-3A board, while the rest of the calculation is performed on a Sun Sparc 10/51 workstation. The limited precision of the GRAPE hardware and algorithm constraints introduce stochastic errors of the order of a few percent in the gravitational forces. Tests of this new P3MG3A code show that it is a robust tool for cosmological simulations. The code currently achieves a peak efficiency of one third the speed of the vectorized P3M code on a Cray C-90 and significant improvements are planned in the near future. Special purpose computers like GRAPE are therefore an attractive alternative to supercomputers for numerical cosmology.Comment: 9 pages (ApJS style); uuencoded compressed PostScript file (371 kb) Also available by anonymous 'ftp' to astro.Princeton.EDU [128.112.24.45] in: summers/grape/p3mg3a.ps (668 kb) and WWW at: http://astro.Princeton.EDU/~library/prep.html (as POPe-600) Send all comments, questions, requests, etc. to: [email protected]

arXiv.org e-Print Archive

Crossref

Design: One, but in different forms

Author: Akin
Akin
Akin
Akin
Archer
Ball
Ball
Ball
Ball
Ball
Baykan
Bhatta
Bilda
Bisseret
Bonnardel
Burkhardt
Byrne
Cagan
Carroll
Carroll
Carroll
Carroll
Casakin
Chalmé
Chandrasekaran
Clancey
Conklin
Cross
Cross
Cross
Cross
Cross
Cross
Cross
Cross
Cross
D'Astous
Darke
Dasgupta
Davies
Dorst
Détienne
Eastman
Eastman
Edmonds
Falzon
Falzon
Frankenberger
Fricke
Glaser
Glaser
Goel
Goel
Goel
Goel
Guindon
Hayes-Roth
Hubka
Kant
Kim
Klahr
Kruger
Lawson
Lawson
Lebahar
Lee
Lloyd
Löwgren
Maher
Maiden
Malhotra
McNichol
Michalek
Méhier
Newell
Ormerod
Pahl
Purcell
Reimann
Reitman
Reymen
Rittel
Rodgers
Rowe
Sargent
Scaife
Simon
Simon
Simon
Simon
Sutcliffe
Thomas
Ullman
Van der Lugt
Van Someren
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Visser
Voss
Voss
Warfield
Whitefield
Willemien Visser
Zhang
Zimring
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

This overview paper defends an augmented cognitively oriented generic-design hypothesis: there are both significant similarities between the design activities implemented in different situations and crucial differences between these and other cognitive activities; yet, characteristics of a design situation (related to the design process, the designers, and the artefact) introduce specificities in the corresponding cognitive activities and structures that are used, and in the resulting designs. We thus augment the classical generic-design hypothesis with that of different forms of designing. We review the data available in the cognitive design research literature and propose a series of candidates underlying such forms of design, outlining a number of directions requiring further elaboration

arXiv.org e-Print Archive

Crossref

HAL-Rennes 1

Queensland University of Technology at TREC 2005

Author: Geva Shlomo
King John
Lu Chengye
Sahama Tony
Woodley Alan
Publication venue: 'National Institute of Standards and Technology (NIST)'
Publication date: 01/01/2005
Field of study

The Information Retrieval and Web Intelligence (IR-WI) research group is a research team at the Faculty of Information Technology, QUT, Brisbane, Australia. The IR-WI group participated in the Terabyte and Robust track at TREC 2005, both for the first time. For the Robust track we applied our existing information retrieval system that was originally designed for use with structured (XML) retrieval to the domain of document retrieval. For the Terabyte track we experimented with an open source IR system, Zettair and performed two types of experiments. First, we compared Zettair’s performance on both a high-powered supercomputer and a distributed system across seven midrange personal computers. Second, we compared Zettair’s performance when a standard TREC title is used, compared with a natural language query, and a query expanded with synonyms. We compare the systems both in terms of efficiency and retrieval performance. Our results indicate that the distributed system is faster than the supercomputer, while slightly decreasing retrieval performance, and that natural language queries also slightly decrease retrieval performance, while our query expansion technique significantly decreased performance

Queensland University of Technology ePrints Archive