52,030 research outputs found

    ALOJA: A framework for benchmarking and predictive analytics in Hadoop deployments

    Get PDF
    This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning. ALOJA is part of a long-term collaboration between BSC and Microsoft to automate the characterization of cost-effectiveness on Big Data deployments, currently focusing on Hadoop. Hadoop presents a complex run-time environment, where costs and performance depend on a large number of configuration choices. The ALOJA project has created an open, vendor-neutral repository, featuring over 40,000 Hadoop job executions and their performance details. The repository is accompanied by a test-bed and tools to deploy and evaluate the cost-effectiveness of different hardware configurations, parameters and Cloud services. Despite early success within ALOJA, a comprehensive study requires automation of modeling procedures to allow an analysis of large and resource-constrained search spaces. The predictive analytics extension, ALOJA-ML, provides an automated system allowing knowledge discovery by modeling environments from observed executions. The resulting models can forecast execution behaviors, predicting execution times for new configurations and hardware choices. That also enables model-based anomaly detection or efficient benchmark guidance by prioritizing executions. In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.This project has received funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No 639595). This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.Peer ReviewedPostprint (published version

    The Digital Architectures of Social Media: Comparing Political Campaigning on Facebook, Twitter, Instagram, and Snapchat in the 2016 U.S. Election

    Full text link
    The present study argues that political communication on social media is mediated by a platform's digital architecture, defined as the technical protocols that enable, constrain, and shape user behavior in a virtual space. A framework for understanding digital architectures is introduced, and four platforms (Facebook, Twitter, Instagram, and Snapchat) are compared along the typology. Using the 2016 US election as a case, interviews with three Republican digital strategists are combined with social media data to qualify the studyies theoretical claim that a platform's network structure, functionality, algorithmic filtering, and datafication model affect political campaign strategy on social media

    A systematic review of speech recognition technology in health care

    Get PDF
    BACKGROUND To undertake a systematic review of existing literature relating to speech recognition technology and its application within health care. METHODS A systematic review of existing literature from 2000 was undertaken. Inclusion criteria were: all papers that referred to speech recognition (SR) in health care settings, used by health professionals (allied health, medicine, nursing, technical or support staff), with an evaluation or patient or staff outcomes. Experimental and non-experimental designs were considered. Six databases (Ebscohost including CINAHL, EMBASE, MEDLINE including the Cochrane Database of Systematic Reviews, OVID Technologies, PreMED-LINE, PsycINFO) were searched by a qualified health librarian trained in systematic review searches initially capturing 1,730 references. Fourteen studies met the inclusion criteria and were retained. RESULTS The heterogeneity of the studies made comparative analysis and synthesis of the data challenging resulting in a narrative presentation of the results. SR, although not as accurate as human transcription, does deliver reduced turnaround times for reporting and cost-effective reporting, although equivocal evidence of improved workflow processes. CONCLUSIONS SR systems have substantial benefits and should be considered in light of the cost and selection of the SR system, training requirements, length of the transcription task, potential use of macros and templates, the presence of accented voices or experienced and in-experienced typists, and workflow patterns.Funding for this study was provided by the University of Western Sydney. NICTA is funded by the Australian Government through the Department of Communications and the Australian Research Council through the ICT Centre of Excellence Program. NICTA is also funded and supported by the Australian Capital Territory, the New South Wales, Queensland and Victorian Governments, the Australian National University, the University of New South Wales, the University of Melbourne, the University of Queensland, the University of Sydney, Griffith University, Queensland University of Technology, Monash University and other university partners

    An Introduction to Programming for Bioscientists: A Python-based Primer

    Full text link
    Computing has revolutionized the biological sciences over the past several decades, such that virtually all contemporary research in the biosciences utilizes computer programs. The computational advances have come on many fronts, spurred by fundamental developments in hardware, software, and algorithms. These advances have influenced, and even engendered, a phenomenal array of bioscience fields, including molecular evolution and bioinformatics; genome-, proteome-, transcriptome- and metabolome-wide experimental studies; structural genomics; and atomistic simulations of cellular-scale molecular assemblies as large as ribosomes and intact viruses. In short, much of post-genomic biology is increasingly becoming a form of computational biology. The ability to design and write computer programs is among the most indispensable skills that a modern researcher can cultivate. Python has become a popular programming language in the biosciences, largely because (i) its straightforward semantics and clean syntax make it a readily accessible first language; (ii) it is expressive and well-suited to object-oriented programming, as well as other modern paradigms; and (iii) the many available libraries and third-party toolkits extend the functionality of the core language into virtually every biological domain (sequence and structure analyses, phylogenomics, workflow management systems, etc.). This primer offers a basic introduction to coding, via Python, and it includes concrete examples and exercises to illustrate the language's usage and capabilities; the main text culminates with a final project in structural bioinformatics. A suite of Supplemental Chapters is also provided. Starting with basic concepts, such as that of a 'variable', the Chapters methodically advance the reader to the point of writing a graphical user interface to compute the Hamming distance between two DNA sequences.Comment: 65 pages total, including 45 pages text, 3 figures, 4 tables, numerous exercises, and 19 pages of Supporting Information; currently in press at PLOS Computational Biolog

    Cosmological Simulations Using Special Purpose Computers: Implementing P3M on Grape

    Full text link
    An adaptation of the Particle-Particle/Particle-Mesh (P3M) code to the special purpose hardware GRAPE is presented. The short range force is calculated by a four chip GRAPE-3A board, while the rest of the calculation is performed on a Sun Sparc 10/51 workstation. The limited precision of the GRAPE hardware and algorithm constraints introduce stochastic errors of the order of a few percent in the gravitational forces. Tests of this new P3MG3A code show that it is a robust tool for cosmological simulations. The code currently achieves a peak efficiency of one third the speed of the vectorized P3M code on a Cray C-90 and significant improvements are planned in the near future. Special purpose computers like GRAPE are therefore an attractive alternative to supercomputers for numerical cosmology.Comment: 9 pages (ApJS style); uuencoded compressed PostScript file (371 kb) Also available by anonymous 'ftp' to astro.Princeton.EDU [128.112.24.45] in: summers/grape/p3mg3a.ps (668 kb) and WWW at: http://astro.Princeton.EDU/~library/prep.html (as POPe-600) Send all comments, questions, requests, etc. to: [email protected]

    Design: One, but in different forms

    Full text link
    This overview paper defends an augmented cognitively oriented generic-design hypothesis: there are both significant similarities between the design activities implemented in different situations and crucial differences between these and other cognitive activities; yet, characteristics of a design situation (related to the design process, the designers, and the artefact) introduce specificities in the corresponding cognitive activities and structures that are used, and in the resulting designs. We thus augment the classical generic-design hypothesis with that of different forms of designing. We review the data available in the cognitive design research literature and propose a series of candidates underlying such forms of design, outlining a number of directions requiring further elaboration

    Queensland University of Technology at TREC 2005

    Get PDF
    The Information Retrieval and Web Intelligence (IR-WI) research group is a research team at the Faculty of Information Technology, QUT, Brisbane, Australia. The IR-WI group participated in the Terabyte and Robust track at TREC 2005, both for the first time. For the Robust track we applied our existing information retrieval system that was originally designed for use with structured (XML) retrieval to the domain of document retrieval. For the Terabyte track we experimented with an open source IR system, Zettair and performed two types of experiments. First, we compared Zettair’s performance on both a high-powered supercomputer and a distributed system across seven midrange personal computers. Second, we compared Zettair’s performance when a standard TREC title is used, compared with a natural language query, and a query expanded with synonyms. We compare the systems both in terms of efficiency and retrieval performance. Our results indicate that the distributed system is faster than the supercomputer, while slightly decreasing retrieval performance, and that natural language queries also slightly decrease retrieval performance, while our query expansion technique significantly decreased performance
    • …
    corecore