Search CORE

667 research outputs found

Platform Independent Real-Time X3D Shaders and their Applications in Bioinformatics Visualization

Author: Liu Feng
Publication venue: ScholarWorks @ Georgia State University
Publication date: 12/01/2007
Field of study

Since the introduction of programmable Graphics Processing Units (GPUs) and procedural shaders, hardware vendors have each developed their own individual real-time shading language standard. None of these shading languages is fully platform independent. Although this real-time programmable shader technology could be developed into 3D application on a single system, this platform dependent limitation keeps the shader technology away from 3D Internet applications. The primary purpose of this dissertation is to design a framework for translating different shader formats to platform independent shaders and embed them into the eXtensible 3D (X3D) scene for 3D web applications. This framework includes a back-end core shader converter, which translates shaders among different shading languages with a middle XML layer. Also included is a shader library containing a basic set of shaders that developers can load and add shaders to. This framework will then be applied to some applications in Biomolecular Visualization

ScholarWorks @ Georgia State University

Evaluation and performance of reading from big data formats

Author: Xavier Lucca Sergi Berquó
Publication venue
Publication date: 01/01/2021
Field of study

The emergence of new application profiles has caused a steep surge in the volume of data generated nowadays. Data heterogeneity is a modern trend, as unstructured types of data, such as videos and images, and semi-structured types, such as JSON and XML files, are becoming increasingly widespread. Consequently, new challenges related to analyzing and extracting important insights from huge bodies of information arise. The field of big data analytics has been developed to address these issues. Performance plays a key role in analytical scenarios, as it empowers applications to generate value in a more efficient and less time-consuming way. In this context, files are used to persist large quantities of information, which can be accessed later by analytic queries. Text files have the advantage of providing an easier interaction with the end user, whereas binary files propose structures that enhance data access. Among them, Apache ORC and Apache Parquet are formats that present characteristics such as column-oriented organization and data compression, which are used to achieve a better performance in queries. The objective of this project is to assess the usage of such files by SAP Vora, a distributed database management system, in order to draw out processing techniques used in big data analytics scenarios, and apply them to improve the performance of queries executed upon CSV files in Vora. Two techniques were employed to achieve such goal: file pruning, which allows Vora’s relational engine to ignore files possessing irrelevant information for the query, and block pruning, which disregards individual file blocks that do not possess data targeted by the query when processing files. Results demonstrate that these modifications enhance the efficiency of analytical workloads executed upon CSV files in Vora, thus narrowing the performance gap of queries executed upon this format and those targeting files tailored for big data scenarios, such as Apache Parquet and Apache ORC. The project was developed during an internship at SAP, in Walldorf, Germany.A emergência de novos perfis de aplicação ocasionou um aumento abrupto no volume de dados gerado na atualidade. A heterogeneidade de tipos de dados é uma nova tendência: encontram-se tipos não-estruturados, como vídeos e imagens, e semi-estruturados, tais quais arquivos JSON e XML. Consequentemente, novos desafios relacionados à extração de valores importantes de corpos de dados surgiram. Para este propósito, criou-se o ramo de big data analytics. Nele, a performance é um fator primordial pois garante análises rápidas e uma geração de valores eficiente. Neste contexto, arquivos são utilizados para persistir grandes quantidades de informações, que podem ser utilizadas posteriormente em consultas analíticas. Arquivos de texto têm a vantagem de proporcionar uma fácil interação com o usuário final, ao passo que arquivos binários propõem estruturas que melhoram o acesso aos dados. Dentre estes, o Apache ORC e o Apache Parquet são formatos que apresentam uma organização orientada a colunas e compressão de dados, o que permite aumentar o desempenho de acesso. O objetivo deste projeto é avaliar o uso desses arquivos na plataforma SAP Vora, um sistema de gestão de base de dados distribuído, com o intuito de otimizar a performance de consultas sobre arquivos CSV, de tipo texto, em cenários de big data analytics. Duas técnicas foram empregadas para este fim: file pruning, a qual permite que arquivos possuindo informações desnecessárias para consulta sejam ignorados, e block pruning, que permite eliminar blocos individuais do arquivo que não fornecerão dados relevantes para consultas. Os resultados indicam que essas modificações melhoram o desempenho de cargas de trabalho analíticas sobre o formato CSV na plataforma Vora, diminuindo a discrepância de performance entre consultas sobre esses arquivos e aquelas feitas sobre outros formatos especializados para cenários de big data, como o Apache Parquet e o Apache ORC. Este projeto foi desenvolvido durante um estágio realizado na SAP em Walldorf, na Alemanha

Lume 5.8

The Family of MapReduce and Large Scale Data Processing Systems

Author: Anna Liu
Ayman G. Fayoumi
King Abdulaziz
See Profile
Sherif Sakr
Sherif Sakr
South Wales
South Wales
Publication venue
Publication date: 12/02/2013
Field of study

In the last two decades, the continuous increase of computational power has produced an overwhelming flow of data which has called for a paradigm shift in the computing architecture and large scale data processing mechanisms. MapReduce is a simple and powerful programming model that enables easy development of scalable parallel applications to process vast amounts of data on large clusters of commodity machines. It isolates the application from the details of running a distributed program such as issues on data distribution, scheduling and fault tolerance. However, the original implementation of the MapReduce framework had some limitations that have been tackled by many research efforts in several followup works after its introduction. This article provides a comprehensive survey for a family of approaches and mechanisms of large scale data processing mechanisms that have been implemented based on the original idea of the MapReduce framework and are currently gaining a lot of momentum in both research and industrial communities. We also cover a set of introduced systems that have been implemented to provide declarative programming interfaces on top of the MapReduce framework. In addition, we review several large scale data processing systems that resemble some of the ideas of the MapReduce framework for different purposes and application scenarios. Finally, we discuss some of the future research directions for implementing the next generation of MapReduce-like solutions.Comment: arXiv admin note: text overlap with arXiv:1105.4252 by other author

arXiv.org e-Print Archive

CiteSeerX

Just-in-time Analytics Over Heterogeneous Data and Hardware

Author: Karpathiotakis Manolis
Publication venue: Lausanne, EPFL
Publication date: 28/11/2017
Field of study

Industry and academia are continuously becoming more data-driven and data-intensive, relying on the analysis of a wide variety of datasets to gain insights. At the same time, data variety increases continuously across multiple axes. First, data comes in multiple formats, such as the binary tabular data of a DBMS, raw textual files, and domain-specific formats. Second, different datasets follow different data models, such as the relational and the hierarchical one. Data location also varies: Some datasets reside in a central "data lake", whereas others lie in remote data sources. In addition, users execute widely different analysis tasks over all these data types. Finally, the process of gathering and integrating diverse datasets introduces several inconsistencies and redundancies in the data, such as duplicate entries for the same real-world concept. In summary, heterogeneity significantly affects the way data analysis is performed. In this thesis, we aim for data virtualization: Abstracting data out of its original form and manipulating it regardless of the way it is stored or structured, without a performance penalty. To achieve data virtualization, we design and implement systems that i) mask heterogeneity through the use of heterogeneity-aware, high-level building blocks and ii) offer fast responses through on-demand adaptation techniques. Regarding the high-level building blocks, we use a query language and algebra to handle multiple collection types, such as relations and hierarchies, express transformations between these collection types, as well as express complex data cleaning tasks over them. In addition, we design a location-aware compiler and optimizer that masks away the complexity of accessing multiple remote data sources. Regarding on-demand adaptation, we present a design to produce a new system per query. The design uses customization mechanisms that trigger runtime code generation to mimic the system most appropriate to answer a query fast: Query operators are thus created based on the query workload and the underlying data models; the data access layer is created based on the underlying data formats. In addition, we exploit emerging hardware by customizing the system implementation based on the available heterogeneous processors â CPUs and GPGPUs. We thus pair each workload with its ideal processor type. The end result is a just-in-time database system that is specific to the query, data, workload, and hardware instance. This thesis redesigns the data management stack to natively cater for data heterogeneity and exploit hardware heterogeneity. Instead of centralizing all relevant datasets, converting them to a single representation, and loading them in a monolithic, static, suboptimal system, our design embraces heterogeneity. Overall, our design decouples the type of performed analysis from the original data layout; users can perform their analysis across data stores, data models, and data formats, but at the same time experience the performance offered by a custom system that has been built on demand to serve their specific use case

Infoscience - École polytechnique fédérale de Lausanne

Visual and verbal processing in reasoning

Author: Brooks Philip Graham
Publication venue: 'University of Plymouth'
Publication date: 01/01/1984
Field of study

This programme of research, involving seven experiments, investigates Evans' (1980a; 1980b) revised version of the Dual Process theory of reasoning (Wason and Evans, 1975). A Type 2 process is characterised as verbal-rational and a Type I process as non-verbal and non-logical. Evans links the processes to two statistical components of observed reasoning performance. The Type I process reflects non-logical response biases and the Type 2 process reflects attention to the logical nature of the task. Six experiments employ a concurrent articulation (with or without a short-term memory load) methodology devised by Baddeley and Hitch (1974) for investigating their Working Memory model. Four experiments apply this technique to conditional reasoning tasks in an attempt to disrupt the verbal Type 2 process. Some weak evidence for the revised Dual Process theory is found. There is a tendency, marked in only one experiment, for concurrent articulation to inhibit logical performance, whilst having little effect on response biases. Unexpectedly, articulation conditions (without memory load) are characterised by faster responding than silent conditions. The results are inconsistent with Hitch and Baddeley's (1976) data and several features of their Working Memory model. Two further experiments repeat and extend their work. A number of important theoretical implications are discussed in the light of recent revisions to their theory (eg. Baddeley, 1983). A possible connection is drawn between Type I and Type 2 processes and dual memory codes (Paivio, 1971; 1983) and thought systems (Paivio, 1975) of a verbal and visual nature. The hypothesis that Type I processes may be associated with visual mechanisms is tested by introducing a factor into three experiments to induce use of a visual code. This does not affect the Type 1 process but facilitates lo3ical performance. These results are discussed in relation to the revised Dual Process theory. An explanation in terms of a recent tricoding model for processing of pictures and words (Snodgrass, 1980; 1984) is suggested.Science and Engineering Research Counci

Plymouth Electronic Archive and Research Library

Proceedings of the 8th Python in Science conference

Author: Millman Jarrod
Van Der Walt Stefan
Varoquaux Gaël
Publication venue: HAL CCSD
Publication date: 18/08/2009
Field of study

International audienceThe SciPy conference provides a unique opportunity to learn and affect what is happening in the realm of scientific computing with Python. Attendees have the opportunity to review the available tools and how they apply to specific problems. By providing a forum for developers to share their Python expertise with the wider commercial, academic, and research communities, this conference fosters collaboration and facilitates the sharing of software components, techniques and a vision for high level language use in scientific computing

INRIA a CCSD electronic archive server

HAL-CEA

A Co-Processor Approach for Efficient Java Execution in Embedded Systems

Author: Säntti Tero
Publication venue: Turku Centre for Computer Science
Publication date: 10/11/2008
Field of study

This thesis deals with a hardware accelerated Java virtual machine, named REALJava. The REALJava virtual machine is targeted for resource constrained embedded systems. The goal is to attain increased computational performance with reduced power consumption. While these objectives are often seen as trade-offs, in this context both of them can be attained simultaneously by using dedicated hardware. The target level of the computational performance of the REALJava virtual machine is initially set to be as fast as the currently available full custom ASIC Java processors. As a secondary goal all of the components of the virtual machine are designed so that the resulting system can be scaled to support multiple co-processor cores. The virtual machine is designed using the hardware/software co-design paradigm. The partitioning between the two domains is flexible, allowing customizations to the resulting system, for instance the floating point support can be omitted from the hardware in order to decrease the size of the co-processor core. The communication between the hardware and the software domains is encapsulated into modules. This allows the REALJava virtual machine to be easily integrated into any system, simply by redesigning the communication modules. Besides the virtual machine and the related co-processor architecture, several performance enhancing techniques are presented. These include techniques related to instruction folding, stack handling, method invocation, constant loading and control in time domain. The REALJava virtual machine is prototyped using three different FPGA platforms. The original pipeline structure is modified to suit the FPGA environment. The performance of the resulting Java virtual machine is evaluated against existing Java solutions in the embedded systems field. The results show that the goals are attained, both in terms of computational performance and power consumption. Especially the computational performance is evaluated thoroughly, and the results show that the REALJava is more than twice as fast as the fastest full custom ASIC Java processor. In addition to standard Java virtual machine benchmarks, several new Java applications are designed to both verify the results and broaden the spectrum of the tests.Siirretty Doriast

UTUPub

A Framework for Industry 4.0

Author: Silva José Bruno Martins da
Publication venue
Publication date: 01/01/2017
Field of study

The potential of the Industry 4.0 will allow the national industry to develop all kinds of procedures, especially in terms of competitive differentiation. The prospects and motivations behind Industry 4.0 are related to the management that is essentially geared towards industrial internet, to the integrated analysis and use of data, to the digitalization of products and services, to new disruptive business models and to the cooperation within the value chain. It is through the integration of Cyber-Physical Systems (CPS), into the maintenance process that it is possible to carry out a continuous monitoring of industrial machines, as well as to apply advanced techniques for predictive and proactive maintenance. The present work is based on the MANTIS project, aiming to construct a specific platform for the proactive maintenance of industrial machines, targeting particularly the case of GreenBender ADIRA Steel Sheet. In other words, the aim is to reduce maintenance costs, increase the efficiency of the process and consequently the profit. Essentially, the MANTIS project is a multinational research project, where the CISTER Research Unit plays a key role, particularly in providing the communications infrastructure for one MANTIS Pilot. The methodology is based on a follow-up study, which is jointly carried with the client, as well as within the scope of the implementation of the ADIRA Pilot. The macro phases that are followed in the present work are: 1) detailed analysis of the business needs; 2) preparation of the architecture specification; 3) implementation/development; 4) tests and validation; 5) support; 6) stabilization; 7) corrective and evolutionary maintenance; and 8) final project analysis and corrective measures to be applied in future projects. The expected results of the development of such project are related to the integration of the industrial maintenance process, to the continuous monitoring of the machines and to the application of advanced techniques of preventive and proactive maintenance of industrial machines, particularly based on techniques and good practices of the Software Engineering area and on the integration of Cyber-Physical Systems.O potencial desenvolvido pela Indústria 4.0 dotará a indústria nacional de capacidades para desenvolver todo o tipo de procedimentos, especialmente a nível da diferenciação competitiva. As perspetivas e as motivações por detrás da Indústria 4.0 estão relacionadas com uma gestão essencialmente direcionada para a internet industrial, com uma análise integrada e utilização de dados, com a digitalização de produtos e de serviços, com novos modelos disruptivos de negócio e com uma cooperação horizontal no âmbito da cadeia de valor. É através da integração dos sistemas ciber-físicos no processo de manutenção que é possível proceder a um monitoramento contínuo das máquinas, tal como à aplicação de técnicas avançadas para a manutenção preditiva e pró-ativa das mesmas. O presente trabalho é baseado no projeto MANTIS, objetivando, portanto, a construção de uma plataforma específica para a manutenção pró-ativa das máquinas industriais, neste caso em concreto das prensas, que serão as máquinas industriais analisadas ao longo do presente trabalho. Dito de um outro modo, objetiva-se, através de uma plataforma em específico, reduzir todos os custos da sua manutenção, aumentando, portanto, os lucros industriais advindos da produção. Resumidamente, o projeto MANTIS consiste num projeto de investigação multinacional, onde a Unidade de Investigação CISTER desenvolve um papel fundamental, particularmente no fornecimento da infraestrutura de comunicação no Piloto MANTIS. A metodologia adotada é baseada num estudo de acompanhamento, realizado em conjunto com o cliente, e no âmbito da implementação do Piloto da ADIRA. As macro fases que são compreendidas por esta metodologia, e as quais serão seguidas, são: 1) análise detalhada das necessidades de negócio; 2) preparação da especificação da arquitetura; 3) implementação/desenvolvimento; 4) testes e validação; 5) suporte; 6) estabilização; 7) manutenção corretiva e evolutiva; e 8) análise final do projeto e medidas corretivas a aplicar em projetos futuros. Os resultados esperados com o desenvolvimento do projeto estão relacionados com a integração do processo de manutenção industrial, a monitorização contínua das máquinas e a aplicação de técnicas avançadas de manutenção preventiva e pós-ativa das máquinas, especialmente com base em técnicas e boas práticas da área de Engenharia de Software

Repositório Científico do Instituto Politécnico do Porto

Computational Design of Novel Non-Ribosomal Peptides

Author: Farag Sherif
Publication venue: University of North Carolina at Chapel Hill Graduate School
Publication date: 01/01/2019
Field of study

Non-ribosomal peptide synthetases (NRPSs) are modular enzymatic machines that catalyze the ribosome-independent production of structurally complex small peptides, many of which have important clinical applications as antibiotics, antifungals, and anti-cancer agents. Several groups have tried to expand natural product diversity by intermixing different NRPS modules to create synthetic peptides. This approach has not been as successful as anticipated, suggesting that these modules are not fully interchangeable. Here, we explored whether inter-modular linkers (IMLs) impact the ability of NRPS modules to communicate during the synthesis of NRPs. We developed a parser to extract 39,804 IMLs from both well annotated and putative NRPS biosynthetic gene clusters from 39,232 bacterial genomes and established the first IMLs database. We analyzed these IMLs and identified a striking relationship between IMLs and the amino acid substrates of their adjacent modules. More than 92% of the identified IMLs connect modules that activate a particular pair of substrates, suggesting that significant specificity is embedded within these sequences. We therefore propose that incorporating the correct IML is critical when attempting combinatorial biosynthesis of novel NRPS. In addition to the IMLs database and IML-Parser we have developed the NRP Discovery Pipeline, which is a set of bioinformatics and cheminformatics tools that will help facilitating early discovery of novel NRPs. Our pipeline comprises of five modules: (1) NRP comprehensive combinatorial biosynthesis: A tool that helps generating virtual libraries of NRPs. (2) NRP sequence-based predictor: A classifier based only on peptide sequences to help triaging peptides with no antibacterial activity. (3) Pep2struc: A tool that helps converting peptide sequences to their 2D structures form both linear and constrained peptides. (4) NRP structure-based predictor: A second classifier based on peptide structures to filter out inactive predicted peptides. (5) NRPS Designer: A tool that helps reprogramming of the bacterial genome by editing its NRP BGC to synthesize the peptide of interest. The IMLs database as well as the NRPS-Parser have been made available on the web at https://nrps-linker.unc.edu. The entire source code of the projects discussed in this dissertation is hosted in GitHub repository (https://github.com/SWFarag).Doctor of Philosoph

Carolina Digital Repository

Advances in Human-Robot Interaction

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

Rapid advances in the field of robotics have made it possible to use robots not just in industrial automation but also in entertainment, rehabilitation, and home service. Since robots will likely affect many aspects of human existence, fundamental questions of human-robot interaction must be formulated and, if at all possible, resolved. Some of these questions are addressed in this collection of papers by leading HRI researchers

Directory of Open Access Books (DOAB)