4,152 research outputs found

    Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering

    Get PDF
    The history of data analysis that is addressed here is underpinned by two themes, -- those of tabular data analysis, and the analysis of collected heterogeneous data. "Exploratory data analysis" is taken as the heuristic approach that begins with data and information and seeks underlying explanation for what is observed or measured. I also cover some of the evolving context of research and applications, including scholarly publishing, technology transfer and the economic relationship of the university to society.Comment: 26 page

    Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks

    Get PDF
    Biological plastic neural networks are systems of extraordinary computational capabilities shaped by evolution, development, and lifetime learning. The interplay of these elements leads to the emergence of adaptive behavior and intelligence. Inspired by such intricate natural phenomena, Evolved Plastic Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed plastic neural networks with a large variety of dynamics, architectures, and plasticity rules: these artificial systems are composed of inputs, outputs, and plastic components that change in response to experiences in an environment. These systems may autonomously discover novel adaptive algorithms, and lead to hypotheses on the emergence of biological adaptation. EPANNs have seen considerable progress over the last two decades. Current scientific and technological advances in artificial neural networks are now setting the conditions for radically new approaches and results. In particular, the limitations of hand-designed networks could be overcome by more flexible and innovative solutions. This paper brings together a variety of inspiring ideas that define the field of EPANNs. The main methods and results are reviewed. Finally, new opportunities and developments are presented

    A hybrid computational approach for seismic energy demand prediction

    Get PDF
    In this paper, a hybrid genetic programming (GP) with multiple genes is implemented for developing prediction models of spectral energy demands. A multi-objective strategy is used for maximizing the accuracy and minimizing the complexity of the models. Both structural properties and earthquake characteristics are considered in prediction models of four demand parameters. Here, the earthquake records are classified based on soil type assuming that different soil classes have linear relationships in terms of GP genes. Therefore, linear regression analysis is used to connect genes for different soil types, which results in a total of sixteen prediction models. The accuracy and effectiveness of these models were assessed using different performance metrics and their performance was compared with several other models. The results indicate that not only the proposed models are simple, but also they outperform other spectral energy demand models proposed in the literature

    A typology of different development and testing options for symbolic regression modelling of measured and calculated datasets

    Get PDF
    AbstractData-driven modelling is used to develop two alternative types of predictive environmental model: a simulator, a model of a real-world process developed from either a conceptual understanding of physical relations and/or using measured records, and an emulator, an imitator of some other model developed on predicted outputs calculated by that source model. A simple four-way typology called Emulation Simulation Typology (EST) is proposed that distinguishes between (i) model type and (ii) different uses of model development period and model test period datasets. To address the question of to what extent simulator and emulator solutions might be considered interchangeable i.e. provide similar levels of output accuracy when tested on data different from that used in their development, a pair of counterpart pan evaporation models was created using symbolic regression. Each model type delivered similar levels of predictive skill to that other of published solutions. Input–output sensitivity analysis of the two different model types likewise confirmed two very similar underlying response functions. This study demonstrates that the type and quality of data on which a model is tested, has a greater influence on model accuracy assessment, than the type and quality of data on which a model is developed, providing that the development record is sufficiently representative of the conceptual underpinnings of the system being examined. Thus, previously reported substantial disparities occurring in goodness-of-fit statistics for pan evaporation models are most likely explained by the use of either measured or calculated data to test particular models, where lower scores do not necessarily represent major deficiencies in the solution itself

    Recent Trends in Software Engineering Research As Seen Through Its Publications

    Get PDF
    This study provides some insight into the field of software engineering through analysis of its recent research publications. Data for this study are taken from the ACM\u27s Guide to Computing Literature (GUIDE) They include both the professionally assigned Computing Classification System (CCS) descriptors and the title text of each software engineering publication reviewed by the GUIDE from 1998 through 2001. The first part of this study provides a snapshot of software engineering by applying co-word analysis techniques to the data. This snapshot indicates recent themes or areas of interest, which, when compared with the results from earlier studies, reveal current trends in software engineering. Software engineering continues to have no central focus. Concepts like software development, process improvement, applications, parallelism, and user interfaces are persistent and, thus, help define the field, but they provide little guidance for researchers or developers of academic curricula. Of more interest and use are the specific themes illuminated by this study, which provide a clearer indication of the current interests of the field. Two prominent themes are the related issues of programming-in-the-large and best practices. Programming-in-the-large is the term often applied to large-scale and long-term software development, where project and people management, code reusability, performance measures, documentation, and software maintenance issues take on special importance. These issues began emerging in earlier periods, but seem to have risen to prominence during the current period. Another important discovery is the trend in software development toward using networking and the Internet. Many network- and Internet-related descriptors were added to the CCS in 1998. The prominent appearance and immediate use of these descriptors during this period indicate that this is a real trend and not just an aberration caused by their recent addition. The titles of the period reflect the prominent themes and trends. In addition to corroborating the keyword analysis, the title text confirms the relevance of the CCS and its most recent revision. By revealing current themes and trends in software engineering, this study provides some guidance to the developers of academic curricula and indicates directions for further research and study

    Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation

    Full text link
    We evaluate AI-assisted generative capabilities on fundamental numerical kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV, Jacobi Stencil, and CG. We test the generated kernel codes for a variety of language-supported programming models, including (1) C++ (e.g., OpenMP [including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g., OpenMP [including offload] and OpenACC), (3) Python (e.g., numba, Numba, cuPy, and pyCUDA), and (4) Julia (e.g., Threads, CUDA.jl, AMDGPU.jl, and KernelAbstractions.jl). We use the GitHub Copilot capabilities powered by OpenAI Codex available in Visual Studio Code as of April 2023 to generate a vast amount of implementations given simple + + prompt variants. To quantify and compare the results, we propose a proficiency metric around the initial 10 suggestions given for each prompt. Results suggest that the OpenAI Codex outputs for C++ correlate with the adoption and maturity of programming models. For example, OpenMP and CUDA score really high, whereas HIP is still lacking. We found that prompts from either a targeted language such as Fortran or the more general-purpose Python can benefit from adding code keywords, while Julia prompts perform acceptably well for its mature programming models (e.g., Threads and CUDA.jl). We expect for these benchmarks to provide a point of reference for each programming model's community. Overall, understanding the convergence of large language models, AI, and HPC is crucial due to its rapidly evolving nature and how it is redefining human-computer interactions.Comment: Accepted at the Sixteenth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), 2023 to be held in conjunction with ICPP 2023: The 52nd International Conference on Parallel Processing. 10 pages, 6 figures, 5 table

    Novel statistical approaches to text classification, machine translation and computer-assisted translation

    Full text link
    Esta tesis presenta diversas contribuciones en los campos de la clasificación automática de texto, traducción automática y traducción asistida por ordenador bajo el marco estadístico. En clasificación automática de texto, se propone una nueva aplicación llamada clasificación de texto bilingüe junto con una serie de modelos orientados a capturar dicha información bilingüe. Con tal fin se presentan dos aproximaciones a esta aplicación; la primera de ellas se basa en una asunción naive que contempla la independencia entre las dos lenguas involucradas, mientras que la segunda, más sofisticada, considera la existencia de una correlación entre palabras en diferentes lenguas. La primera aproximación dió lugar al desarrollo de cinco modelos basados en modelos de unigrama y modelos de n-gramas suavizados. Estos modelos fueron evaluados en tres tareas de complejidad creciente, siendo la más compleja de estas tareas analizada desde el punto de vista de un sistema de ayuda a la indexación de documentos. La segunda aproximación se caracteriza por modelos de traducción capaces de capturar correlación entre palabras en diferentes lenguas. En nuestro caso, el modelo de traducción elegido fue el modelo M1 junto con un modelo de unigramas. Este modelo fue evaluado en dos de las tareas más simples superando la aproximación naive, que asume la independencia entre palabras en differentes lenguas procedentes de textos bilingües. En traducción automática, los modelos estadísticos de traducción basados en palabras M1, M2 y HMM son extendidos bajo el marco de la modelización mediante mixturas, con el objetivo de definir modelos de traducción dependientes del contexto. Asimismo se extiende un algoritmo iterativo de búsqueda basado en programación dinámica, originalmente diseñado para el modelo M2, para el caso de mixturas de modelos M2. Este algoritmo de búsqueda nCivera Saiz, J. (2008). Novel statistical approaches to text classification, machine translation and computer-assisted translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2502Palanci
    corecore