4,152 research outputs found
Origins of Modern Data Analysis Linked to the Beginnings and Early Development of Computer Science and Information Engineering
The history of data analysis that is addressed here is underpinned by two
themes, -- those of tabular data analysis, and the analysis of collected
heterogeneous data. "Exploratory data analysis" is taken as the heuristic
approach that begins with data and information and seeks underlying explanation
for what is observed or measured. I also cover some of the evolving context of
research and applications, including scholarly publishing, technology transfer
and the economic relationship of the university to society.Comment: 26 page
Born to learn: The inspiration, progress, and future of evolved plastic artificial neural networks
Biological plastic neural networks are systems of extraordinary computational
capabilities shaped by evolution, development, and lifetime learning. The
interplay of these elements leads to the emergence of adaptive behavior and
intelligence. Inspired by such intricate natural phenomena, Evolved Plastic
Artificial Neural Networks (EPANNs) use simulated evolution in-silico to breed
plastic neural networks with a large variety of dynamics, architectures, and
plasticity rules: these artificial systems are composed of inputs, outputs, and
plastic components that change in response to experiences in an environment.
These systems may autonomously discover novel adaptive algorithms, and lead to
hypotheses on the emergence of biological adaptation. EPANNs have seen
considerable progress over the last two decades. Current scientific and
technological advances in artificial neural networks are now setting the
conditions for radically new approaches and results. In particular, the
limitations of hand-designed networks could be overcome by more flexible and
innovative solutions. This paper brings together a variety of inspiring ideas
that define the field of EPANNs. The main methods and results are reviewed.
Finally, new opportunities and developments are presented
A hybrid computational approach for seismic energy demand prediction
In this paper, a hybrid genetic programming (GP) with multiple genes is implemented for developing prediction models of spectral energy demands. A multi-objective strategy is used for maximizing the accuracy and minimizing the complexity of the models. Both structural properties and earthquake characteristics are considered in prediction models of four demand parameters. Here, the earthquake records are classified based on soil type assuming that different soil classes have linear relationships in terms of GP genes. Therefore, linear regression analysis is used to connect genes for different soil types, which results in a total of sixteen prediction models. The accuracy and effectiveness of these models were assessed using different performance metrics and their performance was compared with several other models. The results indicate that not only the proposed models are simple, but also they outperform other spectral energy demand models proposed in the literature
A typology of different development and testing options for symbolic regression modelling of measured and calculated datasets
AbstractData-driven modelling is used to develop two alternative types of predictive environmental model: a simulator, a model of a real-world process developed from either a conceptual understanding of physical relations and/or using measured records, and an emulator, an imitator of some other model developed on predicted outputs calculated by that source model. A simple four-way typology called Emulation Simulation Typology (EST) is proposed that distinguishes between (i) model type and (ii) different uses of model development period and model test period datasets. To address the question of to what extent simulator and emulator solutions might be considered interchangeable i.e. provide similar levels of output accuracy when tested on data different from that used in their development, a pair of counterpart pan evaporation models was created using symbolic regression. Each model type delivered similar levels of predictive skill to that other of published solutions. Input–output sensitivity analysis of the two different model types likewise confirmed two very similar underlying response functions. This study demonstrates that the type and quality of data on which a model is tested, has a greater influence on model accuracy assessment, than the type and quality of data on which a model is developed, providing that the development record is sufficiently representative of the conceptual underpinnings of the system being examined. Thus, previously reported substantial disparities occurring in goodness-of-fit statistics for pan evaporation models are most likely explained by the use of either measured or calculated data to test particular models, where lower scores do not necessarily represent major deficiencies in the solution itself
Recent Trends in Software Engineering Research As Seen Through Its Publications
This study provides some insight into the field of software engineering through analysis of its recent research publications. Data for this study are taken from the ACM\u27s Guide to Computing Literature (GUIDE) They include both the professionally assigned Computing Classification System (CCS) descriptors and the title text of each software engineering publication reviewed by the GUIDE from 1998 through 2001.
The first part of this study provides a snapshot of software engineering by applying co-word analysis techniques to the data. This snapshot indicates recent themes or areas of interest, which, when compared with the results from earlier studies, reveal current trends in software engineering.
Software engineering continues to have no central focus. Concepts like software development, process improvement, applications, parallelism, and user interfaces are persistent and, thus, help define the field, but they provide little guidance for researchers or developers of academic curricula.
Of more interest and use are the specific themes illuminated by this study, which provide a clearer indication of the current interests of the field. Two prominent themes are the related issues of programming-in-the-large and best practices.
Programming-in-the-large is the term often applied to large-scale and long-term software development, where project and people management, code reusability, performance measures, documentation, and software maintenance issues take on special importance. These issues began emerging in earlier periods, but seem to have risen to prominence during the current period.
Another important discovery is the trend in software development toward using networking and the Internet. Many network- and Internet-related descriptors were added to the CCS in 1998. The prominent appearance and immediate use of these descriptors during this period indicate that this is a real trend and not just an aberration caused by their recent addition.
The titles of the period reflect the prominent themes and trends. In addition to corroborating the keyword analysis, the title text confirms the relevance of the CCS and its most recent revision.
By revealing current themes and trends in software engineering, this study provides some guidance to the developers of academic curricula and indicates directions for further research and study
Evaluation of OpenAI Codex for HPC Parallel Programming Models Kernel Generation
We evaluate AI-assisted generative capabilities on fundamental numerical
kernels in high-performance computing (HPC), including AXPY, GEMV, GEMM, SpMV,
Jacobi Stencil, and CG. We test the generated kernel codes for a variety of
language-supported programming models, including (1) C++ (e.g., OpenMP
[including offload], OpenACC, Kokkos, SyCL, CUDA, and HIP), (2) Fortran (e.g.,
OpenMP [including offload] and OpenACC), (3) Python (e.g., numba, Numba, cuPy,
and pyCUDA), and (4) Julia (e.g., Threads, CUDA.jl, AMDGPU.jl, and
KernelAbstractions.jl). We use the GitHub Copilot capabilities powered by
OpenAI Codex available in Visual Studio Code as of April 2023 to generate a
vast amount of implementations given simple + +
prompt variants. To quantify and compare the results, we
propose a proficiency metric around the initial 10 suggestions given for each
prompt. Results suggest that the OpenAI Codex outputs for C++ correlate with
the adoption and maturity of programming models. For example, OpenMP and CUDA
score really high, whereas HIP is still lacking. We found that prompts from
either a targeted language such as Fortran or the more general-purpose Python
can benefit from adding code keywords, while Julia prompts perform acceptably
well for its mature programming models (e.g., Threads and CUDA.jl). We expect
for these benchmarks to provide a point of reference for each programming
model's community. Overall, understanding the convergence of large language
models, AI, and HPC is crucial due to its rapidly evolving nature and how it is
redefining human-computer interactions.Comment: Accepted at the Sixteenth International Workshop on Parallel
Programming Models and Systems Software for High-End Computing (P2S2), 2023
to be held in conjunction with ICPP 2023: The 52nd International Conference
on Parallel Processing. 10 pages, 6 figures, 5 table
Novel statistical approaches to text classification, machine translation and computer-assisted translation
Esta tesis presenta diversas contribuciones en los campos de la
clasificación automática de texto, traducción automática y traducción
asistida por ordenador bajo el marco estadístico.
En clasificación automática de texto, se propone una nueva aplicación
llamada clasificación de texto bilingüe junto con una serie de modelos
orientados a capturar dicha información bilingüe. Con tal fin se
presentan dos aproximaciones a esta aplicación; la primera de ellas se
basa en una asunción naive que contempla la independencia entre las
dos lenguas involucradas, mientras que la segunda, más sofisticada,
considera la existencia de una correlación entre palabras en
diferentes lenguas. La primera aproximación dió lugar al desarrollo de
cinco modelos basados en modelos de unigrama y modelos de n-gramas
suavizados. Estos modelos fueron evaluados en tres tareas de
complejidad creciente, siendo la más compleja de estas tareas
analizada desde el punto de vista de un sistema de ayuda a la
indexación de documentos. La segunda aproximación se caracteriza por
modelos de traducción capaces de capturar correlación entre palabras
en diferentes lenguas. En nuestro caso, el modelo de traducción
elegido fue el modelo M1 junto con un modelo de unigramas. Este
modelo fue evaluado en dos de las tareas más simples superando la
aproximación naive, que asume la independencia entre palabras en
differentes lenguas procedentes de textos bilingües.
En traducción automática, los modelos estadísticos de traducción
basados en palabras M1, M2 y HMM son extendidos bajo el marco de la
modelización mediante mixturas, con el objetivo de definir modelos de
traducción dependientes del contexto. Asimismo se extiende un
algoritmo iterativo de búsqueda basado en programación dinámica,
originalmente diseñado para el modelo M2, para el caso de mixturas de
modelos M2. Este algoritmo de búsqueda nCivera Saiz, J. (2008). Novel statistical approaches to text classification, machine translation and computer-assisted translation [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/2502Palanci
- …