Search CORE

1,545 research outputs found

Towards Optimized K Means Clustering using Nature-inspired Algorithms for Software Bug Prediction

Author: Geerish Suddul
Kumar Dookhitram
Tameswar Kajal
Publication venue: Global Journals Inc. (US)
Publication date: 20/05/2023
Field of study

In today s software development environment the necessity for providing quality software products has undoubtedly remained the largest difficulty As a result early software bug prediction in the development phase is critical for lowering maintenance costs and improving overall software performance Clustering is a well-known unsupervised method for data classification and finding related patterns hidden in dataset

Global Journal of Computer Science and Technology (GJCST)

Applications of Trajectory Data From the Perspective of a Road Transportation Agency: Literature Review and Maryland Case Study

Author: Andrienko G.
Andrienko N.
Markovic N.
Sekula P.
Vander Laan Z.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/05/2018
Field of study

Transportation agencies have an opportunity to leverage increasingly-available trajectory datasets to improve their analyses and decision-making processes. However, this data is typically purchased from vendors, which means agencies must understand its potential benefits beforehand in order to properly assess its value relative to the cost of acquisition. While the literature concerned with trajectory data is rich, it is naturally fragmented and focused on technical contributions in niche areas, which makes it difficult for government agencies to assess its value across different transportation domains. To overcome this issue, the current paper explores trajectory data from the perspective of a road transportation agency interested in acquiring trajectories to enhance its analyses. The paper provides a literature review illustrating applications of trajectory data in six areas of road transportation systems analysis: demand estimation, modeling human behavior, designing public transit, traffic performance measurement and prediction, environment and safety. In addition, it visually explores 20 million GPS traces in Maryland, illustrating existing and suggesting new applications of trajectory data

arXiv.org e-Print Archive

City Research Online

Crossref

Fraunhofer-ePrints

The Challenge of Machine Learning in Space Weather Nowcasting and Forecasting

Author: Andrejková G.
Ashmall J.
Bergstra J.
E. Camporeale
Fasshauer G. E.
Gelman A.
Goodfellow I.
Murphy K. P.
Parnowski A.
Pedregosa F.
Pesnell W. D.
Russell S. J.
Semeniv O.
Stepanova M.
Stringer G.
Sutton R. S.
Turner D.
Valach F.
Vapnik V.
Vega‐Jorquera P.
Publication venue: 'American Geophysical Union (AGU)'
Publication date: 03/04/2019
Field of study

The numerous recent breakthroughs in machine learning (ML) make imperative to carefully ponder how the scientific community can benefit from a technology that, although not necessarily new, is today living its golden age. This Grand Challenge review paper is focused on the present and future role of machine learning in space weather. The purpose is twofold. On one hand, we will discuss previous works that use ML for space weather forecasting, focusing in particular on the few areas that have seen most activity: the forecasting of geomagnetic indices, of relativistic electrons at geosynchronous orbits, of solar flares occurrence, of coronal mass ejection propagation time, and of solar wind speed. On the other hand, this paper serves as a gentle introduction to the field of machine learning tailored to the space weather community and as a pointer to a number of open challenges that we believe the community should undertake in the next decade. The recurring themes throughout the review are the need to shift our forecasting paradigm to a probabilistic approach focused on the reliable assessment of uncertainties, and the combination of physics-based and machine learning approaches, known as gray-box.Comment: under revie

arXiv.org e-Print Archive

Crossref

CWI's Institutional Repository

Bayesian nonparametric models for data exploration

Author: Fernández Pradier Mélanie
Publication venue
Publication date: 01/01/2017
Field of study

Mención Internacional en el título de doctorMaking sense out of data is one of the biggest challenges of our time. With the emergence of technologies such as the Internet, sensor networks or deep genome sequencing, a true data explosion has been unleashed that affects all fields of science and our everyday life. Recent breakthroughs, such as self-driven cars or champion-level Go player programs, have demonstrated the potential benefits from exploiting data, mostly in well-defined supervised tasks. However, we have barely started to actually explore and truly understand data. In fact, data holds valuable information for answering most important questions for humanity: How does aging impact our physical capabilities? What are the underlying mechanisms of cancer? Which factors make countries wealthier than others? Most of these questions cannot be stated as well-defined supervised problems, and might benefit enormously from multidisciplinary research efforts involving easy-to-interpret models and rigorous data exploratory analyses. Efficient data exploration might lead to life-changing scientific discoveries, which can later be turned into a more impactful exploitation phase, to put forward more informed policy recommendations, decision-making systems, medical protocols or improved models for highly accurate predictions. This thesis proposes tailored Bayesian nonparametric (BNP) models to solve specific data exploratory tasks across different scientific areas including sport sciences, cancer research, and economics. We resort to BNP approaches to facilitate the discovery of unexpected hidden patterns within data. BNP models place a prior distribution over an infinite-dimensional parameter space, which makes them particularly useful in probabilistic models where the number of hidden parameters is unknown a priori. Under this prior distribution, the posterior distribution of the hidden parameters given the data will assign high probability mass to those configurations that best explain the observations. Hence, inference over the hidden variables can be performed using standard Bayesian inference techniques, therefore avoiding expensive model selection steps. This thesis is application-focused and highly multidisciplinary. More precisely, we propose an automatic grading system for sportive competitions to compare athletic performance regardless of age, gender and environmental aspects; we develop BNP models to perform genetic association and biomarker discovery in cancer research, either using genetic information and Electronic Health Records or clinical trial data; finally, we present a flexible infinite latent factor model of international trade data to understand the underlying economic structure of countries and their evolution over time.Uno de los principales desafíos de nuestro tiempo es encontrar sentido dentro de los datos. Con la aparición de tecnologías como Internet, redes de sensores, o métodos de secuenciación profunda del genoma, una verdadera explosión digital se ha visto desencadenada, afectando todos los campos científicos, así como nuestra vida diaria. Logros recientes como pueden ser los coches auto-dirigidos o programas que ganan a los seres humanos al milenario juego del Go, han demostrado con creces los posibles beneficios que podemos obtener de la explotación de datos, mayoritariamente en tareas supervisadas bien definidas. No obstante, apenas hemos empezado con la exploración de datos y su verdadero entendimiento. En verdad, los datos encierran información muy valiosa para responder a muchas de las preguntas más importantes para la humanidad: ¿Cómo afecta el envejecimiento a nuestras aptitudes físicas? ¿Cuáles son los mecanismos subyacentes del cáncer? ¿Qué factores explican la riqueza de ciertos países frente a otros? Si bien la mayoría de estas preguntas no pueden formularse como problemas supervisados bien definidos, éstas pueden ser abordadas mediante esfuerzos de investigación multidisciplinar que involucren modelos fáciles de interpretar y análisis exploratorios rigurosos. Explorar los datos de manera eficiente abre potencialmente la puerta a un sinnúmero de descubrimientos científicos en diversas áreas con impacto real en nuestras vidas, descubrimientos que a su vez pueden llevarnos a una mejor explotación de los datos, resultando en recomendaciones políticas adecuadas, sistemas precisos de toma de decisión, protocolos médicos optimizados o modelos con mejores capacidades predictivas. Esta tesis propone modelos Bayesianos no-paramétricos (BNP) adecuados para la resolución específica de tareas explorativas de los datos en diversos ámbitos científicos incluyendo ciencias del deporte, investigación contra el cáncer, o economía. Recurrimos a un planteamiento BNP para facilitar el descubrimiento de patrones ocultos inesperados subyacentes en los datos. Los modelos BNP definen una distribución a priori sobre un espacio de parámetros de dimensión infinita, lo cual los hace especialmente atractivos para enfoques probabilísticos donde el número de parámetros latentes es en principio desconocido. Bajo dicha distribución a priori, la distribución a posteriori de los parámetros ocultos dados los datos asignará mayor probabilidad a aquellas configuraciones que mejor explican las observaciones. De esta manera, la inferencia sobre el espacio de variables ocultas puede realizarse mediante técnicas estándar de inferencia Bayesiana, evitando el proceso de selección de modelos. Esta tesis se centra en el ámbito de las aplicaciones, y es de naturaleza multidisciplinar. En concreto, proponemos un sistema de gradación automática para comparar el rendimiento deportivo de atletas independientemente de su edad o género, así como de otros factores del entorno. Desarrollamos modelos BNP para descubrir asociaciones genéticas y biomarcadores dentro de la investigación contra el cáncer, ya sea contrastando información genética con la historia clínica electrónica de los pacientes, o utilizando datos de ensayos clínicos; finalmente, presentamos un modelo flexible de factores latentes infinito para datos de comercio internacional, con el objetivo de entender la estructura económica de los distintos países y su correspondiente evolución a lo largo del tiempo.Programa Oficial de Doctorado en Multimedia y ComunicacionesPresidente: Joaquín Míguez Arenas.- Secretario: Daniel Hernández Lobato.- Vocal: Cédric Archambea

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Universidad Carlos III de Madrid e-Archivo

The assessment and development of methods in (spatial) sound ecology

Author: Heath Rebecca
Publication venue: Dyson School of Design Engineering, Imperial College London
Publication date: 01/03/2023
Field of study

As vital ecosystems across the globe enter unchartered pressure from climate change industrial land use, understanding the processes driving ecosystem viability has never been more critical. Nuanced ecosystem understanding comes from well-collected field data and a wealth of associated interpretations. In recent years the most popular methods of ecosystem monitoring have revolutionised from often damaging and labour-intensive manual data collection to automated methods of data collection and analysis. Sound ecology describes the school of research that uses information transmitted through sound to infer properties about an area's species, biodiversity, and health. In this thesis, we explore and develop state-of-the-art automated monitoring with sound, specifically relating to data storage practice and spatial acoustic recording and data analysis. In the first chapter, we explore the necessity and methods of ecosystem monitoring, focusing on acoustic monitoring, later exploring how and why sound is recorded and the current state-of-the-art in acoustic monitoring. Chapter one concludes with us setting out the aims and overall content of the following chapters. We begin the second chapter by exploring methods used to mitigate data storage expense, a widespread issue as automated methods quickly amass vast amounts of data which can be expensive and impractical to manage. Importantly I explain how these data management practices are often used without known consequence, something I then address. Specifically, I present evidence that the most used data reduction methods (namely compression and temporal subsetting) have a surprisingly small impact on the information content of recorded sound compared to the method of analysis. This work also adds to the increasing evidence that deep learning-based methods of environmental sound quantification are more powerful and robust to experimental variation than more traditional acoustic indices. In the latter chapters, I focus on using multichannel acoustic recording for sound-source localisation. Knowing where a sound originated has a range of ecological uses, including counting individuals, locating threats, and monitoring habitat use. While an exciting application of acoustic technology, spatial acoustics has had minimal uptake owing to the expense, impracticality and inaccessibility of equipment. In my third chapter, I introduce MAARU (Multichannel Acoustic Autonomous Recording Unit), a low-cost, easy-to-use and accessible solution to this problem. I explain the software and hardware necessary for spatial recording and show how MAARU can be used to localise the direction of a sound to within ±10˚ accurately. In the fourth chapter, I explore how MAARU devices deployed in the field can be used for enhanced ecosystem monitoring by spatially clustering individuals by calling directions for more accurate abundance approximations and crude species-specific habitat usage monitoring. Most literature on spatial acoustics cites the need for many accurately synced recording devices over an area. This chapter provides the first evidence of advances made with just one recorder. Finally, I conclude this thesis by restating my aims and discussing my success in achieving them. Specifically, in the thesis’ conclusion, I reiterate the contributions made to the field as a direct result of this work and outline some possible development avenues.Open Acces

Spiral - Imperial College Digital Repository

AN INTELLIGENT NAVIGATION SYSTEM FOR AN AUTONOMOUS UNDERWATER VEHICLE

Author: Loebis Dedy
Publication venue: 'University of Plymouth'
Publication date: 01/01/2004
Field of study

The work in this thesis concerns with the development of a novel multisensor data fusion (MSDF) technique, which combines synergistically Kalman filtering, fuzzy logic and genetic algorithm approaches, aimed to enhance the accuracy of an autonomous underwater vehicle (AUV) navigation system, formed by an integration of global positioning system and inertial navigation system (GPS/INS). The Kalman filter has been a popular method for integrating the data produced by the GPS and INS to provide optimal estimates of AUVs position and attitude. In this thesis, a sequential use of a linear Kalman filter and extended Kalman filter is proposed. The former is used to fuse the data from a variety of INS sensors whose output is used as an input to the later where integration with GPS data takes place. The use of an adaptation scheme based on fuzzy logic approaches to cope with the divergence problem caused by the insufficiently known a priori filter statistics is also explored. The choice of fuzzy membership functions for the adaptation scheme is first carried out using a heuristic approach. Single objective and multiobjective genetic algorithm techniques are then used to optimize the parameters of the membership functions with respect to a certain performance criteria in order to improve the overall accuracy of the integrated navigation system. Results are presented that show that the proposed algorithms can provide a significant improvement in the overall navigation performance of an autonomous underwater vehicle navigation. The proposed technique is known to be the first method used in relation to AUV navigation technology and is thus considered as a major contribution thereof.J&S Marine Ltd., Qinetiq, Subsea 7 and South West Water PL

Plymouth Electronic Archive and Research Library

OpenGrey Repository

Combining Earth Observation and predictive modelling for multi-scale and multi-level biodiversity assessment and monitoring

Author: João Francisco Fernandes Gonçalves
Publication venue
Publication date: 26/01/2018
Field of study

Repositório Aberto da Universidade do Porto

Intelligent Transportation Related Complex Systems and Sensors

Author
Publication venue: 'MDPI AG'
Publication date: 11/01/2022
Field of study

Building around innovative services related to different modes of transport and traffic management, intelligent transport systems (ITS) are being widely adopted worldwide to improve the efficiency and safety of the transportation system. They enable users to be better informed and make safer, more coordinated, and smarter decisions on the use of transport networks. Current ITSs are complex systems, made up of several components/sub-systems characterized by time-dependent interactions among themselves. Some examples of these transportation-related complex systems include: road traffic sensors, autonomous/automated cars, smart cities, smart sensors, virtual sensors, traffic control systems, smart roads, logistics systems, smart mobility systems, and many others that are emerging from niche areas. The efficient operation of these complex systems requires: i) efficient solutions to the issues of sensors/actuators used to capture and control the physical parameters of these systems, as well as the quality of data collected from these systems; ii) tackling complexities using simulations and analytical modelling techniques; and iii) applying optimization techniques to improve the performance of these systems. It includes twenty-four papers, which cover scientific concepts, frameworks, architectures and various other ideas on analytics, trends and applications of transportation-related data

Directory of Open Access Books (DOAB)

Roadmap on signal processing for next generation measurement systems

Signal processing is a fundamental component of almost any sensor-enabled system, with a wide range of applications across different scientific disciplines. Time series data, images, and video sequences comprise representative forms of signals that can be enhanced and analysed for information extraction and quantification. The recent advances in artificial intelligence and machine learning are shifting the research attention towards intelligent, data-driven, signal processing. This roadmap presents a critical overview of the state-of-the-art methods and applications aiming to highlight future challenges and research opportunities towards next generation measurement systems. It covers a broad spectrum of topics ranging from basic to industrial research, organized in concise thematic sections that reflect the trends and the impacts of current and future developments per research field. Furthermore, it offers guidance to researchers and funding agencies in identifying new prospects.AerodynamicsMicrowave Sensing, Signals & System

arXiv.org e-Print Archive

Lund University Publications

TU Delft Repository

KITopen

Research Commons@Waikato

Oxford University Research Archive

Coventry University Pure Portal

Universidad Carlos III de Madrid e-Archivo

Text Similarity Between Concepts Extracted from Source Code and Documentation

Author: Capiluppi Andrea
Pauzi Zaki
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/10/2020
Field of study

Context: Constant evolution in software systems often results in its documentation losing sync with the content of the source code. The traceability research field has often helped in the past with the aim to recover links between code and documentation, when the two fell out of sync. Objective: The aim of this paper is to compare the concepts contained within the source code of a system with those extracted from its documentation, in order to detect how similar these two sets are. If vastly different, the difference between the two sets might indicate a considerable ageing of the documentation, and a need to update it. Methods: In this paper we reduce the source code of 50 software systems to a set of key terms, each containing the concepts of one of the systems sampled. At the same time, we reduce the documentation of each system to another set of key terms. We then use four different approaches for set comparison to detect how the sets are similar. Results: Using the well known Jaccard index as the benchmark for the comparisons, we have discovered that the cosine distance has excellent comparative powers, and depending on the pre-training of the machine learning model. In particular, the SpaCy and the FastText embeddings offer up to 80% and 90% similarity scores. Conclusion: For most of the sampled systems, the source code and the documentation tend to contain very similar concepts. Given the accuracy for one pre-trained model (e.g., FastText), it becomes also evident that a few systems show a measurable drift between the concepts contained in the documentation and in the source code.</p

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

Dissertations of the University of Groningen