144 research outputs found

    An update on statistical boosting in biomedicine

    Get PDF
    Statistical boosting algorithms have triggered a lot of research during the last decade. They combine a powerful machine-learning approach with classical statistical modelling, offering various practical advantages like automated variable selection and implicit regularization of effect estimates. They are extremely flexible, as the underlying base-learners (regression functions defining the type of effect for the explanatory variables) can be combined with any kind of loss function (target function to be optimized, defining the type of regression setting). In this review article, we highlight the most recent methodological developments on statistical boosting regarding variable selection, functional regression and advanced time-to-event modelling. Additionally, we provide a short overview on relevant applications of statistical boosting in biomedicine

    LEED for Cities Pilot Program: Atlantic Beach, Florida Case Study

    Get PDF
    This case study publication provides an overview of the U.S. Green Building Council’s LEED for Cities & Communities Pilot Program. It specifically provides insight into the certification process that was pursued by the City of Atlantic Beach, Florida between 2018 and 2019. LEED for Cities is a certification for local governments that requires planners to examine a wide array of different performance metrics. These metrics focus on topics related to energy, water, waste, transportation and human experience. Through a community partnership with USGBC Florida and the City of Atlantic Beach, Florida, the UNF Environmental Center worked through the certification process and documented the steps that were taken along the way. The case study itself is a result of a Directed Independent Study (DIS) undertaken by Sean Lahav, a UNF Master of Public Administration (MPA) student at the time, who conducted research under the faculty leadership of Dr. David Lambert from the UNF Coggin College of Business. The publication provides other municipalities and communities across Florida with a benchmark of understanding for tackling the certification process independently

    Methods for Modelling Response Styles

    Get PDF
    Abstract Ratings scales are ubiquitous in empirical research, especially in the social sciences, where they are used for measuring abstract concepts such as opinion or attitude. Survey questions typically employ rating scales, for example when persons are asked to self-report their perceptions of films or their job satisfaction. Yet, using a rating scale is subjective. Some persons may use only the middle of the rating scale, whilst others choose to use only the extremes. Consequently, persons with the same opinion may very well answer the same survey question using different ratings. This leads to the response style problem: How can we take into account that different ratings can potentially have different meanings to different persons when analyzing such data? This dissertation makes methodological and empirical contributions towards modelling rating scale data while accounting for such differences in response styles. The general approach is to identify individuals in the data which exhibit similar response styles, and to extract substantive information only within such groups. These elements naturally lead to the synthesis of cluster analysis and dimensionality reduction methods. In order to identify these response styles, responses to multiple survey questions are used to assess within-subject rating scale usage. Both non-parametric and parametric approaches are formulated and studied, and accompanying open-source software implementations are made available. The added value of using the developed algorithms is illustrated by applying these to empirical data. Applications range from sensometrics and brand studies, to psychology and political science

    Advance in optimal design and deployment of ambient intelligence systems

    Get PDF
    [SPA]Se ha pronosticado un futuro excepcional para los sistemas de Inteligencia Ambiental (AmI). Dichos sistemas comprenden aquellos entornos capaces de anticiparse a las necesidades de la gente, y reaccionar inteligentemente en su ayuda. La inteligencia de estos sistemas proviene de los procesos de toma de decisión, cuyo funcionamiento resulta transparente al usuario. Algunos de estos entornos previstos pertenecen al ámbito de los hogares inteligentes, monitorización de la salud, educación, lugares de trabajo, deportes, soporte en actividades cotidianas, etc. La creciente complejidad de estos entornos hace cada vez más difícil la labor de tomar las decisiones correctas que sirvan de ayuda a los usuarios. Por tanto, la toma de decisiones resulta una parte esencial de estos sistemas. Diversas técnicas pueden utilizarse de forma eficaz en los sistemas AmI para resolver los problemas derivados de la toma de decisiones. Entre ellas están las técnicas de clasificación, y las herramientas matemáticas de programación. En la primera parte de este trabajo presentamos dos entornos AmI donde la toma de decisiones juega un papel fundamental: • Un sistema AmI para el entrenamiento de atletas. Este sistema monitoriza variables ambientales y biométricas de los atletas, tomando decisiones durante la sesión de entrenamiento, que al atleta le ayudan a conseguir un determinado objetivo. Varias técnicas han sido utilizadas para probar diferentes generadores de decisión: interpolación mediante (m, s)-splines, k-Nearest-Neighbors, y programación dinámica mediante Procesos de Decisión de Markov. • Un sistema AmI para detección de caza furtiva. En este caso, el objetivo consiste en localizar el origen de un disparo utilizando, para ello, una red de sensores acústicos. La localización se realiza utilizando el método de multilateración hiperbólica. Además, la calidad de las decisiones generadas está directamente relacionada con la calidad de la información disponible. Por lo tanto, es necesario que los nodos de la infraestructura AmI encargados de la obtención de datos relevantes del usuario y del ambiente, estén en red y situados correctamente. De hecho, el problema de posicionamiento tiene dos partes: los nodos deben ubicarse cerca de los lugares donde ocurren sucesos de interés, y deben estar conectados para que los datos capturados sean transmitidos y tengan utilidad. Adicionalmente, pueden considerarse otras restricciones, tales como el coste de despliegue de red. Por tanto, en el posicionamiento de los nodos es habitual que existan compromisos entre las capacidades de sensorización y de comunicación. Son posibles dos tipos de posicionamiento. Posicionamiento determinista donde puede seleccionarse de forma precisa la posición de cada nodo, y, aleatorio donde debido a la gran cantidad de nodos o a lo inaccesible del terreno de depliegue, sólo resulta posible la distribución aleatoria de los nodos. Esta tesis aborda tres problemas de posicionamiento de red. Los dos primeros problemas se han planteado de forma general, siendo de aplicación a cualquier tipo de escenario AmI. El objetivo es seleccionar las mejores posiciones para los nodos y mantener los nodos de la red conectados. Las opciones estudiadas son un posicionamiento determinista resuelto mediante el metaheurístico Ant Colony Optimization para dominios continuos, y un posicionamiento aleatorio, donde se realiza un despliegue cuasi-controlado mediante varios clusters de red. En cada clúster podemos determinar tanto el punto objetivo de despliegue, como la dispersión de los nodos alrededor de dicho punto. En este caso, el problema planteado tiene naturaleza estocástica y se resuelve descomponiéndolo en fases de despliegue, una por clúster. Finalmente, el tercer escenario de despliegue de red está estrechamente ligado al entorno AmI para la detección de caza furtiva. En este caso, utilizamos el método matemático de descenso sin derivadas. El objetivo consiste en maximizar la cobertura, minimizando a la vez el coste de despliegue. Debido a que los dos objetivos son opuestos, se utiliza un frente Pareto para que el diseñador seleccione un punto de operación. [ENG] A brilliant future is forecasted for Ambient Intelligence (AmI) systems. These comprise sensitive environments able to anticipate people’s actions, and to react intelligently supporting them. AmI relies on decision-making processes, which are usually hidden to the users, giving rise to the so-called smart environments. Some of those envisioned environments include smart homes, health monitoring, education, workspaces, sports, assisted living, and so forth. Moreover, the complexity of these environments is continuously growing, thereby increasing the difficulty of making suitable decisions in support of human activity. Therefore, decision-making is one of the critical parts of these systems. Several techniques can be efficiently combined with AmI environments and may help to alleviate decisionmaking issues. These include classification techniques, as well as mathematical programming tools. In the first part of this work we introduce two AmI environments where decisionmaking plays a primary role: • An AmI system for athletes’ training. This system is in charge of monitoring ambient variables, as well as athletes’ biometry and making decisions during a training session to meet the training goals. Several techniques have been used to test different decision engines: interpolation by means of (m, s)-splines, k-Nearest-Neighbors and dynamic programming based on Markov Decision Processes. • An AmI system for furtive hunting detection. In this case, the aim is to locate gunshots using a network of acoustic sensors. The location is performed by means of a hyperbolic multilateration method. Moreover, the quality of the decisions is directly related to the quality of the information available. Therefore, is necessary that nodes in charge of sensing and networking tasks of the AmI infrastructure must be placed correctly. In fact, the placement problem is twofold: nodes must be near important places, where valuable events occur, and network connectivity is also mandatory. In addition, some other constraints, such as network deployment cost could be considered. Therefore, there are usually tradeoffs between sensing capacity and communication capabilities. Two kinds of placement options are possible. Deterministic placements, where the position for each node can be precisely selected, and random deployments where, due to the large number of nodes, or the inaccessibility of the terrain, the only suitable option for deployment is a random scattering of the nodes. This thesis addresses three problems of network placement. The first two problems are not tied to a particular case, but are applicable to a general AmI scenario. The goal is to select the best positions for the nodes, while connectivity constraints are met. The options examined are a deterministic placement, which is solved by means of an Ant Colony Optimization metaheuristic for continuous domains, and a random placement, where partially controlled deployments of clustered networks take place. For each cluster, both the target point and dispersion can be selected, leading to a stochastic problem, which is solved by decomposing it in several steps, one per cluster. Finally, the third network placement scenario is tightly related to the furtive hunting detection AmI environment. Using a derivate-free descent methodology, the goal is to select the placement with maximal sensing coverage and minimal cost. Since both goals are contrary, the Pareto front is constructed to enable the designer to select the desired operational point.Universidad Politécnica de Cartagen

    CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS

    Get PDF
    The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research

    Censored regression techniques for credit scoring

    Get PDF
    This thesis investigates the use of newly-developed survival analysis tools for credit scoring. Credit scoring techniques are currently used by financial institutions to estimate the probability of a customer defaulting on a loan by a predetermined time in the future. While a number of classification techniques are currently used, banks are now becoming more concerned with estimating the lifetime of the loan rather than just the probability of default. Difficulties arise when using standard statistical techniques due to the presence of censoring in the data. Survival analysis, originating from medical and engineering fields, is an area of statistics that typically deals with censored lifetime data. The theoretical developments in this thesis revolve around linear regression for censored data, in particular the Buckley-James method. The Buckley-James method is analogous to linear regression and gives estimates of the mean expected lifetime given a set of explanatory variables. The first development is a measure of fit for censored regression, similar to the classical r-squared of linear regression. Next, the variable-reduction technique of stepwise selection is extended to the Buckley-James method. For the last development, the Buckley-James algorithm is altered to incorporate non-linear regression methods such as neural networks and Multivariate Adaptive Regression Splines (MARS). MARS shows promise in terms of predictive power and interpretability in both simulation and empirical studies. The practical section of the thesis involves using the new techniques to predict the time to default and time to repayment of unsecured personal loans from a database obtained from a major Australian bank. The analyses are unique, being the first published work on applying Buckley-James and related methods to a large-scale financial database

    Financial risk management in shipping investment, a machine learning approach

    Get PDF
    There has been a plethora of research into company credit risk and financial default prediction from both academics and financial professionals alike. However, only a limited volume of the literature has focused on international shipping company financial distress prediction, with previous research concentrating largely on classic linear based modelling techniques. The gaps, identified in this research, demonstrate the need for increased effort to address the inherent nonlinear nature of shipping operations, as well as the noisy and incomplete composition of shipping company financial statement data. Furthermore, the gaps illustrate the need for a workable definition of financial distress, which to date has too often been classed only by the ultimate state of bankruptcy/insolvency. This definition prohibits the practical application of methodologies which should be aimed at the timely identification of financial distress, thereby allowing for remedial measures to be implemented to avoid ultimate financial collapse. This research contributes to the field by addressing these gaps through i) the creation of a machine learning based financial distress forecasting methodology and ii) utilising this as the foundation for the development of a software toolkit for financial distress prediction. This toolkit enables the practical application of the financial risk principles, embedded within the methodology, to be readily integrated into an enterprise/corporate risk management system. The methodology and software were tested through the application of a bulk shipping company case study utilising 5000 bulk shipping company-year accounting observations for the period 2000-2018, in combination with market and macroeconomic data. The results demonstrate that the methodology improves the capture of distress correlations, that traditional financial distress models have struggled to achieve. The methodology's capacity to adequately treat the problem of missing data in company financial statements was also validated. Finally, the results also highlight the successful application of the software toolkit for the development of a multi-model, real time system which can enhance the financial monitoring of shipping companies by acting as a practical "early warning system" for financial distress.There has been a plethora of research into company credit risk and financial default prediction from both academics and financial professionals alike. However, only a limited volume of the literature has focused on international shipping company financial distress prediction, with previous research concentrating largely on classic linear based modelling techniques. The gaps, identified in this research, demonstrate the need for increased effort to address the inherent nonlinear nature of shipping operations, as well as the noisy and incomplete composition of shipping company financial statement data. Furthermore, the gaps illustrate the need for a workable definition of financial distress, which to date has too often been classed only by the ultimate state of bankruptcy/insolvency. This definition prohibits the practical application of methodologies which should be aimed at the timely identification of financial distress, thereby allowing for remedial measures to be implemented to avoid ultimate financial collapse. This research contributes to the field by addressing these gaps through i) the creation of a machine learning based financial distress forecasting methodology and ii) utilising this as the foundation for the development of a software toolkit for financial distress prediction. This toolkit enables the practical application of the financial risk principles, embedded within the methodology, to be readily integrated into an enterprise/corporate risk management system. The methodology and software were tested through the application of a bulk shipping company case study utilising 5000 bulk shipping company-year accounting observations for the period 2000-2018, in combination with market and macroeconomic data. The results demonstrate that the methodology improves the capture of distress correlations, that traditional financial distress models have struggled to achieve. The methodology's capacity to adequately treat the problem of missing data in company financial statements was also validated. Finally, the results also highlight the successful application of the software toolkit for the development of a multi-model, real time system which can enhance the financial monitoring of shipping companies by acting as a practical "early warning system" for financial distress
    • …
    corecore