2,601 research outputs found

    The power law model applied to the marathon world record

    Get PDF
    In September 2013 the world record in the marathon men's race was broken. The aim of this study is to apply to the 2013 Berlin Marathon a mathematical model based on the power law that analyses the marks distribution and checks its connection. The results show that the correlations obtained in all the different categories have been very significant, with a result of (r ≥ 0.978; p < 0.000) and a linear determination coefficient of (R2 ≥ 0.969). As a conclusion it could be said that the power law application to the 2013 Berlin Marathon Men's race has been an useful and feasible study, and the connection between the data and the mathematical model has been so accurat

    Human exploration of complex knowledge spaces

    Get PDF
    Driven by need or curiosity, as humans we constantly act as information seekers. Whenever we work, study, play, we naturally look for information in spaces where pieces of our knowledge and culture are linked through semantic and logic relations. Nowadays, far from being just an abstraction, these information spaces are complex structures widespread and easily accessible via techno-systems: from the whole World Wide Web to the paramount example of Wikipedia. They are all information networks. How we move on these networks and how our learning experience could be made more efficient while exploring them are the key questions investigated in the present thesis. To this end concepts, tools and models from graph theory and complex systems analysis are borrowed to combine empirical observations of real behaviours of users in knowledge spaces with some theoretical findings of cognitive science research. It is investigated how the knowledge space structure can affect its own exploration in learning-type tasks, and how users do typically explore the information networks, when looking for information or following some learning paths. The research approach followed is exploratory and moves along three main lines of research. Enlarging a previous work in algorithmic education, the first contribution focuses on the topological properties of the information network and how they affect the \emph{efficiency} of a simulated learning exploration. To this end a general class of algorithms is introduced that, standing on well-established findings on educational scheduling, captures some of the behaviours of an individual moving in a knowledge space while learning. In exploring this space, learners move along connections, periodically revisiting some concepts, and sometimes jumping on very distant ones. To investigate the effect of networked information structures on the dynamics, both synthetic and real-world graphs are considered, such as subsections of Wikipedia and word-association graphs. The existence is revealed of optimal topological structures for the defined learning dynamics. They feature small-world and scale-free properties with a balance between the number of hubs and of the least connected items. Surprisingly the real-world networks analysed turn out to be close to optimality. To uncover the role of semantic content of the bit of information to be learned in a information-seeking tasks, empirical data on user traffic logs in the Wikipedia system are then considered. From these, and by means of first-order Markov chain models, some users paths over the encyclopaedia can be simulated and treated as proxies for the real paths. They are then analysed in an abstract semantic level, by mapping the individual pages into points of a semantic reduced space. Recurrent patterns along the walks emerge, even more evident when contrasted with paths originated in information-seeking goal oriented games, thus providing some hints about the unconstrained navigation of users while seeking for information. Still, different systems need to be considered to evaluate longer and more constrained and structured learning dynamics. This is the focus of the third line of investigation, in which learning paths are extracted from advances scientific textbooks and treated as they were walks suggested by their authors throughout an underlying knowledge space. Strategies to extract the paths from the textbooks are proposed, and some preliminary results are discussed on their statistical properties. Moreover, by taking advantages of the Wikipedia information network, the Kauffman theory of adjacent possible is formalized in a learning context, thus introducing the adjacent learnable to refer to the part of the knowledge space explorable by the reader as she learns new concepts by following the suggested learning path. Along this perspective, the paths are analysed as particular realizations of the knowledge space explorations, thus allowing to quantitatively contrast different approaches to education

    Efficient Reorganisation of Hybrid Index Structures Supporting Multimedia Search Criteria

    Get PDF
    This thesis describes the development and setup of hybrid index structures. They are access methods for retrieval techniques in hybrid data spaces which are formed by one or more relational or normalised columns in conjunction with one non-relational or non-normalised column. Examples for these hybrid data spaces are, among others, textual data combined with geographical ones or data from enterprise content management systems. However, all non-relational data types may be stored as well as image feature vectors or comparable types. Hybrid index structures are known to function efficiently regarding retrieval operations. Unfortunately, little information is available about reorganisation operations which insert or update the row tuples. The fundamental research is mainly executed in simulation based environments. This work is written ensuing from a previous thesis that implements hybrid access structures in realistic database surroundings. During this implementation it has become obvious that retrieval works efficiently. Yet, the restructuring approaches require too much effort to be set up, e.g., in web search engine environments where several thousands of documents are inserted or modified every day. These search engines rely on relational database systems as storage backends. Hence, the setup of these access methods for hybrid data spaces is required in real world database management systems. This thesis tries to apply a systematic approach for the optimisation of the rearrangement algorithms inside realistic scenarios. Thus, a measurement and evaluation scheme is created which is repeatedly deployed to an evolving state and a model of hybrid index structures in order to optimise the regrouping algorithms to make a setup of hybrid index structures in real world information systems possible. Thus, a set of input corpora is selected which is applied to the test suite as well as an evaluation scheme. To sum up, it can be said that this thesis describes input sets, a test suite including an evaluation scheme as well as optimisation iterations on reorganisation algorithms reflecting a theoretical model framework to provide efficient reorganisations of hybrid index structures supporting multimedia search criteria

    Enabling parallelism and optimizations in data mining algorithms for power-law data

    Get PDF
    Today's data mining tasks aim to extract meaningful information from a large amount of data in a reasonable time mainly via means of --- a) algorithmic advances, such as fast approximate algorithms and efficient learning algorithms, and b) architectural advances, such as machines with massive compute capacity involving distributed multi-core processors and high throughput accelerators. For current and future generation processors, parallel algorithms are critical for fully utilizing computing resources. Furthermore, exploiting data properties for performance gain becomes crucial for data mining applications. In this work, we focus our attention on power-law behavior –-- a common property found in a large class of data, such as text data, internet traffic, and click-stream data. Specifically, we address the following questions in the context of power-law data: How well do the critical data mining algorithms of current interest fit with today's parallel architectures? Which algorithmic and mapping opportunities can be leveraged to further improve performance?, and What are the relative challenges and gains for such approaches? Specifically, we first investigate the suitability of the "frequency estimation" problem for GPU-scale parallelism. Sketching algorithms are a popular choice for this task due to their desirable trade-off between estimation accuracy and space-time efficiency. However, most of the past work on sketch-based frequency estimation focused on CPU implementations. In our work, we propose a novel approach for sketches, which exploits the natural skewness in the power-law data to efficiently utilize the massive amounts of parallelism in modern GPUs. Next, we explore the problem of "identifying top-K frequent elements" for distributed data streams on modern distributed settings with both multi-core and multi-node CPU parallelism. Sketch-based approaches, such as Count-Min Sketch (CMS) with top-K heap, have an excellent update time but lacks the important property of reducibility, which is needed for exploiting data parallelism. On the other end, the popular Frequent Algorithm (FA) leads to reducible summaries, but its update costs are high. Our approach Topkapi, gives the best of both worlds, i.e., it is reducible like FA and has an efficient update time similar to CMS. For power-law data, Topkapi possesses strong theoretical guarantees and leads to significant performance gains, relative to past work. Finally, we study Word2Vec, a popular word embedding method widely used in Machine learning and Natural Language Processing applications, such as machine translation, sentiment analysis, and query answering. This time, we target Single Instruction Multiple Data (SIMD) parallelism. With the increasing vector lengths in commodity CPUs, such as AVX-512 with a vector length of 512 bits, efficient vector processing unit utilization becomes a major performance game-changer. By employing a static multi-version code generation strategy coupled with an algorithmic approximation based on the power-law frequency distribution of words, we achieve significant reductions in training time relative to the state-of-the-art.Ph.D

    The social components of innovation: from data analysis to mathematical modelling

    Get PDF
    Novelties are a key driver of societal progress, yet we lack a comprehensive understanding of the factors that generate them. Recent evidence suggests that innovation emerges from the balance between exploiting past discoveries and exploring new possibilities, the so-called ``adjacent possible". This thesis aims at developing new analysis tools and models to study how people navigate the seemingly infinite space of possibilities. Firstly, I extend the notion of the adjacent possible to account for novelties as combinations of existing elements. In particular, I model innovation as a random walk on an expanding complex network of content, in which novelties correspond not only to the first visit of nodes, but also of links. The model correctly reproduces how novelties emerge in empirical data, highlighting the importance of the exploration process in shaping the growth of the network. Secondly, since people continuously interact and exchange information with each other, I investigate the role of social interactions in enhancing discoveries. I hence propose a model where multiple agents extend their adjacent possible through the links of a complex social network, exploiting in this way opportunities coming from their contacts. By adding a social dimension to the adjacent possible, I prove that the discovery potential of an individual is influenced by its position on the social network. Finally, I combine the two concepts of the adjacent possible in the content and social dimension to develop a data-driven model of music exploration on online platforms. In such a model, multiple agents grow their individual space of possibilities by exploring a network of similarity between artists, while exploiting suggestions from their friends on the social network. The comparison with the empirical data indicates that the adjacent possible, in both the content and the social space, plays a crucial role in determining the individual propensity to innovate

    Modelling the social dynamics of contagion and discovery using dynamical processes on complex networks.

    Get PDF
    PhD Thesis.Complex networks have been successfully used to describe the social structure on top of which many real-world social processes take place. In this thesis, I focus on the development of network models that aim at capturing the fundamental mechanisms behind the dynamics of adoption of ideas, behaviours, or items. I start considering the transmission of a single idea from one individual to another, in an epidemic-like fashion. Recent evidence has shown that mechanisms of complex contagion can effectively capture the fundamental rules of social reinforcement and peer pressure proper of social systems. Along this line, I propose a model of complex recovery in which the social influence mechanism acts on the recovery rule rather than on the infection one, leading to explosive behaviours. Yet, in human communication, interactions can occur in groups. I thus expand the pairwise representation given by graphs using simplicial complexes instead. I develop a model of simplicial contagion, showing how the inclusion of these higher-order interactions can dramatically alter the spreading dynamics. I then consider an individual and model the dynamics of discovery as paths of sequential adoptions, with the first visit of an idea representing a novelty. Starting from the empirically observed dynamics of correlated novelties, according to which one discovery leads to another, I develop a model of biased random walks in which the exploration of the interlinked space of possible discoveries has the byproduct of influencing also the strengths of their connections. Balancing exploration and exploitation, the model reproduces the basic footprints of real-world innovation processes. Nevertheless, people do not live and work in isolation, and social ties can shape their behaviours. Thus, I consider interacting discovery processes to investigate how social interactions contribute to the collective emergence of new ideas and teamwork, and explorers can exploit opportunities coming from their social contacts

    On Information Usage Modeling.

    Get PDF
    Although it is well-known that the usage of information usually follows the 80/20 rule and concentrates on a few items, there has not been an analytical model to depict this skew distribution. This dissertation provides a theoretical foundation, based on Simon\u27s modeling of empirical phenomena and Chen\u27s index approach, to identify the factors which shape this usage pattern. Using Chen\u27s index approach, we conclude that the distance and slope of the data points determine the shape of the distribution. We further examine the critical parameters in Simon\u27s model through computer simulations, and we find the probability of new entry (α\alpha) and the rate of decay (β\beta) to be two predominant factors that affect the patterns of information usage. Based on the effects of these two parameters we can establish the limiting conditions under which these empirical phenomena hold true. Finally, we show how our findings can be applied to enhance the weeding process in libraries--a procedure that can be extended to the archive management of information systems

    Do Environmentally Friendly Companies Outperform Environmentally Unfriendly Companies in Financial Markets? An Analysis of Financial Performance & Corporate Social Responsibility

    Get PDF
    Over the last century, the ideology regarding the relationship between humans and the natural world has shifted from a period of major exploitation to a time of conservation and appreciation. Recent catastrophic events such as Hurricane Katrina in 2005, a result of sea level rise and wetland degradation, have really opened the public\u27s eyes to the negative impacts that humans have on the environment, and what will come if we do not change our ways. Implementing sustainability practices has become a norm, if not a necessity, in the corporate world if companies wish to prosper. Using cross-sectional data from Newsweek\u27s 2015 Green Rankings List and a variety of online financial sources, this study examines the relationship between corporate sustainability efforts, specifically green efforts as reported by Newsweek, and performance in financial markets. Companies may strive for sustainability for its own sake, but they may also hope that their efforts will be rewarded by better financial performance and recognition by the consuming and investing public. To get at the former, this study examines the relationship between Newsweek\u27s Green Ranking and a variety of financial indicators. To address the public perception, using a survey conducted within the Union College community, this study will evaluate how well recognized Newsweek\u27s 2015 Green Rankings environmentally friendly companies are among people with various demographic backgrounds, particularly the millennial age group. The survey will also evaluate how people perceive a company compared to its actual efforts as measured by Newsweek. If there is a relationship between sustainability efforts and financial performance, or public perception, then companies should incorporate environmentally friendly practices into day-to-day operations and learn to market these developments in a way that connects with consumers

    Mathematical model of power law applied to the marathon

    Get PDF
    Objetivo: Aplicar a una prueba de maratón, un modelo matemático de ley de potencias para la distribución de las marcas y comprobar su nivel de ajuste. Método: Aplicación de dos modelos al maratón femenino de Londres de 2010 en todas sus categorías, con las variables tiempo, modelo creciente y a la velocidad media modelo decreciente. Resultados: Los correlaciones obtenidas en todas las categorías han sido muy significativas mostrándose en el coeficiente de correlación (r = 0,980; P < 0,000) y en el coeficiente de determinación lineal (R2= 0,9737). Conclusiones: La aplicación de un modelo matemático de ley de potencias a la prueba de maratón puede ser útil y viable, y el ajuste de los datos al modelo ha sido bastante preciso.Objective: To apply a marathon a mathematical model of power law for the distribution of records and check their level of fit. Method: Application oftwo models at London Women's Marathon 2010 in all categories, with the variable time, increasing pattern and decreasing the average speed pattern. Results: The correlates obtained in all categories have been highly significant regarding the correlation coefficient (r = 0.980, P <0.000) and the linear coefficient of determination (R2 = 0.9737). Conclusions: The application of a mathematical model of power law to the marathon can be useful and feasible, and the fit of the data to the model was fairly accurate.Universidad de Granada. Departamento de Educación Física y Deportiva
    • …
    corecore