222 research outputs found

    Kernel-Based Ranking. Methods for Learning and Performance Estimation

    Get PDF
    Machine learning provides tools for automated construction of predictive models in data intensive areas of engineering and science. The family of regularized kernel methods have in the recent years become one of the mainstream approaches to machine learning, due to a number of advantages the methods share. The approach provides theoretically well-founded solutions to the problems of under- and overfitting, allows learning from structured data, and has been empirically demonstrated to yield high predictive performance on a wide range of application domains. Historically, the problems of classification and regression have gained the majority of attention in the field. In this thesis we focus on another type of learning problem, that of learning to rank. In learning to rank, the aim is from a set of past observations to learn a ranking function that can order new objects according to how well they match some underlying criterion of goodness. As an important special case of the setting, we can recover the bipartite ranking problem, corresponding to maximizing the area under the ROC curve (AUC) in binary classification. Ranking applications appear in a large variety of settings, examples encountered in this thesis include document retrieval in web search, recommender systems, information extraction and automated parsing of natural language. We consider the pairwise approach to learning to rank, where ranking models are learned by minimizing the expected probability of ranking any two randomly drawn test examples incorrectly. The development of computationally efficient kernel methods, based on this approach, has in the past proven to be challenging. Moreover, it is not clear what techniques for estimating the predictive performance of learned models are the most reliable in the ranking setting, and how the techniques can be implemented efficiently. The contributions of this thesis are as follows. First, we develop RankRLS, a computationally efficient kernel method for learning to rank, that is based on minimizing a regularized pairwise least-squares loss. In addition to training methods, we introduce a variety of algorithms for tasks such as model selection, multi-output learning, and cross-validation, based on computational shortcuts from matrix algebra. Second, we improve the fastest known training method for the linear version of the RankSVM algorithm, which is one of the most well established methods for learning to rank. Third, we study the combination of the empirical kernel map and reduced set approximation, which allows the large-scale training of kernel machines using linear solvers, and propose computationally efficient solutions to cross-validation when using the approach. Next, we explore the problem of reliable cross-validation when using AUC as a performance criterion, through an extensive simulation study. We demonstrate that the proposed leave-pair-out cross-validation approach leads to more reliable performance estimation than commonly used alternative approaches. Finally, we present a case study on applying machine learning to information extraction from biomedical literature, which combines several of the approaches considered in the thesis. The thesis is divided into two parts. Part I provides the background for the research work and summarizes the most central results, Part II consists of the five original research articles that are the main contribution of this thesis.Siirretty Doriast

    Planning Group Meals Based on Preferences of Attendees

    Get PDF
    Generally, the present disclosure is directed to determining an optimal place and time for a meeting based on the preferences of the people attending the meeting. In particular, in some implementations, the systems and methods of the present disclosure can include or otherwise leverage one or more machine-learned models to predict an optimal restaurant and time for a group meal based on personal preferences and/or time availability of members of the group

    Multidisciplinary perspectives on Artificial Intelligence and the law

    Get PDF
    This open access book presents an interdisciplinary, multi-authored, edited collection of chapters on Artificial Intelligence (‘AI’) and the Law. AI technology has come to play a central role in the modern data economy. Through a combination of increased computing power, the growing availability of data and the advancement of algorithms, AI has now become an umbrella term for some of the most transformational technological breakthroughs of this age. The importance of AI stems from both the opportunities that it offers and the challenges that it entails. While AI applications hold the promise of economic growth and efficiency gains, they also create significant risks and uncertainty. The potential and perils of AI have thus come to dominate modern discussions of technology and ethics – and although AI was initially allowed to largely develop without guidelines or rules, few would deny that the law is set to play a fundamental role in shaping the future of AI. As the debate over AI is far from over, the need for rigorous analysis has never been greater. This book thus brings together contributors from different fields and backgrounds to explore how the law might provide answers to some of the most pressing questions raised by AI. An outcome of the Católica Research Centre for the Future of Law and its interdisciplinary working group on Law and Artificial Intelligence, it includes contributions by leading scholars in the fields of technology, ethics and the law.info:eu-repo/semantics/publishedVersio

    Document Based Clustering For Detecting Events in Microblogging Websites

    Get PDF
    Social media has a great in?uence in our daily lives. People share their opinions, stories, news, and broadcast events using social media. This results in great amounts of information in social media. It is cumbersome to identify and organize the interesting events with this massive volumes of data, typically browsing, searching, monitoring events becomes more and more challenging. A lot of work has been done in the area of topic detection and tracking (TDT). Most of these methods are based on single-modality (e.g., text, images) information or multi-modality information. In the single-modality analysis, many existing methods adopt visual information (e.g., images and videos) or textual information (e.g., names, time references, locations, title, tags, and description) in isolation to model event data for event detection and tracking. This problem can be resolved by a novel multi-model social event tracking and an evolutionary framework not only effectively capturing the events, but also generates the summary of these events over time. We proposed a novel method works with mmETM, which can effectively model the social documents, which includes the long text along with the images. It learns the similarities between the textual and visual modalities to separate the visual and non-visual representative topics. To incorporate our method to social tracking, we adopted an incremental learning technique represented as mmETM, which gives informative textual and visual topics of event in social media with respect to the time. To validate our work, we used a sample data set and conducted various experiments on it. Both subjective and quantitative assessments show that the proposed mmETM technique performs positively against a few best state-of-the art techniques

    Recent Applications in Graph Theory

    Get PDF
    Graph theory, being a rigorously investigated field of combinatorial mathematics, is adopted by a wide variety of disciplines addressing a plethora of real-world applications. Advances in graph algorithms and software implementations have made graph theory accessible to a larger community of interest. Ever-increasing interest in machine learning and model deployments for network data demands a coherent selection of topics rewarding a fresh, up-to-date summary of the theory and fruitful applications to probe further. This volume is a small yet unique contribution to graph theory applications and modeling with graphs. The subjects discussed include information hiding using graphs, dynamic graph-based systems to model and control cyber-physical systems, graph reconstruction, average distance neighborhood graphs, and pure and mixed-integer linear programming formulations to cluster networks

    Brain-Computer Interface

    Get PDF
    Brain-computer interfacing (BCI) with the use of advanced artificial intelligence identification is a rapidly growing new technology that allows a silently commanding brain to manipulate devices ranging from smartphones to advanced articulated robotic arms when physical control is not possible. BCI can be viewed as a collaboration between the brain and a device via the direct passage of electrical signals from neurons to an external system. The book provides a comprehensive summary of conventional and novel methods for processing brain signals. The chapters cover a range of topics including noninvasive and invasive signal acquisition, signal processing methods, deep learning approaches, and implementation of BCI in experimental problems

    Dynamic learning of the environment for eco-citizen behavior

    Get PDF
    Le développement de villes intelligentes et durables nécessite le déploiement des technologies de l'information et de la communication (ITC) pour garantir de meilleurs services et informations disponibles à tout moment et partout. Comme les dispositifs IoT devenant plus puissants et moins coûteux, la mise en place d'un réseau de capteurs dans un contexte urbain peut être coûteuse. Cette thèse propose une technique pour estimer les informations environnementales manquantes dans des environnements à large échelle. Notre technique permet de fournir des informations alors que les dispositifs ne sont pas disponibles dans une zone de l'environnement non couverte par des capteurs. La contribution de notre proposition est résumée dans les points suivants : - limiter le nombre de dispositifs de détection à déployer dans un environnement urbain ; - l'exploitation de données hétérogènes acquises par des dispositifs intermittents ; - le traitement en temps réel des informations ; - l'auto-calibration du système. Notre proposition utilise l'approche AMAS (Adaptive Multi-Agent System) pour résoudre le problème de l'indisponibilité des informations. Dans cette approche, une exception est considérée comme une situation non coopérative (NCS) qui doit être résolue localement et de manière coopérative. HybridIoT exploite à la fois des informations homogènes (informations du même type) et hétérogènes (informations de différents types ou unités) acquises à partir d'un capteur disponible pour fournir des estimations précises au point de l'environnement où un capteur n'est pas disponible. La technique proposée permet d'estimer des informations environnementales précises dans des conditions de variabilité résultant du contexte d'application urbaine dans lequel le projet est situé, et qui n'ont pas été explorées par les solutions de l'état de l'art : - ouverture : les capteurs peuvent entrer ou sortir du système à tout moment sans qu'aucune configuration particulière soit nécessaire ; - large échelle : le système peut être déployé dans un contexte urbain à large échelle et assurer un fonctionnement correct avec un nombre significatif de dispositifs ; - hétérogénéité : le système traite différents types d'informations sans aucune configuration a priori. Notre proposition ne nécessite aucun paramètre d'entrée ni aucune reconfiguration. Le système peut fonctionner dans des environnements ouverts et dynamiques tels que les villes, où un grand nombre de capteurs peuvent apparaître ou disparaître à tout moment et sans aucun préavis. Nous avons fait différentes expérimentations pour comparer les résultats obtenus à plusieurs techniques standard afin d'évaluer la validité de notre proposition. Nous avons également développé un ensemble de techniques standard pour produire des résultats de base qui seront comparés à ceux obtenus par notre proposition multi-agents.The development of sustainable smart cities requires the deployment of Information and Communication Technology (ICT) to ensure better services and available information at any time and everywhere. As IoT devices become more powerful and low-cost, the implementation of an extensive sensor network for an urban context can be expensive. This thesis proposes a technique for estimating missing environmental information in large scale environments. Our technique enables providing information whereas devices are not available for an area of the environment not covered by sensing devices. The contribution of our proposal is summarized in the following points: * limiting the number of sensing devices to be deployed in an urban environment; * the exploitation of heterogeneous data acquired from intermittent devices; * real-time processing of information; * self-calibration of the system. Our proposal uses the Adaptive Multi-Agent System (AMAS) approach to solve the problem of information unavailability. In this approach, an exception is considered as a Non-Cooperative Situation (NCS) that has to be solved locally and cooperatively. HybridIoT exploits both homogeneous (information of the same type) and heterogeneous information (information of different types or units) acquired from some available sensing device to provide accurate estimates in the point of the environment where a sensing device is not available. The proposed technique enables estimating accurate environmental information under conditions of uncertainty arising from the urban application context in which the project is situated, and which have not been explored by the state-of-the-art solutions: * openness: sensors can enter or leave the system at any time without the need for any reconfiguration; * large scale: the system can be deployed in a large, urban context and ensure correct operation with a significative number of devices; * heterogeneity: the system handles different types of information without any a priori configuration. Our proposal does not require any input parameters or reconfiguration. The system can operate in open, dynamic environments such as cities, where a large number of sensing devices can appear or disappear at any time and without any prior notification. We carried out different experiments to compare the obtained results to various standard techniques to assess the validity of our proposal. We also developed a pipeline of standard techniques to produce baseline results that will be compared to those obtained by our multi-agent proposal

    Recent Advances in Social Data and Artificial Intelligence 2019

    Get PDF
    The importance and usefulness of subjects and topics involving social data and artificial intelligence are becoming widely recognized. This book contains invited review, expository, and original research articles dealing with, and presenting state-of-the-art accounts pf, the recent advances in the subjects of social data and artificial intelligence, and potentially their links to Cyberspace

    Parallel and Flow-Based High Quality Hypergraph Partitioning

    Get PDF
    Balanced hypergraph partitioning is a classic NP-hard optimization problem that is a fundamental tool in such diverse disciplines as VLSI circuit design, route planning, sharding distributed databases, optimizing communication volume in parallel computing, and accelerating the simulation of quantum circuits. Given a hypergraph and an integer kk, the task is to divide the vertices into kk disjoint blocks with bounded size, while minimizing an objective function on the hyperedges that span multiple blocks. In this dissertation we consider the most commonly used objective, the connectivity metric, where we aim to minimize the number of different blocks connected by each hyperedge. The most successful heuristic for balanced partitioning is the multilevel approach, which consists of three phases. In the coarsening phase, vertex clusters are contracted to obtain a sequence of structurally similar but successively smaller hypergraphs. Once sufficiently small, an initial partition is computed. Lastly, the contractions are successively undone in reverse order, and an iterative improvement algorithm is employed to refine the projected partition on each level. An important aspect in designing practical heuristics for optimization problems is the trade-off between solution quality and running time. The appropriate trade-off depends on the specific application, the size of the data sets, and the computational resources available to solve the problem. Existing algorithms are either slow, sequential and offer high solution quality, or are simple, fast, easy to parallelize, and offer low quality. While this trade-off cannot be avoided entirely, our goal is to close the gaps as much as possible. We achieve this by improving the state of the art in all non-trivial areas of the trade-off landscape with only a few techniques, but employed in two different ways. Furthermore, most research on parallelization has focused on distributed memory, which neglects the greater flexibility of shared-memory algorithms and the wide availability of commodity multi-core machines. In this thesis, we therefore design and revisit fundamental techniques for each phase of the multilevel approach, and develop highly efficient shared-memory parallel implementations thereof. We consider two iterative improvement algorithms, one based on the Fiduccia-Mattheyses (FM) heuristic, and one based on label propagation. For these, we propose a variety of techniques to improve the accuracy of gains when moving vertices in parallel, as well as low-level algorithmic improvements. For coarsening, we present a parallel variant of greedy agglomerative clustering with a novel method to resolve cluster join conflicts on-the-fly. Combined with a preprocessing phase for coarsening based on community detection, a portfolio of from-scratch partitioning algorithms, as well as recursive partitioning with work-stealing, we obtain our first parallel multilevel framework. It is the fastest partitioner known, and achieves medium-high quality, beating all parallel partitioners, and is close to the highest quality sequential partitioner. Our second contribution is a parallelization of an n-level approach, where only one vertex is contracted and uncontracted on each level. This extreme approach aims at high solution quality via very fine-grained, localized refinement, but seems inherently sequential. We devise an asynchronous n-level coarsening scheme based on a hierarchical decomposition of the contractions, as well as a batch-synchronous uncoarsening, and later fully asynchronous uncoarsening. In addition, we adapt our refinement algorithms, and also use the preprocessing and portfolio. This scheme is highly scalable, and achieves the same quality as the highest quality sequential partitioner (which is based on the same components), but is of course slower than our first framework due to fine-grained uncoarsening. The last ingredient for high quality is an iterative improvement algorithm based on maximum flows. In the sequential setting, we first improve an existing idea by solving incremental maximum flow problems, which leads to smaller cuts and is faster due to engineering efforts. Subsequently, we parallelize the maximum flow algorithm and schedule refinements in parallel. Beyond the strive for highest quality, we present a deterministically parallel partitioning framework. We develop deterministic versions of the preprocessing, coarsening, and label propagation refinement. Experimentally, we demonstrate that the penalties for determinism in terms of partition quality and running time are very small. All of our claims are validated through extensive experiments, comparing our algorithms with state-of-the-art solvers on large and diverse benchmark sets. To foster further research, we make our contributions available in our open-source framework Mt-KaHyPar. While it seems inevitable, that with ever increasing problem sizes, we must transition to distributed memory algorithms, the study of shared-memory techniques is not in vain. With the multilevel approach, even the inherently slow techniques have a role to play in fast systems, as they can be employed to boost quality on coarse levels at little expense. Similarly, techniques for shared-memory parallelism are important, both as soon as a coarse graph fits into memory, and as local building blocks in the distributed algorithm

    Fifth NASA Goddard Conference on Mass Storage Systems and Technologies

    Get PDF
    This document contains copies of those technical papers received in time for publication prior to the Fifth Goddard Conference on Mass Storage Systems and Technologies held September 17 - 19, 1996, at the University of Maryland, University Conference Center in College Park, Maryland. As one of an ongoing series, this conference continues to serve as a unique medium for the exchange of information on topics relating to the ingestion and management of substantial amounts of data and the attendant problems involved. This year's discussion topics include storage architecture, database management, data distribution, file system performance and modeling, and optical recording technology. There will also be a paper on Application Programming Interfaces (API) for a Physical Volume Repository (PVR) defined in Version 5 of the Institute of Electrical and Electronics Engineers (IEEE) Reference Model (RM). In addition, there are papers on specific archives and storage products
    • …
    corecore