74 research outputs found
Towards an Architecture for Efficient Distributed Search of Multimodal Information
The creation of very large-scale multimedia search engines, with more than one billion
images and videos, is a pressing need of digital societies where data is generated by multiple connected devices. Distributing search indexes in cloud environments is the inevitable solution to deal with the increasing scale of image and video collections. The distribution of such indexes in this setting raises multiple challenges such as the even partitioning of data space, load balancing across index nodes and the fusion of the results computed over multiple nodes. The main question behind this thesis is how to reduce and distribute the multimedia retrieval computational complexity?
This thesis studies the extension of sparse hash inverted indexing to distributed settings.
The main goal is to ensure that indexes are uniformly distributed across computing nodes while keeping similar documents on the same nodes. Load balancing is performed at both node and index level, to guarantee that the retrieval process is not delayed by nodes that have to inspect larger subsets of the index.
Multimodal search requires the combination of the search results from individual modalities and document features. This thesis studies rank fusion techniques focused on reducing complexity by automatically selecting only the features that improve retrieval effectiveness.
The achievements of this thesis span both distributed indexing and rank fusion research.
Experiments across multiple datasets show that sparse hashes can be used to distribute documents and queries across index entries in a balanced and redundant manner across nodes. Rank fusion results show that is possible to reduce retrieval complexity and improve efficiency by searching only a subset of the feature indexes
International Conference on Continuous Optimization (ICCOPT) 2019 Conference Book
The Sixth International Conference on Continuous Optimization took place on the campus of the Technical University of Berlin, August 3-8, 2019. The ICCOPT is a flagship conference of the Mathematical Optimization Society (MOS), organized every three years. ICCOPT 2019 was hosted by the Weierstrass Institute for Applied Analysis and Stochastics (WIAS) Berlin. It included a Summer School and a Conference with a series of plenary and semi-plenary talks, organized and contributed sessions, and poster sessions.
This book comprises the full conference program. It contains, in particular, the scientific program in survey style as well as with all details, and information on the social program, the venue, special meetings, and more
Predictive Modelling Approach to Data-Driven Computational Preventive Medicine
This thesis contributes novel predictive modelling approaches to data-driven computational preventive medicine and offers an alternative framework to statistical analysis in preventive medicine research. In the early parts of this research, this thesis presents research by proposing a synergy of machine learning methods for detecting patterns and developing inexpensive predictive models from healthcare data to classify the potential occurrence of adverse health events. In particular, the data-driven methodology is founded upon a heuristic-systematic assessment of several machine-learning methods, data preprocessing techniques, models’ training estimation and optimisation, and performance evaluation, yielding a novel computational data-driven framework, Octopus.
Midway through this research, this thesis advances research in preventive medicine and data mining by proposing several new extensions in data preparation and preprocessing. It offers new recommendations for data quality assessment checks, a novel multimethod imputation (MMI) process for missing data mitigation, a novel imbalanced resampling approach, and minority pattern reconstruction (MPR) led by information theory. This thesis also extends the area of model performance evaluation with a novel classification performance ranking metric called XDistance.
In particular, the experimental results show that building predictive models with the methods guided by our new framework (Octopus) yields domain experts' approval of the new reliable models’ performance. Also, performing the data quality checks and applying the MMI process led healthcare practitioners to outweigh predictive reliability over interpretability. The application of MPR and its hybrid resampling strategies led to better performances in line with experts' success criteria than the traditional imbalanced data resampling techniques. Finally, the use of the XDistance performance ranking metric was found to be more effective in ranking several classifiers' performances while offering an indication of class bias, unlike existing performance metrics
The overall contributions of this thesis can be summarised as follow. First, several data mining techniques were thoroughly assessed to formulate the new Octopus framework to produce new reliable classifiers. In addition, we offer a further understanding of the impact of newly engineered features, the physical activity index (PAI) and biological effective dose (BED). Second, the newly developed methods within the new framework. Finally, the newly accepted developed predictive models help detect adverse health events, namely, visceral fat-associated diseases and advanced breast cancer radiotherapy toxicity side effects. These contributions could be used to guide future theories, experiments and healthcare interventions in preventive medicine and data mining
Semantic discovery and reuse of business process patterns
Patterns currently play an important role in modern information systems (IS) development and their use has mainly been restricted to the design and implementation phases of the development lifecycle. Given the increasing significance of business modelling in IS development, patterns have the potential of providing a viable solution for promoting reusability of recurrent generalized models in the very early stages of development. As a statement of research-in-progress this paper focuses on business process patterns and proposes an initial methodological framework for the discovery and reuse of business process patterns within the IS development lifecycle. The framework borrows ideas from the domain engineering literature and proposes the use of semantics to drive both the discovery of patterns as well as their reuse
The multiple pheromone Ant clustering algorithm
Ant Colony Optimisation algorithms mimic the way ants use pheromones for marking paths to important locations. Pheromone traces are followed and reinforced by other ants, but also evaporate over time. As a consequence, optimal paths attract more pheromone, whilst the less useful paths fade away. In the Multiple Pheromone Ant Clustering Algorithm (MPACA), ants detect features of objects represented as nodes within graph space. Each node has one or more ants assigned to each feature. Ants attempt to locate nodes with matching feature values, depositing pheromone traces on the way. This use of multiple pheromone values is a key innovation. Ants record other ant encounters, keeping a record of the features and colony membership of ants. The recorded values determine when ants should combine their features to look for conjunctions and whether they should merge into colonies. This ability to detect and deposit pheromone representative of feature combinations, and the resulting colony formation, renders the algorithm a powerful clustering tool. The MPACA operates as follows: (i) initially each node has ants assigned to each feature; (ii) ants roam the graph space searching for nodes with matching features; (iii) when departing matching nodes, ants deposit pheromones to inform other ants that the path goes to a node with the associated feature values; (iv) ant feature encounters are counted each time an ant arrives at a node; (v) if the feature encounters exceed a threshold value, feature combination occurs; (vi) a similar mechanism is used for colony merging. The model varies from traditional ACO in that: (i) a modified pheromone-driven movement mechanism is used; (ii) ants learn feature combinations and deposit multiple pheromone scents accordingly; (iii) ants merge into colonies, the basis of cluster formation. The MPACA is evaluated over synthetic and real-world datasets and its performance compares favourably with alternative approaches
Recommended from our members
Smart Grid Risk Management
Current electricity infrastructure is being stressed from several directions -- high demand, unreliable supply, extreme weather conditions, accidents, among others. Infrastructure planners have, traditionally, focused on only the cost of the system; today, resilience and sustainability are increasingly becoming more important. In this dissertation, we develop computational tools for efficiently managing electricity resources to help create a more reliable and sustainable electrical grid. The tools we present in this work will help electric utilities coordinate demand to allow the smooth and large scale integration of renewable sources of energy into traditional grids, as well as provide infrastructure planners and operators in developing countries a framework for making informed planning and control decisions in the presence of uncertainty.
Demand-side management is considered as the most viable solution for maintaining grid stability as generation from intermittent renewable sources increases. Demand-side management, particularly demand response (DR) programs that attempt to alter the energy consumption of customers either by using price-based incentives or up-front power interruption contracts, is more cost-effective and sustainable in addressing short-term supply-demand imbalances when compared with the alternative that involves increasing fossil fuel-based fast spinning reserves. An essential step in compensating participating customers and benchmarking the effectiveness of DR programs is to be able to independently detect the load reduction from observed meter data. Electric utilities implementing automated DR programs through direct load control switches are
also interested in detecting the reduction in demand to efficiently pinpoint non-functioning devices to reduce maintenance costs. We develop sparse optimization methods for detecting a small change in the demand for electricity of a customer in response to a price change or signal from the utility, dynamic learning methods for scheduling the maintenance of direct load control switches whose operating state is not directly observable and can only be inferred from the metered electricity consumption, and machine learning methods for accurately forecasting the load of hundreds of thousands of residential, commercial and industrial customers. These algorithms have been implemented in the software system provided by AutoGrid, Inc., and this system has helped several utilities in the Pacific Northwest, Oklahoma, California and Texas, provide more reliable power to their customers at significantly reduced prices.
Providing power to widely spread out communities in developing countries using the conventional power grid is not economically feasible. The most attractive alternative source of affordable energy for these communities is solar micro-grids. We discuss risk-aware robust methods to optimally size and operate solar micro-grids in the presence of uncertain demand and uncertain renewable generation. These algorithms help system operators to increase their revenue while making their systems more resilient to inclement weather conditions
- …