121 research outputs found

    Observational Evidence for Tidal Interaction in Close Binary Systems

    Full text link
    This paper reviews the rich corpus of observational evidence for tidal effects in short-period binaries. We review the evidence for ellipsoidal variability and for the observational manifestation of apsidal motion in eclipsing binaries. Among the long-term effects, circularization was studied the most, and a transition period between circular and eccentric orbits has been derived for eight coeval samples of binaries. As binaries are supposed to reach synchronization before circularization, one can expect finding eccentric binaries in pseudo-synchronization state, the evidence for which is reviewed. The paper reviews the Rossiter-McLaughlin effect and its potential to study spin-orbit alignment. We discuss the tidal interaction in close binaries that are orbited by a third distant companion, and review the effect of pumping the binary eccentricity by the third star. We then discuss the idea that the tidal interaction induced by the eccentricity modulation can shrink the binary separation. The paper discusses the extrasolar planets and the observational evidence for tidal interaction with their parent stars which can induce radial drift of short-period planets and circularization of planetary orbits. The paper reviews the revolution of the study of binaries that is currently taking place, driven by large-scaled photometric surveys that are detecting many thousands of new binaries and tens of extrasolar planets. In particular, we review several studies that have been used already thousands of lightcurves of eclipsing binaries to study tidal circularization of early-type stars in the LMC.Comment: 67 pages. Review Paper. To appear in "Tidal effects in stars, planets and disks", M.-J. Goupil and J.-P. Zahn (eds.), EAS Publications Serie

    Clustering in the Big Data Era: methods for efficient approximation, distribution, and parallelization

    Get PDF
    Data clustering is an unsupervised machine learning task whose objective is to group together similar items. As a versatile data mining tool, data clustering has numerous applications, such as object detection and localization using data from 3D laser-based sensors, finding popular routes using geolocation data, and finding similar patterns of electricity consumption using smart meters.The datasets in modern IoT-based applications are getting more and more challenging for conventional clustering schemes. Big Data is a term used to loosely describe hard-to-manage datasets. Particularly, large numbers of data points, high rates of data production, large numbers of dimensions, high skewness, and distributed data sources are aspects that challenge the classical data processing schemes, including clustering methods. This thesis contributes to efficient big data clustering for distributed and parallel computing architectures, representative of the processing environments in edge-cloud computing continuum. The thesis also proposes approximation techniques to cope with certain challenging aspects of big data.Regarding distributed clustering, the thesis proposes MAD-C, abbreviating Multi-stage Approximate Distributed Cluster-Combining. MAD-C leverages an approximation-based data synopsis that drastically lowers the required communication bandwidth among the distributed nodes and achieves multiplicative savings in computation time, compared to a baseline that centrally gathers and clusters the data. The thesis shows MAD-C can be used to detect and localize objects using data from distributed 3D laser-based sensors with high accuracy. Furthermore, the work in the thesis shows how to utilize MAD-C to efficiently detect the objects within a restricted area for geofencing purposes.Regarding parallel clustering, the thesis proposes a family of algorithms called PARMA-CC, abbreviating Parallel Multistage Approximate Cluster Combining. Using approximation-based data synopsis, PARMA-CC algorithms achieve scalability on multi-core systems by facilitating parallel execution of threads with limited dependencies which get resolved using fine-grained synchronization techniques. To further enhance the efficiency, PARMA-CC algorithms can be configured with respect to different data properties. Analytical and empirical evaluations show PARMA-CC algorithms achieve significantly higher scalability than the state-of-the-art methods while preserving a high accuracy.On parallel high dimensional clustering, the thesis proposes IP.LSH.DBSCAN, abbreviating Integrated Parallel Density-Based Clustering through Locality-Sensitive Hashing (LSH). IP.LSH.DBSCAN fuses the process of creating an LSH index into the process of data clustering, and it takes advantage of data parallelization and fine-grained synchronization. Analytical and empirical evaluations show IP.LSH.DBSCAN facilitates parallel density-based clustering of massive datasets using desired distance measures resulting in several orders of magnitude lower latency than state-of-the-art for high dimensional data.In essence, the thesis proposes methods and algorithmic implementations targeting the problem of big data clustering and applications using distributed and parallel processing. The proposed methods (available as open source software) are extensible and can be used in combination with other methods

    Synchronization Inspired Data Mining

    Get PDF
    Advances of modern technologies produce huge amounts of data in various fields, increasing the need for efficient and effective data mining tools to uncover the information contained implicitly in the data. This thesis mainly aims to propose innovative and solid algorithms for data mining from a novel perspective: synchronization. Synchronization is a prevalent phenomenon in nature that a group of events spontaneously come into co-occurrence with a common rhythm through mutual interactions. The mechanism of synchronization allows controlling of complex processes by simple operations based on interactions between objects. The first main part of this thesis focuses on developing the innovative algorithms for data mining. Inspired by the concept of synchronization, this thesis presents Sync (Clustering by Synchronization), a novel approach to clustering. In combination with the Minimum Description Length principle (MDL), it allows discovering the intrinsic clusters without any data distribution assumptions and parameters setting. In addition, relying on the dierent dynamic behaviors of objects during the process towards synchronization,the algorithm SOD (Synchronization-based Outlier Detection) is further proposed. The outlier objects can be naturally flagged by the denition of Local Synchronization Factor (LSF). To cure the curse of dimensionality in clustering,a subspace clustering algorithm ORSC is introduced which automatically detects clusters in subspaces of the original feature space. This approach proposes a weighted local interaction model to ensure all objects in a common cluster, which accommodate in arbitrarily oriented subspace, naturally move together. In order to reveal the underlying patterns in graphs, a graph partitioning approach RSGC (Robust Synchronization-based Graph Clustering) is presented. The key philosophy of RSGC is to consider graph clustering as a dynamic process towards synchronization. Inherited from the powerful concept of synchronization, RSGC shows several desirable properties that don't exist in other competitive methods. For all presented algorithms, their efficiency and eectiveness are thoroughly analyzed. The benets over traditional approaches are further demonstrated by evaluating them on synthetic as well as real-world data sets. Not only the theory research on novel data mining algorithms, the second main part of the thesis focuses on brain network analysis based on Diusion Tensor Images (DTI). A new framework for automated white matter tracts clustering is rst proposed to identify the meaningful ber bundles in the Human Brain by combining ideas from time series mining with density-based clustering. Subsequently, the enhancement and variation of this approach is discussed allowing for a more robust, efficient, or eective way to find hierarchies of ber bundles. Based on the structural connectivity network, an automated prediction framework is proposed to analyze and understand the abnormal patterns in patients of Alzheimer's Disease

    Enhanced non-parametric sequence learning scheme for internet of things sensory data in cloud infrastructure

    Get PDF
    The Internet of Things (IoT) Cloud is an emerging technology that enables machine-to-machine, human-to-machine and human-to-human interaction through the Internet. IoT sensor devices tend to generate sensory data known for their dynamic and heterogeneous nature. Hence, it makes it elusive to be managed by the sensor devices due to their limited computation power and storage space. However, the Cloud Infrastructure as a Service (IaaS) leverages the limitations of the IoT devices by making its computation power and storage resources available to execute IoT sensory data. In IoT-Cloud IaaS, resource allocation is the process of distributing optimal resources to execute data request tasks that comprise data filtering operations. Recently, machine learning, non-heuristics, multi-objective and hybrid algorithms have been applied for efficient resource allocation to execute IoT sensory data filtering request tasks in IoT-enabled Cloud IaaS. However, the filtering task is still prone to some challenges. These challenges include global search entrapment of event and error outlier detection as the dimension of the dataset increases in size, the inability of missing data recovery for effective redundant data elimination and local search entrapment that leads to unbalanced workloads on available resources required for task execution. In this thesis, the enhancement of Non-Parametric Sequence Learning (NPSL), Perceptually Important Point (PIP) and Efficient Energy Resource Ranking- Virtual Machine Selection (ERVS) algorithms were proposed. The Non-Parametric Sequence-based Agglomerative Gaussian Mixture Model (NPSAGMM) technique was initially utilized to improve the detection of event and error outliers in the global space as the dimension of the dataset increases in size. Then, Perceptually Important Points K-means-enabled Cosine and Manhattan (PIP-KCM) technique was employed to recover missing data to improve the elimination of duplicate sensed data records. Finally, an Efficient Resource Balance Ranking- based Glow-warm Swarm Optimization (ERBV-GSO) technique was used to resolve the local search entrapment for near-optimal solutions and to reduce workload imbalance on available resources for task execution in the IoT-Cloud IaaS platform. Experiments were carried out using the NetworkX simulator and the results of N-PSAGMM, PIP-KCM and ERBV-GSO techniques with N-PSL, PIP, ERVS and Resource Fragmentation Aware (RF-Aware) algorithms were compared. The experimental results showed that the proposed NPSAGMM, PIP-KCM, and ERBV-GSO techniques produced a tremendous performance improvement rate based on 3.602%/6.74% Precision, 9.724%/8.77% Recall, 5.350%/4.42% Area under Curve for the detection of event and error outliers. Furthermore, the results indicated an improvement rate of 94.273% F1-score, 0.143 Reduction Ratio, and with minimum 0.149% Root Mean Squared Error for redundant data elimination as well as the minimum number of 608 Virtual Machine migrations, 47.62% Resource Utilization and 41.13% load balancing degree for the allocation of desired resources deployed to execute sensory data filtering tasks respectively. Therefore, the proposed techniques have proven to be effective for improving the load balancing of allocating the desired resources to execute efficient outlier (Event and Error) detection and eliminate redundant data records in the IoT-based Cloud IaaS Infrastructure

    4D Nucleome of Cancer

    Full text link
    Chromosomal translocations and aneuploidy are hallmarks of cancer genomes; however, the impact of these aberrations on the nucleome (i.e., nuclear structure and gene expression) are not yet understood. This dissertation aims to understand the changes in nuclear structure and function that occur as a result of cancer, i.e., the 4D nucleome of cancer. Understanding of nuclear shape and organization and how it changes over time in both healthy cells as well as cancer cells is an area of exploration through the 4D nucleome project. First, I explore healthy cells including periodic changes in nuclear shape as fibroblasts cells grow and divide. Shape and volume changed significantly over the time series including a periodic frequency consistent with the cell cycle. Next, combined analysis of genome wide chromosome conformation capture and RNA-sequencing data identified regions with different expression or interactions in cells grown in 2D or 3D cell culture. Next, I elucidate how chromosomal aberrations affect the nucleome of cancer cells. A high copy number region is studied, and we show that around sites of translocation, chromatin accessibility more directly reflects transcription. The methods developed, including a new copy number based normalization method, were released in the 4D nucleome analysis toolbox (NAT), a publicly available MATLAB toolbox allowing others to use the tools for assessment of the nucleome. Finally, I describe continuing projects. By comparing cancer stem cells to non- stem cell like cancer cells, a bin on chromosome 8 was identified that includes two stem cell related transcription factors, POU5F1B and MYC. Then tools for evaluating allele specific expression are developed and used to measure how allele specific structure and function varies through the cell cycle. This work creates a foundation for robust analysis of chromosome conformation and provides insight into the effect of nuclear organization in cancer.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/140814/1/laseaman_1.pd

    Intelligent Sensor Networks

    Get PDF
    In the last decade, wireless or wired sensor networks have attracted much attention. However, most designs target general sensor network issues including protocol stack (routing, MAC, etc.) and security issues. This book focuses on the close integration of sensing, networking, and smart signal processing via machine learning. Based on their world-class research, the authors present the fundamentals of intelligent sensor networks. They cover sensing and sampling, distributed signal processing, and intelligent signal learning. In addition, they present cutting-edge research results from leading experts

    Towards Comprehensive Foundations of Computational Intelligence

    Full text link
    Abstract. Although computational intelligence (CI) covers a vast variety of different methods it still lacks an integrative theory. Several proposals for CI foundations are discussed: computing and cognition as compression, meta-learning as search in the space of data models, (dis)similarity based methods providing a framework for such meta-learning, and a more general approach based on chains of transformations. Many useful transformations that extract information from features are discussed. Heterogeneous adaptive systems are presented as particular example of transformation-based systems, and the goal of learning is redefined to facilitate creation of simpler data models. The need to understand data structures leads to techniques for logical and prototype-based rule extraction, and to generation of multiple alternative models, while the need to increase predictive power of adaptive models leads to committees of competent models. Learning from partial observations is a natural extension towards reasoning based on perceptions, and an approach to intuitive solving of such problems is presented. Throughout the paper neurocognitive inspirations are frequently used and are especially important in modeling of the higher cognitive functions. Promising directions such as liquid and laminar computing are identified and many open problems presented.

    Advanced analysis and visualisation techniques for atmospheric data

    Get PDF
    Atmospheric science is the study of a large, complex system which is becoming increasingly important to understand. There are many climate models which aim to contribute to that understanding by computational simulation of the atmosphere. To generate these models, and to confirm the accuracy of their outputs, requires the collection of large amounts of data. These data are typically gathered during campaigns lasting a few weeks, during which various sources of measurements are used. Some are ground based, others airborne sondes, but one of the primary sources is from measurement instruments on board aircraft. Flight planning for the numerous sorties is based on pre-determined goals with unpredictable influences, such as weather patterns, and the results of some limited analyses of data from previous sorties. There is little scope for adjusting the flight parameters during the sortie based on the data received due to the large volumes of data and difficulty in processing the data online. The introduction of unmanned aircraft with extended flight durations also requires a team of mission scientists with the added complications of disseminating observations between shifts. Earth’s atmosphere is a non-linear system, whereas the data gathered is sampled at discrete temporal and spatial intervals introducing a source of variance. Clustering data provides a convenient way of grouping similar data while also acknowledging that, for each discrete sample, a minor shift in time and/ or space could produce a range of values which lie within its cluster region. This thesis puts forward a set of requirements to enable the presentation of cluster analyses to the mission scientist in a convenient and functional manner. This will enable in-flight decision making as well as rapid feedback for future flight planning. Current state of the art clustering algorithms are analysed and a solution to all of the proposed requirements is not found. New clustering algorithms are developed to achieve these goals. These novel clustering algorithms are brought together, along with other visualization techniques, into a software package which is used to demonstrate how the analyses can provide information to mission scientists in flight. The ability to carry out offline analyses on historical data, whether to reproduce the online analyses of the current sortie, or to provide comparative analyses from previous missions, is also demonstrated. Methods for offline analyses of historical data prior to continuing the analyses in an online manner are also considered. The original contributions in this thesis are the development of five new clustering algorithms which address key challenges: speed and accuracy for typical hyper-elliptical offline clustering; speed and accuracy for offline arbitrarily shaped clusters; online dynamic and evolving clustering for arbitrary shaped clusters; transitions between offline and online techniques and also the application of these techniques to atmospheric science data analysis
    corecore