6 research outputs found
Advances in knowledge discovery and data mining Part II
19th Pacific-Asia Conference, PAKDD 2015, Ho Chi Minh City, Vietnam, May 19-22, 2015, Proceedings, Part II</p
Recommended from our members
Analyzing, Mining, and Predicting Networked Behaviors
Network structure exists in various types of data in the real world, such as online and offline social networks, traffic networks, computer networks, brain networks, and countless other cases where there are relationships between different entities in the data. What are the roles of network structures in these data? First, the network captures inherent characteristics of the data themselves. This is clear from the definition of the network, which represents the relationship between entities: e.g., the social links among people in a social network describe how they interact with each other; a road network summarizes how the roads are laid out geographically; a brain network obtained from fMRI images represents pairs of brain regions that are active at the same time; a computer network constrains the paths via which internet packages and thus information or viruses can spread. Second, the network structures affect the evolution of the data over time. For example, new friendship links in an online social network are frequently created between friends of friends. Similarly, the current road network structure is without a doubt taken into consideration when roads are added or temporarily closed. As we grow, our brains also grow, including the additions of useful links or the clean up of unnecessary links between brain regions. Third, the network structures act as guidance for many different processes happening in the data. For instance, the links between users on social network dictate how gossips can spread; the roads influence how traffic flows in a city; the links between brain regions affects the way we think and how effectively we do things; the connections between computers route the transfer of any information on the internet.In this thesis, I studied the network effect in various networked behaviors, including analyzing such effect, finding its patterns, and predicting future networked behaviors. First, I gained insights into the data by analyzing the accompanied network structures as well as its evolution. Second, I proposed algorithms for mining different network patterns that help summarize the effect of the network structures on different networked behaviors. Finally, I proposed models to predict the evolution of networked behaviors over time. Toward these tasks, I explored a wide variety of network data, including protein-protein interaction networks, online social networks, collaboration networks, chemical compounds, and traffic networks. Overall, I tackled these network data in different aspects and developed a number of methods for effectively mining and forecasting networked behaviors in data
AVATAR - Machine Learning Pipeline Evaluation Using Surrogate Model
© 2020, The Author(s). The evaluation of machine learning (ML) pipelines is essential during automatic ML pipeline composition and optimisation. The previous methods such as Bayesian-based and genetic-based optimisation, which are implemented in Auto-Weka, Auto-sklearn and TPOT, evaluate pipelines by executing them. Therefore, the pipeline composition and optimisation of these methods requires a tremendous amount of time that prevents them from exploring complex pipelines to find better predictive models. To further explore this research challenge, we have conducted experiments showing that many of the generated pipelines are invalid, and it is unnecessary to execute them to find out whether they are good pipelines. To address this issue, we propose a novel method to evaluate the validity of ML pipelines using a surrogate model (AVATAR). The AVATAR enables to accelerate automatic ML pipeline composition and optimisation by quickly ignoring invalid pipelines. Our experiments show that the AVATAR is more efficient in evaluating complex pipelines in comparison with the traditional evaluation approaches requiring their execution
SIS 2017. Statistics and Data Science: new challenges, new generations
The 2017 SIS Conference aims to highlight the crucial role of the Statistics in Data Science. In this new domain of ‘meaning’ extracted from the data, the increasing amount of produced and available data in databases, nowadays, has brought new challenges. That involves different fields of statistics, machine learning, information and computer science, optimization, pattern recognition. These afford together a considerable contribute in the analysis of ‘Big data’, open data, relational and complex data, structured and no-structured. The interest is to collect the contributes which provide from the different domains of Statistics, in the high dimensional data quality validation, sampling extraction, dimensional reduction, pattern selection, data modelling, testing hypotheses and confirming conclusions drawn from the data