12 research outputs found

    Time series data mining: preprocessing, analysis, segmentation and prediction. Applications

    Get PDF
    Currently, the amount of data which is produced for any information system is increasing exponentially. This motivates the development of automatic techniques to process and mine these data correctly. Specifically, in this Thesis, we tackled these problems for time series data, that is, temporal data which is collected chronologically. This kind of data can be found in many fields of science, such as palaeoclimatology, hydrology, financial problems, etc. TSDM consists of several tasks which try to achieve different objectives, such as, classification, segmentation, clustering, prediction, analysis, etc. However, in this Thesis, we focus on time series preprocessing, segmentation and prediction. Time series preprocessing is a prerequisite for other posterior tasks: for example, the reconstruction of missing values in incomplete parts of time series can be essential for clustering them. In this Thesis, we tackled the problem of massive missing data reconstruction in SWH time series from the Gulf of Alaska. It is very common that buoys stop working for different periods, what it is usually related to malfunctioning or bad weather conditions. The relation of the time series of each buoy is analysed and exploited to reconstruct the whole missing time series. In this context, EANNs with PUs are trained, showing that the resulting models are simple and able to recover these values with high precision. In the case of time series segmentation, the procedure consists in dividing the time series into different subsequences to achieve different purposes. This segmentation can be done trying to find useful patterns in the time series. In this Thesis, we have developed novel bioinspired algorithms in this context. For instance, for paleoclimate data, an initial genetic algorithm was proposed to discover early warning signals of TPs, whose detection was supported by expert opinions. However, given that the expert had to individually evaluate every solution given by the algorithm, the evaluation of the results was very tedious. This led to an improvement in the body of the GA to evaluate the procedure automatically. For significant wave height time series, the objective was the detection of groups which contains extreme waves, i.e. those which are relatively large with respect other waves close in time. The main motivation is to design alert systems. This was done using an HA, where an LS process was included by using a likelihood-based segmentation, assuming that the points follow a beta distribution. Finally, the analysis of similarities in different periods of European stock markets was also tackled with the aim of evaluating the influence of different markets in Europe. When segmenting time series with the aim of reducing the number of points, different techniques have been proposed. However, it is an open challenge given the difficulty to operate with large amounts of data in different applications. In this work, we propose a novel statistically-driven CRO algorithm (SCRO), which automatically adapts its parameters during the evolution, taking into account the statistical distribution of the population fitness. This algorithm improves the state-of-the-art with respect to accuracy and robustness. Also, this problem has been tackled using an improvement of the BBPSO algorithm, which includes a dynamical update of the cognitive and social components in the evolution, combined with mathematical tricks to obtain the fitness of the solutions, which significantly reduces the computational cost of previously proposed coral reef methods. Also, the optimisation of both objectives (clustering quality and approximation quality), which are in conflict, could be an interesting open challenge, which will be tackled in this Thesis. For that, an MOEA for time series segmentation is developed, improving the clustering quality of the solutions and their approximation. The prediction in time series is the estimation of future values by observing and studying the previous ones. In this context, we solve this task by applying prediction over high-order representations of the elements of the time series, i.e. the segments obtained by time series segmentation. This is applied to two challenging problems, i.e. the prediction of extreme wave height and fog prediction. On the one hand, the number of extreme values in SWH time series is less with respect to the number of standard values. In this way, the prediction of these values cannot be done using standard algorithms without taking into account the imbalanced ratio of the dataset. For that, an algorithm that automatically finds the set of segments and then applies EANNs is developed, showing the high ability of the algorithm to detect and predict these special events. On the other hand, fog prediction is affected by the same problem, that is, the number of fog events is much lower tan that of non-fog events, requiring a special treatment too. A preprocessing of different data coming from sensors situated in different parts of the Valladolid airport are used for making a simple ANN model, which is physically corroborated and discussed. The last challenge which opens new horizons is the estimation of the statistical distribution of time series to guide different methodologies. For this, the estimation of a mixed distribution for SWH time series is then used for fixing the threshold of POT approaches. Also, the determination of the fittest distribution for the time series is used for discretising it and making a prediction which treats the problem as ordinal classification. The work developed in this Thesis is supported by twelve papers in international journals, seven papers in international conferences, and four papers in national conferences

    Data mining and exploration of the Nuclear Science References

    Get PDF
    124 leaves : col. ill. ; 29 cm.Includes abstract and appendix.Includes bibliographical references (leaves 122-124).The Nuclear Science References (NSR) is a carefully curated bibliographic dataset focused on nuclear science literature. A domain-specific search engine and supporting tools have been developed to aid and encourage the exploration of the NSR. User queries are analyzed to form a series of filters to retrieve relevant NSR entries from a database. The resulting information is presented in multiple views including lists, bar charts, and network graphs. The network graph representations offer unique insights on collaborations centered around a given parameter such as a nuclide or group of authors. The capability of clustering algorithms to expose trends within the dataset is demonstrated by clustering authors based on publication traits. A vector space model based on the metadata provided in the NSR is used to recommend semantically similar NSR entries. The completed work serves as both an example and a framework for future analysis of the NSR

    Towards Collaborative Scientific Workflow Management System

    Get PDF
    The big data explosion phenomenon has impacted several domains, starting from research areas to divergent of business models in recent years. As this intensive amount of data opens up the possibilities of several interesting knowledge discoveries, over the past few years divergent of research domains have undergone the shift of trend towards analyzing those massive amount data. Scientific Workflow Management System (SWfMS) has gained much popularity in recent years in accelerating those data-intensive analyses, visualization, and discoveries of important information. Data-intensive tasks are often significantly time-consuming and complex in nature and hence SWfMSs are designed to efficiently support the specification, modification, execution, failure handling, and monitoring of the tasks in a scientific workflow. As far as the complexity, dimension, and volume of data are concerned, their effective analysis or management often become challenging for an individual and requires collaboration of multiple scientists instead. Hence, the notion of 'Collaborative SWfMS' was coined - which gained significant interest among researchers in recent years as none of the existing SWfMSs directly support real-time collaboration among scientists. In terms of collaborative SWfMSs, consistency management in the face of conflicting concurrent operations of the collaborators is a major challenge for its highly interconnected document structure among the computational modules - where any minor change in a part of the workflow can highly impact the other part of the collaborative workflow for the datalink relation among them. In addition to the consistency management, studies show several other challenges that need to be addressed towards a successful design of collaborative SWfMSs, such as sub-workflow composition and execution by different sub-groups, relationship between scientific workflows and collaboration models, sub-workflow monitoring, seamless integration and access control of the workflow components among collaborators and so on. In this thesis, we propose a locking scheme to facilitate consistency management in collaborative SWfMSs. The proposed method works by locking workflow components at a granular attribute level in addition to supporting locks on a targeted part of the collaborative workflow. We conducted several experiments to analyze the performance of the proposed method in comparison to related existing methods. Our studies show that the proposed method can reduce the average waiting time of a collaborator by up to 36% while increasing the average workflow update rate by up to 15% in comparison to existing descendent modular level locking techniques for collaborative SWfMSs. We also propose a role-based access control technique for the management of collaborative SWfMSs. We leverage the Collaborative Interactive Application Methodology (CIAM) for the investigation of role-based access control in the context of collaborative SWfMSs. We present our proposed method with a use-case of Plant Phenotyping and Genotyping research domain. Recent study shows that the collaborative SWfMSs often different sets of opportunities and challenges. From our investigations on existing research works towards collaborative SWfMSs and findings of our prior two studies, we propose an architecture of collaborative SWfMSs. We propose - SciWorCS - a Collaborative Scientific Workflow Management System as a proof of concept of the proposed architecture; which is the first of its kind to the best of our knowledge. We present several real-world use-cases of scientific workflows using SciWorCS. Finally, we conduct several user studies using SciWorCS comprising different real-world scientific workflows (i.e., from myExperiment) to understand the user behavior and styles of work in the context of collaborative SWfMSs. In addition to evaluating SciWorCS, the user studies reveal several interesting facts which can significantly contribute in the research domain, as none of the existing methods considered such empirical studies, and rather relied only on computer generated simulated studies for evaluation

    Algorithms in E-recruitment Systems

    Get PDF

    Bionano-Interfaces through Peptide Design

    Get PDF
    The clinical success of restoring bone and tooth function through implants critically depends on the maintenance of an infection-free, integrated interface between the host tissue and the biomaterial surface. The surgical site infections, which are the infections within one year of surgery, occur in approximately 160,000-300,000 cases in the US annually. Antibiotics are the conventional treatment for the prevention of infections. They are becoming ineffective due to bacterial antibiotic-resistance from their wide-spread use. There is an urgent need both to combat bacterial drug resistance through new antimicrobial agents and to limit the spread of drug resistance by limiting their delivery to the implant site. This work aims to reduce surgical site infections from implants by designing of chimeric antimicrobial peptides to integrate a novel and effective delivery method. In recent years, antimicrobial peptides (AMPs) have attracted interest as natural sources for new antimicrobial agents. By being part of the immune system in all life forms, they are examples of antibacterial agents with successfully maintained efficacy across evolutionary time. Both natural and synthetic AMPs show significant promise for solving the antibiotic resistance problems. In this work, AMP1 and AMP2 was shown to be active against three different strains of pathogens in Chapter 4. In the literature, these peptides have been shown to be effective against multi-drug resistant bacteria. However, their effective delivery to the implantation site limits their clinical use. In recent years, different groups adapted covalent chemistry-based or non-specific physical adsorption methods for antimicrobial peptide coatings on implant surfaces. Many of these procedures use harsh chemical conditions requiring multiple reaction steps. Furthermore, none of these methods allow the orientation control of these molecules on the surfaces, which is an essential consideration for biomolecules. In the last few decades, solid binding peptides attracted high interest due to their material specificity and self-assembly properties. These peptides offer robust surface adsorption and assembly in diverse applications. In this work, a design method for chimeric antimicrobial peptides that can self-assemble and self-orient onto biomaterial surfaces was demonstrated. Three specific aims used to address this two-fold strategy of self-assembly and self-orientation are: 1) Develop classification and design methods using rough set theory and genetic algorithm search to customize antibacterial peptides; 2) Develop chimeric peptides by designing spacer sequences to improve the activity of antimicrobial peptides on titanium surfaces; 3) Verify the approach as an enabling technology by expanding the chimeric design approach to other biomaterials. In Aim 1, a peptide classification tool was developed because the selection of an antimicrobial peptide for an application was difficult among the thousands of peptide sequences available. A rule-based rough-set theory classification algorithm was developed to group antimicrobial peptides by chemical properties. This work is the first time that rough set theory has been applied to peptide activity analysis. The classification method on benchmark data sets resulted in low false discovery rates. The novel rough set theory method was combined with a novel genetic algorithm search, resulting in a method for customizing active antibacterial peptides using sequence-based relationships. Inspired by the fact that spacer sequences play critical roles between functional protein domains, in Aim 2, chimeric peptides were designed to combine solid binding functionality with antimicrobial functionality. To improve how these functions worked together in the same peptide sequence, new spacer sequences were engineered. The rough set theory method from Aim 1 was used to find structure-based relationships to discover new spacer sequences which improved the antimicrobial activity of the chimeric peptides. In Aim 3, the proposed approach is demonstrated as an enabling technology. In this work, calcium phosphate was tested and verified the modularity of the chimeric antimicrobial self-assembling peptide approach. Other chimeric peptides were designed for common biomaterials zirconia and urethane polymer. Finally, an antimicrobial peptide was engineered for a dental adhesive system toward applying spacer design concepts to optimize the antimicrobial activity

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum

    Proceedings of the 23rd International Conference of the International Federation of Operational Research Societies

    Full text link
    corecore