2,776 research outputs found
Inductive queries for a drug designing robot scientist
It is increasingly clear that machine learning algorithms need to be integrated in an iterative scientific discovery loop, in which data is queried repeatedly by means of inductive queries and where the computer provides guidance to the experiments that are being performed. In this chapter, we summarise several key challenges in achieving this integration of machine learning and data mining algorithms in methods for the discovery of Quantitative Structure Activity Relationships (QSARs). We introduce the concept of a robot scientist, in which all steps of the discovery process are automated; we discuss the representation of molecular data such that knowledge discovery tools can analyse it, and we discuss the adaptation of machine learning and data mining algorithms to guide QSAR experiments
Recursion Aware Modeling and Discovery For Hierarchical Software Event Log Analysis (Extended)
This extended paper presents 1) a novel hierarchy and recursion extension to
the process tree model; and 2) the first, recursion aware process model
discovery technique that leverages hierarchical information in event logs,
typically available for software systems. This technique allows us to analyze
the operational processes of software systems under real-life conditions at
multiple levels of granularity. The work can be positioned in-between reverse
engineering and process mining. An implementation of the proposed approach is
available as a ProM plugin. Experimental results based on real-life (software)
event logs demonstrate the feasibility and usefulness of the approach and show
the huge potential to speed up discovery by exploiting the available hierarchy.Comment: Extended version (14 pages total) of the paper Recursion Aware
Modeling and Discovery For Hierarchical Software Event Log Analysis. This
Technical Report version includes the guarantee proofs for the proposed
discovery algorithm
TGSum: Build Tweet Guided Multi-Document Summarization Dataset
The development of summarization research has been significantly hampered by
the costly acquisition of reference summaries. This paper proposes an effective
way to automatically collect large scales of news-related multi-document
summaries with reference to social media's reactions. We utilize two types of
social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to
cluster documents into different topic sets. Also, a tweet with a hyper-link
often highlights certain key points of the corresponding document. We
synthesize a linked document cluster to form a reference summary which can
cover most key points. To this aim, we adopt the ROUGE metrics to measure the
coverage ratio, and develop an Integer Linear Programming solution to discover
the sentence set reaching the upper bound of ROUGE. Since we allow summary
sentences to be selected from both documents and high-quality tweets, the
generated reference summaries could be abstractive. Both informativeness and
readability of the collected summaries are verified by manual judgment. In
addition, we train a Support Vector Regression summarizer on DUC generic
multi-document summarization benchmarks. With the collected data as extra
training resource, the performance of the summarizer improves a lot on all the
test sets. We release this dataset for further research.Comment: 7 pages, 1 figure in AAAI 201
Identification of Tumor Evolution Patterns by Means of Inductive Logic Programming
In considering key events of genomic disorders in the development and progression of cancer, the correlation between genomic instability and carcinogenesis is currently under investigation. In this work, we propose an inductive logic programming approach to the problem of modeling evolution patterns for breast cancer. Using this approach, it is possible to extract fingerprints of stages of the disease that can be used in order to develop and deliver the most adequate therapies to patients. Furthermore, such a model can help physicians and biologists in the elucidation of molecular dynamics underlying the aberrations-waterfall model behind carcinogenesis. By showing results obtained on a real-world dataset, we try to give some hints about further approach to the knowledge-driven validations of such hypotheses
Argumentation for machine learning: a survey
Existing approaches using argumentation to aid or improve machine learning differ in the type of machine learning technique they consider, in their use of argumentation and in their choice of argumentation framework and semantics. This paper presents a survey of this relatively young field highlighting, in particular, its achievements to date, the applications it has been used for as well as the benefits brought about by the use of argumentation, with an eye towards its future
- …