Search CORE

14,033 research outputs found

How to solve it by knowledge mining

Author: Sampaio Pedro Rafael Falcone
Publication venue
Publication date: 01/10/1995
Field of study

Frequently we become amazed with the increasing number of problems to be solved that fiourish while facing daily activities. Often, related to these problems we llave also an incredible amount oí data. Since we cannot allways afford time and resources to sol ve them, we keep on gathering and storing data in large databases, widening the gap between raw and interpreted data. At this point we should refiect about Polya's maxima "A great discovery solves a great problem" and realize that databases encompass the knowledge necessary for guiding the decision making process. The question that remains is how to organize and explore this knowledge. This paper presents sorne approaches to knowledge discovery in databases íound in the literature, analyzing issues in classifying and clustering large data sets.Eje: 2do. Workshop sobre aspectos teóricos de la inteligencia artificialRed de Universidades con Carreras en Informática (RedUNCI

AI Solutions for MDS: Artificial Intelligence Techniques for Misuse Detection and Localisation in Telecommunication Environments

Author: Anyakoha Chukwudi
Bauerdick H.
Gottfried B.
Mintram Robert
Muthuraman S.
Phalp Keith T.
Vincent Jonathan
Publication venue: 'Indiana University Press (Project Muse)'
Publication date: 15/07/2006
Field of study

This report considers the application of Articial Intelligence (AI) techniques to the problem of misuse detection and misuse localisation within telecommunications environments. A broad survey of techniques is provided, that covers inter alia rule based systems, model-based systems, case based reasoning, pattern matching, clustering and feature extraction, articial neural networks, genetic algorithms, arti cial immune systems, agent based systems, data mining and a variety of hybrid approaches. The report then considers the central issue of event correlation, that is at the heart of many misuse detection and localisation systems. The notion of being able to infer misuse by the correlation of individual temporally distributed events within a multiple data stream environment is explored, and a range of techniques, covering model based approaches, `programmed' AI and machine learning paradigms. It is found that, in general, correlation is best achieved via rule based approaches, but that these suffer from a number of drawbacks, such as the difculty of developing and maintaining an appropriate knowledge base, and the lack of ability to generalise from known misuses to new unseen misuses. Two distinct approaches are evident. One attempts to encode knowledge of known misuses, typically within rules, and use this to screen events. This approach cannot generally detect misuses for which it has not been programmed, i.e. it is prone to issuing false negatives. The other attempts to `learn' the features of event patterns that constitute normal behaviour, and, by observing patterns that do not match expected behaviour, detect when a misuse has occurred. This approach is prone to issuing false positives, i.e. inferring misuse from innocent patterns of behaviour that the system was not trained to recognise. Contemporary approaches are seen to favour hybridisation, often combining detection or localisation mechanisms for both abnormal and normal behaviour, the former to capture known cases of misuse, the latter to capture unknown cases. In some systems, these mechanisms even work together to update each other to increase detection rates and lower false positive rates. It is concluded that hybridisation offers the most promising future direction, but that a rule or state based component is likely to remain, being the most natural approach to the correlation of complex events. The challenge, then, is to mitigate the weaknesses of canonical programmed systems such that learning, generalisation and adaptation are more readily facilitated

Bournemouth University Research Online

Probabilistic analysis of the human transcriptome with side information

Author: Lahti Leo
Publication venue
Publication date: 01/01/2010
Field of study

Understanding functional organization of genetic information is a major challenge in modern biology. Following the initial publication of the human genome sequence in 2001, advances in high-throughput measurement technologies and efficient sharing of research material through community databases have opened up new views to the study of living organisms and the structure of life. In this thesis, novel computational strategies have been developed to investigate a key functional layer of genetic information, the human transcriptome, which regulates the function of living cells through protein synthesis. The key contributions of the thesis are general exploratory tools for high-throughput data analysis that have provided new insights to cell-biological networks, cancer mechanisms and other aspects of genome function. A central challenge in functional genomics is that high-dimensional genomic observations are associated with high levels of complex and largely unknown sources of variation. By combining statistical evidence across multiple measurement sources and the wealth of background information in genomic data repositories it has been possible to solve some the uncertainties associated with individual observations and to identify functional mechanisms that could not be detected based on individual measurement sources. Statistical learning and probabilistic models provide a natural framework for such modeling tasks. Open source implementations of the key methodological contributions have been released to facilitate further adoption of the developed methods by the research community.Comment: Doctoral thesis. 103 pages, 11 figure

arXiv.org e-Print Archive

Aaltodoc Publication Archive

Fossil gaps inferred from phylogenies alter the apparent nature of diversification in dragonflies and their relatives

Author: Davis Robert B
Mayhew Peter J
Nicholson David B
Saunders Emily LR
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The fossil record has suggested that clade growth may differ in marine and terrestrial taxa, supporting equilibrial models in the former and expansionist models in the latter. However, incomplete sampling may bias findings based on fossil data alone. To attempt to correct for such bias, we assemble phylogenetic supertrees on one of the oldest clades of insects, the Odonatoidea (dragonflies, damselflies and their extinct relatives), using MRP and MRC. We use the trees to determine when, and in what clades, changes in taxonomic richness have occurred. We then test whether equilibrial or expansionist models are supported by fossil data alone, and whether findings differ when phylogenetic information is used to infer gaps in the fossil record. Results There is broad agreement in family-level relationships between both supertrees, though with some uncertainty along the backbone of the tree regarding dragonflies (Anisoptera). "Anisozygoptera" are shown to be paraphyletic when fossil information is taken into account. In both trees, decreases in net diversification are associated with species-poor extant families (Neopetaliidae, Hemiphlebiidae), and an upshift is associated with Calopterygidae + Polythoridae. When ghost ranges are inferred from the fossil record, many families are shown to have much earlier origination dates. In a phylogenetic context, the number of family-level lineages is shown to be up to twice as high as the fossil record alone suggests through the Cretaceous and Cenozoic, and a logistic increase in richness is detected in contrast to an exponential increase indicated by fossils alone. Conclusions Our analysis supports the notion that taxa, which appear to have diversified exponentially using fossil data, may in fact have diversified more logistically. This in turn suggests that one of the major apparent differences between the marine and terrestrial fossil record may simply be an artifact of incomplete sampling. Our results also support previous notions that adult colouration plays an important role in odonate radiation, and that Anisozygoptera should be grouped in a single inclusive taxon with Anisoptera, separate from Zygoptera.</p

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

New Fundamental Technologies in Data Mining

Author
Publication venue: 'IntechOpen'
Publication date: 20/04/2021
Field of study

The progress of data mining technology and large public popularity establish a need for a comprehensive text on the subject. The series of books entitled by "Data Mining" address the need by presenting in-depth description of novel mining algorithms and many useful applications. In addition to understanding each section deeply, the two books present useful hints and strategies to solving problems in the following chapters. The contributing authors have highlighted many future research directions that will foster multi-disciplinary collaborations and hence will lead to significant development in the field of data mining

Directory of Open Access Books (DOAB)

Artificial Intelligence for Drug Discovery: Are We There Yet?

Author: Hasselgren Catrin
Oprea Tudor I.
Publication venue: 'Annual Reviews'
Publication date: 12/07/2023
Field of study

Drug discovery is adapting to novel technologies such as data science, informatics, and artificial intelligence (AI) to accelerate effective treatment development while reducing costs and animal experiments. AI is transforming drug discovery, as indicated by increasing interest from investors, industrial and academic scientists, and legislators. Successful drug discovery requires optimizing properties related to pharmacodynamics, pharmacokinetics, and clinical outcomes. This review discusses the use of AI in the three pillars of drug discovery: diseases, targets, and therapeutic modalities, with a focus on small molecule drugs. AI technologies, such as generative chemistry, machine learning, and multi-property optimization, have enabled several compounds to enter clinical trials. The scientific community must carefully vet known information to address the reproducibility crisis. The full potential of AI in drug discovery can only be realized with sufficient ground truth and appropriate human intervention at later pipeline stages.Comment: 30 pages, 4 figures, 184 reference

arXiv.org e-Print Archive

Advanced Knowledge Technologies at the Midterm: Tools and Methods for the Semantic Web

Author: Ciravegna Fabio
Domingue John
Hall Wendy
Motta Enrico
O'Hara Kieron
Robertson David
Shadbolt Nigel
Sleeman Derek
Tate Austin
Wilks Yorick
Publication venue: School of Electronics and Computer Science, University of Southampton
Publication date: 01/01/2004
Field of study

The University of Edinburgh and research sponsors are authorised to reproduce and distribute reprints and on-line copies for their purposes notwithstanding any copyright annotation hereon. The views and conclusions contained herein are the author’s and shouldn’t be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of other parties.In a celebrated essay on the new electronic media, Marshall McLuhan wrote in 1962:Our private senses are not closed systems but are endlessly translated into each other in that experience which we call consciousness. Our extended senses, tools, technologies, through the ages, have been closed systems incapable of interplay or collective awareness. Now, in the electric age, the very instantaneous nature of co-existence among our technological instruments has created a crisis quite new in human history. Our extended faculties and senses now constitute a single field of experience which demands that they become collectively conscious. Our technologies, like our private senses, now demand an interplay and ratio that makes rational co-existence possible. As long as our technologies were as slow as the wheel or the alphabet or money, the fact that they were separate, closed systems was socially and psychically supportable. This is not true now when sight and sound and movement are simultaneous and global in extent. (McLuhan 1962, p.5, emphasis in original)Over forty years later, the seamless interplay that McLuhan demanded between our technologies is still barely visible. McLuhan’s predictions of the spread, and increased importance, of electronic media have of course been borne out, and the worlds of business, science and knowledge storage and transfer have been revolutionised. Yet the integration of electronic systems as open systems remains in its infancy.Advanced Knowledge Technologies (AKT) aims to address this problem, to create a view of knowledge and its management across its lifecycle, to research and create the services and technologies that such unification will require. Half way through its sixyear span, the results are beginning to come through, and this paper will explore some of the services, technologies and methodologies that have been developed. We hope to give a sense in this paper of the potential for the next three years, to discuss the insights and lessons learnt in the first phase of the project, to articulate the challenges and issues that remain.The WWW provided the original context that made the AKT approach to knowledge management (KM) possible. AKT was initially proposed in 1999, it brought together an interdisciplinary consortium with the technological breadth and complementarity to create the conditions for a unified approach to knowledge across its lifecycle. The combination of this expertise, and the time and space afforded the consortium by the IRC structure, suggested the opportunity for a concerted effort to develop an approach to advanced knowledge technologies, based on the WWW as a basic infrastructure.The technological context of AKT altered for the better in the short period between the development of the proposal and the beginning of the project itself with the development of the semantic web (SW), which foresaw much more intelligent manipulation and querying of knowledge. The opportunities that the SW provided for e.g., more intelligent retrieval, put AKT in the centre of information technology innovation and knowledge management services; the AKT skill set would clearly be central for the exploitation of those opportunities.The SW, as an extension of the WWW, provides an interesting set of constraints to the knowledge management services AKT tries to provide. As a medium for the semantically-informed coordination of information, it has suggested a number of ways in which the objectives of AKT can be achieved, most obviously through the provision of knowledge management services delivered over the web as opposed to the creation and provision of technologies to manage knowledge.AKT is working on the assumption that many web services will be developed and provided for users. The KM problem in the near future will be one of deciding which services are needed and of coordinating them. Many of these services will be largely or entirely legacies of the WWW, and so the capabilities of the services will vary. As well as providing useful KM services in their own right, AKT will be aiming to exploit this opportunity, by reasoning over services, brokering between them, and providing essential meta-services for SW knowledge service management.Ontologies will be a crucial tool for the SW. The AKT consortium brings a lot of expertise on ontologies together, and ontologies were always going to be a key part of the strategy. All kinds of knowledge sharing and transfer activities will be mediated by ontologies, and ontology management will be an important enabling task. Different applications will need to cope with inconsistent ontologies, or with the problems that will follow the automatic creation of ontologies (e.g. merging of pre-existing ontologies to create a third). Ontology mapping, and the elimination of conflicts of reference, will be important tasks. All of these issues are discussed along with our proposed technologies.Similarly, specifications of tasks will be used for the deployment of knowledge services over the SW, but in general it cannot be expected that in the medium term there will be standards for task (or service) specifications. The brokering metaservices that are envisaged will have to deal with this heterogeneity.The emerging picture of the SW is one of great opportunity but it will not be a wellordered, certain or consistent environment. It will comprise many repositories of legacy data, outdated and inconsistent stores, and requirements for common understandings across divergent formalisms. There is clearly a role for standards to play to bring much of this context together; AKT is playing a significant role in these efforts. But standards take time to emerge, they take political power to enforce, and they have been known to stifle innovation (in the short term). AKT is keen to understand the balance between principled inference and statistical processing of web content. Logical inference on the Web is tough. Complex queries using traditional AI inference methods bring most distributed computer systems to their knees. Do we set up semantically well-behaved areas of the Web? Is any part of the Web in which semantic hygiene prevails interesting enough to reason in? These and many other questions need to be addressed if we are to provide effective knowledge technologies for our content on the web

Southampton (e-Prints Soton)

Edinburgh Research Archive

Advances in Nuclear Magnetic Resonance for Drug Discovery

Author: Powers Robert
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 01/01/2009
Field of study

Background—Drug discovery is a complex and unpredictable endeavor with a high failure rate. Current trends in the pharmaceutical industry have exasperated these challenges and are contributing to the dramatic decline in productivity observed over the last decade. The industrialization of science by forcing the drug discovery process to adhere to assembly-line protocols is imposing unnecessary restrictions, such as short project time-lines. Recent advances in nuclear magnetic resonance are responding to these self-imposed limitations and are providing opportunities to increase the success rate of drug discovery. Objective/Method—A review of recent advancements in NMR technology that have the potential of significantly impacting and benefiting the drug discovery process will be presented. These include fast NMR data collection protocols and high-throughput protein structure determination, rapid protein-ligand co-structure determination, lead discovery using fragment-based NMR affinity screens, NMR metabolomics to monitor in vivo efficacy and toxicity for lead compounds, and the identification of new therapeutic targets through the functional annotation of proteins by FASTNMR. Conclusion—NMR is a critical component of the drug discovery process, where the versatility of the technique enables it to continually expand and evolve its role. NMR is expected to maintain this growth over the next decade with advancements in automation, speed of structure calculation, incell imaging techniques, and the expansion of NMR amenable targets