771 research outputs found
Recommended from our members
GenEpi: gene-based epistasis discovery using machine learning.
BackgroundGenome-wide association studies (GWAS) provide a powerful means to identify associations between genetic variants and phenotypes. However, GWAS techniques for detecting epistasis, the interactions between genetic variants associated with phenotypes, are still limited. We believe that developing an efficient and effective GWAS method to detect epistasis will be a key for discovering sophisticated pathogenesis, which is especially important for complex diseases such as Alzheimer's disease (AD).ResultsIn this regard, this study presents GenEpi, a computational package to uncover epistasis associated with phenotypes by the proposed machine learning approach. GenEpi identifies both within-gene and cross-gene epistasis through a two-stage modeling workflow. In both stages, GenEpi adopts two-element combinatorial encoding when producing features and constructs the prediction models by L1-regularized regression with stability selection. The simulated data showed that GenEpi outperforms other widely-used methods on detecting the ground-truth epistasis. As real data is concerned, this study uses AD as an example to reveal the capability of GenEpi in finding disease-related variants and variant interactions that show both biological meanings and predictive power.ConclusionsThe results on simulation data and AD demonstrated that GenEpi has the ability to detect the epistasis associated with phenotypes effectively and efficiently. The released package can be generalized to largely facilitate the studies of many complex diseases in the near future
Considering Human Aspects on Strategies for Designing and Managing Distributed Human Computation
A human computation system can be viewed as a distributed system in which the
processors are humans, called workers. Such systems harness the cognitive power
of a group of workers connected to the Internet to execute relatively simple
tasks, whose solutions, once grouped, solve a problem that systems equipped
with only machines could not solve satisfactorily. Examples of such systems are
Amazon Mechanical Turk and the Zooniverse platform. A human computation
application comprises a group of tasks, each of them can be performed by one
worker. Tasks might have dependencies among each other. In this study, we
propose a theoretical framework to analyze such type of application from a
distributed systems point of view. Our framework is established on three
dimensions that represent different perspectives in which human computation
applications can be approached: quality-of-service requirements, design and
management strategies, and human aspects. By using this framework, we review
human computation in the perspective of programmers seeking to improve the
design of human computation applications and managers seeking to increase the
effectiveness of human computation infrastructures in running such
applications. In doing so, besides integrating and organizing what has been
done in this direction, we also put into perspective the fact that the human
aspects of the workers in such systems introduce new challenges in terms of,
for example, task assignment, dependency management, and fault prevention and
tolerance. We discuss how they are related to distributed systems and other
areas of knowledge.Comment: 3 figures, 1 tabl
The Synthesizability of Molecules Proposed by Generative Models
The discovery of functional molecules is an expensive and time-consuming
process, exemplified by the rising costs of small molecule therapeutic
discovery. One class of techniques of growing interest for early-stage drug
discovery is de novo molecular generation and optimization, catalyzed by the
development of new deep learning approaches. These techniques can suggest novel
molecular structures intended to maximize a multi-objective function, e.g.,
suitability as a therapeutic against a particular target, without relying on
brute-force exploration of a chemical space. However, the utility of these
approaches is stymied by ignorance of synthesizability. To highlight the
severity of this issue, we use a data-driven computer-aided synthesis planning
program to quantify how often molecules proposed by state-of-the-art generative
models cannot be readily synthesized. Our analysis demonstrates that there are
several tasks for which these models generate unrealistic molecular structures
despite performing well on popular quantitative benchmarks. Synthetic
complexity heuristics can successfully bias generation toward
synthetically-tractable chemical space, although doing so necessarily detracts
from the primary objective. This analysis suggests that to improve the utility
of these models in real discovery workflows, new algorithm development is
warranted
Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study
Background
Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images.
Objective
The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts.
Methods
First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic âsubfeaturesâ labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic âsuperfeaturesâ based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen Îș value was used to measure agreement across raters.
Results
In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median Îș values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median Îș values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median Îș values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median Îș values between nonexperts and thresholded averageâexpert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels.
Conclusions
This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools
Prospective for urban informatics
The specialization of different urban sectors, theories, and technologies and their confluence in city development have led to a greatly accelerated growth in urban informatics, the transdisciplinary field for understanding and developing the city through new information technologies. While this young and highly promising field has attracted multiple reviews of its advances and outlook for its future, it would be instructive to probe further into the research initiatives of this rapidly evolving field, to provide reference to the development of not only urban informatics, but moreover the future of cities as a whole. This article thus presents a collection of research initiatives for urban informatics, based on the reviews of the state of the art in this field. The initiatives cover three levels, namely the future of urban science; core enabling technologies including geospatial artificial intelligence, high-definition mapping, quantum computing, artificial intelligence and the internet of things (AIoT), digital twins, explainable artificial intelligence, distributed machine learning, privacy-preserving deep learning, and applications in urban design and planning, transport, location-based services, and the metaverse, together with a discussion of algorithmic and data-driven approaches. The article concludes with hopes for the future development of urban informatics and focusses on the balance between our ever-increasing reliance on technology and important societal concerns
- âŠ