1,367 research outputs found
Automatic document classification of biological literature
Background: Document classification is a wide-spread problem with many applications, from organizing search engine snippets to spam filtering. We previously described Textpresso, a text-mining system for biological literature, which marks up full text according to a shallow ontology that includes terms of biological interest. This project investigates document classification in the context of biological literature, making use of the Textpresso markup of a corpus of Caenorhabditis elegans literature.
Results: We present a two-step text categorization algorithm to classify a corpus of C. elegans papers. Our classification method first uses a support vector machine-trained classifier, followed by a novel, phrase-based clustering algorithm. This clustering step autonomously creates cluster labels that are descriptive and understandable by humans. This clustering engine performed better on a standard test-set (Reuters 21578) compared to previously published results (F-value of 0.55 vs. 0.49), while producing cluster descriptions that appear more useful. A web interface allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept.
Conclusions: We have demonstrated a simple method to classify biological documents that embodies an improvement over current methods. While the classification results are currently optimized for Caenorhabditis elegans papers by human-created rules, the classification engine can be adapted to different types of documents. We have demonstrated this by presenting a web interface that allows researchers to quickly navigate through the hierarchy and look for documents that belong to a specific concept
Protein structure-based evaluation of missense variants: Resources, challenges and future directions.
We provide an overview of the methods that can be used for protein structure-based evaluation of missense variants. The algorithms can be broadly divided into those that calculate the difference in free energy (ĪĪG) between the wild type and variant structures and those that use structural features to predict the damaging effect of a variant without providing a ĪĪG. A wide range of machine learning approaches have been employed to develop those algorithms. We also discuss challenges and opportunities for variant interpretation in view of the recent breakthrough in three-dimensional structural modelling using deep learning
The contribution of missense mutations in core and rim residues of protein-protein interfaces to human disease.
AbstractMissense mutations at proteināprotein interaction sites, called interfaces, are important contributors to human disease. Interfaces are non-uniform surface areas characterized by two main regions, ācoreā and ārimā, which differ in terms of evolutionary conservation and physicochemical properties. Moreover, within interfaces, only a small subset of residues (āhot spotsā) is crucial for the binding free energy of the proteināprotein complex.We performed a large-scale structural analysis of human single amino acid variations (SAVs) and demonstrated that disease-causing mutations are preferentially located within the interface core, as opposed to the rim (p<0.01). In contrast, the interface rim is significantly enriched in polymorphisms, similar to the remaining non-interacting surface. Energetic hot spots tend to be enriched in disease-causing mutations compared to non-hot spots (p=0.05), regardless of their occurrence in core or rim residues. For individual amino acids, the frequency of substitution into a polymorphism or disease-causing mutation differed to other amino acids and was related to its structural location, as was the type of physicochemical change introduced by the SAV.In conclusion, this study demonstrated the different distribution and properties of disease-causing SAVs and polymorphisms within different structural regions and in relation to the energetic contribution of amino acid in proteināprotein interfaces, thus highlighting the importance of a structural system biology approach for predicting the effect of SAVs
Youth Entrepreneurship in Germany: Empirical Evidence on the How, the Why, the How Many, the Who and the When
Youth entrepreneurship is an increasingly prominent aspect of entrepreneurship support policies, but there is surprisingly little relevant research-based empirical evidence. This research gap is particularly noticeable when it comes to the personal and contextual factors that steer young peopleās decision to start a business. Using statistically representative survey data from the Global Entrepreneurship Monitor for Germany, we apply logit regressions to determine the influence of 10 independent variables on the likelihood of starting a business. We distinguish between 18ā24-year-olds and 25ā64-year-olds as well as between founders and non-founders. Self-efficacy in entrepreneurial skills, fear of failure and gender are the strongest influencing variables for the person-related factors and knowledge of other founders for the contextual factors. For younger people, the formal level of education and the perception of local entrepreneurial opportunities do not play a role in the decision to start a business, whereas they are very important for older people. Our results suggest that start-up promotion policies should explicitly address the empirically proven factors of youth entrepreneurship instead of a āone size fits allā policy for new businesses, regardless of the age of the founders
Reading Instruction for the Handicapped Child: Questions and Answers
The concern of parents and teachers that some children have needs significantly different from the majority of other students has brought about educational opportunities which provide special learning environments and unique teaching procedures. From this educational endeavor, programs entitled special education have been established for the purpose of helping handicapped children develop their abilities to a maximum, It is important that the teacher of reading be aware of several essential principles regarding special education. First, teachers often become frustrated because the screening process for special education is often such a time-consuming procedure. Some children may remain in a regular classroom for almost the entire year while diagnosticians and other specialists test and prescribe for their particular learning needs. Secondly, many children are classified as borderline handicapped and, as a result, may not have the opportunity to participate in special education programs. In such cases the regular classroom teacher must retain the primary responsibility for meeting the special needs of some students
Symmetric mixed states of qubits: local unitary stabilizers and entanglement classes
We classify, up to local unitary equivalence, local unitary stabilizer Lie
algebras for symmetric mixed states into six classes. These include the
stabilizer types of the Werner states, the GHZ state and its generalizations,
and Dicke states. For all but the zero algebra, we classify entanglement types
(local unitary equivalence classes) of symmetric mixed states that have those
stabilizers. We make use of the identification of symmetric density matrices
with polynomials in three variables with real coefficients and apply the
representation theory of SO(3) on this space of polynomials.Comment: 10 pages, 1 table, title change and minor clarifications for
published versio
Using Transcriptomes as Mutant Phenotypes Reveals Functional Regions of a Mediator Subunit in Caenorhabditis elegans
Although transcriptomes have recently been used as phenotypes with which to perform epistasis analyses, they are not yet used to study intragenic function/structure relationships. We developed a theoretical framework to study allelic series using transcriptomic phenotypes. As a proof-of-concept, we apply our methods to an allelic series of dpy-22, a highly pleiotropic Caenorhabditis elegans gene orthologous to the human gene MED12, which encodes a subunit of the Mediator complex. Our methods identify functional units within dpy-22 that modulate Mediator activity upon various genetic programs, including the Wnt and Ras modules
Implicit theories of a desire for fame
The aim of the present studies was to generate implicit theories of a desire for fame among the general population. In Study 1, we were able to develop a nine-factor analytic
model of conceptions of the desire to be famous that initially comprised nine separate factors; ambition, meaning derived through comparison with others, psychologically vulnerable, attention seeking, conceitedness, social access, altruistic, positive affect, and glamour. Analysis that sought to examine replicability among these factors suggested that three factors (altruistic, positive affect, and glamour) neither display factor congruence nor display adequate internal reliability. A second study examined the validity of these factors in predicting profiles of individuals who may desire fame. The findings from this study suggested that two of the nine factors (positive affect and
altruism) could not be considered strong factors within the model. Overall, the findings suggest that implicit theories of a desire for fame comprise six factors. The discussion
focuses on how an implicit model of a desire for fame might progress into formal theories of a desire for fame
- ā¦