99,445 research outputs found
Citation Function and Polarity Classification in Biomedical Papers
The traditional reference evaluation method treats all citations equally. However, a citation can serve various functions. It may reflect the citing paper author’s motivation as well as his/her true attitude towards the cited paper. Investigating such information can be achieved through citation content analysis.
This thesis develops an 8-category classification scheme on citation function and polarity to help understand what role a citation played in scientific papers. A biomedical citation corpus is annotated with this scheme and experimented with supervised machine learning methods. Several types of features that capture the characteristics of citation sentences are extracted by natural language processing techniques to serve as the inputs of automatic classifiers. The importance of cue phrases in citation classification is also addressed and discussed
Do peers see more in a paper than its authors?
Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances
Recommended from our members
Incidental or influential? – A decade of using text-mining for citation function classification.
This work looks in depth at several studies that have attempted to automate the process of citation importance classification based on the publications’ full text. We offer a comparison of their individual similarities, strengths and weaknesses. We analyse a range of features that have been previously used in this task. Our experimental results confirm that the number of in-text references are highly predictive of influence. Contrary to the work of Valenzuela et al. (2015), we find abstract similarity one of the most predictive features. Overall, we show that many of the features previously described in literature have been either reported as not particularly predictive, cannot be reproduced based on their existing descriptions or should not be used due to their reliance on external changing evidence. Additionally, we find significant variance in the results provided by the PDF extraction tools used in the pre-processing stages of citation extraction. This has a direct and significant impact on the classification features that rely on this extraction process. Consequently, we discuss challenges and potential improvements in the classification pipeline, provide a critical review of the performance of individual features and address the importance of constructing a large-scale gold-standard reference dataset
Detecting Slow Wave Sleep Using a Single EEG Signal Channel
Background: In addition to the cost and complexity of processing multiple signal channels, manual sleep staging is also tedious, time consuming, and error-prone. The aim of this paper is to propose an automatic slow wave sleep (SWS) detection method that uses only one channel of the electroencephalography (EEG) signal.
New Method: The proposed approach distinguishes itself from previous automatic sleep staging methods by using three specially designed feature groups. The first feature group characterizes the waveform pattern of the EEG signal. The remaining two feature groups are developed to resolve the difficulties caused by interpersonal EEG signal differences.
Results and comparison with existing methods: The proposed approach was tested with 1,003 subjects, and the SWS detection results show kappa coefficient at 0.66, an accuracy level of 0.973, a sensitivity score of 0.644 and a positive predictive value of 0.709. By excluding sleep apnea patients and persons whose age is older than 55, the SWS detection results improved to kappa coefficient, 0.76; accuracy, 0.963; sensitivity, 0.758; and positive predictive value, 0.812.
Conclusions: With newly developed signal features, this study proposed and tested a single-channel EEG-based SWS detection method. The effectiveness of the proposed approach was demonstrated by applying it to detect the SWS of 1003 subjects. Our test results show that a low SWS ratio and sleep apnea can degrade the performance of SWS detection. The results also show that a large and accurately staged sleep dataset is of great importance when developing automatic sleep staging methods
Chi-square-based scoring function for categorization of MEDLINE citations
Objectives: Text categorization has been used in biomedical informatics for
identifying documents containing relevant topics of interest. We developed a
simple method that uses a chi-square-based scoring function to determine the
likelihood of MEDLINE citations containing genetic relevant topic. Methods: Our
procedure requires construction of a genetic and a nongenetic domain document
corpus. We used MeSH descriptors assigned to MEDLINE citations for this
categorization task. We compared frequencies of MeSH descriptors between two
corpora applying chi-square test. A MeSH descriptor was considered to be a
positive indicator if its relative observed frequency in the genetic domain
corpus was greater than its relative observed frequency in the nongenetic
domain corpus. The output of the proposed method is a list of scores for all
the citations, with the highest score given to those citations containing MeSH
descriptors typical for the genetic domain. Results: Validation was done on a
set of 734 manually annotated MEDLINE citations. It achieved predictive
accuracy of 0.87 with 0.69 recall and 0.64 precision. We evaluated the method
by comparing it to three machine learning algorithms (support vector machines,
decision trees, na\"ive Bayes). Although the differences were not statistically
significantly different, results showed that our chi-square scoring performs as
good as compared machine learning algorithms. Conclusions: We suggest that the
chi-square scoring is an effective solution to help categorize MEDLINE
citations. The algorithm is implemented in the BITOLA literature-based
discovery support system as a preprocessor for gene symbol disambiguation
process.Comment: 34 pages, 2 figure
Innovation through pertinent patents research based on physical phenomena involved
One can find innovative solutions to complex industrial problems by looking for knowledge in patents. Traditional search using keywords in databases of patents has been widely used. Currently, different computational methods that limit human intervention have been developed. We aim to define a method to improve the search for relevant patents in order to solve industrial problems and specifically to deduce evolution opportunities. The non-automatic, semi-automatic, and automatic search methods use keywords. For a detailed keyword search, we propose as a basis the functional decomposition and the analysis of the physical phenomena involved in the achievement of the function to fulfill. The search for solutions to design a bi-phasic separator in deep offshore shows the method presented in this paper
- …