Search CORE

6 research outputs found

The structure of PWMs.

Author: Arwa Bin Raies (471499)
Hicham Mansour (471500)
Roberto Incitti (78756)
Vladimir B. Bajic (8687)
Publication venue
Publication date
Field of study

We can generate six PWMs, and each matrix corresponds to a pattern order. For example, the first PWM to the left corresponds to the pattern order (, , ). Each row corresponds to a word, and each column corresponds to a segment, and cells of the matrix represent the frequency of words in each segment.</p

FigShare

DEMGD system architecture.

Author: Arwa Bin Raies (471499)
Hicham Mansour (471500)
Roberto Incitti (78756)
Vladimir B. Bajic (8687)
Publication venue
Publication date
Field of study

The input to the system is the Input Text, and the output is Summary Tables and Full Reports. The system consists of four modules: Text Pre-processing, Structured Data Representation, Classification and Associations Extraction.</p

FigShare

Combining Position Weight Matrices and Document-Term Matrix for Efficient Extraction of Associations of Methylated Genes and Diseases from Free Text

Author: Arwa Bin Raies (471499)
Hicham Mansour (471500)
Roberto Incitti (78756)
Vladimir B. Bajic (8687)
Publication venue
Publication date: 16/10/2013
Field of study

<div>BackgroundIn a number of diseases, certain genes are reported to be strongly methylated and thus can serve as diagnostic markers in many cases. Scientific literature in digital form is an important source of information about methylated genes implicated in particular diseases. The large volume of the electronic text makes it difficult and impractical to search for this information manually. MethodologyWe developed a novel text mining methodology based on a new concept of position weight matrices (PWMs) for text representation and feature generation. We applied PWMs in conjunction with the document-term matrix to extract with high accuracy associations between methylated genes and diseases from free text. The performance results are based on large manually-classified data. Additionally, we developed a web-tool, DEMGD, which automates extraction of these associations from free text. DEMGD presents the extracted associations in summary tables and full reports in addition to evidence tagging of text with respect to genes, diseases and methylation words. The methodology we developed in this study can be applied to similar association extraction problems from free text. ConclusionThe new methodology developed in this study allows for efficient identification of associations between concepts. Our method applied to methylated genes in different diseases is implemented as a Web-tool, DEMGD, which is freely available at <a href="http://www.cbrc.kaust.edu.sa/demgd/" target="_blank">http://www.cbrc.kaust.edu.sa/demgd/</a>. The data is available for online browsing and download. </div

Directory of Open Access Journals

PubMed Central

FigShare

Computing the scores.

Author: Arwa Bin Raies (471499)
Hicham Mansour (471500)
Roberto Incitti (78756)
Vladimir B. Bajic (8687)
Publication venue
Publication date
Field of study

The figure shows an example of a normalized PWM. To compute the score, we sum the weights of one word from each column. For example, the word ‘promoter’ appears in the first segment, so we take its weight from the first column in the PWM. The same step is applied to the second, and the third segments. However, five words appear in the last segment, so we take maximum weight. The score of the pattern is 0.2336+0.1619+0.1724+0.1315=0.5994. </p

FigShare

PWM generation.

Author: Arwa Bin Raies (471499)
Hicham Mansour (471500)
Roberto Incitti (78756)
Vladimir B. Bajic (8687)
Publication venue
Publication date
Field of study

The PWM summarizes frequency of words in each segment. For example, the words ‘CPG’ and ‘island’ appear in the first segment of the sentence, so the rows that correspond to these words and the first column is incremented by one. Similarly, the same step is applied to words in the remaining three segments. The same matrix is updated using other sentences with the same pattern order.</p

FigShare

Dataset representation using PWMs.

Author: Arwa Bin Raies (471499)
Hicham Mansour (471500)
Roberto Incitti (78756)
Vladimir B. Bajic (8687)
Publication venue
Publication date
Field of study

Each pattern in a sentence is represented with twelve features and a class label. The first six features correspond to the scores generated from the positive PWMs, and the following six features correspond to the scores generated from the negative PWMs.</p

FigShare