Search CORE

64 research outputs found

Analyzing imputed financial data: a new approach to cluster analysis

Author: Halima Bensmail
Ramon P. DeGennaro
Publication venue
Publication date
Field of study

The authors introduce a novel statistical modeling technique to cluster analysis and apply it to financial data. Their two main goals are to handle missing data and to find homogeneous groups within the data. Their approach is flexible and handles large and complex data structures with missing observations and with quantitative and qualitative measurements. The authors achieve this result by mapping the data to a new structure that is free of distributional assumptions in choosing homogeneous groups of observations. Their new method also provides insight into the number of different categories needed for classifying the data. The authors use this approach to partition a matched sample of stocks. One group offers dividend reinvestment plans, and the other does not. Their method partitions this sample with almost 97 percent accuracy even when using only easily available financial variables. One interpretation of their result is that the misclassified companies are the best candidates either to adopt a dividend reinvestment plan (if they have none) or to abandon one (if they currently offer one). The authors offer other suggestions for applications in the field of finance.

Research Papers in Economics

Postgenomics: Proteomics and Bioinformatics in Cancer Research

Author: Bensmail Halima
Haoudi Abdelali
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2003
Field of study

Now that the human genome is completed, the characterization of the proteins encoded by the sequence remains a challenging task. The study of the complete protein complement of the genome, the “proteome,” referred to as proteomics, will be essential if new therapeutic drugs and new disease biomarkers for early diagnosis are to be developed. Research efforts are already underway to develop the technology necessary to compare the specific protein profiles of diseased versus nondiseased states. These technologies provide a wealth of information and rapidly generate large quantities of data. Processing the large amounts of data will lead to useful predictive mathematical descriptions of biological systems which will permit rapid identification of novel therapeutic targets and identification of metabolic disorders. Here, we present an overview of the current status and future research approaches in defining the cancer cell's proteome in combination with different bioinformatics and computational biology tools toward a better understanding of health and disease

Crossref

Directory of Open Access Journals

PubMed Central

Regularized Gaussian Discriminant Analysis Through Eigenvalue Decomposition

Author: Gilles Celeux
Halima Bensmail
Publication venue: 'JSTOR'
Publication date: 01/01/2006
Field of study

Crossref

Supervised cross-modal factor analysis for multiple modal data classification

Author: Bensmail Halima
Duan Kanghong
Wang Jim Jing-Yan
Wang Jingbin
Zhou Yihua
Publication venue
Publication date: 18/08/2015
Field of study

In this paper we study the problem of learning from multiple modal data for purpose of document classification. In this problem, each document is composed two different modals of data, i.e., an image and a text. Cross-modal factor analysis (CFA) has been proposed to project the two different modals of data to a shared data space, so that the classification of a image or a text can be performed directly in this space. A disadvantage of CFA is that it has ignored the supervision information. In this paper, we improve CFA by incorporating the supervision information to represent and classify both image and text modals of documents. We project both image and text data to a shared data space by factor analysis, and then train a class label predictor in the shared space to use the class label information. The factor analysis parameter and the predictor parameter are learned jointly by solving one single objective function. With this objective function, we minimize the distance between the projections of image and text of the same document, and the classification error of the projection measured by hinge loss function. The objective function is optimized by an alternate optimization strategy in an iterative algorithm. Experiments in two different multiple modal document data sets show the advantage of the proposed algorithm over other CFA methods

arXiv.org e-Print Archive

CiteSeerX

Crossref

Multiple graph regularized protein domain ranking

Author: Bensmail Halima
Gao Xin
Wang Jim Jing-Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

Background Protein domain ranking is a fundamental task in structural biology. Most protein domain ranking methods rely on the pairwise comparison of protein domains while neglecting the global manifold structure of the protein domain database. Recently, graph regularized ranking that exploits the global structure of the graph defined by the pairwise similarities has been proposed. However, the existing graph regularized ranking methods are very sensitive to the choice of the graph model and parameters, and this remains a difficult problem for most of the protein domain ranking methods. Results To tackle this problem, we have developed the Multiple Graph regularized Ranking algorithm, MultiG- Rank. Instead of using a single graph to regularize the ranking scores, MultiG-Rank approximates the intrinsic manifold of protein domain distribution by combining multiple initial graphs for the regularization. Graph weights are learned with ranking scores jointly and automatically, by alternately minimizing an ob- jective function in an iterative algorithm. Experimental results on a subset of the ASTRAL SCOP protein domain database demonstrate that MultiG-Rank achieves a better ranking performance than single graph regularized ranking methods and pairwise similarity based ranking methods. Conclusion The problem of graph model and parameter selection in graph regularized protein domain ranking can be solved effectively by combining multiple graphs. This aspect of generalization introduces a new frontier in applying multiple graphs to solving protein domain ranking applications.Comment: 21 page

arXiv.org e-Print Archive

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

Functional Clustering Algorithm for High-Dimensional Proteomics Data

Author: Aruna Buddana
Bensmail Halima
Haoudi Abdelali
Semmes O. John
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2005
Field of study

Clustering proteomics data is a challenging problem for any traditional clustering algorithm. Usually, the number of samples is largely smaller than the number of protein peaks. The use of a clustering algorithm which does not take into consideration the number of features of variables (here the number of peaks) is needed. An innovative hierarchical clustering algorithm may be a good approach. We propose here a new dissimilarity measure for the hierarchical clustering combined with a functional data analysis. We present a specific application of functional data analysis (FDA) to a high-throughput proteomics study. The high performance of the proposed algorithm is compared to two popular dissimilarity measures in the clustering of normal and human T-cell leukemia virus type 1 (HTLV-1)-infected patients samples

Crossref

Directory of Open Access Journals

PubMed Central

Detection of statistically significant network changes in complex biological networks

Author: Antonio Iavarone
Halima Bensmail
Luigi Cerulo
Michele Ceccarelli
Raghvendra Mall
Publication venue: Springer Nature
Publication date: 01/01/2017
Field of study

Table S1. Description of data: GHD and MRA Results for all the 457 considered transcription factors on the TCGA and Rembrandt datasets. (XLSX 62.7 kb

Springer - Publisher Connector

FigShare

An unsupervised disease module identification technique in biological networks using novel quality metric based on connectivity, conductance and modularity

Author: Bensmail Halima
Ceccarelli Michele
Kunji Khalid
Mall Raghvendra
Ullah Ehsan
Publication venue
Publication date: 26/03/2018
Field of study

Disease processes are usually driven by several genes interacting in molecular modules or pathways leading to the disease. The identification of such modules in gene or protein networks is the core of computational methods in biomedical research. With this pretext, the Disease Module Identification (DMI) DREAM Challenge was initiated as an effort to systematically assess module identification methods on a panel of 6 diverse genomic networks. In this paper, we propose a generic refinement method based on ideas of merging and splitting the hierarchical tree obtained from any community detection technique for constrained DMI in biological networks. The only constraint was that size of community is in the range [3, 100]. We propose a novel model evaluation metric, called F-score, computed from several unsupervised quality metrics like modularity, conductance and connectivity to determine the quality of a graph partition at given level of hierarchy. We also propose a quality measure, namely Inverse Confidence, which ranks and prune insignificant modules to obtain a curated list of candidate disease modules (DM) for biological network. The predicted modules are evaluated on the basis of the total number of unique candidate modules that are associated with complex traits and diseases from over 200 genome-wide association study (GWAS) datasets. During the competition, we identified 42 modules, ranking 15th at the official false detection rate (FDR) cut-off of 0.05 for identifying statistically significant DM in the 6 benchmark networks. However, for stringent FDR cut-offs 0.025 and 0.01, the proposed method identified 31 (rank 9) and 16 DMIs (rank 10) respectively. From additional analysis, our proposed approach detected a total of 44 DM in the networks in comparison to 60 for the winner of DREAM Challenge. Interestingly, for several individual benchmark networks, our performance was better or competitive with the winner

Open Access Repository

Recommended from our members

Coevolution Analysis of HIV-1 Envelope Glycoprotein Complex

Author: Bensmail Halima
Haoudi Abdelali
Kunji Khalid
Rawi Reda
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 18/11/2015
Field of study

The HIV-1 Env spike is the main protein complex that facilitates HIV-1 entry into CD4+ host cells. HIV-1 entry is a multistep process that is not yet completely understood. This process involves several protein-protein interactions between HIV-1 Env and a variety of host cell receptors along with many conformational changes within the spike. HIV-1 Env developed due to high mutation rates and plasticity escape strategies from immense immune pressure and entry inhibitors. We applied a coevolution and residue-residue contact detecting method to identify coevolution patterns within HIV-1 Env protein sequences representing all group M subtypes. We identified 424 coevolving residue pairs within HIV-1 Env. The majority of predicted pairs are residue-residue contacts and are proximal in 3D structure. Furthermore, many of the detected pairs have functional implications due to contributions in either CD4 or coreceptor binding, or variable loop, gp120-gp41, and interdomain interactions. This study provides a new dimension of information in HIV research. The identified residue couplings may not only be important in assisting gp120 and gp41 coordinate structure prediction, but also in designing new and effective entry inhibitors that incorporate mutation patterns of HIV-1 Env

Harvard University - DASH

Directory of Open Access Journals

PubMed Central

FigShare

The triglyceride glucose-waist-to-height ratio outperforms obesity and other triglyceride-related parameters in detecting prediabetes in normal-weight Qatari adults: A cross-sectional study

Author: Abdelilah Arredouani
Abdelilah Arredouani
Elias N. Haoudi
Halima Bensmail
Neyla S. Al Akl
Publication venue: 'Frontiers Media SA'
Publication date: 01/04/2023
Field of study

IntroductionThe triglyceride-glucose (TyG)-driven indices, incorporating obesity indices, have been proposed as reliable markers of insulin resistance and related comorbidities such as diabetes. This study evaluated the effectiveness of these indices in detecting prediabetes in normal-weight individuals from a Middle Eastern population.MethodsUsing the data of 5,996 adult Qatari participants from the Qatar Biobank cohort, we employed adjusted logistic regression to assess the ability of various obesity and triglyceride-related indices to detect prediabetes in normal-weight (18.5 ≤ BMI <25 kg/m2) adults (≥18 years).ResultsOf the normal-weight adults, 13.62% had prediabetes. TyG-waist-to-height ratio (TyG-WHTR) was significantly associated with prediabetes among normal-weight men [OR per 1-SD 2.68; 95% CI (1.67–4.32)] and women [OR per 1-SD 2.82; 95% CI (1.61–4.94)]. Compared with other indices, TyG-WHTR had the highest area under the curve (AUC) value for prediabetes in men [AUC: 0.76, 95% CI (0.70–0.81)] and women [AUC: 0.73, 95% CI (0.66–0.80)], and performed significantly higher than other indices (p < 0.05) in detecting prediabetes in men. Tyg-WHTR shared similar diagnostic values as fasting plasma glucose (FPG).DiscussionOur findings suggest that the TyG-WHTR index could be a better indicator of prediabetes for general clinical usage in normal weight Qatari adult men than other obesity and TyG-related indices. TyG-WHTR can help identify a person’s risk for developing prediabetes in both men and women when combined with FPG results

Directory of Open Access Journals