17 research outputs found

    A critical evaluation of network and pathway based classifiers for outcome prediction in breast cancer

    Get PDF
    Recently, several classifiers that combine primary tumor data, like gene expression data, and secondary data sources, such as protein-protein interaction networks, have been proposed for predicting outcome in breast cancer. In these approaches, new composite features are typically constructed by aggregating the expression levels of several genes. The secondary data sources are employed to guide this aggregation. Although many studies claim that these approaches improve classification performance over single gene classifiers, the gain in performance is difficult to assess. This stems mainly from the fact that different breast cancer data sets and validation procedures are employed to assess the performance. Here we address these issues by employing a large cohort of six breast cancer data sets as benchmark set and by performing an unbiased evaluation of the classification accuracies of the different approaches. Contrary to previous claims, we find that composite feature classifiers do not outperform simple single gene classifiers. We investigate the effect of (1) the number of selected features; (2) the specific gene set from which features are selected; (3) the size of the training set and (4) the heterogeneity of the data set on the performance of composite feature and single gene classifiers. Strikingly, we find that randomization of secondary data sources, which destroys all biological information in these sources, does not result in a deterioration in performance of composite feature classifiers. Finally, we show that when a proper correction for gene set size is performed, the stability of single gene sets is similar to the stability of composite feature sets. Based on these results there is currently no reason to prefer prognostic classifiers based on composite features over single gene classifiers for predicting outcome in breast cancer

    A regression model for estimating DNA copy number applied to capture sequencing data

    No full text
    International audienceMotivation: Target enrichment, also referred to as DNA capture, provides an effective way to focus sequencing efforts on a genomic region of interest. Capture data are typically used to detect single-nucleotide variants. It can also be used to detect copy number alterations, which is particularly useful in the context of cancer, where such changes occur frequently. In copy number analysis, it is a common practice to determine log-ratios between test and control samples, but this approach results in a loss of information as it disregards the total coverage or intensity at a locus. Results: We modeled the coverage or intensity of the test sample as a linear function of the control sample. This regression approach is able to deal with regions that are completely deleted, which are problematic for methods that use log-ratios. To demonstrate the utility of our approach, we used capture data to determine copy number for a set of 600 genes in a panel of nine breast cancer cell lines. We found high concordance between our results and those generated using a single-nucleotide polymorphsim genotyping platform. When we compared our results with other log-ratio-based methods, including ExomeCNV, we found that our approach produced better overall correlation with SNP data

    ENSEMBLE

    No full text

    Characterization and correction of stray light in TROPOMI-SWIR

    No full text
    The shortwave infrared (SWIR) spectrometer module of the Tropospheric Monitoring Instrument (TROPOMI), on board the ESA Copernicus Sentinel-5 Precursor satellite, is used to measure atmospheric CO and methane columns. For this purpose, calibrated radiance measurements are needed that are minimally contaminated by instrumental stray light. Therefore, a method has been developed and applied in an on-ground calibration campaign to characterize stray light in detail using a monochromatic quasi-point light source. The dynamic range of the signal was extended to more than 7 orders of magnitude by performing measurements with different exposure times, saturating detector pixels at the longer exposure times. Analysis of the stray light indicates about 4.4 % of the detected light is correctable stray light. An algorithm was then devised and implemented in the operational data processor to correct in-flight SWIR observations in near-real time, based on Van Cittert deconvolution. The stray light is approximated by a far-field kernel independent of position and wavelength and an additional kernel representing the main reflection. Applying this correction significantly reduces the stray-light signal, for example in a simulated dark forest scene close to bright clouds by a factor of about 10. Simulations indicate that this reduces the stray-light error sufficiently for accurate gas-column retrievals. In addition, the instrument contains five SWIR diode lasers that enable long-term, in-flight monitoring of the stray-light distribution

    Determination of the TROPOMI-SWIR instrument spectral response function

    No full text
    The Tropospheric Monitoring Instrument (TROPOMI) is the single instrument on board the ESA Copernicus Sentinel-5 Precursor satellite. TROPOMI is a nadir-viewing imaging spectrometer with bands in the ultraviolet and visible, the near infrared and the shortwave infrared (SWIR). An accurate instrument spectral response function (ISRF) is required in the SWIR band where absorption lines of CO, methane and water vapor overlap. In this paper, we report on the determination of the TROPOMI-SWIR ISRF during an extensive on-ground calibration campaign. Measurements are taken with a monochromatic light source scanning the whole detector, using the spectrometer itself to determine the light intensity and wavelength. The accuracy of the resulting ISRF calibration key data is well within the requirement for trace-gas retrievals. Long-term in-flight monitoring of SWIR ISRF is achieved using five on-board diode lasers

    Classification results for merged and paired setting.

    No full text
    <p>In the merged setting one Affymetrix data set is set aside as test and the remaining four Affymetrix data sets are merged into a single data set. This is repeated until every one of the five data sets acted as a test set. <b>Top row:</b> Results for the merged setting. The red lines indicate the median. <b>Bottom row:</b> Only the five Affymetrix data sets were used in the paired setting.</p

    Classification results of the ER positive data only.

    No full text
    <p>The ER positive cases from a single data set were set aside as test set while ER positive cases from the remaining five data sets were merged into a single training set. This was repeated until each data set was employed as left-out test set, resulting in six AUC values. The red lines indicate the median. <b>A</b>: CV-optimized number of features; <b>B</b>: 50 best features.</p

    New method to determine the instrument spectral response function, applied to TROPOMI-SWIR

    Full text link
    Olga Nájera-Ramírez was born in 1955 and raised in the small town of Davenport, California. She is the fourth of six children. In the early 1950s, her parents came to the United States from the state of Durango, Mexico, part of a migration of Mexican-Americans to the North Coast of Santa Cruz County. Her father worked in the fields and at the Davenport Cement Plant. When Nájera-Ramírez was eight, her father died and her family labored in the fields along with finding other jobs to support themselves. Her mother worked in packing sheds and canneries in several places in Santa Cruz County. This oral history begins with Nájera-Ramírez’s recollections of growing up in Davenport. Nájera-Ramírez’s early labor as a farmworker and the importance she placed on creating familia within the community in Davenport grounds her later vision of facilitating access to the university system for people of diverse locations.Even as a small child in Davenport, Najera-Ramírez was interested in becoming a teacher. Her high school counselor held what Nájera-Ramírez’s termed “a paternalistic view of the minorities” and discouraged her from pursuing an advanced education in academia. But Nájera-Ramírez persevered, and despite a lack of mentors or even financial advising, became the first in her family to attend a four-year college, entering UC Santa Cruz as a student in Merrill College in 1973.As a UCSC student, Najera-Ramírez danced with Los Mejicas which galvanized what would become a lifelong interest in conducting research on the dance and traditions of Mexico and Mexican folklore. She earned a dual degree in history and Latin American studies from UC Santa Cruz in 1977. Nájera-Ramírez remembers the Chicano/Latino graduation feeling like a family party—this speaks simultaneously to the small numbers of Chicanas and Latinos graduating in 1977, as well as to importance of music, food, and cultura within a university setting to sustain people of color. Her recollections of Chicano/a Latino/a life at UCSC in the 1970s, as well as her faculty mentors and classes, are an invaluable contribution to a little-documented aspect of UCSC history.Najera-Ramírez's involvement with Los Mejicas during her undergraduate career in 1976 gave her the opportunity to meet Rafael Zamarripa, a well known folklorico maestro, in Colorado. As a result of this life-altering meeting, Najera-Ramírez decided to attend University of Guadalajara and further study dance. After three years, she returned to the United States and attained her MA in Latin American Studies from the University of Texas in 1983. She also married her husband, Ronaldo  Ramírez in that year. In 1987, Najera-Ramírez earned her PhD in anthropology from the University of Texas, with a specialization in folklorico studies.Nájera-Ramírez is perhaps unique among UCSC faculty in that she is a native of Santa Cruz County who attended UC Santa Cruz, and then returned to her alma mater for a lifelong career as a tenured professor. In 1989, Nájera-Ramírez was hired by UCSC’s anthropology department, where she has now taught for twenty-five years. She is also a founder of UCSC’s Latin American and Latino Studies department and has directed the Chicano/Latino Research Center (CLRC). Striking in Nájera-Ramírez’s interview is her dedication to communities of color who are producing knowledge of “Greater Mexico” and beyond. This is evident primarily through her mentorship of graduate students of color, active guidance of Los Mejicas, and participation in cross-border projects of the CLRC.Along with being a published writer, Olga is a film producer who has created two major films, La Charreada: Rodeo a la Mexicana and anza Folklórica Escénica: El Sello Artístico de Rafael Zamarripa (Mexican Folkloric Dance: Rafael Zamarripa’s Artistic Trademark). She describes the making of these two films in her oral history and demonstrates her dedication to visual arts and culture.Najera-Ramírez has also served as the faculty advisor for Grupo Folklórico Los Mejicas of UCSC since 1997, which has been dancing folklorico since 1972. Los Mejicas fosters a strong sense of community at UCSC, thereby helping with the retention of Chicano/a and Latino/a students. The group performs at public schools throughout California and in the process does outreach to potential UCSC students of all cultural and ethnic backgrounds.Olga Nájera-Ramírez was interviewed in three sessions by Susy Zepeda at her home in Santa Cruz County. The interviews took place on May 2, May 16, and May 30, 2013. Nájera-Ramírez’s articulate reflections, warmth, and intellect facilitated powerful storytelling that offered a unique perspective from a lifelong connection with Santa Cruz County. The interviews were transcribed by Irene Reti and a transcript was returned both to Zepeda, who audited it for accuracy of transcription, and Nájera-Ramírez, who edited it for flow and accuracy. We chose not to italicize the Spanish in the transcript, a political decision that recognizes that italics can “other” Spanish words as “foreign,” or non-normative. This is a style preferred by many Latino/a writers today.Copies of this volume are on deposit in Special Collections and in the circulating stacks at the UCSC Library, as well as on the library’s website. The Regional History Project is supported administratively by Elisabeth Remak- Honnef, Head of Special Collections and Archives, and Interim University Librarian, Elizabeth Cowell.—Irene Reti, Director, Regional History Project, University Library—Susy Zepeda, Interviewer, Regional History ProjectUniversity of California, Santa Cruz, April 11, 201

    Performance of the NMC employing single genes and composite features constructed from different secondary data sources.

    No full text
    <p>For each combination of feature extraction method and secondary data source and each pair of data sets we obtained one AUC value resulting in 30 AUC values per combination. The number of features for each classifier was determined in the cross-validation procedure (CV-optimized). <b>A:</b> Each box plot shows the median, the 25% and 75% percentiles and the standard deviation of the 30 AUC values. Outliers are depicted by crosses. The boxes are sorted in descending order according to the median. <b>B:</b> This panel shows the result of pairwise comparisons between all combinations of feature extraction methods and secondary data sources. If, for a given combination of training and test data set, the AUC value of classifier <i>i</i> is higher (lower) than the AUC value of classifier <i>j</i> on the same test data set, it is counted as a win (loss) for classifier <i>i</i>. Element (<i>i</i>, <i>j</i>) in the matrix represents the ratio of wins to losses of method <i>i</i> compared to method <i>j</i>. Green indicates an overall win, red an overall loss and white represents draws. The rows and columns are sorted as in Panel A. <b>Abbreviations:</b> SG: Single genes; C: <i>Chuang</i>; L: <i>Lee</i> and T: <i>Taylor</i>.</p
    corecore