614 research outputs found

    Scalable Multi-document Summarization Using Natural Language Processing

    Get PDF
    In this age of Internet, Natural Language Processing (NLP) techniques are the key sources for providing information required by users. However, with the extensive usage of available data, a secondary level of wrappers that interact with NLP tools have become necessary. These tools must extract a concise summary from the primary data set retrieved. The main reason for using text summarization techniques is to obtain this secondary level of information. Text summarization using NLP techniques is an interesting area of research with various implications for information retrieval. This report deals with the use of Latent Semantic Analysis (LSA) for generic text summarization and compares it with other models available. It proposes text summarization using LDS in conjunction with open-source NLP frameworks such as Mahout and Lucene. The LSA algorithm can be scaled to multiple large-sized documents using these framworks. The performance of this algorithm is then compared with other models commonly used for summarization and Recall-Oriented Understudy of Gisting Evaluation (ROUGE) scores. This project implements a text summarization framework, which uses available open-source tools and cloud resources to summarize documents from many languages such as, in the case of this study, English and Hindi

    Analysis of Growth Curves Under Some Special Covariance Structures

    Get PDF
    In this dissertation we consider the growth curve or generalized MANOVA model in its most general form given by and develop statistical methodology for analyzing data using this model. Here g represents the number of groups, Yij is the observation matrix, ξ is a matrix of unknown parameters, Ai is a known matrix of rank g, and Bij is a matrix of rank k. Further, the rows of the error matrix ∈ij are independent and each distributed as Npij (0, Σij).This model accommodates different kinds of unbalanced data, such as, monotone data, data missing from any occasion, and data observed at unequally spaced time points. Our main results are: (1) derivation of the formulae for the maximum likelihood estimates (MLEs) of the parameters involved, (2) construction of the tests for testing general linear hypothesis of the form Ho : EqxgξgxkFkxv =0. for known full rank matrices E and F, and (3) derivation of the formulae for prediction of (a) future observations corresponding to an individual, (b) the unobserved portion of a partially observed data for a new individual, and (c) any missing value of an observation vector. Deriving the maximum likelihood estimates and the prediction formulae for unbalanced data is a challenging problem. We have derived these results by taking two types of covariance structures for Σij. These structures, namely equicorrelation structure and autoregressive structure, are most commonly used in the literature. For the autoregressive structure, the maximum likelihood estimator of the correlation parameter turns out to be a solution of a cubic equation. We prove that this cubic equation has a unique real root in (-1, 1). This proves the uniqueness of the MLE. Further, we notice that the autoregressive structure leads to Markov structure when the data are observed at unequally spaced time intervals. For the model with Markov covariance structure, we derive a formula for estimating a missing value and show that the estimator based on this formula depends on only two neighboring data values. The results for equicorrelation structure are included in Chapter 2 and those for the autoregressive structure (Markov structure as well) are included in Chapter 3. Finally, in the fourth chapter we point out some draw backs of fitting the linear growth curve models to biological data and suggest fitting nonlinear models to growth data. After reviewing the popular nonlinear models, we show the analysis of nonlinear models with different covariance structures using SAS software

    Evaluating The Effectiveness Of Smoking Cessation Intervention Program In Low Income Emergency Department Adult Populations Using Moderation And Meditational Analysis

    Get PDF
    Cigarette and tobacco use is common among ED patients from lower socioeconomic backgrounds. Our goal in this study was to conduct moderation and mediation analysis to evaluate the effectiveness of an enhanced smoking cessation intervention involving enhanced care as compared to standard care for adult smokers in the ED. Our study is a secondary analysis of a two-arm randomized clinical trial conducted by Dr. Bernstein, which involved two intervention arms; one with enhanced care where the subjects received a motivational interview by a trained research assistant, 6 weeks of nicotine replacement therapy (NRT) initiated in the ED, a faxed referral to the state smokers’ quitline, a booster call, and a brochure. The subjects in control arm subjects received the brochure, which provided quitline information. We used mediation analysis to assess the treatment effects of the mediators; NRT use and Quitline calls and moderation analysis to evaluate the effect modification or interaction of the moderators; baseline nicotine dependency and craving with the treatment. The outcomes were 7-day abstinence and number of cigarettes smoked per day at three months. We found significant mediation effects with the NRT use on both the outcomes. However, the speaking to a quitline counselor had only marginal mediation effects. We could not detect any interaction or effect modification with either of the two moderators on 7-day abstinence and no. of cigarettes smoked per day

    The implication of GPP130 shedding by PC7 and Furin in lung cancer progression

    Full text link
    Le cancer du poumon est la principale cause de mortalité par cancer au Canada et entraîne un taux de mortalité important chez les patients. En effet, l’Organisation Mondiale de la Santé (OMS) indique que le cancer du poumon est la principale cause de décès liés au cancer dans le monde, avec 2.21 millions de diagnostics par année qui conduisent en moyenne à 1.8 millions de décès par an. Initialement asymptomatique, ce cancer évolue rapidement, devenant très invasif et métastatique, et est alors responsable de plus de morts par an que ces quatre cancers meurtriers combinés : côlon, sein, prostate et pancréas. Les Proprotéines Convertases (PCs) sont une famille de 9 sérines protéases qui jouent un rôle dans la maturation des précurseurs de protéines. Les PCs activent/désactivent ces précurseurs en les clivant à un unique ou une paire de résidus d’acides aminés et sont ainsi essentiels pour divers processus biologiques, tels que l’activation de facteurs de croissance qui jouent un rôle vital dans la transformation cellulaire et les risques de formation de tumeurs. Parmi les neufs membres des sérines protéases identifiées, les rôles physiologiques du septième membre de la famille, PC7, restent encore largement méconnus à ce jour. Afin d'identifier davantage de substrats de PC7, un criblage protéomique quantitatif a été réalisé pour l'enrichissement sélectif de polypeptides Nglycosylés, sécrétés par les cellules hépatiques HuH7. Deux protéines transmembranaires de type II clivées par PC7/Furine, et sécrétées sous forme soluble, ont alors été identifiées : CAncer Susceptibility Candidate 4 (CASC4) and Golgi PhosphoProtein de 130 kDa (GPP130). Des études ultérieures menées sur CASC4 par iii notre laboratoire ont mis en évidence son rôle protecteur contre la migration et l'invasion du Cancer du Sein Triple Négatif. GPP130 est une protéine transmembranaire de type II avec un domaine luminal contenant des déterminants endosomaux et de récupération du Golgi, lui offrant une voie de trafic cellulaire unique. Jusqu'à présent, le rôle de GPP130 a principalement été étudié dans la liaison et le trafic rétrograde des Shiga-toxines. Un récent rapport a cependant aussi montré son implication dans la progression du cycle cellulaire et dans la prolifération des cellules du cancer de la tête et du cou. Ainsi, notre analyse du cBioPortal pour Cancer Genomics a révélé que GPP130 est amplifié jusqu'à 35% chez les patients atteints de cancer du poumon. Le travail présenté ici montre les implications du clivage de GPP130 par PC7 et Furine dans la progression du cancer du poumon, en identifiant la région de GPP130 responsable de la croissance cellulaire. Ce projet dévoile ainsi des stratégies thérapeutiques potentielles ciblant la prolifération cellulaire induite par GPP130.Lung Cancer is the leading cause of cancer death in Canada and causes significant morbidity in patients. Globally, World Health Organization (WHO) reports lung cancer as the leading cause of cancer-related deaths, with 2.21 million diagnoses/year, resulting in approximately 1.8 million deaths/year. Initially asymptomatic, it progresses to a highly invasive and quickly metastasizing cancer. It is responsible for more deaths per year than the combined death of the four deadly cancers: colon, breast, prostate, and pancreas. Proprotein Convertases (PCs) are a family of nine serine proteases that play a role in the maturation of secretory precursor proteins. Basic amino acid-specific PCs activate/inactivate precursor proteins by cleaving them at single or paired basic aminoacid residues. They are crucial for various biological processes, including the activation of growth factors that play a vital role in cellular transformation and the likelihood of tumor formation. Of the nine serine proteases identified, the physiological functions of the seventh member of the family, PC7, currently remain mostly unidentified. To further identify novel PC7 substrates, a quantitative proteomics screen for selective enrichment of N-glycosylated polypeptides, secreted from hepatic HuH7 cells, was performed. This identified two type-II transmembrane proteins, which were shed into soluble secreted forms by PC7/Furin: CAncer Susceptibility Candidate 4 (CASC4) and Golgi PhosphoProtein of 130 kDa (GPP130). Subsequent studies on CASC4 by our laboratory reported the protective role that CASC4 plays against migration and invasion in Triple Negative Breast Cancer. v GPP130 is a type-II transmembrane protein with a luminal domain containing endosomal and Golgi-retrieval determinants enabling a unique subcellular trafficking route. So far, the role of GPP130 has only been extensively studied in the binding and retrograde trafficking of Shiga toxin. However, recent reports have shown its implication in cell-cycle progression and cellular proliferation of head and neck cancer cells. Our analysis from cBioPortal for Cancer Genomics revealed that GPP130 is amplified in up to 35% of patients with lung cancer. The work presented here shows the implications of shedding GPP130 by PC7 and Furin in lung cancer progression by identifying the region of GPP130 responsible for cellular growth and unravelling potential therapeutic strategies for GPP130-induced cellular proliferation

    Institutional Allocation In Initial Public Offerings: Empirical Evidence

    Get PDF
    We analyze institutional allocation in initial public offerings (IPOs) using a new dataset of US offerings between 1997 and 1998. We document a positive relationship between institutional allocation and day one IPO returns. This is partly explained by the practice of giving institutions more shares in IPOs with strong pre-market demand, consistent with book-building theories. However, institutional allocation also contains private information about first-day IPO returns not reflected in pre-market demand and other public information. Our evidence supports book-building theories of IPO underpricing, but suggests that institutional allocation in underpriced issues is in excess of that explained by book-building alone.

    Two new families of high-gain DC-DC power electronic converters for DC-microgrids

    Get PDF
    Distributing the electric power in dc form is an appealing solution in many applications such as telecommunications, data centers, commercial buildings, and microgrids. A high gain dc-dc power electronic converter can be used to individually link low-voltage elements such as solar panels, fuel cells, and batteries to the dc voltage bus which is usually 400 volts. This way, it is not required to put such elements in a series string to build up their voltages. Consequently, each element can function at it optimal operating point regardless of the other elements in the system. In this dissertation, first a comparative study of dc microgrid architectures and their advantages over their ac counterparts is presented. Voltage level selection of dc distribution systems is discussed from the cost, reliability, efficiency, and safety standpoints. Next, a new family of non-isolated high-voltage-gain dc-dc power electronic converters with unidirectional power flow is introduced. This family of converters benefits from a low voltage stress across its switches. The proposed topologies are versatile as they can be utilized as single-input or double-input power converters. In either case, they draw continuous currents from their sources. Lastly, a bidirectional high-voltage-gain dc-dc power electronic converter is proposed. This converter is comprised of a bidirectional boost converter which feeds a switched-capacitor architecture. The switched-capacitor stage suggested here has several advantages over the existing approaches. For example, it benefits from a higher voltage gain while it uses less number of capacitors. The proposed converters are highly efficient and modular. The operating modes, dc voltage gain, and design procedure for each converter are discussed in details. Hardware prototypes have been developed in the lab. The results obtained from the hardware agree with those of the simulation models. --Abstract, page iv

    User-Generated Content and Online Product Search - The Case of the Indian Automobile Industry

    Get PDF
    The individual\u27s online search is done privately and is considered an accurate measure of purchase intention. Online product information influences purchases and can be expected to influence purchase intention. Using Google\u27s volume of online search trends and 23,000 online automobile reviews scraped from a popular Indian online forum, the relationship between product reviews and online product information search is examined. The reviews are divided into three groups based on the numerical rating associated with each review. Latent Dirichlet Allocation is used in each group to extract discussion topics from the review text. The study shows that online review valence significantly impacts online search volume. Further, while the topics of discussion across the 3 rating groups overlap, their impact on online search volume is different. The study demonstrates the need to look at rating information more closely and the influence of online UGC on online information search

    High Voltage Gain DC/DC Power Electronic Converters

    Get PDF
    A DC/DC power converter provides high voltage gain using integrated boost and voltage multiplier (VM) stages. The boost cell operates according to a switching sequence to alternately energize and discharge a primary winding. A VM cell electrically coupled to the primary winding of the boost cell charges a multiplier capacitor to a DC output voltage greater than the input voltage when the primary winding is energized and discharges the multiplier capacitor when primary winding is discharged

    Can Social Networks Help Mitigate Information Asymmetry in Online Markets?

    Get PDF
    This study examines whether online social networks can help mitigate information asymmetry in online markets, and if so, what aspects of these networks generate value for market participants. Using a comprehensive dataset on transactions and social network information in an online peerto- peer lending market, Prosper.com, we empirically study the linkage between borrowers\u27 social network positions and their transactional outcomes. Our results highlight the distinction between the structural and relational dimensions of social networks. Stronger ties, where social and economic relations intertwine with each other, create value by both exerting peer pressures and increasing the verifiability of network ties, thereby alleviating the information asymmetry between borrowers and lenders. Our findings contribute to the growing IS literature on the economics of social networks as well as to the study of online quality signaling mechanisms
    • …
    corecore