20 research outputs found

    New challenges for text mining: mapping between text and manually curated pathways

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Associating literature with pathways poses new challenges to the Text Mining (TM) community. There are three main challenges to this task: (1) the identification of the mapping position of a specific entity or reaction in a given pathway, (2) the recognition of the causal relationships among multiple reactions, and (3) the formulation and implementation of required inferences based on biological domain knowledge.</p> <p>Results</p> <p>To address these challenges, we constructed new resources to link the text with a model pathway; they are: the GENIA pathway corpus with event annotation and NF-kB pathway. Through their detailed analysis, we address the untapped resource, ‘bio-inference,’ as well as the differences between text and pathway representation. Here, we show the precise comparisons of their representations and the nine classes of ‘bio-inference’ schemes observed in the pathway corpus.</p> <p>Conclusions</p> <p>We believe that the creation of such rich resources and their detailed analysis is the significant first step for accelerating the research of the automatic construction of pathway from text.</p

    New structures to solve aggregated queries for trips over public transportation networks

    Full text link
    Representing the trajectories of mobile objects is a hot topic from the widespread use of smartphones and other GPS devices. However, few works have focused on representing trips over public transportation networks (buses, subway, and trains) where a user's trips can be seen as a sequence of stages performed within a vehicle shared with many other users. In this context, representing vehicle journeys reduces the redundancy because all the passengers inside a vehicle share the same arrival time for each stop. In addition, each vehicle journey follows exactly the sequence of stops corresponding to its line, which makes it unnecessary to represent that sequence for each journey. To solve data management for transportation systems, we designed a conceptual model that gave us a better insight into this data domain and allowed us the definition of relevant terms and the detection of redundancy sources among those data. Then, we designed two compact representations focused on users' trips (TTCTR) and on vehicle trips (AcumM), respectively. Each approach owns some strengths and is able to answer some queries efficiently. We include experimental results over synthetic trips generated from accurate schedules obtained from a real network description (from the bus transportation system of Madrid) to show the space/time trade-off of both approaches. We considered a wide range of different queries about the use of the transportation network such as counting-based or aggregate queries regarding the load of any line of the network at different times.Comment: This research has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Sk{\l}odowska-Curie Actions H2020-MSCA-RISE-2015 BIRDS GA No. 69094

    Assigning Polarity Scores to Reviews Using Machine Learning Techniques

    No full text
    We propose a novel type of document classification task that quantifies how much a given document (review) appreciates the target object by using a continuous measure called sentiment polarity score (SP score) rather than binary polarity (good or bad). An SP score gives a concise summary of a review, and provides more information than binary classification. The difficulty of this task lies in the quantification of polarity. In this paper we use support vector regression (SVR) to tackle this problem. Experiments on book reviews using five-point scales show that SVR outperforms a multi-class classification method using support vector machines, and the results are close to human performance. We also examine the effect of sentence subjectivity detection using a Naive Bayes classifier, and show that this improves the robustness of the classifier.本論文では,ある対象を評価している文章(レビュー)が与えられた時,対象物に対する評価が「良い」か「悪い」かでレビューを二倍分類するのではなく,どの桂度「良い」か「悪い」かの指標(sentimentpolarityscore (SPscore))をレビューに与える新しいタスクを提案する.SPscoreはレビューの簡潔な要約であり,単純な「良い」か「悪い」かの二倍分類より詳細な情報を与える.このタスクの難しさは連続した量であるSPscoreをどのようにしてレビューから得られるかにある.本稿ではsupportvectorregressionを用いてSPscoreを求める方法を提案する.5段階評価がついた本に対するレビューを用いた実験で,我々の手法がsupportvectormachinesを用いた多値分類より高い精度であり,人による指標の予測結果に近いことを示す.また,NaiveBayesClassifierを用いた文単位での主観性分析を用いることにより我々の手法の頑健性が増すことを示す
    corecore