44,376 research outputs found

    Characteristic of partition-circuit matroid through approximation number

    Full text link
    Rough set theory is a useful tool to deal with uncertain, granular and incomplete knowledge in information systems. And it is based on equivalence relations or partitions. Matroid theory is a structure that generalizes linear independence in vector spaces, and has a variety of applications in many fields. In this paper, we propose a new type of matroids, namely, partition-circuit matroids, which are induced by partitions. Firstly, a partition satisfies circuit axioms in matroid theory, then it can induce a matroid which is called a partition-circuit matroid. A partition and an equivalence relation on the same universe are one-to-one corresponding, then some characteristics of partition-circuit matroids are studied through rough sets. Secondly, similar to the upper approximation number which is proposed by Wang and Zhu, we define the lower approximation number. Some characteristics of partition-circuit matroids and the dual matroids of them are investigated through the lower approximation number and the upper approximation number.Comment: 12 page

    Classifying sequences by the optimized dissimilarity space embedding approach: a case study on the solubility analysis of the E. coli proteome

    Full text link
    We evaluate a version of the recently-proposed classification system named Optimized Dissimilarity Space Embedding (ODSE) that operates in the input space of sequences of generic objects. The ODSE system has been originally presented as a classification system for patterns represented as labeled graphs. However, since ODSE is founded on the dissimilarity space representation of the input data, the classifier can be easily adapted to any input domain where it is possible to define a meaningful dissimilarity measure. Here we demonstrate the effectiveness of the ODSE classifier for sequences by considering an application dealing with the recognition of the solubility degree of the Escherichia coli proteome. Solubility, or analogously aggregation propensity, is an important property of protein molecules, which is intimately related to the mechanisms underlying the chemico-physical process of folding. Each protein of our dataset is initially associated with a solubility degree and it is represented as a sequence of symbols, denoting the 20 amino acid residues. The herein obtained computational results, which we stress that have been achieved with no context-dependent tuning of the ODSE system, confirm the validity and generality of the ODSE-based approach for structured data classification.Comment: 10 pages, 49 reference

    Statistical Inferences for Polarity Identification in Natural Language

    Full text link
    Information forms the basis for all human behavior, including the ubiquitous decision-making that people constantly perform in their every day lives. It is thus the mission of researchers to understand how humans process information to reach decisions. In order to facilitate this task, this work proposes a novel method of studying the reception of granular expressions in natural language. The approach utilizes LASSO regularization as a statistical tool to extract decisive words from textual content and draw statistical inferences based on the correspondence between the occurrences of words and an exogenous response variable. Accordingly, the method immediately suggests significant implications for social sciences and Information Systems research: everyone can now identify text segments and word choices that are statistically relevant to authors or readers and, based on this knowledge, test hypotheses from behavioral research. We demonstrate the contribution of our method by examining how authors communicate subjective information through narrative materials. This allows us to answer the question of which words to choose when communicating negative information. On the other hand, we show that investors trade not only upon facts in financial disclosures but are distracted by filler words and non-informative language. Practitioners - for example those in the fields of investor communications or marketing - can exploit our insights to enhance their writings based on the true perception of word choice
    • …
    corecore