49 research outputs found

    Social Fingerprinting: Identifying Users of Social Networks by their Data Footprint

    Get PDF
    This research defines, models, and quantifies a new metric for social networks: the social fingerprint. Just as one\u27s fingers leave behind a unique trace in a print, this dissertation introduces and demonstrates that the manner in which people interact with other accounts on social networks creates a unique data trail. Accurate identification of a user\u27s social fingerprint can address the growing demand for improved techniques in unique user account analysis, computational forensics and social network analysis. In this dissertation, we theorize, construct and test novel software and methodologies which quantify features of social network data. All approaches and methodologies are framed to test the accuracy of social fingerprint identification. Further, we demonstrate and verify that features of anonymous data trails observed on social networks are unique identifiers of social network users. Lastly, this research delivers scalable technology for future research in social network analysis, business analytics and social fingerprinting

    NASA/American Society for Engineering Education (ASEE) Summer Faculty Fellowship Program, 1990

    Get PDF
    Since 1964, NASA has supported a program of summer faculty fellowships for engineering and science educators. The objectives are to further the professional knowledge of qualified engineering and science members; to stimulate and exchange ideas between participants and NASA; to enrich and refresh the research and teaching activities of participants' institutions; and to contribute to the research objectives of the NASA center. The study program consists of lectures and seminars on topics of interest or that are directly relevant to the research topics

    Acta Cybernetica : Volume 22. Number 3.

    Get PDF

    Application of multilevel control techniques to classes of distributed parameter plants

    Get PDF
    This study concerns the application of a combination of multilevel hierarchical systems analysis techniques and Pontryagin\u27s minimum principle (multilevel control) to the problem of controlling optimally two classes of dynamic distributed parameter plants representing concentrations balances in streams, rivers and estuaries. The concentrations treated in this study are those deemed the most effective indicators of water quality, dissolved oxygen (DO) and biochemical oxygen demand (BOD). One class of plants treated in this study consists of linear continuous distributed parameter plants represented mathematically by sets of simultaneous partial differential equations. Optimal control of a plant of this class is initiated by applying spatial discretization followed by a combination of multilevel techniques and Pontryagin\u27s minimum principle for lumped parameter systems. This approach reduces the original problem of optimally controlling a distributed parameter plant to a hierarchy of subproblems comprised of ordinary differential and algebraic equations that can be solved iteratively. A general two-dimensional plant representative of a class of two-step discrete dynamic distributed parameter plants is derived from mass balances at the faces of a model of a volume element of a waterway. The resulting set of simultaneous finite-difference equations represents dynamic balances of concentrations at a finite number of spatial points in a reach of a waterway at selected time instants. Application of Pontryagin\u27s minimum principle for discrete systems in conjunction with multilevel hierarchical systems analysis techniques reduces the problem of controlling such a plant optimally to a hierarchy of subproblems to be solved iteratively. Implicit in the application of optimal control to a plant is the selection of a suitable performance index functional with which to measure the relative optimality of each solution iteration. A variety of performance indices based upon physical considerations is utilized in conjunction with several different control modes for a number of plants representative of the two classes treated in this study. Subproblem hierarchies corresponding to both continuous and discrete distributed parameter plants representing concentrations balances in waterway reaches subject to multilevel optimal control are aggregated into super hierarchies. These super hierarchies possess at least one more level than those corresponding to the single reaches and represent, in this context, the concentrations balances in multireach or regional portions of waterways. Sufficient boundary, initial and final conditions are presented for numerical solution of the subproblem hierarchies developed in this study. Flow charts for the corresponding digital computer programs also are depicted. A proof of consistency between the ordinary differential equations of the spatially discretized plant and the partial differential equations of the continuous distributed parameter plant that it approximates is developed for a representative plant. A proof of convergence of the solutions of the equations of the same spatially discretized plant also is developed. Stability analyses are conducted for representative continuous and discrete distributed parameter plants. The optimal control of the spatially discretized continuous distributed parameter plant is formulated as a linear regulator problem and the associated performance index is utilized as a Liapunov function. The optimal control of the discrete distributed parameter plant with time-varying mean volume flow rate is formulated as the problem of optimal control of a nonstationary system which is treated by transforming the nonstationary system to an equivalent stationary system. The z-transform is applied to the finite-difference equations of the plant to facilitate evaluation of the effect of the presence of transport lags. The relationship between structural characteristics and computational efficiency of subproblem hierarchies is analyzed. Multilevel hierarchical systems analysis techniques are applied to the sensitivity analysis of a spatially discretized distributed parameter plant subject to multilevel optimal control. The combination of discretization and multilevel techniques is shown to reduce the generation of trajectory sensitivity coefficients for an optimally controlled distributed parameter plant to generation of trajectory sensitivity coefficients for a series of lumped parameter plants under optimal control. A normalized performance index sensitivity function also is developed for the same plant. Numerical results of multilevel optimization are presented for various control modes and configurations applied to plants representing: single reaches of a tidal river, four contiguous reaches of a tidal river, six contiguous reaches of a tidal river with taper and waste dischargers, and single reaches of an estuary. The study culminates with the application of one of the single reach subproblem hierarchies for a discrete distributed parameter plant under multilevel optimal control and multilevel hierarchical systems analysis techniques to the problem of minimizing total treatment cost for a multireach portion of a tidal river. This demonstrates the feasibility and efficiency of the multilevel approach to the solution of dynamic systems optimization problems of regional scope

    Introspective knowledge acquisition for case retrieval networks in textual case base reasoning.

    Get PDF
    Textual Case Based Reasoning (TCBR) aims at effective reuse of information contained in unstructured documents. The key advantage of TCBR over traditional Information Retrieval systems is its ability to incorporate domain-specific knowledge to facilitate case comparison beyond simple keyword matching. However, substantial human intervention is needed to acquire and transform this knowledge into a form suitable for a TCBR system. In this research, we present automated approaches that exploit statistical properties of document collections to alleviate this knowledge acquisition bottleneck. We focus on two important knowledge containers: relevance knowledge, which shows relatedness of features to cases, and similarity knowledge, which captures the relatedness of features to each other. The terminology is derived from the Case Retrieval Network (CRN) retrieval architecture in TCBR, which is used as the underlying formalism in this thesis applied to text classification. Latent Semantic Indexing (LSI) generated concepts are a useful resource for relevance knowledge acquisition for CRNs. This thesis introduces a supervised LSI technique called sprinkling that exploits class knowledge to bias LSI's concept generation. An extension of this idea, called Adaptive Sprinkling has been proposed to handle inter-class relationships in complex domains like hierarchical (e.g. Yahoo directory) and ordinal (e.g. product ranking) classification tasks. Experimental evaluation results show the superiority of CRNs created with sprinkling and AS, not only over LSI on its own, but also over state-of-the-art classifiers like Support Vector Machines (SVM). Current statistical approaches based on feature co-occurrences can be utilized to mine similarity knowledge for CRNs. However, related words often do not co-occur in the same document, though they co-occur with similar words. We introduce an algorithm to efficiently mine such indirect associations, called higher order associations. Empirical results show that CRNs created with the acquired similarity knowledge outperform both LSI and SVM. Incorporating acquired knowledge into the CRN transforms it into a densely connected network. While improving retrieval effectiveness, this has the unintended effect of slowing down retrieval. We propose a novel retrieval formalism called the Fast Case Retrieval Network (FCRN) which eliminates redundant run-time computations to improve retrieval speed. Experimental results show FCRN's ability to scale up over high dimensional textual casebases. Finally, we investigate novel ways of visualizing and estimating complexity of textual casebases that can help explain performance differences across casebases. Visualization provides a qualitative insight into the casebase, while complexity is a quantitative measure that characterizes classification or retrieval hardness intrinsic to a dataset. We study correlations of experimental results from the proposed approaches against complexity measures over diverse casebases

    Helmholtz Principle-Based Keyword Extraction

    Get PDF
    In today’s world of evolving technology, everybody wishes to accomplish tasks in least time. As information available online is perpetuating every day, it becomes very difficult to summarize any more than 100 documents in acceptable time. Thus, ”text summarization” is a challenging problem in the area of Natural Language Processing (NLP) especially in the context of global languages. In this thesis, we survey taxonomy of text summarization from different aspects. It briefly explains different approaches to summarization and the evaluation parameters. Also presented are a thorough details and facts about more than fifty automatic text summarization systems to ease the job of researchers and serve as a short encyclopedia for the investigated systems. Keyword extraction methods plays vital role in text mining and document processing. Keywords represent essential content of a document. Text mining applications take the advantage of keywords for processing documents. A quality Keyword is a word that represents the exact content of the text subsetly. It is very difficult to process large number of documents to get high quality keywords in acceptable time. This thesis gives a comparison between the most popular keyword extractions method, tf-idf and the proposed method that is based on Helmholtz Principle. Helmholtz Principle is based on the ideas from image processing and derived from the Gestalt theory of human perception. We also investigate the run time to extract the keywords by both the methods. Experimental results show that keyword extraction method based on Helmholtz Principle outperformancetf-idf

    Matrix factorization over dioids and its applications in data mining

    Get PDF
    Matrix factorizations are an important tool in data mining, and they have been used extensively for finding latent patterns in the data. They often allow to separate structure from noise, as well as to considerably reduce the dimensionality of the input matrix. While classical matrix decomposition methods, such as nonnegative matrix factorization (NMF) and singular value decomposition (SVD), proved to be very useful in data analysis, they are limited by the underlying algebraic structure. NMF, in particular, tends to break patterns into smaller bits, often mixing them with each other. This happens because overlapping patterns interfere with each other, making it harder to tell them apart. In this thesis we study matrix factorization over algebraic structures known as dioids, which are characterized by the lack of additive inverse (“negative numbers”) and the idempotency of addition (a + a = a). Using dioids makes it easier to separate overlapping features, and, in particular, it allows to better deal with the above mentioned pattern breaking problem. We consider different types of dioids, that range from continuous (subtropical and tropical algebras) to discrete (Boolean algebra). Among these, the Boolean algebra is perhaps the most well known, and there exist methods that allow one to obtain high quality Boolean matrix factorizations in terms of the reconstruction error. In this work, however, a different objective function is used – the description length of the data, which enables us to obtain compact and highly interpretable results. The tropical and subtropical algebras, on the other hand, are much less known in the data mining field. While they find applications in areas such as job scheduling and discrete event systems, they are virtually unknown in the context of data analysis. We will use them to obtain idempotent nonnegative factorizations that are similar to NMF, but are better at separating the most prominent features of the data.Matrix-Faktorisierungen sind ein wichtiges Werkzeug in Data-Mining und wurden umfangreich zum Auffinden latenter Muster in den Daten verwendet. Oft erlauben sie, die Struktur vom Rauschen zu trennen, sowie Dimensionalität von der Eingabematrix wesentlich zu reduzieren. Obwohl klassische Methoden für die Matrix-Zerlegung, wie z.B. nicht negative Matrixfaktorisierung (NMF) und Singulärwertzerlegung (SVD), in der Datenanalyse sich als sehr nützlich erwiesen haben, sind sie durch die zugrunde liegende algebraische Struktur eingeschränkt. Insbesondere neigt NMF dazu, Muster in kleinere Bits zu brechen, und vermischt sie oft miteinander. Das passiert, weil überschneidende Muster sich gegenseitig stören, sodass es schwieriger ist, sie auseinander zu halten. In dieser Dissertation werden Matrix-Faktorisierungen über algebraische Strukturen, sogenannte Dioiden, untersucht, die sich durch die fehlende additive Inverse (“negative Zahlen”) und Idempotenz der Addition (a + a = a) auszeichnen. Mit Dioiden ist es einfacher überschneidende Merkmale zu trennen. Insbesondere erlauben sie besser mit dem erwähnten Musterbrechenproblem umzugehen. Es werden unterschiedliche Dioiden untersucht, die von kontinuierlichen (subtropische und tropische Algebren) bis zu diskreter (Boolesche Algebra) reichen. Unter diesen, die Boolesche Algebra ist wahrscheinlich die bekannteste, und es gibt Methoden, die ermöglichen hochwertiger Matrix-Faktorisierungen in Bezug auf den Rekonstruktionsfehler zu erzielen. In dieser Arbeit aber wird eine andere Zielfunktion verwendet: Die Länge der Beschreibung von den Daten. Die Zielfunktion ermöglicht uns kompakte und hochinterpretierbare Ergebnisse zu erzielen. Andererseits sind die tropische und subtropische Algebren viel weniger im Bereich Data-Mining bekannt. Sie finden zwar Anwendungen in Bereichen wie Job-Scheduling und diskrete Ereignissysteme, jedoch sind sie im Kontext von Datenanalyse nahezu unbekannt. Hier werden sie verwendet, um idempotente, nicht negative Faktorisierungen zu erhalten, die NMF ähneln, aber die wichtigsten Merkmale der Daten besser voneinander trennen
    corecore