27 research outputs found

    CONSTRAINED MULTI-GROUP PROJECT ALLOCATION USING MAHALANOBIS DISTANCE

    Get PDF
    Optimal allocation is one of the most active research areas in operation research using binary integer variables. The allocation of multi constrained projects among several options available along a given planning horizon is an especially significant problem in the general area of item classification. The main goal of this dissertation is to develop an analytical approach for selecting projects that would be most attractive from an economic point of view to be developed or allocated among several options, such as in-house engineers and private contractors (in transportation projects). A relevant limiting resource in addition to the availability of funds is the in-house manpower availability. In this thesis, the concept of Mahalanobis distance (MD) will be used as the classification criterion. This is a generalization of the Euclidean distance that takes into account the correlation of the characteristics defining the scope of a project. The desirability of a given project to be allocated to an option is defined in terms of its MD to that particular option. Ideally, each project should be allocated to its closest option. This, however, may not be possible because of the available levels of each relevant resource. The allocation process is formulated mathematically using two Binary Integer Programming (BIP) models. The first formulation maximizes the dollar value of benefits derived by the traveling public from those projects being implemented subject to a budget, total sum of MD, and in-house manpower constraints. The second formulation minimizes the total sum of MD subject to a budget and the in-house manpower constraints. The proposed solution methodology for the BIP models is based on the branchand- bound method. In particular, one of the contributions of this dissertation is the development of a strategy for branching variables and node selection that is consistent with allocation priorities based on MD to improve the branch-and-bound performance level as well as handle a large scale application. The suggested allocation process includes: (a) multiple allocation groups; (b) multiple constraints; (c) different BIP models. Numerical experiments with different projects and options are considered to illustrate the application of the proposed approach

    GlySpy: A software suite for assigning glycan topologies from sequential mass spectral data

    Get PDF
    GlySpy is a suite of algorithms used to determine the structure of glycans. Glycans, which are orderly aggregations of monosaccharides such as glucose, mannose, and fucose, are often attached to proteins and lipids, and provide a wide range of biological functions. Previous biomolecule-sequencing algorithms have operated on linear polymers such as proteins or DNA but, because glycans form complicated branching structures, new approaches are required. GlySpy uses data derived from sequential mass spectrometry (MSn), in which a precursor molecule is fragmented to form products, each of which may then be fragmented further, gradually disassembling the glycan. GlySpy resolves the structures of the original glycans by examining these disassembly pathways. The four main components of GlySpy are: (1) OSCAR (the Oligosaccharide Subtree Constraint Algorithm), which accepts analyst-selected MSn disassembly pathways and produces a set of plausible glycan structures; (2) IsoDetect, which reports the MSn disassembly pathways that are inconsistent with a set of expected structures, and which therefore may indicate the presence of alternative isomeric structures; (3) IsoSolve, which attempts to assign the branching structures of multiple isomeric glycans found in a complex mixture; and (4) Intelligent Data Acquisition (IDA), which provides automated guidance to the mass spectrometer operator, selecting glycan fragments for further MSn disassembly. This dissertation provides a primer for the underlying interdisciplinary topics---carbohydrates, glycans, MSn, and so on-and also presents a survey of the relevant literature with a focus on currently-available tools. Each of GlySpy\u27s four algorithms is described in detail, along with results from their application to biologically-derived glycan samples. A summary enumerates GlySpy\u27s contributions, which include de novo glycan structural analysis, favorable performance characteristics, interpretation of higher-order MSn data, and the automation of both data acquisition and analysis

    Unsupervised multilingual learning

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2010.Cataloged from PDF version of thesis.Includes bibliographical references (p. 241-254).For centuries, scholars have explored the deep links among human languages. In this thesis, we present a class of probabilistic models that exploit these links as a form of naturally occurring supervision. These models allow us to substantially improve performance for core text processing tasks, such as morphological segmentation, part-of-speech tagging, and syntactic parsing. Besides these traditional NLP tasks, we also present a multilingual model for lost language deciphersment. We test this model on the ancient Ugaritic language. Our results show that we can automatically uncover much of the historical relationship between Ugaritic and Biblical Hebrew, a known related language.by Benjamin Snyder.Ph.D

    Real Time Crime Prediction Using Social Media

    Get PDF
    There is no doubt that crime is on the increase and has a detrimental influence on a nation's economy despite several attempts of studies on crime prediction to minimise crime rates. Historically, data mining techniques for crime prediction models often rely on historical information and its mostly country specific. In fact, only a few of the earlier studies on crime prediction follow standard data mining procedure. Hence, considering the current worldwide crime trend in which criminals routinely publish their criminal intent on social media and ask others to see and/or engage in different crimes, an alternative, and more dynamic strategy is needed. The goal of this research is to improve the performance of crime prediction models. Thus, this thesis explores the potential of using information on social media (Twitter) for crime prediction in combination with historical crime data. It also figures out, using data mining techniques, the most relevant feature engineering needed for United Kingdom dataset which could improve crime prediction model performance. Additionally, this study presents a function that could be used by every state in the United Kingdom for data cleansing, pre-processing and feature engineering. A shinny App was also use to display the tweets sentiment trends to prevent crime in near-real time.Exploratory analysis is essential for revealing the necessary data pre-processing and feature engineering needed prior to feeding the data into the machine learning model for efficient result. Based on earlier documented studies available, this is the first research to do a full exploratory analysis of historical British crime statistics using stop and search historical dataset. Also, based on the findings from the exploratory study, an algorithm was created to clean the data, and prepare it for further analysis and model creation. This is an enormous success because it provides a perfect dataset for future research, particularly for non-experts to utilise in constructing models to forecast crime or conducting investigations in around 32 police districts of the United Kingdom.Moreover, this study is the first study to present a complete collection of geo-spatial parameters for training a crime prediction model by combining demographic data from the same source in the United Kingdom with hourly sentiment polarity that was not restricted to Twitter keyword search. Six unique base models that were frequently mentioned in the previous literature was selected and used to train stop-and-search historical crime dataset and evaluated on test data and finally validated with dataset from London and Kent crime datasets.Two different datasets were created from twitter and historical data (historical crime data with twitter sentiment score and historical data without twitter sentiment score). Six of the most prevalent machine learning classifiers (Random Forest, Decision Tree, K-nearest model, support vector machine, neural network and naïve bayes) were trained and tested on these datasets. Additionally, hyperparameters of each of the six models developed were tweaked using random grid search. Voting classifiers and logistic regression stacked ensemble of different models were also trained and tested on the same datasets to enhance the individual model performance.In addition, two combinations of stack ensembles of multiple models were constructed to enhance and choose the most suitable models for crime prediction, and based on their performance, the appropriate prediction model for the UK dataset would be selected. In terms of how the research may be interpreted, it differs from most earlier studies that employed Twitter data in that several methodologies were used to show how each attribute contributed to the construction of the model, and the findings were discussed and interpreted in the context of the study. Further, a shiny app visualisation tool was designed to display the tweets’ sentiment score, the text, the users’ screen name, and the tweets’ vicinity which allows the investigation of any criminal actions in near-real time. The evaluation of the models revealed that Random Forest, Decision Tree, and K nearest neighbour outperformed other models. However, decision trees and Random Forests perform better consistently when evaluated on test data

    Proceedings of the 7th International Conference on Functional-Structural Plant Models, SaariselkÀ, Finland, 9 - 14 June 2013

    Get PDF

    Computer modelling of agroforestry systems

    Get PDF

    Detecting and mapping forest nutrient deficiencies: eucalyptus variety (Eucalyptus grandis x and Eucalyptus urophylla) trees in KwaZulu-Natal, South Africa.

    Get PDF
    Doctoral Degree. University of KwaZulu-Natal, Pietermaritzburg.Abstract available in PDF

    The role of structured induction in expert systems

    Get PDF
    A "structured induction" technique was developed and tested using a rules- from -examples generator together with a chess -specific application package. A drawback of past experience with computer induction, reviewed in this thesis, has been the generation of machine -oriented rules opaque to the user. By use of the structured approach humanly understandable rules were synthesized from expert supplied examples. These rules correctly performed chess endgame classifications of sufficient complexity to be regarded as difficult by international master standard players. Using the "Interactive ID3" induction tools developed by the author, chess experts, with a little programming support, were able to generate rules which solve problems considered difficult or impossible by conventional programming techniques. Structured induction and associated programming tools were evaluated using the chess endgames Icing and Pawn vs. King (Black -tomove) and King and Pawn vs. King and Rook (White -to -move, White Pawn on a7) as trial problems of measurable complexity.Structured solutions to both trial problems are presented, and implications of this work for the design of expert systems languages are assessed
    corecore