418 research outputs found

    Writer Identification Using Inexpensive Signal Processing Techniques

    Full text link
    We propose to use novel and classical audio and text signal-processing and otherwise techniques for "inexpensive" fast writer identification tasks of scanned hand-written documents "visually". The "inexpensive" refers to the efficiency of the identification process in terms of CPU cycles while preserving decent accuracy for preliminary identification. This is a comparative study of multiple algorithm combinations in a pattern recognition pipeline implemented in Java around an open-source Modular Audio Recognition Framework (MARF) that can do a lot more beyond audio. We present our preliminary experimental findings in such an identification task. We simulate "visual" identification by "looking" at the hand-written document as a whole rather than trying to extract fine-grained features out of it prior classification.Comment: 9 pages; 1 figure; presented at CISSE'09 at http://conference.cisse2009.org/proceedings.aspx ; includes the the application source code; based on MARF described in arXiv:0905.123

    A New Approach to Time Domain Classification of Broadband Noise in Gravitational Wave Data

    Get PDF
    Broadband noise in gravitational wave (GW) detectors, also known as triggers, can often be a deterrant to the efficiency with which astrophysical search pipelines detect sources. It is important to understand their instrumental or environmental origin so that they could be eliminated or accounted for in the data. Since the number of triggers is large, data mining approaches such as clustering and classification are useful tools for this task. Classification of triggers based on a handful of discrete properties has been done in the past. A rich information content is available in the waveform or 'shape' of the triggers that has had a rather restricted exploration so far. This paper presents a new way to classify triggers deriving information from both trigger waveforms as well as their discrete physical properties using a sequential combination of the Longest Common Sub-Sequence (LCSS) and LCSS coupled with Fast Time Series Evaluation (FTSE) for waveform classification and the multidimensional hierarchical classification (MHC) analysis for the grouping based on physical properties. A generalized k-means algorithm is used with the LCSS (and LCSS+FTSE) for clustering the triggers using a validity measure to determine the correct number of clusters in absence of any prior knowledge. The results have been demonstrated by simulations and by application to a segment of real LIGO data from the sixth science run.Comment: 16 pages, 16 figure

    Statistical mechanics of transcription-factor binding site discovery using Hidden Markov Models

    Full text link
    Hidden Markov Models (HMMs) are a commonly used tool for inference of transcription factor (TF) binding sites from DNA sequence data. We exploit the mathematical equivalence between HMMs for TF binding and the "inverse" statistical mechanics of hard rods in a one-dimensional disordered potential to investigate learning in HMMs. We derive analytic expressions for the Fisher information, a commonly employed measure of confidence in learned parameters, in the biologically relevant limit where the density of binding sites is low. We then use techniques from statistical mechanics to derive a scaling principle relating the specificity (binding energy) of a TF to the minimum amount of training data necessary to learn it.Comment: 25 pages, 2 figures, 1 table V2 - typos fixed and new references adde

    Temperature effects on zoeal morphometric traits and intraspecific variability in the hairy crab Cancer setosus across latitude

    Full text link
    International audiencePhenotypic plasticity is an important but often ignored ability that enables organisms, within species-specific physiological limits, to respond to gradual or sudden extrinsic changes in their environment. In the marine realm, the early ontogeny of decapod crustaceans is among the best known examples to demonstrate a temperature-dependent phenotypic response. Here, we present morphometric results of larvae of the hairy crab , the embryonic development of which took place at different temperatures at two different sites (Antofagasta, 23°45′ S; Puerto Montt, 41°44′ S) along the Chilean Coast. Zoea I larvae from Puerto Montt were significantly larger than those from Antofagasta, when considering embryonic development at the same temperature. Larvae from Puerto Montt reared at 12 and 16°C did not differ morphometrically, but sizes of larvae from Antofagasta kept at 16 and 20°C did, being larger at the colder temperature. Zoea II larvae reared in Antofagasta at three temperatures (16, 20, and 24°C) showed the same pattern, with larger larvae at colder temperatures. Furthermore, larvae reared at 24°C, showed deformations, suggesting that 24°C, which coincides with temperatures found during strong EL Niño events, is indicative of the upper larval thermal tolerance limit.   is exposed to a wide temperature range across its distribution range of about 40° of latitude. Phenotypic plasticity in larval offspring does furthermore enable this species to locally respond to the inter-decadal warming induced by El Niño. Morphological plasticity in this species does support previously reported energetic trade-offs with temperature throughout early ontogeny of this species, indicating that plasticity may be a key to a species' success to occupy a wide distribution range and/or to thrive under highly variable habitat conditions

    On finite pp-groups whose automorphisms are all central

    Full text link
    An automorphism α\alpha of a group GG is said to be central if α\alpha commutes with every inner automorphism of GG. We construct a family of non-special finite pp-groups having abelian automorphism groups. These groups provide counter examples to a conjecture of A. Mahalanobis [Israel J. Math., {\bf 165} (2008), 161 - 187]. We also construct a family of finite pp-groups having non-abelian automorphism groups and all automorphisms central. This solves a problem of I. Malinowska [Advances in group theory, Aracne Editrice, Rome 2002, 111-127].Comment: 11 pages, Counter examples to a conjecture from [Israel J. Math., {\bf 165} (2008), 161 - 187]; This paper will appear in Israel J. Math. in 201

    Food and welfare in India, c. 1900–1950

    Get PDF
    In 2001, the People's Union for Civil Liberties submitted a writ petition to the Supreme Court of India on the “right to food.” The petitioner was a voluntary human rights organization; the initial respondents were the Government of India, the Food Corporation of India, and six state governments. The petition opens with three pointed questions posed to the court: * A. Does the right to life mean that people who are starving and who are too poor to buy food grains ought to be given food grains free of cost by the State from the surplus stock lying with the State, particularly when it is reported that a large part of it is lying unused and rotting? * B. Does not the right to life under Article 21 of the Constitution of India include the right to food? * C. Does not the right to food, which has been upheld by the Honourable Court, imply that the state has a duty to provide food especially in situations of drought, to people who are drought affected and are not in a position to purchase food

    Automatic Network Fingerprinting through Single-Node Motifs

    Get PDF
    Complex networks have been characterised by their specific connectivity patterns (network motifs), but their building blocks can also be identified and described by node-motifs---a combination of local network features. One technique to identify single node-motifs has been presented by Costa et al. (L. D. F. Costa, F. A. Rodrigues, C. C. Hilgetag, and M. Kaiser, Europhys. Lett., 87, 1, 2009). Here, we first suggest improvements to the method including how its parameters can be determined automatically. Such automatic routines make high-throughput studies of many networks feasible. Second, the new routines are validated in different network-series. Third, we provide an example of how the method can be used to analyse network time-series. In conclusion, we provide a robust method for systematically discovering and classifying characteristic nodes of a network. In contrast to classical motif analysis, our approach can identify individual components (here: nodes) that are specific to a network. Such special nodes, as hubs before, might be found to play critical roles in real-world networks.Comment: 16 pages (4 figures) plus supporting information 8 pages (5 figures

    Radio-loud Narrow-Line Type 1 Quasars

    Full text link
    We present the first systematic study of (non-radio-selected) radio-loud narrow-line Seyfert 1 (NLS1) galaxies. Cross-correlation of the `Catalogue of Quasars and Active Nuclei' with several radio and optical catalogues led to the identification of 11 radio-loud NLS1 candidates including 4 previously known ones. Most of the radio-loud NLS1s are compact, steep spectrum sources accreting close to, or above, the Eddington limit. The radio-loud NLS1s of our sample are remarkable in that they occupy a previously rarely populated regime in NLS1 multi-wavelength parameter space. While their [OIII]/H_beta and FeII/H_beta intensity ratios almost cover the whole range observed in NLS1 galaxies, their radio properties extend the range of radio-loud objects to those with small widths of the broad Balmer lines. Among the radio-detected NLS1 galaxies, the radio index R distributes quite smoothly up to the critical value of R ~ 10 and covers about 4 orders of magnitude in total. Statistics show that ~7% of the NLS1 galaxies are formally radio-loud while only 2.5% exceed a radio index R > 100. Several mechanisms are considered as explanations for the radio loudness of the NLS1 galaxies and for the lower frequency of radio-louds among NLS1s than quasars. While properties of most sources (with 2-3 exceptions) generally do not favor relativistic beaming, the combination of accretion mode and spin may explain the observations. (abbreviated)Comment: Astronomical Journal (first submitted in Dec. 2005); 45 pages incl. 1 colour figur

    Subfamily specific conservation profiles for proteins based on n-gram patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>A new algorithm has been developed for generating conservation profiles that reflect the evolutionary history of the subfamily associated with a query sequence. It is based on n-gram patterns (NP{<it>n,m</it>}) which are sets of <it>n </it>residues and <it>m </it>wildcards in windows of size <it>n+m</it>. The generation of conservation profiles is treated as a signal-to-noise problem where the signal is the count of n-gram patterns in target sequences that are similar to the query sequence and the noise is the count over all target sequences. The signal is differentiated from the noise by applying singular value decomposition to sets of target sequences rank ordered by similarity with respect to the query.</p> <p>Results</p> <p>The new algorithm was used to construct 4,248 profiles from 120 randomly selected Pfam-A families. These were compared to profiles generated from multiple alignments using the consensus approach. The two profiles were similar whenever the subfamily associated with the query sequence was well represented in the multiple alignment. It was possible to construct subfamily specific conservation profiles using the new algorithm for subfamilies with as few as five members. The speed of the new algorithm was comparable to the multiple alignment approach.</p> <p>Conclusion</p> <p>Subfamily specific conservation profiles can be generated by the new algorithm without aprioi knowledge of family relationships or domain architecture. This is useful when the subfamily contains multiple domains with different levels of representation in protein databases. It may also be applicable when the subfamily sample size is too small for the multiple alignment approach.</p

    Importance of data structure in comparing two dimension reduction methods for classification of microarray gene expression data

    Get PDF
    BACKGROUND: With the advance of microarray technology, several methods for gene classification and prognosis have been already designed. However, under various denominations, some of these methods have similar approaches. This study evaluates the influence of gene expression variance structure on the performance of methods that describe the relationship between gene expression levels and a given phenotype through projection of data onto discriminant axes. RESULTS: We compared Between-Group Analysis and Discriminant Analysis (with prior dimension reduction through Partial Least Squares or Principal Components Analysis). A geometric approach showed that these two methods are strongly related, but differ in the way they handle data structure. Yet, data structure helps understanding the predictive efficiency of these methods. Three main structure situations may be identified. When the clusters of points are clearly split, both methods perform equally well. When the clusters superpose, both methods fail to give interesting predictions. In intermediate situations, the configuration of the clusters of points has to be handled by the projection to improve prediction. For this, we recommend Discriminant Analysis. Besides, an innovative way of simulation generated the three main structures by modelling different partitions of the whole variance into within-group and between-group variances. These simulated datasets were used in complement to some well-known public datasets to investigate the methods behaviour in a large diversity of structure situations. To examine the structure of a dataset before analysis and preselect an a priori appropriate method for its analysis, we proposed a two-graph preliminary visualization tool: plotting patients on the Between-Group Analysis discriminant axis (x-axis) and on the first and the second within-group Principal Components Analysis component (y-axis), respectively. CONCLUSION: Discriminant Analysis outperformed Between-Group Analysis because it allows for the dataset structure. An a priori knowledge of that structure may guide the choice of the analysis method. Simulated datasets with known properties are valuable to assess and compare the performance of analysis methods, then implementation on real datasets checks and validates the results. Thus, we warn against the use of unchallenging datasets for method comparison, such as the Golub dataset, because their structure is such that any method would be efficient
    corecore