List of publications

5 results found

Search by title or abstract

Search by author

Select year

Filter by type

 
2016 Articolo in rivista metadata only access

Natural vs. random protein sequences: Discovering combinatorics properties on amino acid words

In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1047 natural protein sequences and 10,470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natural proteins. We analyze the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones. (C) 2015 Elsevier Ltd. All rights reserved. Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambiguous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid constraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence indistinguishable from a random one.

Protein sequence Random sequence Combinatorics of words Amino acid association
2015 Articolo in rivista metadata only access

Natural vs. Random Protein Sequences: Discovering Combinatorics Properties on Amino Acid Words

Casual mutations and natural selection have driven the evolution of protein amino acid sequences that we observe at present in nature. The question about which is the dominant force of proteins evolution is still lacking of an unambigu- ous answer. Casual mutations tend to randomize protein sequences while, in order to have the correct functionality, one expects that selection mechanisms impose rigid contraints on amino acid sequences. Moreover, one also has to consider that the space of all possible amino acid sequences is so astonishingly large that it could be reasonable to have a well tuned amino acid sequence in- distinguishable from a random one. In order to study the possibility to discriminate between random and natural amino acid sequences, we introduce different measures of association between pairs of amino acids in a sequence, and apply them to a dataset of 1, 047 nat- ural protein sequences and 10, 470 random sequences, carefully generated in order to preserve the relative length and amino acid distribution of the natu- ral proteins. We analize the multidimensional measures with machine learning techniques and show that, to a reasonable extent, natural protein sequences can be differentiated from random ones

Protein sequence Random sequence Combinatorics of words Amino acid association
2014 Articolo in rivista metadata only access

Mathematical Desk for Italian Industry: An Applied and Industrial Mathematics Project

In this paper we introduce the Mathematical Desk for Italian Industry, a project based on applied and industrial mathematics developed by a team of researchers from the Italian National Research Council in collaboration with two major Italian associations for applied mathematics, SIMAI and AIRO. The scope of this paper is to clarify the motivations for this project and to present an overview on the activities, context and organization of the Mathematical Desk, whose mission is to build a concrete bridge of common interests between the Italian scientific community of applied mathematics and the world of the Italian enterprises. Some final considerations on the strategy for the future development of the Mathematical Desk project complete the paper.

2006 Articolo in rivista metadata only access

Mining Relevant Information on the Web: A Clique Based Approach

The role of information management and retrieval in production processes has been gaining in importance in recent years. In this context, the ability to search for and quickly find the small piece of information needed from the huge amount of information available has crucial importance. One category of tools devoted to such a task is represented by search engines. Satisfying the basic needs of the Web user has led to the research of new tools that aim at helping more sophisticated users (communities, companies, interest groups) with more elaborate methods. An example is the use of clustering and classification algorithms or other specific data mining techniques. In such a context, the proper use of a thematic search engine is a crucial tool in supporting and orienting many activities. Several prac- tical and theoretical problems arise in developing such tools, and we try to face some of these in this paper, extending previous work on Web mining. Here we consider two related problems: how to select an appropriate set of keywords for a thematic engine taking into account the semantic and linguistic extensions of the search context, and how to select and rank a subset of relevant pages given a set of search keywords. Both problems are solved using the same framework, based on a graph representation of the available information and on the search of particular node subsets of such a graph. Such subsets are effectively identified by a maximum-weight clique algorithm customized ad hoc for specific problems. The methods have been developed in the framework of a funded research project for the development of new Web search tools, they have been tested on real data, and are currently being implemented in a prototypal thematic search engine. The Web mining method presented in this paper can be applied to Web-based design and manufacturing.

Data mining Maximum clique
2004 Articolo in rivista metadata only access

Improving Search Results with Data Mining in a Thematic Search Engine

The problem of obtaining relevant results in web searching has been tackled with several approaches. Although very e0ective techniques are currently used by the most popular search engines when no a priori knowledge on the user's desires beside the search keywords is available, in di0erent settings it is conceivable to design search methods that operate on a thematic database of web pages that refer to a common body of knowledge or to speci3c sets of users. We have considered such premises to design and develop a search method that deploys data mining and optimization techniques to provide a more signi3cant and restricted set of pages as the 3nal result of a user search. We adopt a vectorization method based on search context and user pro&le to apply clustering techniques that are then re3ned by a specially designed genetic algorithm. In this paper we describe the method, its implementation, the algorithms applied, and discuss some experiments that has been run on test sets of web pages.

Search engines; Web mining; Clustering; Genetic algorithms