2010Presentazione / Comunicazione non pubblicata (convegno, evento, webinar...)metadata only access
Intercomparison of two different statistical approaches to the initialization of the physical inversion of IASI radiances for temperature, water vapour and ozone
In recent years, the introduction of massively parallel sequencing platforms for Next Generation Sequencing (NGS) protocols, able to simultaneously sequence hundred thousand DNA fragments, dramatically changed the landscape of the genetics studies. RNA-Seq for transcriptome studies, Chip-Seq for DNA-proteins interaction, CNV-Seq for large genome nucleotide variations are only some of the intriguing new applications supported by these innovative platforms. Among them RNA-Seq is perhaps the most complex NGS application. Expression levels of specific genes, differential splicing, allele-specific expression of transcripts can be accurately determined by RNA-Seq experiments to address many biological-related issues. All these attributes are not readily achievable from previously widespread hybridization-based or tag sequence-based approaches. However, the unprecedented level of sensitivity and the large amount of available data produced by NGS platforms provide clear advantages as well as new challenges and issues. This technology brings the great power to make several new biological observations and discoveries, it also requires a considerable effort in the development of new bioinformatics tools to deal with these massive data files. The paper aims to give a survey of the RNA-Seq methodology, particularly focusing on the challenges that this application presents both from a biological and a bioinformatics point of view.
2010Rapporto di ricerca / Relazione scientificametadata only access
Consolidation of scientific baseline for the development of a MTG-IRS L2 processor: role of Optimal Estimation with background state and associated error from climatology
Here we discuss the biological high-throughput data dilemma: how to integrate replicated experiments and nearby species data? Should we consider each species as a monadic source of data when replicated experiments are available or, viceversa, should we try to collect information from the large number of nearby species analyzed in the different laboratories? In this paper we make and justify the observation that experimental replicates and phylogenetic data may be combined to strength the evidences on identifying transcriptional motifs and identify networks, which seems to be quite difficult using other currently used methods. In particular we discuss the use of phylogenetic inference and the potentiality of the Bayesian variable selection procedure in data integration. In order to illustrate the proposed approach we present a case study considering sequences and microarray data from fungi species. We also focus on the interpretation of the results with respect to the problem of experimental and biological noise.
Clustering is one of the most important unsupervised learning problems and it deals with finding a structure in a collection of unlabeled data; however, different clustering algorithms applied to the same data-set produce different solutions. In many applications the problem of multiple solutions becomes crucial and providing a limited group of good clusterings is often more desirable than a single solution. In this work we propose the Least Square Consensus clustering that allows a user to extrapolate a small number of different clustering solutions from an initial (large) set of solutions obtained by applying any clustering algorithm to a given data-set. Two different implementations are presented. In both cases, each consensus is accomplished with a measure of quality defined in terms of Least Square error and a graphical visualization is provided in order to make immediately interpretable the result. Numerical experiments are carried out on both synthetic and real data-sets.
Remote sensing of atmosphere is changing rapidly thanks to the development of high spectral resolution infrared space-borne sensors. The aim is to provide more and more accurate information on the lower atmosphere, as requested by the World Meteorological Organization (WMO), to improve reliability and time span of weather forecasts plus Earth's monitoring. In this paper we show the results we have obtained on a set of Infrared Atmospheric Sounding Interferometer (IASI) observations using a new statistical strategy based on dimension reduction. Retrievals have been compared to time-space colocated ECMWF analysis for temperature, water vapor and ozone.
We consider the problem of testing for additivity in the standard multiple nonparametric regression model. We derive optimal (in the minimax sense) non- adaptive and adaptive hypothesis testing procedures for additivity against the composite nonparametric alternative that the response function involves interactions of second or higher orders separated away from zero in L 2([0, 1] d )-norm and also possesses some smoothness properties. In order to shed some light on the theoretical results obtained, we carry out a wide simulation study to examine the finite sample performance of the proposed hypothesis testing procedures and compare them with a series of other tests for additivity available in the literature.
A retrieval algorithm that uses a statistical strategy based on dimension reduction is proposed. The ethodology and details of the implementation of the new algorithm are presented and discussed. The algorithm has been applied to high resolution spectra measured by the Infrared Atmospheric Sounding Interferometer instrument to retrieve atmospheric profiles of temperature, water vapour and ozone. The performance of the inversion strategy has been assessed by comparing the retrieved profiles to the ones obtained by colocating in space and time profiles from the European Centre for Medium-Range Weather Forecasts analysis.