CNR IAC - Products

2016 Articolo in rivista metadata only access

Advantages and limits in the adoption of reproducible research and R-tools for the analysis of omic data

Russo Francesco ; Righelli Dario ; Righelli Dario ; Angelini Claudia

Reproducible (computational) Research is crucial to produce transparent and high quality scientific papers. First, we illustrate the benefits that scientific community can receive from the adoption of Reproducible Research standards in the analysis of high-throughput omic data. Then, we describe several tools useful to researchers to increase the reproducibility of their works. Moreover, we face the advantages and limits of reproducible research and how they could be addressed and solved. Overall, this paper should be considered as a proof of concept on how and what characteristic - in our opinion - should be considered to conduct a study in the spirit of Reproducible Research. Therefore, the scope of this paper is two-fold. The first goal consists in presenting and discussing some easy-to-use instruments for data analysts to promote reproducible research in their analyses. The second aim is to encourage developers to incorporate automatic reproducibility features in their tools.

Big-data R Reproducible research

CNR IRIS

2016 Prefazione/Postfazione metadata only access

Preface

Angelini Claudia ; Rancoita Paola Maria Vittoria ; Rovetta Stefano

Prefazione al volume CIBB2015 edito da springer

CIBB2015 Selection of papers

CNR IRIS

2016 Articolo in rivista restricted access

Overproduction of indole-3-acetic acid in free-living rhizobia induces transcriptional changes resembling those occurring in nodule bacteroids.

Roberto Defez ; Roberta Esposito ; Claudia Angelini ; Carmelina Bianco

Free-living bacteria grown under aerobic conditions were used to investigate, by next-generation RNA sequencing analysis, the transcriptional profiles of Sinorhizobium meliloti wild-type 1021 and its derivative, RD64, overproducing the main auxin indole-3-acetic acid (IAA). Among the upregulated genes in RD64 cells, we detected the main nitrogen-fixation regulator fixJ, the two intermediate regulators fixK and nifA, and several other genes known to be FixJ targets. The gene coding for the sigma factor RpoH1 and other genes involved in stress response, regulated in a RpoH1-dependent manner in S. meliloti, were also induced in RD64 cells. Under microaerobic condition, quantitative real-time polymerase chain reaction analysis revealed that the genes fixJL and nifA were up-regulated in RD64 cells as compared with 1021 cells. This work provided evidence that the overexpression of IAA in S. meliloti free-living cells induced many of the transcriptional changes that normally occur in nitrogen-fixing root nodule.

RNA-seq

CNR IRIS

2016 Contributo in Atti di convegno metadata only access

An introduction to next generation sequencing for studying omic-environment interactions.

Angelini Claudia

questa presentazione rivisiteremo il concetto di interazione geneambiente alla luce delle conoscenze scientifiche emergenti e dell'utilizzo delle tecnologie ad alta risoluzione, quali i moderni sequenziatori. In particolare, illustreremo l'importanza della stima di tale interazione per la comprensione delle patologie complesse. Quindi, forniremo una panoramica dei metodi esistenti per l'analisi di dati di NGS con particolare enfasi riguardo l'individuazione di varianti genomiche ed l'analisi dell'epigenomica e della trascrittomica. Infine, discuteremo come i dati multiomici possono essere utilizzati per migliorare in nostro grado di comprensione delle patologie e fornire nuove prospettive di ricerca In this talk, first, we review the concept of gene-environmental interaction on the light of emerging results and the use of modern high-throughput technologies; we illustrate its impact on the understanding of complex human diseases. Then, we provide an overview of the methods available to process NGS data with particular emphasis to the detection of genomic variants, the analysis of epigenomic and transcriptional data produced by modern sequencers. Finally, we discuss how multiomic data can be used to improve our way of studying complex diseases and can provide novel research perspectives.

NGS Gwas multi-omic data analysis gene-environment interaction

CNR IRIS

2016 Contributo in Atti di convegno metadata only access

GeenaR: a flexible approach to pre-process, analyse and compare MALDI-ToF mass spectra

E Del Prete ; A Facchiano ; A Profumo ; C Angelini ; P Romano

Mass spectrometry is a set of technologies with many applications in characterizing biological samples. Due to the huge quantity of data, often biased and contaminated by different source of errors, and the amount of results that is possible to extract, an easy-to-learn and complete workflow is essential. GeenaR is a robust web tool for pre-processing, analysing, visualizing and comparing a set of MALDI-ToF mass spectra. It combines PHP, Perl and R languages and allows different levels of control over the parameters, in order to adapt the work to the needs and expertise of the users.

Mass Spectrometry Proteomics Statistical Analysis Web tool

CNR IRIS

2015 Editoriale, Commentario, Contributo a Forum in rivista metadata only access

Preface: BITS2014, the Annual Meeting of the Italian Society of Bioinformatics

Facchiano A ; Angelini C ; Bosotti R ; Guffanti A ; Marabotti A ; Marangoni R ; Pascarella S ; Romano P ; Zanzoni A ; HelmerCitterich ; M

bioinformatics

CNR IRIS

2015 Abstract in Atti di convegno metadata only access

Combining pathway identification and survival prediction via screening-network analysis

Iuliano A ; Occhipinti A ; Angelini C ; De Feis I ; Lio' P

Motivation Gene expression data from high-throughput assays, such as microarray, are often used to predict cancer survival. However, available datasets consist of a small number of samples (n patients) and a large number of gene expression data (p predictors). Therefore, the main challenge is to cope with the high-dimensionality, i.e. p>>n, and a novel appealing approach is to use screening procedures to reduce the size of the feature space to a moderate scale (Wu & Yin 2015, Song et al. 2014, He et al. 2013). In addition, genes are often co-regulated and their expression levels are expected to be highly correlated. Genes that are involved in the same biological process are grouped in pathway structures. In order to incorporate the pathway information of genes, network-based methods have been applied (Zhang et al. 2013, Sun et al. 2013). Motivated by the most recent models based on variable screening techniques and integration of pathway information into penalized Cox methods, we propose a new procedure to obtain more accurate predictions. First, we identify the high-risk genes by using variable screening techniques and then, we perform Cox regression analysis integrating network information associated with the selected high-risk genes. By combining these two approaches, we present a new method to select important core pathways and genes that are related to the survival outcome and we show the benefits of our proposal both in simulation and real studies. Methods In our study, we combine variable screening techniques and network methods to identify genes and pathways highly associated with the disease and to better predict patient risk. We propose a new method for survival analysis based on the following steps. First, (i) we perform variable screening, such as the sure independence screening (Fan et al. 2008) and its advancement (Gorst-Rasmussen & Scheike 2013, Zhao & Li 2012, Fan et al. 2010) to select the active set of variables strongly correlated with the survival response, and then (ii) we apply network-based Cox regression models, such as Net-Cox and AdaLnet, which use a network based on the number of selected signature genes to predict survival probability. In order to build our apriori network information, we use the human gene functional linkage approach (Huttenhower et al. 2009). Such network contains maps of functional activity and interaction networks in over 200 areas of human cellular biology with information from 30.000 genome-scale experiments. The functional linkage network summarizes information from a variety of biologically informative perspectives: prediction of protein function and functional modules, cross-talk among biological processes, and association of novel genes and pathways with known genetic disorders. In particular, our gene network is built by using the HEFalMp tool to determine the edge's weight w between two nodes (i.e. genes). The resulting network consists of a fixed number of unique genes (about 2000 genes), where w describes how strong is the relation between two genes and it takes values in [0,1]. Hence, while the screening methods recruit the features with the best marginal utility to reduce the dimensionality of the data, the network incorporates the pathway information used as a prior knowledge network into the survival analysis. Results We combine variable screening procedures and network-penalized Cox models for high-dimensional survival data aimed at determining pathway structures and biomarkers involved in cancer progression. By using this approach, it is possible to obtain a deeper insight of the gene-regulatory networks and investigate the gene signatures related to the cancer survival time in order to understand how patient features (molecular and clinical information) can influence cancer treatment and detection. In particular, we show the results obtained in simulation and real cancer studies, along with screening rules. The simulated data are aimed to illustrate two different biological scenarios. In the first setting, we examine the situation where all genes within the same module belong to different groups or pathways. In the second one, the pathways are not independent among them (as in genomic studies), but the activation of some groups is conditional from other pathways. We use specificity, sensitivity and Matthews Correlation Coefficient to compare the prediction performance. We also predict patient survival using molecular data of different cancer types, such as ovarian and breast cancer. We investigate the set of the active signature genes and the corresponding pathways involved in the cancer disease process. Then, using the biological network, as prior information network, we perform network-based Cox model including Kaplan- Meier curve and log-rank test. Overall this study shows that the new screening-network analysis is useful for improving

Survival analysis Statististical Screening Cox-Regression Networks

CNR IRIS

2015 Abstract in Atti di convegno metadata only access

Cripto is essential to capture mouse EpiSC and human ESC pluripotency

A Fiorenzano ; E Pascale ; C D'Aniello ; F Russo ; M Biffoni ; F Francescangeli ; A Zeuner ; C Angelini ; EJ Patriarca ; Chazaud C ; A Fico ; G Minchiotti

stem cells cripto

CNR IRIS

2015 Articolo in rivista metadata only access

Short interspersed DNA elements and miRNAs: a novel hidden gene regulation layer in zebrafish?

Scarpato M ; Angelini C ; Cocca E ; Pallotta MM ; Morescalchi MA ; Capriglione T

In this study, we investigated by in silico analysis the possible correlation between microRNAs (miRNAs) and Anamnia V-SINEs (a superfamily of short interspersed nuclear elements), which belong to those retroposon families that have been preserved in vertebrate genomes for millions of years and are actively transcribed because they are embedded in the 3? untranslated region (UTR) of several genes. We report the results of the analysis of the genomic distribution of these mobile elements in zebrafish (Danio rerio) and discuss their involvement in generating miRNA gene loci. The computational study showed that the genes predicted to bear V-SINEs can be targeted by miRNAs with a very high hybridization E-value. Gene ontology analysis indicates that these genes are mainly involved in metabolic, membrane, and cytoplasmic signaling pathways. Nearly all the miRNAs that were predicted to target the V-SINEs of these genes, i.e., miR-338, miR-9, miR-181, miR-724, miR-735, and miR-204, have been validated in similar regulatory roles in mammals. The large number of genes bearing a V-SINE involved in metabolic and cellular processes suggests that V-SINEs may play a role in modulating cell responses to different stimuli and in preserving the metabolic balance during cell proliferation and differentiation. Although they need experimental validation, these preliminary results suggest that in the genome of D. rerio, as in other TE families in vertebrates, the preservation of V-SINE retroposons may also have been favored by their putative role in gene network modulation.

3?UTR miRNA Retrotransposons SINEs

CNR IRIS

2015 Articolo in rivista metadata only access

Is this the right normalization? A diagnostic tool for ChIP-seq normalization

Angelini C ; Heller R ; Volkinshtein R ; Yekutieli D

Background: Chip-seq experiments are becoming a standard approach for genome-wide profiling protein-DNA interactions, such as detecting transcription factor binding sites, histone modification marks and RNA Polymerase II occupancy. However, when comparing a ChIP sample versus a control sample, such as Input DNA, normalization procedures have to be applied in order to remove experimental source of biases. Despite the substantial impact that the choice of the normalization method can have on the results of a ChIP-seq data analysis, their assessment is not fully explored in the literature. In particular, there are no diagnostic tools that show whether the applied normalization is indeed appropriate for the data being analyzed. Results: In this work we propose a novel diagnostic tool to examine the appropriateness of the estimated normalization procedure. By plotting the empirical densities of log relative risks in bins of equal read count, along with the estimated normalization constant, after logarithmic transformation, the researcher is able to assess the appropriateness of the estimated normalization constant. We use the diagnostic plot to evaluate the appropriateness of the estimates obtained by CisGenome, NCIS and CCAT on several real data examples. Moreover, we show the impact that the choice of the normalization constant can have on standard tools for peak calling such as MACS or SICER. Finally, we propose a novel procedure for controlling the FDR using sample swapping. This procedure makes use of the estimated normalization constant in order to gain power over the naive choice of constant (used in MACS and SICER), which is the ratio of the total number of reads in the ChIP and Input samples. Conclusions: Linear normalization approaches aim to estimate a scale factor, r, to adjust for different sequencing depths when comparing ChIP versus Input samples. The estimated scaling factor can easily be incorporated in many peak caller algorithms to improve the accuracy of the peak identification. The diagnostic plot proposed in this paper can be used to assess how adequate ChIP/Input normalization constants are, and thus it allows the user to choose the most adequate estimate for the analysis.

Chip-Seq Diagnostic plots Normalization

CNR IRIS

2015 Articolo in rivista metadata only access

Applications of network-based survival analysis methods for pathway detection in cancer

A Iuliano ; A Occhipinti ; C Angelini ; I De Feis ; PLiò

Gene expression data from high-throughput assays, such as microarray, are often used to predict cancer survival. Available datasets consist of a small number of samples (n patients) and a large number of genes (p predictors). Therefore, the main challenge is to cope with the high-dimensionality. Moreover, genes are co-regulated and their expression levels are expected to be highly correlated. In order to face these two issues, network based approaches can be applied. In our analysis, we compared the most recent network penalized Cox models for highdimensional survival data aimed to determine pathway structures and biomarkers involved into cancer progression. Using these network-based models, we show how to obtain a deeper understanding of the gene-regulatory networks and investigate the gene signatures related to prognosis and survival in different types of tumors. Comparisons are carried out on three real different cancer datasets.

Survival Analysis microarray cancer

CNR IRIS

2015 Articolo in rivista metadata only access

ZFP57 recognizes multiple and closely spaced sequence motif variants to maintain repressive epigenetic marks in mouse embryonic stem cells.

Anvar Z ; Cammisa M ; Riso V ; Baglivo I ; Kukreja H ; Sparago A ; Girardot M ; Lad S ; De Feis I ; Cerrato F ; Angelini C ; Feil R ; Pedone PV ; Grimaldi G ; Riccio A

Imprinting Control Regions (ICRs) need to maintain their parental allele-specific DNA methylation during early embryogenesis despite genome-wide demethylation and subsequent de novo methylation. ZFP57 and KAP1 are both required for maintaining the repressive DNA methylation and H3-lysine-9-trimethylation (H3K9me3) at ICRs. In vitro, ZFP57 binds a specific hexanucleotide motif that is enriched at its genomic binding sites. We now demonstrate in mouse embryonic stem cells (ESCs) that SNPs disrupting closely-spaced hexanucleotide motifs are associated with lack of ZFP57 binding and H3K9me3 enrichment. Through a transgenic approach in mouse ESCs, we further demonstrate that an ICR fragment containing three ZFP57 motif sequences recapitulates the original methylated or unmethylated status when integrated into the genome at an ectopic position. Mutation of Zfp57 or the hexanucleotide motifs led to loss of ZFP57 binding and DNA methylation of the transgene. Finally, we identified a sequence variant of the hexanucleotide motif that interacts with ZFP57 both in vivo and in vitro. The presence of multiple and closely located copies of ZFP57 motif variants emerges as a distinct characteristic that is required for the faithful maintenance of repressive epigenetic marks at ICRs and other ZFP57 binding sites.

Imprinting ChIP-seq

CNR IRIS

2015 Contributo in volume (Capitolo o Saggio) metadata only access

A walking tour in Reproducible Research and Big Data Management with RNASeqGUI and R.

F Russo ; D Righelli ; C Angelini

In this paper, we discuss the concept of Reproducible Research and its importance to produce transparent and high quality scientific papers. In particular, we illustrate the advantages that both paper authors and readers can receive from the adoption of Reproducible Research and we discuss a strategy to develop computational tools supporting such a feature. We present a novel version of RNASeqGUI, a user friendly computational tool capable to handle and analyse RNA-Seq data. This tool exploits Reproducible Research feature to produce RNA-Seq analyses easy to read, inspect, understand, study, reproduce and modify. Overall, this paper is a proof of concept on how it is possible to develop complex and interactive tools in the spirit of Reproducible Research.

Rna-Seq Reproducible Research R

CNR IRIS

2014 Articolo in rivista open access

Computational approaches for isoform detection and estimation: good and bad news

Angelini Claudia ; De Canditiis Daniela ; De Feis Italia

Results: We carried out a simulation study to assess the performance of 5 widely used tools, such as: CEM, Cufflinks, iReckon, RSEM, and SLIDE. All of them have been used with default parameters. In particular, we considered the effect of the following three different scenarios: the availability of complete annotation, incomplete annotation, and no annotation at all. Moreover, comparisons were carried out using the methods in three different modes of action. In the first mode, the methods were forced to only deal with those isoforms that are present in the annotation; in the second mode, they were allowed to detect novel isoforms using the annotation as guide; in the third mode, they were operating in fully data driven way (although with the support of the alignment on the reference genome). In the latter modality, precision and recall are quite poor. On the contrary, results are better with the support of the annotation, even though it is not complete. Finally, abundance estimation error often shows a very skewed distribution. The performance strongly depends on the true real abundance of the isoforms. Lowly (and sometimes also moderately) expressed isoforms are poorly detected and estimated. In particular, lowly expressed isoforms are identified mainly if they are provided in the original annotation as potential isoforms. Background: The main goal of the whole transcriptome analysis is to correctly identify all expressed transcripts within a specific cell/tissue- at a particular stage and condition - to determine their structures and to measure their abundances. RNA-seq data promise to allow identification and quantification of transcriptome at unprecedented level of resolution, accuracy and low cost. Several computational methods have been proposed to achieve such purposes. However, it is still not clear which promises are already met and which challenges are still open and require further methodological developments.

Rna seq, simulation

CNR IRIS

2014 Articolo in rivista metadata only access

AnaLysis of Expression on human chromosome 21, ALE-HSA21: a pilot integrated web resource

Scarpato Margherita ; Esposito Roberta ; Evangelista Daniela ; Aprile Marianna ; Ambrosio Maria Rosaria ; Angelini Claudia ; Ciccodicola Alfredo ; Costa Valerio

Transcriptome studies have shown the pervasive nature of transcription, demonstrating almost all the genes undergo alternative splicing. Accurately annotating all transcripts of a gene is crucial. It is needed to understand the impact of mutations on phenotypes, to shed light on genetic and epigenetic regulation of mRNAs and more generally to widen our knowledge about cell functionality and tissue diversity. RNA-sequencing (RNA-Seq), and the other applications of the next-generation sequencing, provides precious data to improve annotations' accuracy, simultaneously creating issues related to the variety, complexity and the size of produced data. In this 'scenario', the lack of user-friendly resources, easily accessible to researchers with low skills in bioinformatics, makes difficult to retrieve complete information about one or few genes without browsing a jungle of databases. Concordantly, the increasing amount of data from 'omics' technologies imposes to develop integrated databases merging different data formats coming from distinct but complementary sources. In light of these considerations, and given the wide interest in studying Down syndrome-a genetic condition due to the trisomy of human chromosome 21 (HSA21)-we developed an integrated relational database and a web interface, named ALE-HSA21 (AnaLysis of Expression on HSA21), accessible at http://bioinfo.na.iac.cnr.it/ALE-HSA21. This comprehensive and user-friendly web resource integrates-for all coding and noncoding transcripts of chromosome 21-existing gene annotations and transcripts identified de novo through RNA-Seq analysis with predictive computational analysis of regulatory sequences. Given the role of noncoding RNAs and untranslated regions of coding genes in key regulatory mechanisms, ALE-HSA21 is also an interesting web-based platform to investigate such processes. The 'transcript-centric' and easily-accessible nature of ALE-HSA21 makes this resource a valuable tool to rapidly retrieve data at the isoform level, rather than at gene level, useful to investigate any disease, molecular pathway or cell process involving chromosome 21 genes.

CNR IRIS

2014 Articolo in rivista metadata only access

RNASeqGUI: a GUI for analysing RNA-Seq data

Russo Francesco ; Angelini Claudia

Summary: We present RNASeqGUI R package, a graphical user interface (GUI) for the identification of differentially expressed genes across multiple biological conditions. This R package includes some wellk-nown RNA-Seq tools, available at www.bioconductor.org. RNASeqGUI package is not just a collection of some known methods and functions, but it is designed to guide the user during the entire analysis process. RNASeqGUI package is mainly addressed to those users who have little experience with command-line software. Therefore, thanks to RNASeqGUI, they can conduct analogous analyses using this simple graphical interface. Moreover, RNASeqGUI is also helpful for those who are expert R-users because it speeds up the usage of the included RNASeq methods drastically.

CNR IRIS

2014 Contributo in Atti di convegno metadata only access

Network-based survival analysis methods for pathway detection in cancer.

Antonella Iuliano ; Annalisa Occhipinti ; Haouming ; Claudia Angelini ; Italia De Feis ; Pietro Liò

CNR IRIS

2014 Abstract in rivista metadata only access

Analysing RNA-Seq data with RNASeqGUI

Claudia Angelini ; Francesco Russo

RNA-SEQ GUI R programming Reproducible Research

CNR IRIS

2014 Articolo in rivista metadata only access

Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems.

Angelini Claudia ; Costa Valerio

The availability of omic data produced from international consortia, as well as from worldwide laboratories, is offering the possibility both to answer long-standing questions in biomedicine/molecular biology and to formulate novel hypotheses to test. However, the impact of such data is not fully exploited due to a limited availability of multi-omic data integration tools and methods. In this paper, we discuss the interplay between gene expression and epigenetic markers/transcription factors. We show how integrating ChIP-seq and RNA-seq data can help to elucidate gene regulatory mechanisms. In particular, we discuss the two following questions: (i) Can transcription factor occupancies or histone modification data predict gene expression? (ii) Can ChIP-seq and RNA-seq data be used to infer gene regulatory networks? We propose potential directions for statistical data integration. We discuss the importance of incorporating underestimated aspects (such as alternative splicing and long-range chromatin interactions). We also highlight the lack of data benchmarks and the need to develop tools for data integration from a statistical viewpoint, designed in the spirit of reproducible research.

CNR IRIS

2014 Abstract in Atti di convegno metadata only access

Pathways identification in cancer survival analysis by network-based Cox models

A Iuliano ; A Occhipinti ; C Angelini ; I De Feis ; P Liò

Gene expression data from high-throughput assays, such as microarray, are often used to predict cancer survival. However, available datasets consist of a small number of samples (n patients) and a large number of gene expression data (p predictors). Therefore, the main challenge is to cope with the high-dimensionality. Moreover, genes are co-regulated and their expression levels are expected to be highly correlated. In order to face these two issues, network based approaches have been proposed. In our analysis, we compare four network penalized Cox models for high-dimensional survival data aimed to determine pathway structures and biomarkers involved in cancer progression. Using these network-based models, it is possible to obtain a deeper understanding of the gene-regulatory networks and investigate the gene signatures related to the cancer survival time. We evaluate cancer survival prediction to illustrate the benefits and drawbacks of the network techniques and to understand how patient features (i.e. age, gender and coexisting diseases-comorbidity) can influence cancer treatment, detection and outcome. In particular, we show results obtained in simulation and real cancer datasets using the Functional Linkage network, as network prior information.

cox regression high dimensional penalization

CNR IRIS

List of publications

Search by title or abstract

Search by author

Select year

Filter by type

Advantages and limits in the adoption of reproducible research and R-tools for the analysis of omic data

Preface

Overproduction of indole-3-acetic acid in free-living rhizobia induces transcriptional changes resembling those occurring in nodule bacteroids.

An introduction to next generation sequencing for studying omic-environment interactions.

GeenaR: a flexible approach to pre-process, analyse and compare MALDI-ToF mass spectra

Preface: BITS2014, the Annual Meeting of the Italian Society of Bioinformatics

Combining pathway identification and survival prediction via screening-network analysis

Cripto is essential to capture mouse EpiSC and human ESC pluripotency

Short interspersed DNA elements and miRNAs: a novel hidden gene regulation layer in zebrafish?

Is this the right normalization? A diagnostic tool for ChIP-seq normalization

Applications of network-based survival analysis methods for pathway detection in cancer

ZFP57 recognizes multiple and closely spaced sequence motif variants to maintain repressive epigenetic marks in mouse embryonic stem cells.

A walking tour in Reproducible Research and Big Data Management with RNASeqGUI and R.

Computational approaches for isoform detection and estimation: good and bad news

AnaLysis of Expression on human chromosome 21, ALE-HSA21: a pilot integrated web resource

RNASeqGUI: a GUI for analysing RNA-Seq data

Network-based survival analysis methods for pathway detection in cancer.

Analysing RNA-Seq data with RNASeqGUI

Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems.

Pathways identification in cancer survival analysis by network-based Cox models