Sini Junttila
BIOCOMP, Bioinformatics & Scientific Computing, VBCF - Vienna Biocenter Core Facilities GmbH, Austria
Current next generation sequencing (NGS) methods enable the sequencing of large numbers of samples from diverse sources. A comprehensive data analysis workflow is essential for efficient primary analysis of large data sets, and it facilitates the downstream analysis, where for example potential lead molecules for drug discovery can be identified.
We have implemented a streamlined RNA-sequencing data analysis workflow for differential expression studies. It includes data preprocessing, quality control, statistical testing for differential expression and functional enrichment analysis. By detecting shared differentially expressed genes in various preclinical samples, genes which contribute to the studied condition can be identified. The functional enrichment analysis assigns functional annotations to the identified differentially expressed genes utilizing various public annotation databases and can also give clues as to the gene’s biological mechanism of action. Identifying genes whose expression levels are linked to the studied condition can help pinpointing the potential lead molecules suitable for further downstream analyses. It is also important that all of the tools and methods used in the analysis are easily traceable and that different data sets are analyzed similarly if the results are compared to one another.