Research Projects

Early onset preeclampsia in a model for human placental trophoblast

We describe a model for early onset preeclampsia (EOPE) that uses induced pluripotent stem cells (iPSC) generated from umbilical cords of EOPE and control (CTL) pregnancies. These were converted to trophoblast (TB) and tested for parameters thought to be disturbed in EOPE, including invasive potential. Under 5 % O2, CTL-TB and EOPE-TB lines showed no differences, but, under 20 % O2, invasiveness of EOPE-TB was reduced (P = 0.008). RNAseq analysis revealed only two differentially expressed genes (RPS17, FDR = 0.0005; MTRNR2L2, FDR = 0.005) significantly down-regulated in EOPE-TB under 20 % O2. A weighted correlation network analysis revealed two gene modules in CTL-TB significantly correlated with both TB invasion and O2 responsiveness (Module CTL4 and CTL9). Out of those two gene modules, CTL9 was positively correlated with 20 % O2 and negatively correlated with TB invasion and was enriched in ontology terms related to “cell migration”, “angiogenesis”, and “pre-eclampsia”. Two EOPE-TB modules EOPE1 and EOPE2 correlated with O2 conditions, but only weakly with invasion; they largely contains the same sets of genes present in CTL modules CTL4 and CTL9. Our experiments suggest that in EOPE, the initial step precipitating disease is a reduced capacity of placental TB to invade, precipitated by a dysregulation of O2 response mechanisms.

TissueEnrich: A tool to calculate tissue-specific gene enrichment

Analysis of RNA-Sequencing data results in lists of genes that may have similar function, based on differential gene expression analysis or co-expression network analysis. Multiple tools have been developed that use gene ontologies to identify biological processes that are enriched in the genes sets. While these tools provide insights into the biological processes, there is no information about the tissue specificity of the genes, which is important when studying human disease. Therefore, we developed TissueEnrich, a tool that calculates tissue-specific gene enrichment in an input gene set. We demonstrated that TissueEnrich is very robust in identifying the lineage of single cell clusters and differentiated embryonic stem cells. TissueEnrich is available as a user-friendly and interactive web application as well as an R package allowing additional flexibility in usage.

Deciphering transcriptional regulation in human embryonic stem cells specified towards a trophoblast fate

Differentiated human embryonic stem cells (hESC) continue to provide a model for studying early trophoblast cells (TB), but many questions have been raised regarding their true identity. Therefore, we carried out a global and unbiased analysis on previously published transcriptomic profiles for hESC differentiated to TB by means of bone morphogenetic protein-4 and inhibitors of activin A and fibroblast growth factor-2 signaling (BAP treatment). Our results confirm that BAP treated hESC (ESCd) lack a mesoderm signature and are a subtype of placental cells unlike those present at term. ESCd display a high level of expression of genes implicated in migration and invasion compared to commonly used, immortalized TB cell lines and primary cells from term placenta. Co-expression network analysis also identified gene modules involved in cell migration and adhesion, processes that are likely critical during the beginning stages of placentation. Finally, protein-protein interaction analysis predicted several additional genes that may play important roles in early stages of placental development. Together, our analyses provide novel insights into the transcriptional programs that are active in ESCd.

Placenta-SeqDB: Curation and annotation of publicly available placenta sequencing dataset

During the last few years, placental development research has started gaining a lot of attention from the research community. Improvement in NGS techniques has led to an increase in the number of sequencing samples submitted to public databases. GEO is one of the largest data repositories used for storing publically available experimental datasets. However, there are no strict guidelines for submitting the metadata information related to the datasets. There are many discrepancies in the metadata annotation including cell/tissue type, experiment type, and gestational age which make it very difficult to search for specific datasets. Arrayexpress is another such data repository which has a good search engine but it lacks well-annotated placenta-specific terms and features. Because of these issues, searching and retrieving data from these databases poses a great challenge for public data reuse. There are few other databases which have good annotations but don’t have placental datasets. To address this problem, we built a data repository by combining manually curated sequencing datasets including RNA-Seq, ChIP-Seq, and DNAse-Seq from different data repositories including GEO, Arrayexpress, ENA, and CistromeMap. The datasets are linked to the dataset specific genes and diseases through manual curation from research papers. We also developed a new controlled vocabulary based on the curated sample metadata including cell/tissue type, sequencing platform, associated genes, disease, and developmental time points. This resource will help the researchers to search placenta datasets in a quick and efficient manner.

SPINNER

SPINNER stands for Seeded Protein Interaction Network Neighborhood Expansion and Ranking Algorithm tool. This is an automated tool, which expands the initial seed proteins related to a particular disease, by using the network neighborhood expansion and create protein interaction network for protein network analysis. It expands the protein network by using the Protein-Protein Interaction (PPI) data from the databases like STRING and HAPPI. It also uses a heuristic scoring function for calculating the initial rank score and modified page rank algorithm for calculating the iterative rank score of the proteins in the protein network. This tool also calculates the effect on the network, after the removal of protein from the sub-network. This is called as protein perturbation. As we know, there are certain hub nodes in the protein networks, which play a key role in the activation of the disease. This feature will help us to identify those key hubs which would help in identifying potential drug targets. Using this tool I carried out the analysis of the Amyotrophic lateral sclerosis (ALS) and Breast Cancer.

GEMINE (Gene Expression Mutation Interaction Neighborhood Exploration)

Developed the database which stores the data regarding the significance of genes in cancers based on their gene expression and mutations. The raw data has been taken from COSMIC and TCGA and the differentially expressed genes have been filtered and prioritized on the basis of their SPINNER rank. We also calculate the similarity between the genes on the basis of the gene expression data from TCGA and mutation data from COSMIC. We also developed the web application, which is used to browse through the database using PHP and javascript.

Smart Health

Smart Health is an Electronic Medical Record (EMR) Project which was developed in collaboration with Wenzhou Medical Centre, China. The project involves efficient storage of health examination data that has been provided by Wenzhou Medical Centre. We designed and developed the data model and stored the health examination data in an Oracle database. In this process, we developed automated tools to carry out preprocessing and normalization of the data. We also developed a scoring scheme for scoring individuals on the basis of their test results. The scoring scheme can be used to develop prediction models which can be used to predict the potential disease threat on the basis of the current health conditions of an individual.