Softwares

AAOCA: Webtool for Anomalous Aortic Origin of a Coronary Artery Reporter

It is a web tool to generate a descriptor for Anomalous Aortic Origin of a Coronary Artery (AAOCA) based on the standardized nomenclature proposed by the International Coronary Artery Anomalies Collaborative group.

scDown: A Pipeline for Single-Cell RNA-Seq Downstream Analysis

Single-cell transcriptomics data are analyzed using two popular tools, Seurat and Scanpy. Multiple separate tools are used downstream of Seurat and Scanpy cell annotation to study cell differentiation and communication, including cell proportion difference analysis between conditions, pseudotime and trajectory analyses to study cell transition, and cell–cell communication analysis. To automate the integrative cell differentiation and communication analyses of single-cell RNA-seq data, we developed a single-cell RNA-seq downstream analysis pipeline called “scDown”. This R package includes cell proportion difference analysis, cell–cell communication analysis, pseudotime analysis, and RNA velocity analysis. Both Seurat and Scanpy annotated single-cell RNA-seq data are accepted in this pipeline. We applied scDown to a published dataset and identified a unique, previously undiscovered signature of neuronal inflammatory signaling associated with a rare genetic neurodevelopmental disorder. These findings were not identified with a simple implementation of Seurat differential gene expression analysis, illustrating the value of our pipeline in biological discovery. scDown can be broadly utilized in downstream analyses of scRNA-seq data, particularly in rare diseases.

MAFDash: An easy-to-use dashboard builder for mutation data

Characterizing the somatic mutation landscape of a cohort of patients has become a routine task in cancer research in recent years. Such studies are often highly interdisciplinary, requiring iterative analysis that must be evaluated at each step by many researchers. Therefore, there is a growing need for reporting tools that can easily generate interactive reports for sharing data and results with collaborators. Here we present an R package, MAFDash, that tries to simplify summarization and visualization of mutation data from Mutation Annotation Format (MAF) files. The output HTML dashboard is a self-contained report that can be used for downstream analysis and sharing results.

PlacentaCellEnrich: A tool to characterize gene sets using placenta cell-specific gene enrichment analysis

Single-cell RNA-Sequencing (scRNA-Seq) has improved our understanding of individual cell types in the human placenta. However, placental scRNA-Seq data has not been analyzed to identify genes with cell-specific gene expression patterns, which would be useful to understand how expression patterns in other model systems correspond to those in humans. Therefore, we developed PlacentaCellEnrich, a tool that takes a gene set as input, and then reports if the input set is enriched for genes with placenta cell-specific expression patterns, based on human placenta scRNA-Seq data.

TissueEnrich: A tool to calculate tissue-specific gene enrichment

Analysis of RNA-Sequencing data results in lists of genes that may have similar function, based on differential gene expression analysis or co-expression network analysis. Multiple tools have been developed that use gene ontologies to identify biological processes that are enriched in the genes sets. While these tools provide insights into the biological processes, there is no information about the tissue specificity of the genes, which is important when studying human disease. Therefore, we developed TissueEnrich, a tool that calculates tissue-specific gene enrichment in an input gene set. We demonstrated that TissueEnrich is very robust in identifying the lineage of single cell clusters and differentiated embryonic stem cells. TissueEnrich is available as a user-friendly and interactive web application as well as an R package (Bioconductor) allowing additional flexibility in usage.

Shiny application for IPL T20 cricket data analysis

I developed a shiny application to visualize and analyze the IPL T20 cricket data as a part of my STAT585 class. I used the IPL data from the year 2008 to 2016. The raw data is available at the Cricsheet website . The raw data consists ball by ball details individually for every match (577 files). The processed data has been downloaded from Kaggle . We further processed the data and incorporated the details about the venue locations for visualization. After that, we developed a shiny application in which the data can be analyzed by plotting and changing various parameters including year, team, and player. More details about the project are on the github page.

GBEER Analysis Pipeline

This is a semi-automated pipeline to run the GBEER tool. GBEER tool is being developed by Friedberg lab which is used to quantify and visualize the evolutionary changes that occur in gene blocks.

SPINNER

SPINNER stands for Seeded Protein Interaction Network Neighborhood Expansion and Ranking Tool. It is an automated software tool, which can rank and compare genes or proteins from constructed phenotype-specific biomolecular interaction networks. Given the user input of a list of phenotype-specific genes, our tool can query the STRING protein-protein interaction database automatically to retrieve protein-protein interactions among the input genes with user-specified network expansion levels to construct a phenotype-specific network. All the sub-networks are ranked and evaluated statistically to obtain a P-value for its index of aggregation before subsequent analysis. To compare the significant contribution of each protein, we consider its node degree of connectivity, protein interaction quality for its surrounding interacting partners (including both direct and indirect connected partners through iterations), the protein's significance in both unfiltered global network and phenotype-specific network, and other network characteristics. Our tool also provides the gene/protein PubMed reference citation count for the specific phenotype to help users evaluate the ranked proteins. A family-wise adjusted P-value of all significant ranks against randomized topology-preserving networks are also provided to help assess the rank.

GEMINE (Gene Expression Mutation Interaction Neighborhood Exploration)

Developed the database which stores the data regarding the significance of genes in cancers based on their gene expression and mutations. The raw data has been taken from COSMIC and TCGA and the differentially expressed genes have been filtered and prioritized on the basis of their SPINNER rank. We also calculate the similarity between the genes on the basis of the gene expression data from TCGA and mutation data from COSMIC. We also developed the web application, which is used to browse through the database using PHP and javascript.

Vocab World

"VOCAB WORLD" is an English Vocabulary Android Application published on Google Play. This application consists of a database of a large pool of words. The application consists of Antonym and Synonym quiz which consists of 10 multiple choice questions having a time limit of 2 minutes. Based on the correct answers the final score is displayed which is saved in the history. The option to review oneâ€™s performance is also there with correct answers highlighted green and the wrong answers highlighted red. The review section also consists of the wordâ€™s meaning.