Bioinformatics and Computational Genomics
The focus of our research is on regulatory molecules and their interactions, such as regulatory proteins and their DNA/RNA target sites, small silencing RNAs and their RNA targets, and protein-protein interaction. Our lab has three main projects:
Gene Regulation
We aim to develop computational methods for understanding the molecular mechanism of gene regulation. We develop novel ways to discover transcription factor binding sites in genomic DNA. Because the sequences of these sites are of low information content, we pursue multiple approaches, including better characterizing transcriptional start sites and alternative proximal promoters, detecting clusters of transcription factor binding sites using probabilistic models, and identifying genes that are co-regulated and taking advantage of the enrichment of the sequence motifs in their promoters. We take an integrative approach using extensive high-throughput genomic and epigenomic data, such as chromatin-immunoprecipitation of transcription factors, nucleosome positioning, histone modifications, DNA methylation, and DNA replication.
Protein Docking
We develop methods to compute binding affinities between protein molecules. Combining this ability with a fast Fourier transform-based search algorithm, we develop computational methods for predicting protein-complex structures. We take a multiple-stage approach, i.e., we develop an initial stage algorithm ZDOCK to perform an exhaustive search in the translational and rotational space, and subsequent refinement algorithms such as ZRANK for structure refinement and reranking. We participate in the community-wide blind test of protein docking algorithms CAPRI.
Small Silencing RNAs
We develop computational methods to understand the biogenesis and regulatory mechanisms of small silencing RNAs (microRNAs or miRNAs, small silencing RNAs or siRNAs, and PIWI-interacting RNAs or piRNAs). We build computational pipelines to analyze high-throughput sequencing data of small silencing RNAs. We map tens of millions of sequence reads to the genome, quantify their length and nucleotide properties, genomic localization, relative abundance in different cell types and/or genotypes, evolutionary conservation, and discover any other features that can uncover the biogenesis and target recognition of the small silencing RNAs.