Introduction of CGIDLA
CGIDLA is an online service website for CpG island related density and LAUPs analysis that hosts LAUPs and provides users with CpG island related analysis functions. CGIDLA provides analysis service to investigate the relationship among the CpG islands density, TATA-box feature and expression breadth of human genes. Also, it deposits 32 representative species, including bacteria, humans, and animal plants, and offers data analysis service for LAUPs within the dataset. Moreover, CGIDLA provides the source code download service and the related K-mer counting functions.
Citations:
[1] Zhang L, Dai Z, Yu J, Xiao M *. CpG-island-based annotation and analysis of human housekeeping genes. Briefings In Bioinformatics, 2021, 22(1): 515-525. DOI:10.1093/bib/bbz134
[2] Xiao M, Yang X, Yu J, Zhang L *. CGIDLA:Developing the Web Server for CpG Island related Density and LAUPs (Lineage-associated Underrepresented Permutations) Study, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2019, 17(6): 2148-2154. 10.1109/TCBB.2019.2935971
How to annotate CpG island density?
we annotate all human CpG island into three categories: HCGI(high-density CpG island) , ICGI(intermediate-density CpG island) and LCGI(low-density CpG island)[1]. We compute the observe/expectation value (O/E) for each CpG island by Eq. 1.1[2], then annotate each CpG island density by Eq. 1.2.
How to annotate CpG+/- genes?
Referring to previous studies[3, 4],we annotate the protein-coding gene that at least one of its transcriptional start sites (TSSs) is located in CpG island as CpG+ gene,othwise annotate as CpG-.
How to annotate TATA+/- genes?
Referring to previous studies[3, 4],we annotate the protein-coding gene that at least one of its TSSs contains TATA-box in the upstream [-50, -10] region as TATA+ gene,othwise annotate as TATA-.
How to compute tissue expression breadth?
First, we denote gene expression breadth (expBreadth) as how many tissues have the gene been expressed[3] as Eq. 2.1 Then, we computed the expression breadth for every protein-coding genes of human.
What's LAUPs?
we empirically define sequences permutations that never exist in any wellknown public databases as lineage-associated underrepresented permutations (LAUPs).
How to compute LAUPs?
The workflow for the LAUPs counting procedure is :
[7] ZHANG L, XIAO M, ZHOU J, et al. Lineage-associated underrepresented permutations (LAUPs) of mammalian genomic sequences based on a Jellyfish-based LAUPs analysis application (JBLA) [J]. Bioinformatics, 2018, 34(21): 3624-30.
Here, we provide some tools for LAUPs calculation and analysis, please click on the left banner to see the details.
For guide document, please check:Guide document