ИСТИНА |
Войти в систему Регистрация |
|
ИСТИНА ПсковГУ |
||
An increasing number of studies have revealed that non-coding RNAs (ncRNAs) play important roles in gene regulation and nuclear organization. Many ncRNAs have been shown to interact with chromatin. One approach to understanding the function of these ncRNAs is to identify their sites of genomic interaction. Hybridization capture methods using oligonucleotide probes have been used for years to study chromatin-associated RNA. This group of methods (RAP, CHART- seq, ChIRP-seq, ChOP-seq and CHIRT-seq [1-5]) solves the problem of finding contacts of a predetermined RNA. More recently, several groups have developed novel methods based on proximity ligation to investigate RNA–chromatin interactions at a genome-wide scale (GRID- seq, ChAR-seq, iMARGI, RADICL-seq and Red-C [6-10]). They are designed to search for all possible DNA and RNA contacts in a cell. The result is a set of contacts of various RNAs and DNA loci. For several techniques authors have provided protocols for handling raw reads. But there is no standard unifying protocol for processing all these types of data. Therefore it is difficult to conduct a comparative analysis of RNA–chromatin contacts. We developed NF-RNA-Chrom – a complete data processing pipeline to improve analysis re- producibility by standardizing data processing steps originally implemented in RNA-Chrom DB [11]. It is flexible, universal, can be completed without installing required bioinformatic tools on your own (by using Docker images). The pipeline is written in Nextflow [12] and each step consists of subworkflows with processing scripts. With such types of data there are sev- eral challenges and limitations, such as noise in data, specifics in technical sequences and their preprocessing, in some experiments strand of the cDNA is mixed up. The pipeline is adjusted to account for any issues. It is designed to improve efficiency, and ensure the reliability and transparency of the analysis. To run it locally there are simple installation steps that are highlighted in the documentation. The pipeline will generate output files and statistics from each step into the corresponding di- rectories. The main routine performs several consecutive steps. First step is to process raw sequencing reads including removal of a bridge/linker sequence and separating RNA and DNA parts. PCR duplicates are identified and removed, this step implements our own software for remov- ing duplicate reads, which in contrast to programs such as FastUniq [13] works optimally with memory and has other problems fixed. After that RNA and DNA reads are mapped to a ref- erence genome separately using the Hisat2 software. By default only reads that were uniquely mapped with 2 or fewer mismatches are kept for further analysis. Then filtered bam files are converted to bed format and joined into a resulting contacts table. Next, chromatin associated RNAs are identified. The clusters of unannotated RNA-parts of contacts that are found were named X-RNAs. A nonspecific background model is calculated (defined by protein coding RNAs that interact with all chromosomes except the chromosomes from which they are transcribed). Then each contact is normalized by the value of the back- ground signal, producing background-corrected contacts. Several additional normalization procedures take into account dependence of contact density on distance between the RNA source gene and chromatin target loci (‘scaling’). In addition to this, we are developing a special peak caller for RNA—DNA interactions data (BaRDIC tool), which will be able to determine areas of statistically significant enrichment of the true signal compared to the background. It accounts for scaling unlike MACS2 peak caller. The work was funded by the Russian Science Foundation grant No. 23-14-00136 1. Engreitz J.M., Pandya-Jones A., McDonel P. et al. (2013) The Xist lncRNA exploits three- dimensional genome architecture to spread across the X chromosome. Science, 341, 1–8. 2. Simon M.D., Wang C.I., Kharchenko P.V. et al. (2011) The genomic binding sites of a noncoding RNA. Proc. Natl. Acad. Sci. U.S.A., 108, 20497–20502. 3. Chu C., Qu K., Zhong F.L. et al. (2011) Genomic maps of long noncoding RNA occupancy reveal principles of RNA-chromatin interactions. Mol. Cell, 44, 667–678. 4. Mondal T., Subhash S., Vaid R. et al. (2015) MEG3 long noncoding RNA regulates the TGF-β pathway genes through formation of RNA-DNA triplex structures. Nat. Commun., 6, 1–17. 5. Chu H.P., Cifuentes-Rojas C., Kesner B. et al. (2017) TERRA RNA Antagonizes ATRX and protects telomeres. Cell, 170, 86–101. 6. Li X., Zhou B., Chen L. et al. (2017) GRID-seq reveals the global RNA-chromatin interac- tome. Nat. Biotechnol., 35, 940–950. 7. Bell J.C., Jukam D., Teran N.A. et al. (2018) Chromatin-associated RNA sequencing (ChAR- seq) maps genome-wide RNA-to-DNA contacts. ELife, 7, 1–28. 8. Yan Z., Huang N., Wu W. et al. (2019) Genome-wide colocalization of RNA–DNA interac- tions and fusion RNA pairs. Proc. Natl. Acad. Sci. U.S.A., 116, 3328–3337. 9. Bonetti A., Agostini F., Suzuki A.M. et al. (2020) RADICL-seq identifies general and cell type–specific principles of genome-wide RNA-chromatin interactions. Nat. Commun., 11, 1– 14. 10. Gavrilov A.A., Zharikova A.A., Galitsyna A.A. et al. (2020) Studying RNA-DNA interac- tome by Red-C identifies noncoding RNAs associated with various chromatin types and reveals transcription dynamics. Nucleic Acids Res., 48, 6699–6714. 11. Ryabykh, G. K., Kuznetsov, S. V., Korostelev, Y. D., Sigorskikh, A. I., Zharikova, A. A., Mironov, A. A. (2023). RNA-Chrom: a manually curated analytical database of RNA- chromatin interactome. Database : the journal of biological databases and curation, 2023, baad025. 12. Di Tommaso, P., Chatzou, M., Floden, E. W., Barja, P. P., Palumbo, E., Notredame, C. (2017). Nextflow enables reproducible computational workflows. Nature biotechnology, 35(4), 316–319. 13. Xu, H., Luo, X., Qian, J., Pang, X., Song, J., Qian, G., Chen, J., Chen, S. (2012). FastUniq: a fast de novo duplicates removal tool for paired short reads. PloS one, 7(12), e52249.