Дана краткая характеристика CRISPR-локусов, встречающихся приблизительно у половины бактерий и у большинства архей. Показана их типичная организация, важным элементом которой служат CRISPR-кассеты, содержащие уникальные спейсеры, перемежающиеся одинаковыми прямыми повторами. Кратко рассмотрены специализированные программы поиска CRISPR-кассет в секвенированных геномах микроорганизмов и в метагеномных данных путем выявления в них повторяющихся участков. Приведены актуальные web-страницы таких программ и в табличной форме указаны их предназначения и возможности. Отмечены базы данных по CRISPR-локусам с указанием их web-адресов. Проведен анализ практически всей имеющейся литературы по данному вопросу и соответствующие интернет-ресурсы.
Ключевые слова:
CRISPR, Cas, PAM, гидРНК, спейсер, протоспейсер, компьютерная программа, web-ресурс, база данных
Abby S.S., Néron B., Ménager H., Touchon M., Rocha E.P. MacSyFinder: a program to mine genomes for molecular systems with an application to CRISPR-Cas systems // PLoS One. 2014. V.9(10):e110726.
Abouelhoda M.I., Kurtz S., Ohlebusch E. Replacing suffix trees with enhanced suffix arrays // J. Discrete Algorithms. 2004. V.2. P.53–86.
Alkhnbashi O.S., Costa F., Shah S.A., Garrett R.A., Saunders S.J., Backofen R. CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci // Bioinformatics. 2014. V.30. P.489-496.
Alkhnbashi O.S., Shah S.A., Garrett R.A., Saunders S.J., Costa F., Backofen R. Characterizing leader sequences of CRISPR loci // Bioinformatics. 2016. V.32. P.i576-i585.
Anderson R.E., Brazelton W.J., Baross J.A. Using CRISPRs as a metagenomic tool to identify microbial hosts of a diffuse flow hydrothermal vent viral assemblage // FEMS Microbiol. Ecol. 2011. V.77. P.120-133.
Barrangou R., Horvath P. A decade of discovery: CRISPR functions and applications // Nat. Microbiol. 2017. V.2:17092.
Bao Z., Eddy S.R. Automated de novo identification of repeat sequence families in sequenced genomes // Genome Res. 2002. V.12. P.1269-1276.
Ben-Bassat I., Chor B. CRISPR Detection from Short Reads Using Partial Overlap Graphs // Intern. Conf. Res. Comput. Mol. Biol. RECOMB 2015: Research in Computational Molecular Biology. P.16-27.
Ben-Bassat I., Chor B. CRISPR detection from short reads using partial overlap graphs // J. Comput. Biol. 2016. V.23. P.461-471.
Benson G. Tandem repeats finder: a program to analyze DNA sequences // Nucleic Acids Res. 1999. V.27. P.573-580.
Biswas A., Gagnon J.N., Brouns S.J., Fineran P.C., Brown C.M. CRISPRTarget: bioinformatic prediction and analysis of crRNA targets // RNA Biol. 2013. V.10. P.817-827.
Biswas A., Fineran P.C., Brown C.M. Accurate computational prediction of the transcribed strand of CRISPR non-coding RNAs // Bioinformatics. 2014. V.30. P.1805-1813.
Biswas A., Fineran P.C., Brown C.M. Computational Detection of CRISPR/crRNA Targets // Methods Mol. Biol. 2015. V.1311. P.77-89.
Biswas A., Staals R.H., Morales S.E., Fineran P.C., Brown C.M. CRISPRDetect: A flexible algorithm to define CRISPR arrays // BMC Genomics. 2016. V.17:356.
Bland C., Ramsey T.L., Sabree F., Lowe M., Brown K., Kyrpides N.C., Hugenholtz P. CRISPR recognition tool (CRT): a tool for automatic detection of clustered regularly interspaced palindromic repeats // BMC Bioinformatics. 2007. V.8:209.
Bolotin A., Quinquis B., Sorokin A., Ehrlich S.D. Clustered regularly interspaced short palindrome repeats (CRISPRs) have spacers of extrachromosomal origin // Microbiology. 2005. V.151. P.2551–2561.
Drevet C., Pourcel C. How to identify CRISPRs in sequencing data // Methods Mol. Biol. 2012. V.905. P.15-27.
Dsouza M., Larsen N., Overbeek R. Searching for patterns in genomic data // Trends Genet. 1997. V.13. P.497-498.
Durand P., Mahé F., Valin A.S., Nicolas J. Browsing repeats in genomes: Pygram and an application to non-coding region analysis // BMC Bioinformatics. 2006. V.7:477.
Edgar R.C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity // BMC Bioinformatics. 2004. V.5: 113.
Edgar R.C. PILER-CR: fast and accurate identification of CRISPR repeats // BMC Bioinformatics. 2007. V.8:18.
Edgar R.C., Myers E.W. PILER: identification and classification of genomic repeats // Bioinformatics. 2005. V.21. Suppl 1:i152-8.
Ge R., Mai G., Wang P., Zhou M., Luo Y., Cai Y., Zhou F. CRISPRdigger: detecting CRISPRs with better direct repeat annotations // Sci. Rep. 2016. V.6:32942.
Godde J.S., Bickerton A. The repetitive DNA elements called CRISPRs and their associated genes: evidence of horizontal transfer among prokaryotes // J. Mol. Evol. 2006. V.62. P.718-729.
Gogleva A.A., Gelfand M.S., Artamonova I.I. Comparative analysis of CRISPR cassettes from the human gut metagenomic contigs // BMC Genomics. 2014. V.15:202.
Grissa I., Vergnaud G., Pourcel C. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats // Nucl. Acids Res. 2007. V.35. W52-57.
Grissa I., Vergnaud G., Pourcel C. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats // BMC Bioinformatics. 2007. V8:172.
Grissa I., Vergnaud G., Pourcel C. CRISPRcompar: a website to compare clustered regularly interspaced short palindromic repeats // Nucl. Acids Res. 2008. V.36. W145-148.
Gusfield D., Stoye J. Linear time algorithms for finding and representing all the tandem repeats in a string // J. Computer and System Sciences. 2004. V.69. P.525-546.
Heidelberg J.F., Nelson W.C., Schoenfeld T., Bhaya D. Germ warfare in a microbial mat community: CRISPRs provide insights into the co-evolution of host and viral genomes // PLoS One. 2009. V.4(1):e4169.
Huntemann M., Ivanova N.N., Mavromatis K., Tripp H.J., Paez-Espino D., Palaniappan K., Szeto E., Pillay M., Chen I.M., Pati A., Nielsen T., Markowitz V.M., Kyrpides N.C. The standard operating procedure of the DOE-JGI Microbial Genome Annotation Pipeline (MGAP v.4) // Stand Genomic Sci. 2015. V.10:86.
Huntemann M., Ivanova N.N., Mavromatis K., Tripp H.J., Paez-Espino D., Tennessen K., Palaniappan K., Szeto E., Pillay M., Chen I.M., Pati A., Nielsen T., Markowitz V.M., Kyrpides N.C. The standard operating procedure of the DOE-JGI Metagenome Annotation Pipeline (MAP v.4) // Stand Genomic Sci. 2016. V.11:17.
Jansen R., Embden J.D., Gaastra W., Schouls L.M. Identification of genes that are associated with DNA repeats in prokaryotes // Mol Microbiol. 2002. V. 43. P. 1565–1575.
Ishino Y., Shinagawa H., Makino K., Amemura M., Nakata A. Nucleotide sequence of the iap gene, responsible for alkaline phosphatase isozyme conversion in Escherichia coli, and identification of the gene product // J. Bacteriol. 1987. V. 169. P. 5429–5433.
Ivanova N., Daum C., Lang E., Abt B., Kopitz M., Saunders E., Lapidus A., Lucas S., Glavina Del Rio T., Nolan M., Tice H., Copeland A., Cheng J.F., Chen F., Bruce D., Goodwin L., Pitluck S., Mavromatis K., Pati A., Mikhailova N., Chen A., Palaniappan K., Land M., Hauser L., Chang Y.J., Jeffries C.D., Detter J.C., Brettin T., Rohde M., Göker M., Bristow J., Markowitz V., Eisen J.A., Hugenholtz P., Kyrpides N.C., Klenk H.P. Complete genome sequence of Haliangium ochraceum type strain (SMP-2) // Stand Genomic Sci. 2010. V.2. P.96-106.
Koonin E.V., Makarova K.S., Zhang F. Diversity, classification and evolution of CRISPR-Cas systems // Curr. Opin. Microbiol. 2017. V.37. P.67-78.
Kunin V., Sorek R., Hugenholtz P. Evolutionary conservation of sequence and secondary structures in CRISPR repeats // Genome Biol. 2007. V.8(4):R61.
Kurtz S., Choudhuri J.V., Ohlebusch E., Schleiermacher C., Stoye J., Giegerich R. REPuter: the manifold applications of repeat analysis on a genomic scale // Nucleic Acids Res. 2001. V.29. P.4633-4642.
Kurtz S., Schleiermacher C. REPuter: fast computation of maximal repeats in complete genomes // Bioinformatics. 1999. V.15. P.426-427.
Lange S.J, Alkhnbashi O.S., Rose D., Will S., Backofen R. CRISPRmap: an automated classification of repeat conservation in prokaryotic adaptive immune systems // Nucleic Acids Research. 2013. V.41. P.8034-8044.
Lei J., Sun Y. Assemble CRISPRs from metagenomic sequencing data // Bioinformatics. 2016. V.32. P.i520-i528.
Lefebvre A., Lecroq T., Dauchel H., Alexandre J. FORRepeats: detects repeats on entire chromosomes and between genomes // Bioinformatics. 2003. V.19. P.319-326.
Mai G., Ge R., Sun G., Meng Q., Zhou F. A Comprehensive Curation Shows the Dynamic Evolutionary Patterns of Prokaryotic CRISPRs // Biomed. Res. Int. 2016;2016:7237053.
Mojica F.J., Juez G., Rodríguez-Valera F. Transcription at different salinities of Haloferax mediterranei sequences adjacent to partially modified PstI sites // Mol Microbiol. 1993. V.9. P.613–621.
Mojica F.J., Ferrer C., Juez G., Rodríguez-Valera F. Long stretches of short tandem repeats are present in the largest replicons of the Archaea Haloferax mediterranei and Haloferax volcanii and could be involved in replicon partitioning // Mol Microbiol. 1995. V. 17. P. 85–93.
Mojica F.J., Díez-Villaseñor C., Soria E., Juez G. Biological significance of a family of regularly spaced repeats in the genomes of Archaea, Bacteria and mitochondria // Mol Microbiol. 2000. V. 36. P. 244–246.
Mojica F.J., Díez-Villaseñor C., García-Martínez J., Soria E. Intervening sequences of regularly spaced prokaryotic repeats derive from foreign genetic elements // J Mol Evol. 2005. V. 60. P. 174–182.
Nakata A., Amemura M., Makino K. Unusual nucleotide arrangement with repeated sequences in the Escherichia coli K-12 chromosome // J Bacteriol. 1989. V. 171. P. 3553–3556.
Nicolas J., Rousseau C., Siegel A., Peterlongo P., Coste F., Durand P., Tempel S., Valin A-S., Mahe F. Modeling local repeats on genomic sequences // Research Report RR-6802, INRIA. 2008. pp.43.
Paez-Espino D., Eloe-Fadrosh E.A., Pavlopoulos G.A., Thomas A.D., Huntemann M., Mikhailova N., Rubin E., Ivanova N.N., Kyrpides N.C. Uncovering Earth's virome // Nature. 2016. V.536. P.425-430.
Pevzner P.A., Tang H., Tesler G. De novo repeat classification and fragment assembly // Genome Res. 2004. V.14. P.1786-1796. Erratum in: Genome Res. 2004. V.14. P.2510.
Pourcel C., Salvignol G., Vergnaud G. CRISPR elements in Yersinia pestis acquire new repeats by preferential uptake of bacteriophage DNA, and provide additional tools for evolutionary studies // Microbiology. 2005. V. 151. P. 653–663.
Price A.L., Jones N.C., Pevzner P.A. De novo identification of repeat families in large genomes // Bioinformatics. 2005. V.21. Suppl 1:i351-358.
Rho M., Wu Y.W., Tang H., Doak T.G., Ye Y. Diverse CRISPRs evolving in human microbiomes // PLoS Genet. 2012. V.8(6):e1002441.
Rousseau C., Gonnet M., Le Romancer M., Nicolas J. CRISPI: a CRISPR interactive database // Bioinformatics. 2009. V.25. P.3317–3318.
Skennerton C.T., Imelfort M., Tyson G.W. Crass: identification and reconstruction of CRISPR from unassembled metagenomic data // Nucleic Acids Res. 2013. V.41(10):e105.
Sobreira T.J., Durham A.M., Gruber A. TRAP: automated classification, quantification and annotation of tandemly repeated sequences // Bioinformatics. 2006. V.22. P.361-362.
Stern A., Mick E., Tirosh I., Sagy O., Sorek R. CRISPR targeting reveals a reservoir of common phages associated with the human gut microbiome // Genome Res. 2012. V.22. P.1985-1994.
Ussery D.W., Binnewies T.T., Gouveia-Oliveira R., Jarmer H., Hallin P.F. Genome update: DNA repeats in bacterial genomes // Microbiology. 2004. V.150. P.3519-3521.
Volfovsky N., Haas B.J., Salzberg S.L. A clustering method for repeat analysis in DNA sequences // Genome Biol. 2001. V.2(8):RESEARCH0027
Zhang Q., Ye Y. Not all predicted CRISPR-Cas systems are equal: isolated cas genes and classes of CRISPR like elements // BMC Bioinformatics. 2017. V.18(1):92.