banner
Центр новостей
Изящный и современный

Исследование вирусной темной материи экосистемы рубца с помощью глобальной базы данных виромов

Jun 18, 2023

Nature Communications, том 14, номер статьи: 5254 (2023) Цитировать эту статью

331 Доступов

8 Альтметрика

Подробности о метриках

Разнообразный виром рубца может модулировать микробиом рубца, но он остается в значительной степени неизученным. Здесь мы анализируем 975 опубликованных метагеномов рубца на предмет вирусных последовательностей, создаем глобальную базу данных виромов рубца (RVD) и анализируем виромы рубца на предмет разнообразия, связей вирус-хозяин и потенциальной роли в влиянии на функции рубца. Содержащий 397 180 вирусных операционных таксономических единиц (vOTU) на видовом уровне, RVD существенно увеличивает уровень обнаружения вирусов рубца из метагеномов по сравнению с IMG/VR V3. Большинство классифицированных vOTU принадлежат к Caudovirales и отличаются от тех, которые встречаются в кишечнике человека. Предполагается, что виром рубца инфицирует основной микробиом рубца, включая вещества, разрушающие клетчатку, и метаногены, несет разнообразные вспомогательные метаболические гены и, таким образом, вероятно, влияет на экосистему рубца как сверху вниз, так и снизу вверх. РВД и полученные результаты предоставляют полезные ресурсы и основу для будущих исследований по изучению того, как вирусы могут влиять на экосистему рубца и физиологию пищеварения.

В результате недавних метагеномных исследований, посвященных вирусам, были созданы очень большие каталоги и базы данных вирусных геномов для нескольких экосистем, включая океанские вирусы1,2, кишечник человека3,4,5 и почву6. Они выявили чрезвычайно разнообразные виромы, идентифицировали многочисленные вспомогательные метаболические гены и пролили новый свет на экологическое воздействие вирусов. Более того, модельные системно-ориентированные исследования начали показывать, как вирусы могут перепрограммировать метаболизм своих прокариотических хозяев, образуя отдельные вироклетки, которые изменяют экологическую приспособленность и метаболизм хозяев7. Появляющиеся данные подтверждают потенциальное воздействие вирусов на биогеохимию океана1,8, физиологию человека4 и болезненные состояния9. Аналогичных исследований рубцового вирома или базы данных рубцово-специфических виромов не имеется.

Рубец содержит разнообразную экосистему, состоящую из нескольких царств, содержащую бактерии, археи, грибы, простейшие и вирусы. В совокупности микробиом рубца переваривает и ферментирует неперевариваемые в противном случае корма и обеспечивает большую часть энергии (в форме летучих жирных кислот) и метаболизируемого азота (в форме микробного белка), необходимых жвачным животным для роста и производства мяса и молока. Была задокументирована сильная связь рубцовых бактерий, архей и простейших с эффективностью кормления, выбросами метана (CH4) и здоровьем животных10, но вирусы рубца, несмотря на их обилие, остаются плохо изученными, несмотря на исследования, ориентированные на вирусы, способствующие характеристике рубца. вироме11,12. Ранние исследования с использованием электронной микроскопии зафиксировали морфологически разнообразные бактериофаги и выявили преобладание хвостатых фагов13,14. Ранние исследования, основанные на культуре, обнаружили бактериофаги, которые могут инфицировать широкий спектр видов или штаммов бактерий рубца, включая распространенные виды Prevotella, Ruminococcus и Streptococcus, и классифицировали большинство этих фагов на основе их морфологии в семейства Myoviridae, Siphoviridae, Podoviridae. и Inoviridae (обзор Гилберта и Клива15). Хотя эти исследования предоставили ценную информацию о вирусах рубца, простая морфология фагов не позволяет провести надежную таксономическую классификацию, и, таким образом, Международный комитет по таксономии вирусов (ICTV: https://ictv.global/taxonomy) больше не признает морфологию- на основе классификации вирусов.

Геномика, метагеномика и метатранскриптомика стали основными технологиями изучения виромов, в том числе вирома рубца. Недавнее культурально-зависимое полногеномное секвенирование выявило 10 фагов, инфицирующих Prevotella ruminicola, Ruminococcus albus, Streptococcus bovis и Butyrivibrio fibrisolvens16,17, которые играют важную роль в переваривании и ферментации корма. Эти фаговые геномы демонстрируют модульную геномную организацию, консервативные вирусные гены и потенциально могут быть как литическими, так и лизогенными17. Вирусы рубца также изучались с использованием метагеномов вирусоподобных частиц (VLP) (обзор 11). Однако использованные эталонные геномные базы данных недостаточно представляют вирусы рубца, что ограничивает идентификацию и классификацию вирусов рубца, а также прогнозирование их хозяина. Например, были обнаружены вирусы рубца с различными генотипами, но большинство из них не были классифицированы из-за отсутствия совпадений с вирусными последовательностями18,19,20. Miller и соавт.18 обнаружили кластеризованные элементы коротких палиндромных повторов с регулярными промежутками (CRISPR)/CRISPR-ассоциированных белковых элементов (Cas) в некоторых микробных геномах и метагеномах рубца, но обнаружили мало спейсерных последовательностей, соответствующих вирусным последовательностям рубца для прогнозирования хозяина. Поэтому было трудно охарактеризовать виром рубца, особенно в отношении новых вирусов.

12-fold) and IMG/VR V3 and improving the identification of viral sequences based on rumen metagenomics, RVD will be useful as a new community resource and will provide new insights for future studies on the rumen virome and its implication in feed digestion, microbial protein synthesis, feed efficiency, and CH4 emissions./p>5 kb each and clustered them into 411,125 vOTUs. After validation with VIBRANT23, we constructed a rumen virome database (RVD, download available at https://zenodo.org/record/7412085#.ZDsE2XbMK5c) representing 397,180 vOTUs (Supplementary Fig. 1), with 193,327 vOTUs of >10 kb. Checking with CheckV21 revealed 4400 complete vOTUs, 4396 high-quality vOTUs, and 32,942 medium-quality vOTUs. The completeness and quality of the RVD vOTUs were probably underestimated because CheckV is database dependent, and the databases used are primarily derived from other ecosystems. All the vOTUs in RVD meet Uncultivated Virus Genome (MIUViG) standards25./p>50% completeness of the current study and the two largest human gut virome databases (MGV4 and GPD5). For better visualization, only one representative vOTU (the longest and most complete) was included for each genus-level vOTU (714 in total). The branches were color-coded: green, the Caudovirales lineages exclusively found in the human virome; red, the lineages exclusively found in the rumen virome of the current study; blue, the lineages found in both the rumen and the human viromes. Lysogeny rates (proportion) were calculated with VIBRANT and shown as the inner ring. The number of vOTUs representing each lineage was shown as a bar plot (red for human viruses, and black for human viruses). d Proportion of lineages of Caudovirales viruses unique to the human intestine, the rumen, and shared. e A rarefaction curve of the vOTUs identified in the rumen virome. The upward trend of the rarefaction curve indicates that more rumen viruses remain to be identified at the specie level./p>1 phage per host genome. The percentage of lysogenic viruses varied among the host genera, and it was low for most host genera (Fig. 3c). Most ciliate SAGs presented multiple EVEs, among which all five SAGs of Isotricha sp. YL-2021b and Dasytricha ruminantium presented the greatest number (>50) EVEs per SAG (Supplementary Fig. 5). Little is known about viruses infecting ciliates, and no EVEs have been reported for even model ciliate species (e.g., Tetrahymena thermophila). However, EVEs have been recently found in Entamoeba and Giardia in human stool metagenomes32. Therefore, rumen ciliates probably carry EVEs. The large number of EVEs per ciliate SAG may correspond to the high polyploidy and the enormous numbers of chromosomes found in many rumen ciliates (e.g., >10,000 in Entodinium caudatum33)./p>12-fold). Based on the gene-sharing network, most rumen vOTUs were clustered into four groups (Fig. 3b). Groups I (the largest) and IV (the smallest) contained more classified vOTUs than groups II and III. Groups I and IV had a broader host range among bacterial phyla, including both gram-positive and gram-negative bacteria with different niches and capacities, but few of their genera or families were predominant in the rumen. Groups II and III mainly infected Bacteroidota and Methanobacteriota, respectively (Fig. 3c), and most viruses of these two groups could not be classified with any of the current virome databases; thus, they represent new viral lineages. The narrow host range (a single phylum) of groups II and III supports the notion that phages with a high degree of gene sharing generally infect phylogenetically related hosts./p>2400) and bacteriophages (>40,000) down to the species level, and many of the host species are known to play important roles in feed digestion, fermentation, and methane emissions. Advancement in the prediction of hosts and virus‒host linkages will aid in understanding the ecological roles of rumen viruses. Such information will be especially useful when both the rumen metagenome and virome are investigated for their association with major rumen functions. Among the rumen vOTUs with a predicted host match, 99.5% were inferred to infect prokaryotes primarily found in the rumen, even though most of the reference prokaryote genomes that were used came from prokaryotes in other environments, demonstrating the rigor and low false positive rate of our host prediction pipeline./p>5 kb were verified using VirSorter222 (option: --min-score 0.5), and the contigs that passed the verification procedure were input to CheckV21 to trim off host sequences flanking prophages. We only chose viral contigs >5 kb because the currently available bioinformatics tools show a relatively high false positive rate when identifying viral contigs <5 kb30. Only the contigs falling into categories Keep1 and Keep2 were retained as putative viral contigs (708,580 in total) for further analyses./p>10 kb to genus-level viral taxa based on a gene-sharing network using vConTACT226, which uses NCBI RefSeq Viral (release 88) as reference genomes. The vOTUs that could be clustered with the reference genomes of a viral genus were assigned to that genus according to the vConTACT2 workflow. We assigned the vOTUs that failed to be assigned to a viral genus and those <10 kb to family-level viral taxa using the majority rule, as applied previously4. Briefly, we predicted the ORFs of each vOTU using Prodigal56 and then aligned the ORF sequences with those of NCBI RefSeq Viral using BLASTp with a bit score of ≥50. The vOTUs that were aligned with the NCBI RefSeq Viral genomes of a viral family with >50% of their protein sequences were assigned to that family. We identified crAss-like phages using BLASTn against 2,478 crAss-like phage genomes identified from previous studies57,58,59, with a threshold of ≥80% sequence identity along ≥50% of the length of previously identified crAss-like vOTUs./p>50% were included in the search. We then aligned each of the marker genes from the three databases using MAFFT62, sliced out the positions with >50% gaps using trimAl63, concatenated each aligned marker gene, and filled the gap where a marker gene was absent. Only the concatenated marker genes that each showed >3 marker genes and were found in >5% of all the aligned concatemers were retained, resulting in 10,203 Caudovirales marker gene concatemers, each with 13,573 alignment columns. These marker gene concatemers were clustered into genus-level vOTUs as described previously5, where benchmarking was performed to achieve high taxonomic homogeneity using NCBI RefSeq Viral genomes. We built a phylogenetic tree of Caudovirales viruses using FastTree v.2.1.9 (option: -mlacc 2 -slownni -wag)64 and aligned the concatenated marker genes of the representative vOTUs sequences of all the genus-level vOTUs with genome completeness >50% (based on CheckV analysis). The Caudovirales tree was visualized using iTOL65. The vOTUs identified as prophages or encoding an integrase were considered lysogenic. The lysogenic rate (%) was calculated based on the VIBRANT results as the percentage of lysogenic viruses of all the viruses for each genus of their probable hosts./p>2,500 bp of a host genome or MAG matched a vOTU sequence at >90% sequence identity over 75% of the vOTU sequence length4. We predicted probable protozoal hosts of the rumen viruses by searching the 52 high-quality ciliate SAGs68 for EVEs using BLASTn and the above criteria./p>10 kb (5912 in total) for AMG identification using the criteria recommended in a benchmarking paper30. The selected vMAGs were then subjected to AMG identification and genome annotation using DRAMv72 after processing with VirSorter2 with the options “—prep-for-dramv” applied. Second, the AMG-carrying vMAGs were removed if the AMGs were at an end of the vMAGs or if the AMGs were not flanked by both one viral hallmark gene and one viral-like gene or by two viral hallmark genes (category 1 and category 2 as determined by DRAMv). Third, the remaining vMAGs were further manually curated based on the criteria specified in the VirSorter2 SOP (https://doi.org/10.17504/protocols.io.bwm5pc86; also see https://github.com/yan1365/RVD/blob/main/vmags_check_helper/readme.txt). We eventually obtained 1,880 vMAGs. To further minimize false identification, we manually checked the genomic context of these vMAGs and found that some of them were still possible genomic islands. Therefore, we filtered the 1880 vMAGs based on the criteria established by Sun and Pratama et al. (unpublished data). Briefly, vMAGs with only integrases/transposases, tail fiber genes, or any nonviral genes were removed. The remaining vMAGs were filtered again to remove those that did not have at least one of the viral structural genes (i.e., capsid protein, portal protein, phage coat protein, baseplate, head protein, tail protein, virion structural protein, and terminase) and those containing genes encoding an endonuclease, plasmid stability protein, lipopolysaccharide biosynthesis enzyme, glycosyltransferase (GT) families 11 and 25, nucleotidyltransferase, carbohydrate kinase, or nucleotide sugar epimerase. We eventually obtained 504 vMAGs free of genomic islands. To benchmark our curation pipeline, 100 of the vMAGs were randomly selected for detailed manual curation based on their genomic context. According to the benchmarking results, we were confident that we retained only complete vMAGs for AMG prediction. Detailed results of each curation step and full annotation of the final vMAGs and the annotation of the identified AMGs are presented in Supplementary Data 4. We compared the AMGs identified in the rumen virome to the previously identified AMGs from other viromes, which are available in an expert-curated AMG database (https://github.com/WrightonLabCSU/DRAM/blob/master/data/amg_database.tsv). For the newly identified AMGs, we double-checked the annotations and searched the literature to ensure that they were truly AMGs./p>50% concentrate). First, we transformed the raw abundance table into a binary matrix (presence or absence). Then, the prevalence of each vOTU in each sample was calculated. A vOTU was included in the core rumen virome if its prevalence exceeded 50% of the prevalence for each concentrate level or all cattle. Based on prevalence, the vOTUs were categorized as individualized (observed in only one sample), one concentrate level (observed in more than 1 sample but exclusively from a single concentrate level), two concentrate levels (observed in animals from two concentrate levels) and three concentrate levels (observed in all three concentrate levels). The numbers of vOTUs shared by the core viromes among the three concentrate levels were visualized with a Venn graph in R. We examined whether animals from the same diet or same breed share more vOTUs compared to animals fed different diets or of different breeds using subsets of data from Stewart et al.78 and Li et al.79 respectively. The Kruskal–Wallis test was used to compare the numbers of shared vOTUs in different groups in R./p>12 metagenomes were retained for the analysis. The number of vOTUs shared by two studies was compared for every study pair, and the results were subjected to hierarchical clustering. The hierarchical clustering results were visualized in R with the ComplexHeatmap package81 and annotated according to the metadata./p>