Oncogenic viruses account for about one sixth of tumorigenesis cases. A detailed and clear understanding of viral integration location, distribution and identification of specific viral sequences within human genome is helpful for curing cancer caused by viral infection. It is of great significance to investigate the integration of viruses into host genes or sequences from the perspective of genomics to screen virus susceptible populations, prevent virus infection, develop new therapeutics and precisely optimize treatment. In this study, we develop VISDB(virus我ntegrationsitedATAbASE,,VISDB)提供有关现场相关信息的知识基础,这些信息与人类基因组的病毒相关信息,并使研究病毒和相关恶性疾病的研究人员受益。beplay苹果手机能用吗
VISDBcovers 9 main viruses, including 5 DNA oncoviruses (HBV, HPV, EBV, MCV, AAV2) and 4 RNA retroviruses (HIV, MLV, HTLV, XMRV). The current version of VISDB deposits 77,602 integration sites carefully curated from 108 publications(some articles harbor multiple viruses).
HBV,20558 Vises,45个出版物
HPV,5118 Vises,31个出版物
EBV 1144虎头钳7出版物
MCV,55个Vises,9个出版物
AAV2,24个诉讼,3个出版物
艾滋病毒,16797年,10份出版物
htlv-1,33845 vises,4个出版物
MLV, 32 VISes, 1 publication
HBV,29个vises,2个出版物
Figure 1 shows the overview of VISDB. Scientific literatures harboring VISes, VISes and VIS-related data such as virus sequences, genes, miRNAs, fragile sites and other kinds of annotations, are curated and stored in VISDB. We firstly extract VISes from literature downloaded from public databases such as PubMed and ScienceDirect. Target genes, nearby genes, fragile sites and miRNA are provided in some publications, but in other cases, we only get a partof the information from the original publication. However, that information can be curated if exact integration position is provided by publications,including genome assembly,chromosome and locations on chromosome. Therefore, we discard studies that do not contain exact integration position information. Moreover, because we have collected a large number of VISes in VISDB, discarding a small number of VISes will not affect the integrity of VIS coverage. Oncogenes and tumor suppressor genes are further screened out from target genes and nearby genes, and the associations between genes and miRNA are also curated. Finally, functions such as browse, search, curation, gene feature, microRNA feature, download, statistics, help, and feedback are provided for users.
集成在人类参考基因组上的立场is critical for curating of VIS. The best case is that we can extract chromosome, cytoband, locations on chromosome and genome assembly from original articles. Sequences the virus integrated into are also extracted and deposited in database. For VIS with a sole location, we record sequences both upstream and downstream of inserted sites. However, if both start position and end position are provided, then the upstream sequence of start position, the downstream sequence of end position, and the sequence between the start position and end position are curated according to the genome assembly declared in the article.
The junction sequence category has a significant role in analyzing integration patterns. Though we wish to store the FASTA file of the junction sequence and its annotation, the coverage and mapped reads of each VIS, we only find junction sequences in less than 5 percent of VISes. Furthermore, we mark all endpoints pertaining to the human sequence or virus sequence and map these points with specific positions in the reference genome to ensure the visualization of integration event as shown in figure 3.
We develop a visualization tool to display rich information about features of integration sites. Virus integration with human genome may have many different patterns. The simplest pattern is when a segment of the virus sequence is broken and inserted into the host's genome without any other process in the occurrence of integration event. However, reverse-inserts, rearrangements, microhomology and mutations may take place in the process of integration, and the integration event may be complex. In this study, we consider a virus-integrated within a human sequence to have the form of “human sequence” + ”virus-mixed sequences” + ”human sequence”. In other words, a junction sequence is composed of a human sequence preceding the integrating region, a sequence mixed with virus sequences and unknown sequences excluding human sequences, and a human sequence following the integration region. Notably, overlap of human sequence and virus sequence and unknown sequence between human sequence and virus sequence are both allowed. However, no human sequence can exist in the mixed sequence, otherwise, the integration event is divided into two events.
全部scientific literature containing VISes was downloaded from PubMed, ScienceDirect, Google Scholar and Wiley with the authorization of the University of Texas, Health Science Center at Houston. We searched these data sources by using different combinations of the following keywords: virus integration, viral integration, integration, integration site, full name of virus, abbreviation of virus, etc. Articles recruited were referred to as an initial literature set. For example, articles related to HBV are retrieved by the following statements:
(( viral integration[Title/Abstract] or virus integration[Title/Abstract] ) AND (HBV[Title/Abstract] or hepatitis B virus[Title/Abstract]))
搜索the whole article to extract the objective information such as human genome, virus genome, experimental assay
Copy the text in literature with pdf format, save it to a text file and import it to Excel
Use string functions provided by Excel( such as concat, find, left, right and len)to extract or normalize data items
使用Excel提供的排序和替换功能来删除重复项或将行分组以加速数据汇编。
使用公共生物数据库策划诉讼
后提取V我sinformation from literature, we curate VIS with public biological database such as NCBI GenBank, KEGG, ENCODE, Genecards, RID, ONGene, TSGene, HumCFS, miTarBase, miRNA, etc.
靶基因的验证
验证基因附近VIS而不靶向任何基因
Annotatation of VISes with oncogene from Oncogene database
从TSGene数据库中用肿瘤抑制基因的伴侣注释
提取上游和下游序列
如果提供了两个位置或断点,则提取整合病毒序列和人类序列
从HUMCF中提取脆弱的位点,并将其与cistes相关联
从mitarbase和miRNA中提取miRNA,并将它们与cistes相关联
set links to GenBank, KEGG, ENCODE, Genecards, etc.
Use GRCh38/hg38 as the reference genome to visualize the integration event for literature-curated VIS(for imported VIS, we’ll navigate to the source database)
state VISes detected by NGS-related technology (no need for experimental assay)
在计算基因的距离时使用BP作为测量单位
使用毫米作为样本量的测量单位
Give each virus a default code if no specific virus reference genome is provided, and this code can’t link to any genome in Nucleotide database
Give all VISes without specific reference genome a default code that does not link to UCSC or NCBI.
进口景点
下载Vises和文学
RID (Retrovirus Integration Database, https://rid.ncifcrf.gov/) is a relational database containing information about retrovirus integration sites in host genomes and is sponsored by the HIV Dynamics and Replication Program (HIV DRP), National Cancer Institute, NIH. It collects about 4 million VISes from 18 papers of HIV, HTLV, MLV and ALV. Insert position on host chromosome, target genes or nearest genes and distance are presented. In addition, the locations in host human genome are mapped to hg19. However, the reference virus genome, the details of sample and experiment assay are not listed by the website, let alone the sequence around the integration site and the junction sites. To give a a more details about those VISes, we download some VISes as well as literatures from RID for further curation of VIS information that is not provide by RID.
After downloading VISes from RID and referencing to literature, we curated those VISes according to the original paper and public biological databases.
从原始纸中提取实验测定,并将其与VIS相关
从原始纸张中提取样品信息,并将其与相应的景点相关联
提取与样品有关的疾病信息,并将其与样品相关联
提取与该连接序列对齐的病毒参考基因组
补充没有基因ID的基因在RED中并通过官方符号正常化。
联系我们
We appreciate your feedback. Please send an Email if you wish to make a request, a comment, or report a bug.
Zhongming Zhao,博士,MS Chair Professor for Precision Health Professor of Biomedical Informatics and Human Genetics 生物医学信息学和公共卫生学院 Founding Director, Center for Precision Health director, UTHealth Cancer Genomics Core Beplay体育中心
电话:713-500-3631 电子邮件:zhongming.zhao@uth.tmc.edu
要在出版物中引用VISDB网站,请引用以下内容: Tang D, Li B, Xu T, Hu R, Tan D, Song X, Jia P, Zhao Z (2020) VISDB: a manually curated database of viral integration sites in the human genome. Nucleic Acids Research 48(D1):D633-D641 https://www.ncbi.nlm.nih.gov/pubmed/31598702