跳过内容
SBMI Horizontal Logo

Publications

For a full list of publications, visitDr. Cui Tao Google Scholar Profile

选定论文
  • 黑色素瘤检测中的问题:通过人工智能的结合,半佩比的深度学习算法开发

    Xinyuan Zhang,Ziqian Xie,Yang Xiang,Imran Baig,Mena Kozman,Carly Stender,Luca Giancardo,Cui Tao;Jmir Dermatol 2022; 5(4):E39113
    Funded by: This research was partially supported by UTHealth Innovation for Cancer Prevention Research Training Program Pre-doctoral Fellowship (Cancer Prevention and Research Institute of Texas Grant No. RP160015 and No. RP210042)

    Abstract

    Automatic skin lesion recognition has shown to be effective in increasing access to reliable dermatology evaluation; however, most existing algorithms rely solely on images. Many diagnostic rules, including the 3-point checklist, are not considered by artificial intelligence algorithms, which comprise human knowledge and reflect the diagnosis process of human experts. In this paper, we aimed to develop a semisupervised model that can not only integrate the dermoscopic features and scoring rule from the 3-point checklist but also automate the feature-annotation process. We first trained the semisupervised model on a small, annotated data set with disease and dermoscopic feature labels and tried to improve the classification accuracy by integrating the 3-point checklist using ranking loss function. We then used a large, unlabeled data set with only disease label to learn from the trained algorithm to automatically classify skin lesions and features. After adding the 3-point checklist to our model, its performance for melanoma classification improved from a mean of 0.8867 (SD 0.0191) to 0.8943 (SD 0.0115) under 5-fold cross-validation. The trained semisupervised model can automatically detect 3 dermoscopic features from the 3-point checklist, with best performances of 0.80 (area under the curve [AUC] 0.8380), 0.89 (AUC 0.9036), and 0.76 (AUC 0.8444), in some cases outperforming human annotators. Our proposed semisupervised learning framework can help with the automatic diagnosis of skin disease based on its ability to detect dermoscopic features and automate the label-annotation process. The framework can also help combine semantic knowledge with a computer algorithm to arrive at a more accurate and more interpretable diagnostic result, which can be applied to broader use cases.

    查看详细信息doi:10.2196/39113

  • 迈向模型卡报告的标准正式语义表示

    Muhammad Tuan Amith, Licong Cui, Degui Zhi, Kirk Roberts, Xiaoqian Jiang, Fang Li, Evan Yu & Cui Tao;BMC Bioinformatics 23 (Suppl 6), 281 (2022)
    Funded by: This research was partially supported by NIH award No. RF1AG072799.

    Abstract

    Model card reports aim to provide informative and transparent description of machine learning models to stakeholders. This report document is of interest to the National Institutes of Health’s Bridge2AI initiative to address the FAIR challenges with artificial intelligence-based machine learning models for biomedical research. We present our early undertaking in developing an ontology for capturing the conceptual-level information embedded in model card reports. Sourcing from existing ontologies and developing the core framework, we generated the Model Card Report Ontology. Our development efforts yielded an OWL2-based artifact that represents and formalizes model card report information. The current release of this ontology utilizes standard concepts and properties from OBO Foundry ontologies. Also, the software reasoner indicated no logical inconsistencies with the ontology. With sample model cards of machine learning models for bioinformatics research (HIV social networks and adverse outcome prediction for stent implantation), we showed the coverage and usefulness of our model in transforming static model card reports to a computable format for machine-based processing. The benefit of our work is that it utilizes expansive and standard terminologies and scientific rigor promoted by biomedical ontologists, as well as, generating an avenue to make model cards machine-readable using semantic web technology. Our future goal is to assess the veracity of our model and later expand the model to include additional concepts to address terminological gaps. We discuss tools and software that will utilize our ontology for potential …

    查看详细信息DOI 10.1186/s12859-022-04797-6

  • 阿尔茨海默氏病与身份潜在广告相关的语义三元的相关知识图的采矿以重新利用药物

    Yi Nian,Xinyue Hu,Rui Zhang,Jingna Feng,Jingcheng Du,Fang Li,Larry Bu,Yuji Zhang,Yong Chen&Cui Tao;BMC Bioinformatics 23 (Suppl 6), 407 (2022)

    Abstract

    到目前为止,还没有有效的治疗方法为金属氧化物半导体t neurodegenerative diseases. Knowledge graphs can provide comprehensive and semantic representation for heterogeneous data, and have been successfully leveraged in many biomedical applications including drug repurposing. Our objective is to construct a knowledge graph from literature to study the relations between Alzheimer’s disease (AD) and chemicals, drugs and dietary supplements in order to identify opportunities to prevent or delay neurodegenerative progression. We collected biomedical annotations and extracted their relations using SemRep via SemMedDB. We used both a BERT-based classifier and rule-based methods during data preprocessing to exclude noise while preserving most AD-related semantic triples. The 1,672,110 filtered triples were used to train with knowledge graph completion algorithms (i.e., TransE, DistMult, and ComplEx) to predict candidates that might be helpful for AD treatment or prevention. Among three knowledge graph completion models, TransE outperformed the other two (MR = 10.53, Hits@1 = 0.28). We leveraged the time-slicing technique to further evaluate the prediction results. We found supporting evidence for most highly ranked candidates predicted by our model which indicates that our approach can inform reliable new knowledge. This paper shows that our graph mining model can predict reliable new relationships between AD and other entities (i.e., dietary supplements, chemicals, and drugs). The knowledge graph constructed can facilitate data-driven knowledge discoveries and the generation of novel hypotheses.

    查看详细信息DOI 10.1186/s12859-022-04934-1

  • 使用多任务卷积神经网络了解Twitter对麻疹的公众看法

    塞缪尔·王(Samuel Wang),金尚·杜(Jingcheng du),卢唐(Lu Tang);卫生技术和信息学研究,2022年6月6日; 290:607-611。

    Abstract

    麻疹是幼儿通常出现的高热疾病的高度传染性原因。近年来见证了美国麻疹病例的复兴。迅速了解公众对麻疹的看法将使公共卫生机构能够及时及时做出适当反应。我们提出了一个多任务卷积神经网络(MT-CNN)模型,以根据三种特征来对与麻疹相关的推文进行分类:信息的类型(6个子类),表达的情绪(6个子类)和对疫苗接种的态度(3个亚类)(3个子类)。在这些维度中包含2,997条带有注释的推文的黄金标准语料库经过手动策划。将各种常规机器学习和深度学习模型评估为基线模型。MT-CNN模型的性能优于其他基线常规机器学习和信号任务CNN模型,然后应用于预测从2007年至2019年拖延的无标记麻疹相关的Twitter讨论,并沿着公众看法的趋势进行了分析三个维度。

    查看详细信息doi:10.3233/shti220149

  • Application of artificial intelligence and machine learning for HIV prevention interventions

    Yang Xiang,Jingcheng DU,Kayo Fujimoto,Fang Li,John Schneider,Cui Tao;柳叶刀艾滋病毒,2021年11月8日
    项目:使用大数据和深入学习预测MSM人群中的HIV传播风险
    Funded by: NIH award 1R56AI150272-01A1

    Abstract

    2019年,美国政府宣布了在10年内结束艾滋病毒流行的目标,这反映了UNAIDS规定的倡议。公共卫生预防干预措施是这一雄心勃勃的目标的关键部分。但是,这一目标存在许多挑战,包括提高艾滋病毒意识,增加早期的艾滋病毒感染检测,确保快速治疗,优化资源分配以及为弱势群体提供有效的预防服务。人工智能在革新医疗保健方面发挥了关键作用,并在制定有效的HIV预防干预策略方面表现出了巨大潜力。尽管人工智能已用于一些艾滋病毒预防干预区域,但仍有挑战需要解决和探索机会。

    查看详细信息doi 10.1016/s2352-3018(21)00247-2

  • COVID-19 trial graph: a linked graph for COVID-19 clinical trials

    Jingcheng Du,Qing Wang,Jingqi Wang,Prera​​na Ramesh,Yang Xiang,Xiaoqian Jiang,Cui Tao;Journal of the American Medical Informatics Association, Volume 28, Issue 9, September 2021, Pages 1964–1969
    Funded by: This research was partially supported by NIH award Nos. R56AI150272 and R01AI130460

    Abstract

    临床试验是一个重要的努力的一部分t to find safe and effective prevention and treatment for COVID-19. Given the rapid growth of COVID-19 clinical trials, there is an urgent need for a better clinical trial information retrieval tool that supports searching by specifying criteria, including both eligibility criteria and structured trial information. We built a linked graph for registered COVID-19 clinical trials: the COVID-19 Trial Graph, to facilitate retrieval of clinical trials. Natural language processing tools were leveraged to extract and normalize the clinical trial information from both their eligibility criteria free texts and structured information from ClinicalTrials.gov. We linked the extracted data using the COVID-19 Trial Graph and imported it to a graph database, which supports both querying and visualization. We evaluated trial graph using case queries and graph embedding. The graph currently (as of October 5, 2020) contains 3392 registered COVID-19 clinical trials, with 17 480 nodes and 65 236 relationships. Manual evaluation of case queries found high precision and recall scores on retrieving relevant clinical trials searching from both eligibility criteria and trial-structured information. We observed clustering in clinical trials via graph embedding, which also showed superiority over the baseline (0.870 vs 0.820) in evaluating whether a trial can complete its recruitment successfully. The COVID-19 Trial Graph is a novel representation of clinical trials that allows diverse search queries and provides a graph-based visualization of COVID-19 clinical trials. High-dimensional vectors mapped by graph embedding for clinical trials would be potentially beneficial for many downstream applications, such as trial end recruitment status prediction and trial similarity comparison. Our methodology also is generalizable to other clinical trials.

    查看详细信息DOI 10.1093/jamia/ocab078

  • 使用深度学习从疫苗不良事件报告系统(VAERS)中的安全报告中提取市场后事件

    Jingcheng Du,Yang Xiang,Madhuri Sankaranarayanapillai,Meng Zhang,Jingqi Wang,Yuqi SI,Huy Anh Pham,Hua Xu,Yong Chen,Yong Chen,Cui Tao;美国医学信息学协会杂志,第28卷,第7期,2021年7月,第1393–1400页
    项目:使用VAERS中的时间信息进行疫苗事件事件预测的动态学习
    资助者:这项研究由NIH资助,根beplay苹果手机能用吗据R01AI130460和R01LM011829

    Abstract

    疫苗后市场监测叙事报告的自动分析对于了解罕见但严重的疫苗不良事件(AES)的进展至关重要。这项研究实施并评估了最先进的深度学习算法,用于从疫苗安全报告中提取与神经系统障碍有关的事件的指定实体识别。我们从1990年至2016年收集了来自疫苗不良事件报告系统(VAERS)的Guillain-Barré综合征(GBS)相关的流感疫苗安全报告。选择了VAERS报告并手动注释了与神经系统疾病有关的主要实体,包括神经疾病,神经_AE,神经_AE,,其他_ae,过程,social_circumstance和temulal_expression。然后,评估了各种常规机器学习和深度学习算法的提取,以提取上述实体。我们使用VAERS报告(VAERS BERT)进一步鉴定了特定的域特异性BERT(来自变形金刚的双向编码器表示),并将其性能与现有模型进行了比较。注释了91个VAERS报告,产生了2512个实体。该语料库被公开用于促进社区疫苗AES识别的努力。基于深度学习的方法(例如,两次短期记忆和BERT模型)优于常规机器学习方法(即具有广泛特征的条件随机字段)。Biobert大型模型在神经_AE,过程,social_circumstance和temulal_expression上获得了最高的精确匹配f-1分数;而Vaers Bert大型模型在调查和其他_AE上获得了最高的匹配F-1分数。 An ensemble of these 2 models achieved the highest exact match microaveraged F-1 score at 0.6802 and the second highest lenient match microaveraged F-1 score at 0.8078 among peer models.

    查看详细信息doi 10.1093/jamia/ocab014

会议记录
  • Chemical-Protein Relation Extraction with Pre-trained Prompt Tuning

    Jianping He, Fang Li, Xinyue Hu, Jianfu Li, Yi Nian, Jingqi Wang, Yang Xiang, Qiang Wei, Hua Xu, Cui Tao;2022 IEEE 10th International Conference on Healthcare Informatics (ICHI)

    Abstract

    生物医学关系提取在高质量知识图和数据库的构建中起着至关重要的作用,这可以进一步支持许多下游应用程序。作为新的范式,预训练的及时调整在许多自然语言处理(NLP)任务中显示出巨大的潜力。通过将一段文本插入原始输入中,提示将NLP任务转换为掩盖的语言问题,可以通过预先训练的语言模型(PLM)更好地解决这些问题。在这项研究中,我们使用Biocreative VI ChemProt数据集将预训练的迅速调整应用于化学蛋白关系提取。实验结果表明,预先训练的及时调整在化学蛋白相互作用分类中的基线方法优于基线方法。我们得出的结论是,迅速调整可以提高PLM在化学蛋白关系提取任务上的效率。

    查看详细信息doi:10.1109/ICHI54592.2022.00120

Baidu