中国药物警戒 ›› 2025, Vol. 22 ›› Issue (9): 1040-1044.
DOI: 10.19803/j.1672-8629.20250291

• 安全与合理用药 • 上一篇    下一篇

基于生成式人工智能大语言模型的文献来源药品不良反应报告自动生成平台研究与应用

刘芹1, 章嘉一1, 吴烜1,*, SOANES Nigel2   

  1. 1阿斯利康投资(中国)有限公司,上海 201203;
    2ASTRAZENECA UK LIMITED, Cambridge CB2 0AA, United Kingdom
  • 收稿日期:2025-05-09 发布日期:2025-09-22
  • 通讯作者: *吴烜,男,硕士,药物警戒。E-mail:henry.wu1@astrazeneca.com
  • 作者简介:刘芹,女,硕士,药物警戒。

An Automatic Generation Platform for Adverse Drug Reaction Reports from Literature Sources Based on the Generative Artificial Intelligence Large Language Model

LIU Qin1, ZHANG Jiayi1, WU Xuan1,*, SOANES Nigel2   

  1. 1AstraZeneca Investment (China) Ltd., Shanghai 201203, China;
    2ASTRAZENECA UK LIMITED, Cambridge CB2 0AA, United Kingdom
  • Received:2025-05-09 Published:2025-09-22

摘要: 目的 利用生成式人工智能技术,探索文献来源药品不良反应报告的自动化生成方法,以提升报告处理效率与准确性。方法 基于行业内成熟的基础大语言模型及表格解析、OCR识别等前沿技术,从由中国知网、万方数据中下载的约700篇中文文献中提取药品不良反应报告基本要素,标注数据训练集(6 925条)进行不良反应报告生成场景训练,形成能理解学术文献并自动生成不良反应报告的定制化模型。平台通过多轮算法迭代优化模型性能,结合人工复核,采用人工智能3个算法性能指标数值(召回率、精确度、F1值)和报告处理前后耗时对比来评估研究结果。结果 经过4轮算法迭代,定制化模型的召回率、准确率和F1值分别达到97.1%、90.1%和93.5%,均满足设定标准。平均报告处理时长由传统人工80 min缩短至45 min,效率提升77.8%。结论 生成式人工智能为文献来源药品不良反应报告的智能识别与高效生成,对药物警戒自动化、智能化转型提供了新工具参考。本平台仍面临无法处理复杂排版及异常数据等局限,需持续优化数据集与算法,并严格遵循伦理与监管要求,确保数据合规。

关键词: 药品不良反应, 个例安全性报告, 生成式人工智能, 大语言模型, 机器学习

Abstract: Objective To leverage generative artificial intelligence to explore how to automate the generation of adverse drug reaction (ADR) reports from medical literature in order to enhance the efficiency and accuracy of the report processing workflow. Methods Based on the mature basic large language model in this industry, table parsing, OCR recognition and other advanced technologies, and using the annotated data training set (6 925 pieces) retrieved from about 700 published articles in Chinese downloaded from CNKI and WanFang Database, related training was carried out under an ADR report generation-specific scenario. A model was established that could understand academic publications and automatically generate individual case safety reports (ICSRs). This platform was optimized through multiple rounds of algorithmic iterations and assisted by manual review. The research results were evaluated by comparing the values of three AI algorithm performance indicators (recall rate, precision, F1 value) and the report processing time. Results After four rounds of algorithm iterations, the recall rate, precision and F1 value of this generative AI model for ADR reports reached 97.1%, 90.1% and 93.5% respectively, all meeting the project’s acceptance standards. The processing time for ICSRs was reduced from 80 minutes taken by conventional manual methods to 45 minutes, which was boosted by 77.8%. Conclusion Generative artificial intelligence offers new tools for the intelligent identification and efficient generation of ADR reports sourced from literature and plays a significant role in driving the automation and intelligent transformation of pharmacovigilance. However, this platform has such limitations as the complexity of formatting and abnormal data. Constant optimization of datasets and algorithms is required, along with strict adherence to ethical and regulatory requirements to ensure data compliance.

Key words: Adverse Drug Reactions, Individual Case Safety Reports, Generative Artificial Intelligence, Large Language Models, Machine Learning

中图分类号: