中国药物警戒 ›› 2023, Vol. 20 ›› Issue (6): 639-645.
DOI: 10.19803/j.1672-8629.20220645

• 上市后药品不良反应监测定量统计学方法研究专栏 • 上一篇    下一篇

机器学习方法在FAERS布加替尼不良反应信号检测中的应用

陈枭, 郭晓晶, 许金芳, 韦连慧, 陈晨鑫, 梁际洲, 郑轶, 叶小飞*   

  1. 海军军医大学卫生统计学教研室,上海 200433
  • 收稿日期:2022-11-09 出版日期:2023-06-15 发布日期:2023-06-15
  • 通讯作者: * 叶小飞,男,博士,副教授,药物流行病学。E-mail: yexiaofei@smmu.edu.cn
  • 作者简介:陈枭,男,在读硕士,药物流行病学。
  • 基金资助:
    国家自然科学基金资助项目(82073671); 上海市公共卫生体系建设三年行动计划重点学科建设项目大数据与人工智能应用(GWV-10.1-XK05); 上海市卫计委优秀青年医学人才培养计划(2018YQ47); 上海市公共卫生学科带头人(GWV-10.2-XD22); 上海市公共卫生优青计划(GWV-10.2-YQ33); 中国药学会药物临床评价研究专业委员会研究课题(CPA-CDCER-2021-001); 军队双重建设项目-03

Machine learning method in the detection of adverse drug reaction signals of brigatinib based on FAERS database

CHEN Xiao, GUO Xiaojing, XU Jinfang, WEI Lianhui, CHEN Chenxin, LIANG Jizhou, ZHENG Yi, YE Xiaofei*   

  1. Department of Health Statistics, Naval Medical University, Shanghai 200433, China
  • Received:2022-11-09 Online:2023-06-15 Published:2023-06-15

摘要: 目的 评估机器学习算法在检测布加替尼(Brigatinib)药品不良反应信号中的性能。方法 使用美国食品药品监督管理局(FDA)不良反应事件报告系统(FAERS)2017年4月1日至2022年3月31日收集的布加替尼药品不良反应信号数据开展研究。首先为研究药品构建一个输入数据集,包括药品标签上列出的已知不良反应和未知不良反应。对于已知不良反应,训练4种机器学习算法,并通过曲线下面积(AUC)评估机器学习算法[随机森林(random forest, RF)、极限梯度提升机(extreme gradient boosting)、Logistic回归和邻近算法(k-NearestNeighbor, kNN)]与传统不相称分析方法[报告优势比(ROR)]和信息成分(IC)]进行比较。结果 kNN算法具有最高的AUC,平均值0.875,其余方法中Logistic回归(0.852),XGBoost(0.722),RF(0.662)和DPA(0.548)。在未知不良反应数据集中,以kNN算法建立的机器学习模型检出6个额外信号(占15.8%),传统不相称方法检出4个额外信号(占10.5%)。结论 以kNN算法建立的机器学习模型比传统DPA方法在药品不良反应信号检测方面性能更佳。

关键词: 不相称分析, 布加替尼, 信号检测, 药品不良反应, 机器学习算法, 美国食品药品监督管理不良反应事件报告系统

Abstract: Objective To evaluate the performance of machine learning algorithms in detecting signals of adverse drug reactions (ADR) of brigatinib. Methods Data on signals of adverse drug reaction of brigatinib retrieved from the FDA FAERS from April 1, 2017 to Match 31, 2022 was used. An input dataset was constructed for the drug to be studied, including known ADR listed in drug labels and unknown ADR. For the known ADR, four machine learning algorithms (Random Forest, XGBoost, Logistic Regression, and kNN) were trained and evaluated by the area under the curve (AUC) to compare the machine learning algorithms with traditional disproportionality analysis method involving the reporting odds ratio (ROR) and information component (IC) . Results Among these methods, the kNN algorithm had the largest AUC with an average value of 0.875, followed by Logistic regression (0.852), XGBoost (0.722), random forest (0.662) and DPA (0.548). In the unknown ADR datasets, the machine learning model established by the kNN algorithm detected 6 additional signals (15.8%), compared with 4 additional signals (10.5%) by the DPA method. Conclusion The machine learning model established by the kNN algorithm has better performance than the traditional DPA method in detecting ADR signals.

Key words: disproportionality analysis, brigatinib, signal detection, adverse drug reaction, machine learning algorithm, FAERS

中图分类号: