The article below is written by Liang Peng, AI expert and Deputy Minister of Evaluation Department I of CMDE (Center for Medical Device Evaluation) and Lei Sun, CMDE’s director, and translated by China Med Device, LLC.
Also, see our technical analysis on AI-aided Software Guideline. The article was published on BioWorld, a Hong Kong-based biotech magazine.
After “Key Points of Deep Learning Aided Decision-Making Medical Device Software” issued in July 2019, over twenty pieces of Class III AI software have been approved.
AI NMPA CMDE
AI medical devices can be divided into artificial intelligence independent software (software as medical device, SaMD) and artificial intelligence software components (software in medical device, SiMD). The quality control principles of the software life cycle process are the same, so the regulatory requirements are basically the same.
Different countries have different national conditions, and the scope, mode, resources, and conditions of medical device supervision are different. Therefore, international supervision experience can be used for reference, but it cannot be simply copied.
For example, the US FDA is developing a “predetermined change control plan” to control the update of AI-independent software, which will be extended to AI software components when mature. The core idea is to cancel the original “algorithm lock” requirement, and manufacturers can Software updates are performed under the US FDA-approved Software Scheduled Update Program without re-registration.
Since the content of the scheduled software update plan may involve major software updates, and it is necessary to apply for change registration in accordance with the requirements of current laws and regulations in my country, the “Scheduled Change Control Plan” has legal conflicts in my country, and it is difficult to fully implement it.
For another example, the US FDA is piloting the “software pre-certification” project, trying to change the independent software supervision model from product-based to manufacturer-based quality and organizational excellence (Culture of Quality and Organizational Excellence, CQOE), which is also applicable to artificial intelligence independent software, which will be extended to software components later.
Although “software pre-certification” has reference value in optimizing the product listing process and other aspects, it is similar to the canceled inspection-free product project in my country from the perspective of product listing, which is not suitable for the current national conditions.
Whether independent software is managed as a medical device is usually judged based on its intended use and core functions, and the category is mainly judged based on the risk level.
The risk level can be refined from the two dimensions of intended use and algorithm maturity. Among them, the intended use can be divided into assisted decision-making and non-assisted decision-making. The former provides medical decision-making advice, the latter provides medical reference information, and the former has higher risks than the latter; algorithm maturity can be divided into mature algorithms and new algorithms, the former refers to algorithm security.
The performance and effectiveness have been fully proven in medical applications, the latter means that the algorithm has not been marketed or its safety and effectiveness have not been fully proven in medical applications, and the latter has more potential risks than the former.
If the new algorithm is used to assist decision-making, it will be managed as a third-class medical device, and if it is used for non-assisted decision-making, it will be managed as a second-class medical device; the mature algorithm remains unchanged regardless of the intended use of the management category to ensure the continuity of supervision.
The technical review of artificial intelligence medical devices must consider not only the requirements of the guiding principles of artificial intelligence medical devices, but also the requirements of relevant guiding principles such as digital medical care, including but not limited to medical device software, medical device network security, medical device human factors design, mobile Guidelines for naming common names of medical devices, clinical evaluation of medical devices, and medical software.
The technical review mainly combines algorithm characteristics and product characteristics, comprehensively weighs risks and benefits, and systematically evaluates safety and effectiveness. The characteristics of the algorithms are different, and the evaluation focus is also different.
For example, the interpretability of the black-box algorithm is inferior to that of the white-box algorithm, and it is necessary to pay attention to the improvement of its interpretability; the data labeling requirements of supervised learning are higher than that of unsupervised learning, and it is necessary to pay attention to the quality of data labeling. The data-based algorithm has higher requirements on the amount of training data than the model-based algorithm, so it is necessary to pay attention to its data quality control problem.
The expected use and usage scenarios of the product are different, and even if the same algorithm is used, the product characteristics are different, and the evaluation is also focused. Risks mainly focus on algorithmic risks such as overfitting and underfitting, and medical decision-making risks such as false negatives and false positives. Imported products also need to consider the risks of differences between China and foreign countries.
The system evaluation needs to combine the results of algorithm training, algorithm performance evaluation, clinical evaluation and other results to standardize and restrict the scope of application, usage scenarios, and core functions of the product. For products that have been developed in the early stage and do not meet the requirements, gap analysis and remedial measures are allowed.
In terms of algorithm update control, algorithm update is divided into algorithm-driven update and data-driven update, and the requirements are distinguished. The former refers to the substantial change or retraining of the algorithm, which is a major software update and needs to apply for change registration; the latter refers to The algorithm update that occurs only due to the increase in the amount of training data, if there is a statistical difference between the algorithm performance evaluation result and the previous registration, it is a major software update and needs to apply for change registration, otherwise it is a minor software update, no need to apply for change registration, pass.
The quality management system is controlled, and the corresponding registration application materials are submitted when the registration is changed next time, that is, there is no need for “algorithm locking”. At the same time, the algorithm update control is carried out through the software version naming rules. The software version naming rules need to cover algorithm-driven updates and data-driven updates, and list the common typical situations of major algorithm updates. But implemented earlier.
In terms of ensuring the generalization ability of the algorithm, the training data should be combined with the epidemiological characteristics of the target disease, as far as possible from multiple, multi-regional, and multi-level representative clinical institutions, and representative collection of multiple, multiple, and multi-parameters equipment, thereby improving the adequacy and diversity of data, and ensuring the generalization ability of the algorithm from the source. Algorithm training needs to provide evidence such as training data volume-evaluation index curve, and continuously monitor the generalization ability of the algorithm.
The test set used for algorithm verification needs to be different from the training set, in order to objectively evaluate the generalization ability of the algorithm, and the generalization ability of the algorithm can be evaluated in depth by combining stress testing and adversarial testing. For algorithm validation, it is necessary to ensure that the clinical evaluation data set is different from the training data set, the number of institutions is as large as possible, and the geographical distribution is as wide as possible, so as to comprehensively evaluate the generalization ability of the algorithm. After listing, it is also necessary to continue to carry out research on algorithm generalization capabilities in the real world.
In terms of improving the interpretability of the black-box algorithm, the algorithm design needs to analyze the factors affecting the algorithm performance of the black-box algorithm, study the main factors affecting the performance of the algorithm and their degree of influence, and clarify the product use restrictions according to the analysis results, and warn in the manual. and hints to improve algorithm interpretability. At the same time, clarify the quality control requirements of the algorithm development life cycle process to improve the transparency of the algorithm. In addition, it is recommended to establish linkages with existing medical knowledge to further improve algorithm interpretability.
The system verification of artificial intelligence medical devices needs to be based on the quality management standards for medical device production, independent software appendices (software components are implemented by reference, including network security) and its on-site inspection guidelines, and can refer to the relevant requirements of the guidelines for artificial intelligence medical devices.
Supervised deep learning as an example clarifies the quality control requirements of artificial intelligence medical device life cycle process, covering demand analysis, data collection, algorithm design, verification and validation, and update control stages.
Data quality control is very important to ensure product quality, especially data-based algorithms. Therefore, efforts are made to standardize data quality control requirements, clarify and refine quality control requirements for data collection, data organization, data labeling, data set construction, etc., covering personnel , process, results, etc.
Algorithm update quality control is the focus of system verification, especially for data-driven updates, because minor software updates are mainly controlled through the quality management system. Taking the match between the algorithm update and the software version naming rule as the starting point will be the basic method for checking the algorithm update system.
As an important method of algorithm quality assurance, algorithm traceability analysis is also the focus of system verification. It is necessary to trace the relationship between algorithm requirements, algorithm design, algorithm implementation (source code), algorithm testing, and algorithm risk management. Algorithm updates also require algorithm traceability analysis.