Software in Medical Devices, by MD101 Consulting

To content | To menu | To search


Team-NB questionnaire on artificial intelligence in medical devices

The team-NB published a questionnaire on artificial intelligence in medical devices in November 2024. This questionnaire is simply a (very long) list of questions that a Notified Body may take one by one, when they review a technical file of a medical device containing AI.

We’ve seen in a previous post how the “monster” FDA guidance on AI tells manufacturer how they should design, validate and monitor such medical device. The FDA’s approach is: “You should do it this way”.
The Team-NB’s AI check list takes an opposite approach. They don’t tell how manufacturer should do things, but check if things are done. The Team-NB’s approach is then: “Show me how you did it”.

Both approaches are equivalent; at the very end, the FDA will also review what the manufacturer did. But the FDA’s approach is more didactic, by setting a common ground on how to do things.
Following the FDA guidance on IA MD allows in some way to answer to the Team-NB’s checklist!

Validation or Validation?

As a preamble, we'd like to warn the reader about the word "validation".
The checklist uses the word validation with the meaning of AI validation, after training.

E.g., with this question:
6.1.4 Input for risk management and clinical evaluation
14. Does the manufacturer assess the risks related to splitting the data into training, validation and test data?

The words shall be understood like this:

  • Training: is a part of AI model design (no confusion here),
  • Validation: is AI model validation and is a part of AI model design,
  • Test: is software testing phase, part of device verification.


Thus, don't misuse the word validation. To clarify it, it shall be accompanied with a context:

  • AI validation, part of AI model design,
  • Device validation, part of classical V&V.

Note: this question could be misleading:
6.4.4 Documentation
2. Can the manufacturer reproduce the test and validation results?
As this question references ISO 13485 7.3.6, we know that with are in verification, not device validation.

Overview

The Team-NB checklist is quite a long list of questions. It covers the medical device lifecycle with:

  • General Requirements: 7 questions
  • Intended use and stakeholder requirements: 33 questions
  • Software requirements: 24 questions
  • Data management: 38 questions
  • Model development: 30 questions
  • Product Development: 35 questions (including 9 in clinical evaluation)
  • Product release: 4 questions
  • Requirements for the post development phase: 18 questions

A total of 189 questions!
Questions that add up to other long checklists on the rest of Technical Files. This is a typical approach of Notified Bodies. Good luck manufacturers, if you want to reduce the burden and mark some questions irrelevant.

Regulations, guides and standards

Since there is no harmonized standard specialized on AI lifecycle, Team-NB cannot do otherwise than quoting IEC 62304, ISO 14971, and the like.

To be more precise, here is the list of regulations, guides and standards referenced by the checklist:

Reference Comment
Regulations:
Regulation (EU) 2017/745 MDR
Regulation (EU) 2017/746 IVDR
Regulation (EU) 2016/679 GDPR
Regulation (EU) 2021/2226 e-IFU
MDR and IVDR are frequently quoted (no surprise),
GDPR is quoted once on patient health data privacy,
e-IFU is quoted once on availability on latest version.
Guidances:
MDCG 2019-16
MDCG 2020-1
MEDDEV 2.7/1 revision 4
These guides are sparsely quoted in relevant questions about cybersecurity and clinical evaluation
Medical device standards:
EN ISO 13485
EN ISO 14971
EN ISO 14155
EN/ISO 20417
EN 62366-1
These standards are without surprise frequently quoted in the checklist. Excepted ISO 14155 a few on questions related to good clinical practices, and ISO 20417 once on accompanying documents.
Medical device standards specific to software:
EN 60601-1
EN 62304
EN 82304-1
EN/IEC 80001-1
IEC 81001-5-1
Not surprising: IEC 62304 and IEC 82304-1 are quoted a lot.
IEC 60601-1 (only clause 14 is specific to software) is quoted once, on its clause 4.4 about expected service life. Why this clause? Because this is the only text, which defines the expected service life (try searching for lifetime or shelf life in regulations!).
Cybersecurity standards are quoted on a few questions on AI security risks.
Medical device standards specific to software and AI:
BS/AAMI 34971
Here we feel a bit alone. The only medical device standard on AI is this AAMI 34971 on AI risks.
This is a good start. We’ll see that some other standards are to come.
Standards specific to AI, but not specific to medical devices:
ISO/IEC 23894
ISO/IEC 5259-2
ISO/IEC 5259-3
ISO/IEC 5259-4
ISO/IEC 5338
ISO/IEC 5339
ISO/IEC TR 24027
ISO/IEC TR 24028
ISO/IEC TS 4213
We see here the biggest limitation of this checklist: In the absence of medical device standards, the checklist has to summon standards coming from other industries.
Even if this looks like a good idea at the beginning, this leaves an impression of uncompleted work. See the discussion below.

General software standards
ISO/IEC 25010
ISO/IEC/IEEE 12207
Why quoting these general software standards?
Especially why quoting IEEE 12207 only once on its clause about SW architecture, and not on other clauses? Each time IEC 62304 or IEC 82304-1 is quoted, IEEE 12207 could be quoted on its equivalent clause.
Likewise, why quoting ISO 25010 on SQuaRE only, and not other standards on SQuaRE method?
This looks like cherry-picking some good practices from other industries. Why these ones and not other ones ? This leaves once again an impression of uncompleted work.

If you don't know standards on AI quoted above, here are their titles, and some comments:

Standard Title Comment
ISO/IEC 23894 Information technology — Artificial intelligence — Risk management This standard appears twice, and especially in this question:
8. Has the manufacturer identified and evaluated the gaps between ISO/IEC 23894 and EN ISO 14971 in his risk management documentation considering the device under assessment?
Is it really necessary? ISO 23894 isn't an appropriate standard for safety. The definition of risk is the one from ISO 31000: ''The effect of uncertainty on objectives''
We'd better rely on ISO 14971, AAMI 34971 and ignore ISO 23894.
ISO/IEC 5259-2 Artificial Intelligence — Data quality for analytics and machine learning (ML) — Part 2: Data quality measures This standard appears only once as supplementary reference.
This standard is an interesting one for measuring data quality and data consistency.
ISO/IEC 5259-3 Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 3: Data quality management requirements and guidelines This standard appears 7 times as supplementary reference. It is an interesting one on data quality management and setting up a data management process.
ISO/IEC 5259-4 Artificial intelligence — Data quality for analytics and machine learning (ML) — Part 4: Data quality process framework This standard appears 4 times as supplementary reference. It is an interesting one as well on data quality management and setting up a data management process.
ISO/IEC 5338 Information technology — Artificial intelligence — AI system life cycle processes This standard appears 6 times as supplementary reference. IEC 5338 supplements IEC 12207 on artificial intelligence. Maybe you know IEC 12207: it is referenced by IEC 62304 as supplemental source of information.
ISO/IEC 5339 Information technology — Artificial intelligence — Guidance for Al applications This standard appears 3 times as supplementary reference. IEC 5339 supplements IEC 15288 (itself referenced by IEC 12207) on artificial intelligence.
ISO/IEC TR 24027 Information technology — Artificial intelligence (AI) — Bias in Al systems and Al aided decision making This standard appears twice, and especially in this question:
5. Does the manufacturer examine the data sets that predicted particularly well and those that predicted particularly poorly?
This technical report is an interesting one, for methods for assessment of bias and fairness
Note: another standard about bias is quoted in references at the bottom of the document: ISO/IEC TS 12791 on Treatment of unwanted bias in classification and regression machine learning tasks.
ISO/IEC TR 24028 Information technology — Artificial intelligence — Overview of trustworthiness in artificial intelligence This standard appears three times, and especially for its clause 8.8.2.23 on this question:
4. Does the manufacturer identify means to reduce the risk of training procedure related effects such as overfitting?
This technical report contains lots of information. Maybe to read but not to follow by the book.
ISO/IEC TS 4213 Information technology — Artificial intelligence — Assessment of machine learning classification performance This standard appears several times, on data collection and AI model evaluation.
To read when the AI model is a classifier.


That's 11 new standards, which scope isn't medical devices! Don't buy them too quickly.
First, they're mostly quoted as supplemental source of information. Second, some MD specific standards on AI are being cooked and should be published soon. We'll see that in a further post.

The Team-NB's policy was probably to reference existing standards outside MD scope, to justify the questions and to direct the reader towards a way to bring answers.

Other remarks

Questions without reference

23 questions are left without reference, or with supplementary reference only. This raises the question: are these questions legitimate?
Obviously yes, when you read them. The absence of reference just reminds us the pressing need of having standards or guidance on AI in MD.

IEC 62304

Our good old IEC 62304 is quoted 37 times. That's a bit surprising for a standard written before the emergence of modern AI.
In the absence of MD-specific standards, we still have to rely on IEC 62304. Needless to say that IEC 62304 isn't adapted to AI. Its requirements need an extensive interpretation when they are applied to AI model lifecycle.

Performative prediction

The following question addresses a phenomenon involving software users. Its note is very interesting:
6.1.4 Input for risk management and clinical evaluation
16. Does the manufacturer assess the risks by making predictions that change the predicted outcomes themselves, if applicable?
Note: This phenomenon applies, when the model switches from being an observer to an actor. It is referred to as "performative prediction" (...).

Such concern isn't specific to AI. A classical algorithm could produce the same effects with unaware users. They could see software as an "oracle" they blindly trust.
As we've seen in the comments on the FDA guidance on MD with AI, such concerns aren't specific to AI, but are exacerbated by AI.

Data representative of target population

The following question addresses the representativeness of data:
6.3.1 Collection of the training, validation and test data sets
6. Does the manufacturer justify where it collects e.g. training, test and validation data and why it is representative of the target population? Where appropriate, has the manufacturer compared these with data from the Federal Statistical Office, scientific publications and registries?

Such concern is actually specific to AI during training and AI validation. However, data representativeness is also a concern in device validation for classical algorithms. Like the previous question, this phenomenon is pointed by the FDA guidance. Such concerns aren't totally specific to AI, but are exacerbated by AI.
For those who don't know what the Federal Statistical Office is, you'll find this institution in Germany. We see here the parenthood of this checklist with German Notified Bodies' work. The working group overlooked this reference to a local institution, when the Team-NB document was written. You can disregard and replace it by other relevant local institution.

In-field self-learning AI models

The Team-NB reminds in section 4 that in-field self-learning AI models aren't "certifiable" unless the manufacturer takes measures to ensure the safe operation of the device within the scope of the validation described in the technical documentation.
Needless to say that nobody ever tried to certify such AI model (at the time this blog post is written). It is already challenging to have a Predeterminded Change Control Plan (PCCP) accepted by the FDA. An in-field self-learning AI model is one step beyond complexity.

QMS

The section 4 also mentions that standard operating procedures (SOP) are affected by the AI lifecycle. The FDA guidance also mentions that in its section named QMS.
The Team-NB document explicitly mentions the Data Management Process. But with Customer Property in parentheses. However, data management process isn't limited to customer property. Implementing a data management process can be a way to answer to the 38 questions on data management found in the checklist.

Uncompleted work

Yes, this work isn't completed. Not in its questions, but in its references. As we said, lots of standards outside MD were summoned, in the absence of AI in MD standards. The writers of this checklist couldn't do otherwise.
Likewise, no reference is made to the AI Act. A regulation too new to allow its complete digestion by Notified Bodies.
This is exactly what the working group wrote in the V1.1 of this document, stating that it shall be updated to take into account AI Act requirements, Once European Standards are available.

On the QMS side, as of today, we only have ISO 42001 to implement an AI-aware QMS. But ISO 42001 is not considered adequate for presumption of conformity under the AI Regulation and cannot be considered as the state-of-the-art. Thus, we have to wait for new AI MD-specific standards, for product standards and for QMS standards.

That leaves the reader in an uncomfortable situation: should I read and buy these standards? Should I wait for MD specific standards?
We cannot answer to this question right now. Some of these new standards may or may not reference the ones currently quoted in this document.
But if a manufacturer is requested by an Notified Body to fill-in this checklist, they will be left with no choice but following at least the clauses referenced in these standards, if they don't have strong alternative references.

This situation is the result of the AI technological breakthrough in MD, leaving regulators, certification bodies, and standardization committees behind. Standards take time to write. In the absence of standards, the Team-NB just tries to fill the gap with this checklist.

Note: ISO 42001, despite lacking requirements on design and made to manage risks according to ISO 31000 is still a good start to define new AI-aware SOPs. A white paper on the impact of ISO 42001 on an ISO 13485-compliant QMS is available here.

Conclusion

This checklist is quite long, with 190 questions. It will also cost a lot of man-hours to fill-in for manufacturers. Imagine you need half an hour per question, that makes 95 hours, or approx. 12 days (8 hours per day). This checklist isn't lowering the already exagerated costs of CE marking.

One solution would be to add it to reference documents in a technical file, without following it by the book. However, that solution may not be accepted by some Notified Bodies.



Add a comment

Comments can be formatted using a simple wiki syntax.

They posted on the same topic

Trackback URL : https://blog.cm-dm.com/trackback/294

This post's comments feed