Technical documentation under Annex IV of the EU AI Act is not optional for high-risk AI systems — it is a legal prerequisite for deployment. Market surveillance authorities will request it. Notified bodies will audit it. And it must be kept current for the entire operational life of the system.

This checklist covers all eight required sections of Annex IV documentation, what regulators look for in each, the most common gaps companies leave, and how to maintain documentation as your systems evolve. Use it alongside our risk classification guide to confirm which of your systems require full Annex IV compliance.

Who Needs Annex IV Documentation

Any provider of a high-risk AI system — defined as an organization that develops an AI system and places it on the EU market or puts it into service under its own name or trademark — must maintain Annex IV documentation.

Deployers (organizations that use a high-risk AI system for their own purposes) are not directly required to produce Annex IV documentation, but they must be able to obtain it from providers and incorporate it into their own conformity assessment process. If you are deploying a third-party high-risk AI system, request the provider’s Annex IV documentation immediately — its absence is a red flag about your vendor’s own compliance status.

The 8 Required Sections of Annex IV Documentation

Section 1: General Description of the AI System

This section establishes what the system does, who it is for, and how it fits into its operating environment. Required content includes:

Intended purpose: A precise description of the task the AI system performs and the context in which it will be used. “Intended purpose” in EU AI Act terms has a specific legal meaning — it is what the provider has designed and marketed the system to do. Ambiguous descriptions invite regulatory scrutiny.
System version and version history: The specific version being documented, plus a log of material changes across versions.
Hardware and software requirements: The infrastructure the system runs on, including processing requirements, operating systems, and dependencies.
Interaction with other systems: How the AI system connects to other software, databases, or processes — including data inputs and outputs.
Instructions for use: What users and operators need to know to use the system correctly, including any known limitations or conditions under which accuracy degrades.

What regulators look for here: Whether the intended purpose is defined narrowly enough to be meaningful, and whether the instructions for use honestly reflect system limitations rather than marketing language.

Common gap: Intended purpose descriptions that are broad enough to apply to almost any AI system (“to improve business outcomes”) rather than specific enough to support conformity assessment.

Section 2: Development Process

This section documents how the system was built. It must cover:

Design choices and development methodology: The overall approach taken, including choices about model architecture, learning paradigm (supervised, unsupervised, reinforcement learning), and why those choices were made.
Key design decisions: Particularly decisions that affect accuracy, robustness, or potential for bias — and the reasoning behind them.
Third-party components: Any pre-trained models, datasets, or development tools sourced from third parties, along with their provenance.
Changes over time: How the system has been modified since initial development, and what triggered those changes.

What regulators look for here: Evidence that design choices were deliberate and documented — not post-hoc rationalizations. Traceability between design decisions and the system’s actual behavior.

Common gap: Missing documentation for third-party model components, particularly where foundation models or pre-trained components are incorporated without adequate provenance records.

Section 3: Training Data and Data Governance

Data documentation is one of the most scrutinized sections. Requirements include:

Training, validation, and test datasets: Their origin, collection methodology, and any pre-processing applied.
Data governance practices: How data quality was assured, including processes for identifying and handling errors, biases, or gaps.
Demographic and representational characteristics: Evidence that datasets adequately represent the population the system will operate on — and documentation of known limitations in representation.
Bias examination: What testing was done to identify discriminatory patterns in training data or outputs, and how identified biases were addressed.
Data protection compliance: How personal data used in training complied with GDPR — including the legal basis for processing and any data minimization measures.

What regulators look for here: Whether the organization has genuinely examined training data for bias and representational gaps, or whether this section is a formality. Specific demographic breakdowns, not generalizations.

Common gap: Documenting that bias testing was conducted without documenting what it found and what was done in response. Regulators are interested in the results and remediation, not just the activity.

Section 4: Testing, Validation, and Verification

This section demonstrates that the system actually performs as claimed. It must include:

Metrics used to evaluate performance: Accuracy, precision, recall, F1 score, or domain-specific metrics — along with the thresholds defined as acceptable and why.
Testing methodology: How testing was conducted, including train/test splits, cross-validation approaches, and evaluation datasets.
Testing results: The actual performance figures achieved, including performance disaggregated by relevant subgroups (demographic categories, operating conditions, geographic contexts).
Known limitations: Conditions under which the system’s performance degrades below its declared metrics.
Cybersecurity testing: Evidence that the system has been tested for adversarial attacks, data poisoning, and other AI-specific security vulnerabilities.

What regulators look for here: Whether performance claims are substantiated by rigorous methodology, and whether limitations are disclosed honestly. Disaggregated performance data is increasingly a baseline expectation, not an advanced requirement.

Common gap: Reporting only aggregate performance metrics without subgroup disaggregation. A system that achieves 92% accuracy overall but 71% accuracy for a specific demographic group has a significant undisclosed limitation.

Section 5: Standards Compliance and Conformity Assessment

This section documents which technical standards the system complies with and how conformity was assessed. Key elements:

Harmonized standards applied: Reference to any European harmonized standards (EN series) relevant to the system’s domain.
Common specifications: Any common specifications issued by the European Commission applied to the system.
Conformity assessment procedure: Whether self-assessment or third-party notified body assessment was used, and the basis for that choice. Most high-risk systems can use self-assessment; some (biometric systems, certain safety-critical applications) require notified body involvement.
EU Declaration of Conformity: A reference to the declaration, which is a separate document but must cross-reference the technical documentation.

What regulators look for here: That the correct conformity assessment procedure was used, and that standards application is specific (citing the actual standard number and version) rather than generic.

Common gap: Citing standards compliance without demonstrating how the system meets specific standard requirements.

Section 6: Human Oversight Measures

The EU AI Act places significant weight on human oversight as a safeguard for high-risk systems. This section must document:

Technical measures enabling oversight: How the system is designed to allow human operators to monitor outputs, intervene, override decisions, or stop the system.
Interfaces for oversight: The specific controls, dashboards, alerts, or mechanisms through which oversight is exercised.
Operator and user roles: Who is responsible for oversight, what they are expected to do, and what authority they have.
Procedures for escalation: What happens when the system produces an output that a human operator believes is incorrect or inappropriate.

What regulators look for here: Whether oversight is genuine — embedded in system design and operational procedures — or nominal. The question is whether a human operator can practically and meaningfully review AI outputs before they have effect, not just whether a theoretical override exists.

Common gap: Documenting that a human approval step exists without documenting whether operators have sufficient information, time, and training to exercise that approval meaningfully.

Section 7: Robustness, Accuracy, and Cybersecurity

This section addresses the system’s resilience against technical failures and adversarial conditions:

Accuracy specifications: Declared accuracy levels for the intended operating conditions, with substantiation from Section 4 testing.
Robustness measures: How the system behaves under edge cases, degraded inputs, or unusual operating conditions — and what safeguards exist to prevent erroneous outputs from causing harm.
Error handling: Mechanisms for detecting and managing errors, including fail-safe behaviors.
Cybersecurity measures: Technical controls protecting the system from manipulation, data poisoning, model inversion attacks, and adversarial inputs.
Resilience against misuse: Measures to prevent the system from being used in ways outside its intended purpose.

What regulators look for here: Specificity. General statements about “robust security practices” are insufficient. Regulators want specific controls mapped to specific threat vectors.

Common gap: Treating cybersecurity documentation as a standard IT security checklist rather than addressing AI-specific attack surfaces.

Section 8: Post-Market Monitoring Plan

This section is forward-looking — it documents how the system will be monitored after deployment:

Monitoring methodology: How performance, accuracy, and behavior will be tracked in production, including the metrics monitored and the frequency of review.
Incident reporting process: How serious incidents or unexpected behaviors will be identified, documented, and reported to national market surveillance authorities.
Feedback loops: How information from production monitoring feeds back into the system’s development, documentation, and risk management.
Documentation update triggers: The conditions under which the technical documentation will be reviewed and updated.

What regulators look for here: A credible, operational plan — not a promise. Specific metrics, monitoring frequencies, named responsible parties, and defined thresholds for escalation.

Common gap: Treating post-market monitoring as a formality. The EU AI Act expects demonstrable monitoring activity in production, with records to show for it.

Maintaining Documentation as Systems Evolve

Technical documentation is not filed once and forgotten. The EU AI Act requires documentation to remain accurate and current throughout the system’s operational life. Specifically, you must update documentation when:

The system’s training data changes: New data versions, updated datasets, or significant changes to data sources require Section 3 updates.
Model versions change: Updates that affect performance, behavior, or capabilities require updates to Sections 2, 4, and potentially 7.
Intended purpose changes: If the system begins being used in ways not covered by the original documentation, the documentation must be updated — and the conformity assessment may need to be revisited.
Post-market monitoring reveals issues: Problems identified in production must be documented and addressed, with the documentation updated to reflect both the finding and the remediation.

Practically, this means designating clear documentation ownership within your organization. Someone needs to be accountable for knowing when systems change and ensuring documentation reflects those changes.

Getting Started with Documentation Generation

The documentation burden for a single high-risk AI system is substantial but manageable with the right tooling. Aikraft’s documentation generator produces structured Annex IV documentation from a guided intake process — covering all eight sections with regulatory language and leaving your team to supply the system-specific content rather than working from a blank page.

If you are new to Aikraft, the getting started guide walks through connecting your first AI system and generating your initial compliance documentation.

Ready to generate your Annex IV documentation? Aikraft’s documentation generator produces compliant, structured technical documentation for high-risk AI systems — starting free for your first system.

Back to Blog