European AVM Alliance

The EAA sets the benchmark for AVM quality, transparency, and governance across Europe. Its ESSVM standards define how automated valuation models should be built, tested, and reported — and how lenders and regulators should evaluate them.

What is the EAA?

The European AVM Alliance (EAA) is an industry body for automated valuation model providers operating in European property markets. Founded to promote standards, transparency, and professional governance in the AVM sector, the EAA brings together AVM developers, lenders, regulators, and academic researchers to establish common benchmarks for model quality.

The Alliance addresses a fundamental problem in the AVM market: without common standards, every provider defines accuracy, confidence, and coverage differently, making it impossible for lenders and regulators to compare models on equal terms. A provider claiming “95% accuracy” might be measuring something entirely different from a competitor making the same claim — different benchmarks, different error definitions, different test methodologies.

The EAA’s primary output is the European Standards for Statistical Valuation Methods (ESSVM), now in its 3rd Edition (2022). This document is the closest thing the European AVM industry has to a single, authoritative quality framework. It defines what an AVM must disclose, how accuracy must be measured, and what constitutes acceptable performance.

ESSVM 3rd Edition (2022)

The European Standards for Statistical Valuation Methods set out precise requirements for how AVMs should report their performance. This is not a vague set of principles — it specifies exact metrics, segmentation requirements, testing frequencies, and disclosure obligations. Any AVM provider claiming ESSVM alignment should be able to demonstrate compliance with all of the following.

Required performance metrics

The ESSVM mandates that every AVM provider publishes the following metrics from independent testing:

Hit rate

The percentage of properties for which the model produces a valuation. An AVM that declines 40% of requests may have excellent accuracy on the remaining 60%, but that is not the same as a model that values 95% of the stock. Hit rate prevents cherry-picking.

Median Error (MdAPE)

The median absolute percentage error between the AVM estimate and the benchmark value. Using the median rather than the mean prevents a small number of extreme outliers from flattering the headline figure.

Average Absolute Error (MAPE)

The mean absolute percentage error. Sensitive to outliers, but useful as a complement to the median for understanding the tail distribution of errors.

Forecast Standard Deviation (FSD)

A measure of the dispersion of valuation errors, used as the basis for the confidence scale. The FSD tells you how spread out the model’s errors are — a low FSD means tightly clustered predictions.

PE10 and PE20

The percentage of valuations falling within ±10% and ±20% of the benchmark value respectively. These are the most intuitive accuracy metrics: what proportion of valuations are “close enough” to the real price?

Segmentation requirements

A national headline number is not sufficient. The ESSVM requires that all metrics are reported segmented by:

  • Property type — detached, semi-detached, terraced, flat/maisonette
  • Geography — at minimum regional, ideally sub-regional
  • Price range — performance in the £100K–200K bracket may be very different from the £500K–1M bracket
  • Confidence level — high-confidence valuations should demonstrably outperform low-confidence ones

This segmentation requirement is critical. An AVM might achieve 65% PE10 nationally, but if it delivers 80% in London and 40% in rural Wales, the headline figure conceals a material problem. Lenders making decisions for specific property types in specific regions need to know the model’s accuracy for that segment, not just the national average.

Additional requirements

  • Performance reports at least quarterly — accuracy is not static; models drift as markets move. The ESSVM requires fresh test results at minimum every three months.
  • Historical depth minimum 5 years — models should be trained and validated across at least one full market cycle, not just a period of rising prices.
  • Comparable model where possible — the ESSVM encourages AVM outputs to include comparable evidence that a human valuer can review and challenge.
  • No “solely black box” techniques — outputs must be produced “in a replicable, explainable, traceable manner.” The model’s reasoning must be interpretable, not opaque.

The FSD confidence scale

The ESSVM defines a common confidence scale based on Forecast Standard Deviation (FSD). The FSD measures the dispersion of a model’s percentage errors — essentially, how widely scattered the model’s predictions are around the true values. A lower FSD means the model’s errors are tightly clustered and predictable; a higher FSD means they are spread out and unreliable.

The FSD is expressed as a percentage, and the confidence interpretation follows a normal distribution assumption: an FSD of 13% means there is approximately a 68% probability that the true value falls within ±13% of the AVM estimate (one standard deviation).

FSD Range Confidence Level Meaning
≤13% High 68% probability that the true value falls within ±13% of the estimate. Suitable for most lending decisions with appropriate due diligence.
13–20% Medium Acceptable for lower-risk applications such as portfolio monitoring, remortgage screening, or desktop review support. The prediction interval is wider.
>20% Low Generally rejected by lenders for individual lending decisions. May still be useful for indicative pricing, market analysis, or flagging properties that require physical inspection.
>25% No-hit Treated as an AVM failure. The model’s uncertainty is so high that the estimate carries no decision-making value. A physical valuation is required.

The FSD scale provides a common language between AVM providers, lenders, and regulators. When a lender specifies that they require “FSD ≤13%” for automated lending decisions, every ESSVM-compliant AVM provider interprets that threshold identically. Without this shared framework, one provider’s “high confidence” might correspond to another’s “medium.”

Accuracy reporting standards

The ESSVM specifies a family of “PE” (Percentage within Error) metrics as the standard way to report AVM accuracy. These metrics answer a simple question: what proportion of the model’s valuations fell within a given tolerance of the benchmark value?

Metric Definition Typical Use
PE5 % of valuations within ±5% of benchmark The strictest test. Differentiates the best models in high-volume, homogeneous markets.
PE10 % of valuations within ±10% of benchmark The industry standard. Most lender thresholds and academic benchmarks reference PE10.
PE15 % of valuations within ±15% of benchmark A broader tolerance that captures the “near miss” population. Useful for understanding the error distribution.
PE20 % of valuations within ±20% of benchmark The widest standard tolerance. Valuations outside ±20% are typically considered material misvaluations.
MdAPE Median Absolute Percentage Error The single best summary statistic. Robust to outliers and easy to interpret: “half of all valuations are within X% of the true price.”

How PE metrics should be calculated

The ESSVM is specific about calculation methodology. PE metrics must be computed as the absolute percentage deviation between the AVM estimate and the benchmark value, where the benchmark is the denominator. For PE10, a valuation of £220,000 for a property that sold for £200,000 produces a 10% error and is included in the PE10 count. The same £220,000 estimate for a property that sold for £240,000 produces an 8.3% error — also within the PE10 band.

This matters because symmetric percentage thresholds are not symmetric in absolute terms. A ±10% band around a £500,000 property (£450K–550K) is wider in absolute terms than the same band around a £150,000 property (£135K–165K). The ESSVM requires that this asymmetry is understood and disclosed.

Walk-forward testing

The ESSVM requires that AVM accuracy be measured using out-of-sample, out-of-time testing — commonly called walk-forward or temporal validation. This is the single most important methodological requirement in the entire document, and the one most commonly violated by providers making inflated accuracy claims.

How it works

The model is trained on historical transaction data up to a defined cutoff date. It is then asked to predict the prices of properties that sold after that cutoff — transactions the model has never seen during training. The predictions are compared to the actual sale prices to produce accuracy metrics.

Why it matters

Without temporal separation, accuracy figures can be dramatically inflated through data leakage. If a model is trained on transactions from 2020–2024 and then “tested” on a random subset of those same transactions, it has already seen information about the market conditions, price levels, and comparable evidence from the test period. This is not a genuine test of the model’s predictive ability — it is a test of how well the model memorises its training data.

Walk-forward testing prevents this. By training on data that ends before the test period begins, the model faces the same challenge it will face in production: predicting the price of a property using only information available at the time of the valuation, with no knowledge of what subsequently happened in the market.

Overfitting and inflated claims

Walk-forward testing is the primary defence against overfitting — a model that has learned the noise in its training data rather than the underlying market signal. An overfit model can produce PE10 figures above 90% on in-sample data while performing at 50% on genuinely unseen transactions.

When evaluating an AVM provider’s published accuracy, the critical question is: were these figures produced from walk-forward testing, or from random train-test splits? The ESSVM is unambiguous: only temporal out-of-sample results are valid for accuracy claims.

Benchmark methodology

Accuracy metrics are only meaningful relative to a benchmark — the “truth” against which the AVM is tested. The ESSVM addresses two common benchmarks and their respective strengths and weaknesses.

Surveyor valuations

Comparing AVM estimates to RICS surveyor valuations is the softer benchmark. Surveyor valuations are themselves estimates with a typical variability of ±5–10% from the eventual sale price. Testing against a noisy benchmark inflates apparent accuracy because the AVM is being compared to another estimate rather than to ground truth. However, surveyor benchmarks are useful for understanding how well an AVM replicates professional opinion, which is the relevant comparison for lending applications where the AVM replaces a desktop valuation.

Sale prices

Comparing AVM estimates to actual completed sale prices (e.g. from Land Registry) is the harder benchmark. Sale prices are objective and final — but they incorporate factors invisible to any model, such as negotiation dynamics, buyer urgency, vendor motivation, and chain circumstances. An AVM tested against sale prices will always show lower headline accuracy than one tested against surveyor valuations, even if the underlying model is identical.

The ESSVM acknowledges both benchmarks but requires providers to disclose which one they use. Direct comparison of accuracy figures across providers is only valid when the same benchmark methodology is used. An AVM claiming 75% PE10 against sale prices may be more accurate than one claiming 85% PE10 against surveyor valuations — but the headline numbers suggest the opposite.

EAA AVM Label accreditation

The EAA operates a formal accreditation programme — the AVM Label — that certifies an AVM provider meets the ESSVM requirements. The Label is not a rubber stamp; it requires demonstrable compliance across multiple dimensions.

Key requirements

Market penetration

The AVM must be in active commercial use by a minimum of two financial institutions. This is not a theoretical model or a research prototype — it must be deployed and relied upon in production lending decisions.

Required reporting metrics

The provider must publish all ESSVM-mandated metrics (hit rate, MdAPE, MAPE, FSD, PE10, PE20) segmented by property type, geography, price range, and confidence level. Partial reporting is not sufficient.

Quarterly bulk test reports

The provider must submit bulk test results at least quarterly, demonstrating ongoing model performance against fresh transaction data. Historical accuracy from two years ago is not evidence of current model quality.

Model documentation

Full methodology documentation must be available for review, covering data sources, model architecture, feature engineering, training procedures, validation methodology, and known limitations.

Explainability

The model must produce interpretable outputs — not just a point estimate, but confidence indicators, prediction intervals, and ideally comparable evidence that enables human review and challenge.

The AVM Label programme signals to lenders and regulators that a provider has submitted to external scrutiny and met an independent quality standard. For lenders conducting vendor due diligence under PRA SS1/23, the AVM Label provides third-party validation of the model quality claims made by the provider.

How Gadsden Valuations reports against ESSVM standards

We build our accuracy reporting to ESSVM standards. Our goal is not merely to claim alignment but to exceed the minimum requirements on the metrics that matter most to lenders and model risk teams.

Walk-forward backtesting against Land Registry sale prices

We test against actual completed sale prices from HM Land Registry — the hardest available benchmark. Our model is trained on historical transactions and tested on subsequent unseen sales, exactly as the ESSVM requires. We do not use random train-test splits, and we do not test against surveyor valuations.

PE10, PE20, and MdAPE published transparently

Our headline accuracy metrics are published on our accuracy page, updated with each model version. We report PE10, PE20, MdAPE, and test set size. The numbers are not hidden behind a login wall or available only on request.

Segmented by property type, price band, and region

We publish accuracy breakdowns by property type (detached, semi-detached, terraced, flat), by price band, and by region. A lender evaluating our model for a specific use case can see the relevant segment accuracy, not just the national average.

Confidence tiers with prediction intervals

Every valuation includes a confidence tier and prediction interval, enabling risk-based decision-making. Our confidence model is validated: higher-tier valuations empirically demonstrate lower error rates than lower-tier ones. This satisfies the ESSVM requirement that confidence indicators must be calibrated and meaningful, not decorative.

Explainable, not black-box

Every valuation is decomposed into per-feature contributions using SHAP values, and accompanied by comparable evidence from nearby recent sales. A RICS Registered Valuer can review, understand, and challenge every element of the estimate. This directly satisfies the ESSVM prohibition on solely black-box techniques.

For further detail on our methodology, data sources, validation approach, and known limitations, see the technical summary and what are AVMs pages. For a broader view of the regulatory landscape governing AVMs in the UK, see our guide to RICS and UK regulation.

See how we measure up

We publish our walk-forward backtest results openly — PE10, PE20, MdAPE, and segmented breakdowns by property type, price band, and region. No claims without evidence.