Transparency · 方法学

How we rank schools

Every prestige_score on School Path AI is computed from a single, auditable formula. No paywalled data sources, no opaque editorial weighting — just public datasets (DfE Performance Tables, GIAS, Ofsted, ISI inspection PDFs) and a small set of transparent, hand-curated reference lists.

Senior independent prestige_score (age 16+)

Each ISC-tier independent senior school gets a continuous score. Final value is the sum of these components:

Manual tier anchorrange 0–100

Hand-curated tier from data/seeds/prestige_tiers.csv. Tier 1 (Clarendon Nine + GSA flagship + closely-associated HMC tops) = +100. Tier 2 (established HMC / GSA member schools, ~210 currently) = +60. Tier 3 = +30.

Why: Captures national prestige that no algorithm can fully measure (centuries of academic tradition, alumni network, scholarship density).

A-Level AAB %range 0–90

Percentage of A-Level entries graded AAB+ (DfE 16-18 dataset, latest period). Falls back to MAX(DfE, manual_estimate × 0.90) when DfE under-reports (Pre-U / IB / Bedales Assessed Course schools).

Why: The strongest single academic signal. Our manual fallback corrects systematic under-reporting for ~30 elite schools that opt out of standard A-Level.

GCSE Attainment 8range 0–30

DfE KS4 Attainment 8 / 90 × 30. Same MAX-with-manual fallback rule.

Why: Lower weight than A-Level since prestige typically derives from sixth-form outcomes, but still informative for academic broad strength.

ISI inspection ratingrange 0–20

Excellent = +20, Good = +12, Sound = +5, Unsatisfactory = 0. Scraped from public ISI inspection report PDFs (isi.net) — currently 918 schools.

Why: Independent third-party quality check, complements academic outcome data. Capped at 20 so it never dominates the academic signal.

School sizerange 0–24

log10(total_pupils) × 8, capped at 24. ~200 pupils = 18, 1000 = 24.

Why: Substantive school proxy. Excludes 1-form-entry specialists from competing with major day schools.

Boarding optionrange 0–8

Full / mainly boarding = +8. Mixed boarding (day + flexi) = +5. Day-only = 0.

Why: Boarding correlates with national catchment, fee selectivity, and traditional public-school identity.

Destination edgesrange 0–12

log10(university_destinations + 1) × 5. Captures published UCAS leaver destinations.

Why: Real-outcome signal, but most schools haven't published structured destination data yet — currently only 21 senior schools contribute.

Has sixth formrange 0–4

+4 if has_sixth_form = TRUE, else 0.

Why: Filters out junior-only schools from the senior ranking pipeline.

No-DfE compensationrange 0–35

When a Tier 1 / Tier 2 school has no AAB > 25 (i.e. doesn't fit DfE A-Level metric), Tier 1 gets +35, Tier 2 gets +20.

Why: St Paul's School (Boys), KCS Wimbledon, Bedales — top schools that opt out of DfE A-Level. Without compensation they'd sink unfairly.

Maximum theoretical score ≈ 233. Top schools cluster around 200-245. Bottom of Tier 2 around 130-160.

Prep school prep_prestige_score (age ≤ 14)

Prep schools don't take A-Level / GCSE / KS2 SATs — those signals are zero for the entire population, so the senior formula is meaningless for prep ranking. Instead we use a prep-specific formula:

Manual IAPS tier anchorrange 0–100

Tier 1 IAPS top-of-mind (Dragon, Ludgrove, Summer Fields, Hill House etc.) = +100. Tier 2 established IAPS = +60.

Why: Same role as senior tier anchor.

ISI ratingrange 0–30

Excellent = +30, Good = +20, Sound = +10. Higher weight than senior since ISI is the primary prep quality signal.

Why: Preps don't have DfE academic data — ISI carries more of the weight.

Ofsted (fallback)range 0–22

Outstanding = +22, Good = +15, RI = +8 — only when no ISI rating exists.

Why: Covers ~11% of preps inspected by Ofsted instead of ISI.

Senior school destinationsrange 0–50

log10(senior_outflow + 1) × 22. Counts pupils sent to top senior schools (Eton, Westminster, Winchester etc.).

Why: The gold-standard prep signal. Currently sparse data (3 preps).

Sizerange 0–12

log10(pupils) × 4. Lower max than senior since 1-form-entry preps are common.

Why: Avoid penalising small traditional preps.

Boarding bonusrange 0–8

Same as senior.

Why: Boarding preps tend to be national catchment 13+ feeders.

Age pattern bonusrange 0–5

age_max = 13 → +5 (traditional CE13+ feeder), 14 → +4, 11 → +3 (modern 11+).

Why: Recognises traditional 4-13 prep model is distinct from modern 4-11 pre-prep+prep blend.

Choir school bonusrange 0–12

+12 for Cathedral Choir / Abbey Choir schools (St Paul's Cathedral School, Westminster Cathedral Choir, Westminster Abbey Choir, etc.).

Why: Unique high-prestige niche with their own pipeline (choir scholarships).

Pathway probabilities (primary → senior → university)

On the Pathways page, when you focus on a single prep school we compute the chance that one of its leavers ends up at each university. We model the journey as a 3-state Markov chain and apply the law of total probability:

P(uni = u | primary = p)
   = Σ_seniors  P(senior = s | primary = p) × P(uni = u | senior = s)
  • Rate normalisation. Every count in our database carries a period_years annotation (1 for a single-year cohort, 5 for a 5-year cumulative total). We divide each count by its period before the ratio, so 5-year cumulative and single-year sources sit on the same scale.
  • 11+ vs 13+ branches. A child entering a senior school at 11+ ends up at a different set of seniors than one entering at 13+. The Stage filter on the Pathways page partitions the prep's leavers by entry stage so you can compare the two pathways separately.
  • Coverage badge. If a senior school in your prep's pipeline does not yet have a verified university destination row, that branch contributes zero to the rolled-up probability. We surface a “X% of pathways covered” badge so you know how much of the cohort is represented. Read all conditional probabilities together with the coverage figure.
  • Compounded confidence. Each chain step has a confidence label (verified / verified-list / training-knowledge). The end-to-end pathway inherits the weaker of the two — a route through a verified prep edge but a training-knowledge senior edge surfaces as training-knowledge.
  • Markov assumption. Given the senior school, we treat the choice of university as conditionally independent of the primary. Without per-pupil tracking this is the most-tractable model — a real St Paul's boy from Cathedral Choir vs Sussex House may have slightly different uni outcomes, but our published numbers cannot tell them apart.
  • Conditional on going to university. The probability you see is for “which university”, not for “will my child go to university at all”. Apprenticeships, gap years, and direct work are not in our data.

Data sources

  • GIAS (Get Information About Schools) — DfE bulk export, 27,222 schools
  • DfE Performance Tables — KS2 / KS4 / 16-18 datasets via EES API
  • Ofsted — latest inspections CSV, ratings + dates
  • ISI — public inspection report PDFs scraped from isi.net/reports via their public API. 1,108 matched schools, 918 with extracted ratings.
  • Curated tier listsdata/seeds/prestige_tiers.csv (315 hand-tiered schools), built from public HMC / GSA / IAPS membership rosters
  • Manual performance overridesdata/seeds/manual_performance_overrides.csv (42 schools, conservative AAB/Att8 estimates from each school's published annual results announcement)
  • Fee estimates — 30 manually-seeded actual fees, 1,274 rule-based estimates (boarding × region × tier multipliers)

Known limitations

  • Scotland is missing. GIAS only covers England + Wales. Famous Scottish schools (Fettes, Gordonstoun, Edinburgh Academy, Hutchesons', etc.) are not in our database.
  • ISI Excellent under-reports. ISI's 2024 framework reform abolished the Excellent/Good/Sound 4-point scale. Recent ROU reports get a synthetic "Good" default for compliant schools — only 57 schools currently have explicit Excellent ratings (from legacy EQI reports).
  • Prep destinations data is sparse. Only 3 prep schools currently have meaningful prep_to_senior_edges data. The senior_dest signal will only become decisive once we've scraped each prep's "Where do leavers go?" page.
  • Manual estimates are not externally audited. Our 42 manual performance values get a 10% confidence discount in the formula. Real DfE data always wins when present.
  • Fee estimates are coarse. Real fees vary by year, day vs boarding rate, scholarship discounts. Use the figure for budgeting, not for procurement.
  • This isn't Times Schools Guide. Times uses paywalled school-submitted data including Pre-U / IB grade conversions. Our formula approximates ~85% overlap with their Top 50 from public sources alone, but methodology differs.

Reproducibility

Every score is reproducible from the source code. Run the pipeline yourself:

# 1. Refresh ISC membership flags (needs ~1300 rows)
python -m app.db.derive_isc_membership --permissive --apply

# 2. Re-import manual tiers + performance overrides
python -m app.db.import_prestige_tiers
python -m app.db.import_manual_performance

# 3. Re-scrape ISI ratings (~30 min, 1374 schools)
python -m app.db.scrape_isi_index
python -m app.db.scrape_isi_reports
python -m app.db.match_isi_to_urn
python -m app.db.extract_isi_ratings --matched-only
python -m app.db.finalize_isi_import

# 4. Re-derive scores
python -m app.db.derive_prestige_score --apply
python -m app.db.derive_prep_prestige_score --apply
python -m app.db.derive_estimated_fees --apply

Source code: apps/api/app/db/derive_prestige_score.pyapps/api/app/db/derive_prep_prestige_score.py