Stephane Budel
Home

Methodology & Limitations

How these analyses are built

The Signal analyses turn public data — published papers, registered clinical trials, and public-market prices — into a read on where precision-medicine technologies sit. This page documents the sources, definitions, calculations, the AI-assisted steps, and the limitations, so any figure can be traced and judged. None of it uses confidential or client data.

What kind of claim is what

Throughout the site, four kinds of statement are deliberately distinct:

  • Direct observations — counts pulled straight from a source (e.g. a paper appeared in Nature; a trial is recruiting).
  • Calculated metrics — defined arithmetic on those counts (the diffusion score, growth CAGRs, interventional %).
  • AI-assisted classifications — categories assigned by a language model (a paper’s application area or research theme).
  • Expert interpretation — the written conclusions and directional calls. These are judgment, not measurement.

Where a number is an estimate or a directional signal rather than a precise measurement, the surrounding text says so.

The Scholarly Diffusion Index (publication-based)

Question. How far has each technology diffused through the research community, and how much novelty premium does it still command?

Source & cutoff. Papers from PubMed via the NCBI E-utilities API, by publication year. Data current as of June 2026; the 2026 year is partial and annualized or flagged where shown.

Inclusion. Each technology is a fixed PubMed query (e.g. NGS = “next-generation sequencing” OR “high-throughput sequencing”). High-plex multi-omics deliberately requires a named method (CITE-seq, 10x Multiome, Olink + sequencing, spatial multi-omics, proteogenomics) rather than the loose term. Full queries are in each index’s methodology note.

Definitions. The diffusion score = top-tier papers ÷ total papers × 1,000 for a year. Top-3 = Nature, Science, Cell. Tier 1+2adds the leading specialist journals. The novelty premium is that score; it is aleading indicator that falls before adoption peaks. Adoption phase (Rogers: Innovators → Early Adopters → Early Majority → Late Majority → Laggards) is a separate,concurrent axis — a field can be Early Majority by adoption while its score is near the floor.

Phase calculation. A forward-only state machine combines two signals — publication-growth momentum (median of 1-, 3-, and 5-year CAGR) and the novelty premium (normalized to each technology’s own peak). Early phases are volume-gated; the chasm crosses when the premium falls below 30% of peak or a field exceeds 750 papers/year; late phases trigger on growth deceleration relative to the ambient growth of the literature (~4%/yr). Thresholds were calibrated against expert ground truth, then held fixed across all technologies. The Adoption Index is the single source of truth for phase; pages and notes derive their phase labels from it.

AI-assisted classification. Application categories and research themes are assigned by Anthropic’s Claude (model: claude-haiku-4.5, via the Batch API) from each paper’s title and abstract, into a fixed, tech-specific taxonomy. Country attribution is keyword matching on the first-author affiliation string, not a model. Aggregate distributions were reviewed for face validity; individual classifications were not hand-checked.

Limitations. This measures scholarly diffusion, not clinical or commercial value — a low or falling score is silent on whether a technology matters in the clinic. Keyword queries miss papers that don’t name a method and may catch tangential ones. Term commoditization (e.g. “multi-omics”) inflates volume without implying novelty. Some years hit PubMed’s result cap (flagged on those pages). The atlas/method and clinical/translational classification boundaries are where the model is softest, so per-category splits are directional; the macro shape is robust.

Update frequency. Refreshed when the underlying PubMed pulls are re-run (roughly quarterly); the data-cutoff label updates with them.

The clinical-trial trackers (MRD, Clinical Pull, ctDNA SoC, MCED)

Source. Live queries to the ClinicalTrials.gov API (v2) at page-render time, cached and refreshed daily. The counts you see are current; the written conclusions describe the data at the last fetch and can shift as trials update.

Status & denominators. Unless stated otherwise, counts are recruiting trials only. Every percentage states its denominator (e.g. interventional % = interventional ÷ that group’s recruiting trials). The MRD tracker’s per-cancer counts use broad keyword searches that can overlap (a trial spanning two cancers is counted in each), so the headline total is the sum across tracked types, not a de-duplicated global figure.

Curated vs live. Spotlight trials, decision-type breakdowns, and milestone lists are hand-classified from registry entries and are point-in-time; they sit alongside the live counts and are labeled as such.

Limitations. Registry data is self-reported and uneven. “Interventional study type” does not guarantee the assay drives a treatment decision. Keyword searches over- and under-count. These trackers measure registered clinical activity, not real-world use or reimbursement.

The Pulse (market scorecard)

Source. Public-market prices via Yahoo Finance, refreshed hourly. Baskets are equal-weighted averages of per-ticker returns, indexed to a common start; the denominator is the count of tickers returning data. Era bands and event annotations are editorial.

Limitations. Equal-weighting and basket membership are editorial choices; returns are price-only (not total-return) and not investment advice.

Corrections & versioning

These analyses are living instruments and will contain errors. If you spot one — a number that doesn’t reconcile, a misclassified paper, a stale claim — please flag it and it will be fixed and noted.

Methodology last updated June 12, 2026.