Beyond Massive Text Embedding Benchmarks (MTEB): A Topology-Aware Embedder Bake-Off Across Two Corpora
Cosine leaderboards rank by training objective, not corpus fit. Adding a Mapper-TDA stability layer and linguistic anchors changes which embedder wins. A fourteen-encoder roster spanning both training-regime axes (contrastive vs MLM, general vs biomedical). Results indicate the topology ranking is structured by training-regime + pooling-convention compatibility, not cosine benchmark performance.
