GeoLIP Spectral Encoder — Test Manifest

Geometric Primitives for Constellation-Anchored Classification

Target: CIFAR-10 (baseline), then generalize Constraint: Zero or minimal learned encoder params. All learning in constellation anchors, patchwork, classifier. Metric: Val accuracy, CV convergence, anchor activation, InfoNCE lock, train/val gap Baseline to beat: 88.0% (conv encoder + SquaredReLU + full trainer, 1.6M params) Current best spectral: 46.8% (STFT + Cholesky + SVD, v4, 137K params, CE-only carry)

STATUS KEY

[ ] — Not started
[R] — Running
[X] — Completed
[F] — Failed (with reason)
[S] — Skipped (with reason)
[P] — Partially completed

COMPLETED EXPERIMENTS (prior sessions + this session)

Conv Encoder Baselines (Form 1 Core)

Linear baseline, 100 epochs → 67.0%, 422K params, overfits at E31
MLP baseline, 100 epochs → 65.0%, 687K params, overfits at E10
Core CE-only, 100 epochs → 63.4%, 820K params, CV=0.70, never converges
Core CE+CV, 100 epochs → 62.7%, 820K params, CV=0.61, worse than CE-only
Core 32 anchors, interrupted E20 → 59.2%, 1.8M params, slow convergence
Full trainer GELU, 100 epochs → 88.0%, 1.6M params (original proven result)
Full trainer SquaredReLU, 100 epochs → 88.0%, 1.6M params, E96 best

Spectral Encoder Experiments

[F] Spectral v1: flat FFT → 768-d → single constellation → collapsed
- Cause: concat norm √48≈6.93 vs anchor norm 1, not on same sphere
[F] Spectral v2: per-band constellation (48×64=3072 anchors) → ~35%
- Cause: 3072 tri dims too diffuse, InfoNCE dead at 0.45, no cross-band structure
[F] Spectral v3: FFT → 8 channels (spherical mean) → 128 anchors → 27%
- Cause: cos≈0.99, spherical mean collapsed all images to same point
[P] Spectral v4: STFT + Cholesky + SVD → S^43 → 64 anchors → 46.8% (still running)
- CE carrying alone, CosineEmbeddingLoss frozen at 0.346, InfoNCE dead at 0.15
- Cholesky+SVD signature IS discriminative, contrastive losses unable to contribute

CATEGORY 1: SIGNAL DECOMPOSITION TO GEOMETRY

1.1 Wavelet Scattering Transform (Mallat)

Formula: S_J[p]x(u) = |||x * ψ_{λ₁}| * ψ_{λ₂}| ... | * φ_{2^J}(u) Library: kymatio (pip install kymatio) Github: https://github.com/kymatio/kymatio Expected output: ~10K-dim feature vector for 32×32 Literature baseline: ~82% CIFAR-10 with SVM, ~70.5% with linear Properties: Deterministic, Lipschitz-continuous, approximately energy-preserving

1.1a Scattering order 2, J=2, L=8 → L2 normalize → flat constellation on S^d
- Hypothesis: scattering features are rich enough that flat constellation should work
- Compare: direct linear classifier on scattering vs constellation pipeline
1.1b Scattering → JL projection to S^127 → constellation (64 anchors)
- JL preserves distances; S^127 matches our proven dim
1.1c Scattering → JL → S^43 → Cholesky/SVD signature → constellation
- Stack v4's geometric signature on top of scattering features
1.1d Scattering order 1 vs order 2 ablation
- Order 1 is ~Gabor magnitude; order 2 adds inter-frequency structure
1.1e Scattering + InfoNCE: does augmentation invariance help or hurt?
- Scattering is already translation-invariant; InfoNCE may be redundant
1.1f Scattering hybrid: scattering front-end + lightweight learned projection + constellation
- Test minimal learned params needed to bridge the 82→88% gap

1.2 Gabor Filter Banks

Formula: g(x,y) = exp(−(x'²+γ²y'²)/(2σ²)) · exp(i(2πx'/λ+ψ)) Expected: S scales × K orientations → S×K magnitude responses Properties: Deterministic, O(N·S·K), first-order scattering ≈ Gabor modulus

1.2a Gabor bank (4 scales × 8 orientations = 32 filters) → L2 norm → S^31
- Each filter response is a spatial map; pool to scalar per filter
1.2b Gabor → per-filter spatial statistics (mean, std, skew, kurtosis) → S^127
- 32 filters × 4 stats = 128-d, matches conv encoder output dim
1.2c Gabor vs scattering order 1 A/B test
- Validate that scattering order 1 ≈ Gabor + modulus

1.3 Radon Transform

Formula: Rf(ω,t) = ∫ f(x) δ(x·ω − t) dx Properties: Deterministic, exactly invertible via filtered back-projection

1.3a Radon at K angles → sinogram → L2 norm per angle → K points on S^d
- K angles = K geometric addresses, constellation measures the cloud
1.3b Radon → 1D wavelet per projection (= ridgelet) → aggregate to S^d
- Composition: Radon → Ridgelet, captures linear singularities

1.4 Curvelet Transform

Formula: c_{j,l,k} = ⟨f, φ_{j,l,k}⟩, parabolic scaling: width ≈ length² Properties: Deterministic, exactly invertible (tight frame), O(N² log N)

1.4a Curvelet energy per (scale, orientation) band → L2 norm → S^d
- Captures directional frequency that scattering misses
1.4b Curvelet + scattering concatenation → JL → constellation
- Test complementarity of isotropic (scattering) + anisotropic (curvelet) features

1.5 Persistent Homology (TDA)

Formula: Track birth/death of β₀ (components), β₁ (loops) across filtration Library: giotto-tda or ripser Properties: Deterministic, O(n³), captures topology no other transform sees

1.5a Sublevel set filtration on grayscale → persistence image → L2 norm → S^d
1.5b PH on scattering feature maps (topology of the representation)
- Captures whether scattering features form clusters, loops, voids
1.5c PH Betti curve as additional channel in multi-signature pipeline
1.5d PH standalone classification baseline on CIFAR-10
- Literature suggests ~60-70% standalone; valuable as complementary signal

1.6 STFT Variants (improving v4)

1.6a 2D STFT via patch-wise FFT (overlapping patches) instead of row/col STFT
- True spatial-frequency decomposition vs row+col approximation
1.6b STFT with larger n_fft=32 (current: 16) → more frequency resolution
1.6c STFT preserving phase (not just magnitude) via analytic signal
- Phase encodes spatial structure; current pipeline discards it
1.6d Multi-window STFT (different window sizes for different frequency ranges)

CATEGORY 2: MANIFOLD STRUCTURES

2.1 Hopf Fibration

Formula: h(z₁,z₂) = (2z̄₁z₂, |z₁|²−|z₂|²) : S³ → S² Properties: Deterministic, O(1), hierarchical (base + fiber)

2.1a Encode 4-d feature vectors on S³ → Hopf project to S² + fiber coordinate
- Coarse triangulation on S², fine discrimination in fiber
2.1b Quaternionic Hopf S⁷ → S⁴ for 8-d features
- Natural for 8-channel spectral decomposition (v3/v4 channel count)
2.1c Hopf foliation spherical codes for anchor initialization
- Replace uniform_hypersphere_init with Hopf-structured codes
2.1d Hierarchical constellation: coarse anchors on base S², fine anchors per fiber

2.2 Grassmannian Class Representations

Formula: Class = k-dim subspace of ℝⁿ, distances via principal angles Properties: Requires SVD, O(nk²)

2.2a Replace class vectors with class subspaces on Gr(k,n)
- Each class owns a k-dim subspace; classification = nearest subspace
- Literature: +1.3% on ImageNet over single class vectors
2.2b Grassmannian distance metrics ablation: geodesic vs chordal vs projection
2.2c Per-class anchor subspace: each anchor defines a subspace, not a point

2.3 Flag Manifold (Nested Subspace Hierarchy)

Formula: V₁ ⊂ V₂ ⊂ ... ⊂ Vₖ, nested subspaces Properties: Generalizes Grassmannian, natural for multi-resolution

2.3a Flag decomposition of frequency channels (DC ⊂ low ⊂ mid ⊂ high)
- Test whether nesting constraint improves spectral encoder
2.3b Flag-structured anchors: coarse-to-fine anchor hierarchy

2.4 Von Mises-Fisher Mixture

Formula: f(x; μ, κ) = C_p(κ) exp(κ μᵀx), soft clustering on S^d Properties: Natural density model for hyperspherical data

2.4a Replace hard nearest-anchor assignment with vMF soft posteriors
- p(j|x) = α_j f(x;μ_j,κ_j) / Σ α_k f(x;μ_k,κ_k)
- Learned κ per anchor = adaptive influence radius
2.4b vMF mixture EM for anchor initialization (replace uniform hypersphere init)
2.4c vMF concentration κ as a diagnostic: track per-class κ convergence

2.5 Optimal Anchor Placement

2.5a E₈ lattice anchors for 8-d constellation (240 maximally separated points)
2.5b Spherical t-design initialization vs uniform hypersphere init
2.5c Thomson problem solver for N anchors on S^d (energy minimization)
- Compare: QR + iterative repulsion (current) vs Coulomb energy minimization

CATEGORY 3: COMPACT REPRESENTATIONS

3.1 Random Fourier Features

Formula: z(x) = √(2/D) [cos(ω₁ᵀx+b₁), ..., cos(ωDᵀx+bD)] Properties: Pseudo-deterministic, preserves kernel structure, maps to S^d via cos/sin

3.1a RFF on raw pixels → S^d → constellation
- Baseline: how much does nonlinear kernel approximation help raw pixels?
3.1b RFF on scattering features → constellation
- Composition: scattering (linear invariants) → RFF (nonlinear kernel)
3.1c Fourier feature positional encoding (Tancik/Mildenhall style)
- γ(v) = [cos(2πBv), sin(2πBv)]ᵀ explicitly maps to hypersphere

3.2 Johnson-Lindenstrauss Projection

Formula: f(x) = (1/√k)Ax, preserves distances with k = O(ε⁻² log n) Properties: Pseudo-deterministic, near-isometric

3.2a JL from scattering (~10K) to 128-d → L2 norm → constellation
- Test: does JL + L2 norm preserve enough structure?
3.2b JL target dimension sweep: 32, 64, 128, 256, 512
- Find minimum k where constellation accuracy saturates
3.2c Fast JL (randomized Hadamard) vs Gaussian JL speed/accuracy tradeoff

3.3 Compressed Sensing on Scattering Coefficients

Formula: y = Φx, recover via ℓ₁ minimization if x is k-sparse Properties: Exact recovery for sparse signals, O(k log(N/k)) measurements

3.3a Measure sparsity of scattering coefficients (how many are near-zero?)
- If sparse: CS can compress much more than JL
3.3b CS measurement matrix → L2 norm → constellation
- Compare: CS vs JL at same target dimension

3.4 Spherical Harmonics

Formula: Y_l^m(θ,φ), complete basis on S², (l_max+1)² coefficients Properties: Deterministic, native Fourier on sphere, exactly invertible

3.4a Expand constellation triangulation profile in spherical harmonics
- Which angular frequencies carry discriminative info?
3.4b Spherical harmonic coefficients of embedding distribution as class signature
3.4c Hyperspherical harmonics for S^15 and S^43 (higher-dim generalization)

CATEGORY 4: INVERTIBLE GEOMETRIC TRANSFORMS

4.1 Stereographic Projection

Formula: σ(x) = x_{1:n}/(1−x_{n+1}), σ⁻¹(y) = (2y, ‖y‖²−1)/(‖y‖²+1) Properties: Conformal bijection S^n{pole} ↔ ℝⁿ, preserves angles

4.1a Stereographic → Euclidean scattering → inverse stereographic → S^d
- Apply scattering in flat space, project back to sphere
4.1b Stereographic projection as constellation readout alternative
- Instead of triangulation distances, read local coordinates via stereographic

4.2 Exponential / Logarithmic Maps

Formula: exp_p(v) = cos(‖v‖)·p + sin(‖v‖)·v/‖v‖ Formula: log_p(q) = arccos(⟨q,p⟩) · (q−⟨q,p⟩p)/‖q−⟨q,p⟩p‖ Properties: Deterministic, locally invertible, O(n)

4.2a Replace triangulation (1−cos) with log map coordinates at each anchor
- Log map gives direction + distance in tangent space (richer than scalar distance)
- Each anchor contributes d-dim tangent vector instead of 1-d distance
4.2b Log map triangulation → parallel transport to common tangent space → aggregate
- Geometrically principled alternative to patchwork concatenation

4.3 Parallel Transport

Formula: Γ^q_p(v) = v − (⟨v,p⟩+⟨v,q⟩/(1+⟨p,q⟩))·(p+q) on S^n Properties: Isometric between tangent spaces, exactly invertible

4.3a Compute log maps at K anchors → parallel transport all to north pole → aggregate
- Creates a canonical tangent-space representation independent of anchor positions
4.3b Parallel transport as inter-anchor communication in constellation
- How does the same input look from different anchor tangent spaces?

4.4 Möbius Transformations

Formula: h_ω(z) = (1−‖ω‖²)/‖z−ω‖² − ω Properties: Conformal automorphism of S^d, invertible, O(d)

4.4a Möbius "geometric attention": transform sphere to zoom into anchor regions
- Expand region near anchor, compress far regions
- Each anchor applies its own Möbius transform before measuring distance
4.4b Composition of Möbius transforms as normalizing flow on S^d
- Learned flow that warps embedding distribution toward better separation

4.5 Procrustes + Polar Decomposition

Formula: R* = argmin_R ‖RA−B‖_F = UVᵀ from SVD(BᵀA) Formula: A = UP (rotation × stretch)

4.5a Procrustes-align channel cloud to canonical pose before Cholesky/SVD
- Remove rotation variability, isolate shape information
4.5b Polar decomposition of channel matrix: U (rotation) + P (stretch) as separate features
- U encodes orientation of frequency cloud; P encodes shape/scale
- Both are geometric, both are deterministic from the channel matrix

CATEGORY 5: MATRIX DECOMPOSITION SIGNATURES

5.1 Already Tested

Cholesky of Gram matrix → 36 lower-tri values (in v4, working)
SVD singular values → 8 values (in v4, working)
Concatenated 44-d signature on S^43 → 46.8% with CE-only

5.2 Remaining Decompositions

5.2a QR decomposition: Q (rotation) and R diagonal (scale per channel)
- R diagonal = per-channel magnitude; Q = inter-channel angular structure
5.2b Schur decomposition: T diagonal = eigenvalues, T off-diagonal = coupling
- For the Gram matrix: Schur gives eigenstructure in triangular form
5.2c Eigendecomposition of Gram: eigenvalues as spectral signature
- Compare: eigenvalues vs SVD singular values vs Cholesky diagonal
- These are related but not identical (λ_i = σ_i² for Gram = AᵀA)
5.2d NMF of magnitude spectrum: parts-based decomposition
- Requires iterative optimization (not fully deterministic)
- But finds additive, non-negative parts — texture components
5.2e Tucker tensor decomposition of spatial×frequency×channel tensor
- 3D structure: (H, W, freq_bins) per color channel
- Core tensor encodes interactions between spatial, frequency, channel modes

CATEGORY 6: INFORMATION-THEORETIC LOSSES

6.1 Already Tested

InfoNCE (self-contrastive, two augmented views) — dead at 0.15 in spectral v4
CosineEmbeddingLoss — frozen at 0.346 (margin-saturated)
CV loss (Cayley-Menger volume) — running but not in 0.18-0.25 band

6.2 Loss Modifications

6.2a Drop contrastive losses entirely, CE-only + geometric losses
- v4 shows CE is the only contributor; contrastive is dead weight
- Hypothesis: removing dead losses may speed convergence
6.2b Class-conditional InfoNCE: positive = same class, not same image
- Requires labels but gives much stronger supervision signal
6.2c vMF-based contrastive loss: replace dot-product similarity with vMF log-likelihood
- κ-adaptive: high-κ for nearby pairs, low-κ for far pairs
6.2d Fisher-Rao distance as loss: d_FR(p,q) = 2·arccos(∫√(pq))
- Natural distance for distributions on the sphere
6.2e Sliced spherical Wasserstein distance as distribution matching loss
- Matches embedding distribution to target (e.g., uniform on sphere)
6.2f Geometric autograd (from GM3): tangential projection + separation preservation
- Adam + geometric autograd > AdamW on geometric tasks (proven)
- Operates on gradient direction, not loss value

6.3 Anchor Management

6.3a Anchor push frequency sweep: every 10, 25, 50, 100, 200 batches
6.3b Anchor push with vMF-weighted centroids instead of hard class centroids
6.3c Anchor birth/death: add anchors where density is high, remove where unused
6.3d Anchor dropout sweep: 0%, 5%, 15%, 30%, 50%

CATEGORY 7: COMPOSITE PIPELINE TESTS

7.1 The Reference Pipeline (from research article)

7.1a Scattering(J=2,L=8) → JL(128) → L2 norm → constellation(64) → classify
- The "canonical" pipeline; expected ~75-80% based on literature
7.1b Same as 7.1a but with learned 2-layer projection replacing JL
- Minimal learned params (~16K), test if projection adaptation matters
7.1c Scattering → curvelet energy → concat → JL → constellation
- Test complementarity

7.2 Hybrid: Spectral + Scattering

7.2a STFT channels (v4) + scattering features → concat → JL → S^d → constellation
- STFT gives spatial-frequency; scattering gives multi-scale invariants
7.2b Scattering → Cholesky Gram + SVD signature → constellation
- Apply v4's geometric signature to scattering output instead of STFT

7.3 Multi-Signature Constellation

7.3a Parallel extraction: scattering + Gabor + Radon → separate constellations → fusion
- Each primitive captures different geometric aspect
- Fusion: concatenate patchwork outputs → shared classifier
7.3b Hierarchical constellation: scattering → coarse anchors → residual → fine anchors
- Two-stage: first stage identifies broad category, second refines

7.4 Minimal Learned Params Tests

7.4a Best deterministic pipeline + 1 learned linear layer (d_in → 128) before constellation
- Measure: how much does a single projection layer help?
- Count: exact learned param count
7.4b Same as 7.4a but with SquaredReLU + LayerNorm (the proven patchwork block)
7.4c Sweep learned projection sizes: 0, 1K, 5K, 10K, 50K, 100K params
- Find the elbow where adding params stops helping

PRIORITY QUEUE (recommended execution order)

Tier 1: Highest Expected Impact

1.1a — Scattering + flat constellation (the literature leader)
1.1b — Scattering + JL → S^127 + constellation
6.2a — Drop dead contrastive losses from v4, measure CE-only ceiling
2.4a — vMF soft assignment replacing hard nearest-anchor
4.2a — Log map triangulation (richer than scalar distance)

Tier 2: High Expected Impact

7.1a — Full reference pipeline
1.1f — Scattering hybrid with minimal learned projection
1.2b — Gabor spatial statistics → S^127
5.2c — Eigendecomposition vs SVD vs Cholesky ablation
2.1b — Quaternionic Hopf S⁷→S⁴ for 8-channel data

Tier 3: Exploratory

1.5a — Persistent homology standalone
3.1b — RFF on scattering features
4.4a — Möbius geometric attention
7.3a — Multi-signature parallel constellations
2.2a — Grassmannian class subspaces

Tier 4: Deep Exploration

1.3a — Radon cloud on S^d
1.4b — Curvelet + scattering concat
2.3a — Flag decomposition of frequency channels
4.3a — Parallel transport aggregation
3.4c — Hyperspherical harmonics analysis

RUNNING SCOREBOARD

Experiment	Val Acc	Params (learned)	CV	Anchors Active	InfoNCE	Key Finding
Linear baseline	67.0%	423K	—	—	—	Overfits E31
MLP baseline	65.0%	687K	—	—	—	Overfits E10
Core CE-only	63.4%	820K	0.70	—	—	CV never converges
Core CE+CV	62.7%	820K	0.61	—	—	CV hurts accuracy
Full GELU	88.0%	1.6M	0.14-0.17	64/64	1.00	Reference
Full SquaredReLU	88.0%	1.6M	0.15	64/64	1.00	Matches GELU
Spectral v1 (flat FFT)	FAIL	—	—	1/64	—	Norm mismatch
Spectral v2 (per-band)	~35%	1.2M	0.17-0.19	900/3072	0.45	Too diffuse
Spectral v3 (sph mean)	~27%	130K	0.27-0.34	110/128	0.35	Collapsed to point
Spectral v4 (STFT+Chol+SVD)	46.8%	137K	0.52-0.66	53/64	0.15	CE-only carry
Scattering baseline	~82%*	0	—	—	—	Literature (SVM)

Italicized entries are literature values, not our runs

NOTES & INSIGHTS

Why contrastive losses die on deterministic encoders

The STFT/FFT faithfully reports every pixel-level difference between augmented views. Two crops of the same image produce signatures as different as two different images. Without a learned layer to absorb augmentation variance, InfoNCE has nothing to align. Solutions: (a) augmentation-invariant features (scattering), (b) thin learned projection, (c) class-conditional contrastive (6.2b), (d) drop contrastive entirely (6.2a).

The Cholesky insight

L diagonal encodes "new angular information per tier given all lower tiers." This IS discriminative (proved by v4 reaching 46.8% with CE alone). The 44-d signature on S^43 carries real inter-channel geometry. Next question: is the STFT front-end the bottleneck, or the 44-d signature?

Scattering is the clear next step

82% on CIFAR-10 with zero learned params (literature) vs our 46.8%. Scattering is translation-invariant AND deformation-stable (Lipschitz). This directly addresses the augmentation sensitivity problem. kymatio provides GPU-accelerated PyTorch implementation.

The dimension question

S^15 (band_dim=16) vs S^43 (signature) vs S^127 (conv encoder output) E₈ lattice gives 240 optimal anchors on S^7 Proven CV attractor at ~0.20 is on S^15 Need to test which target sphere dimension is optimal for spectral features

Last updated: 2026-03-18, session with Opus Next: run scattering baseline (1.1a), then decide pipeline direction

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including AbstractPhil/geolip-hypersphere-experiments

GEOLIP Research Concepts

Collection

A series of repos dedicated to geolip research and results, some with stored weights, some without. All entirely based on the progress of geolip. • 11 items • Updated about 21 hours ago