GeoLIP Spectral Encoder — Test Manifest
Geometric Primitives for Constellation-Anchored Classification
Target: CIFAR-10 (baseline), then generalize Constraint: Zero or minimal learned encoder params. All learning in constellation anchors, patchwork, classifier. Metric: Val accuracy, CV convergence, anchor activation, InfoNCE lock, train/val gap Baseline to beat: 88.0% (conv encoder + SquaredReLU + full trainer, 1.6M params) Current best spectral: 46.8% (STFT + Cholesky + SVD, v4, 137K params, CE-only carry)
STATUS KEY
[ ]— Not started[R]— Running[X]— Completed[F]— Failed (with reason)[S]— Skipped (with reason)[P]— Partially completed
COMPLETED EXPERIMENTS (prior sessions + this session)
Conv Encoder Baselines (Form 1 Core)
- Linear baseline, 100 epochs → 67.0%, 422K params, overfits at E31
- MLP baseline, 100 epochs → 65.0%, 687K params, overfits at E10
- Core CE-only, 100 epochs → 63.4%, 820K params, CV=0.70, never converges
- Core CE+CV, 100 epochs → 62.7%, 820K params, CV=0.61, worse than CE-only
- Core 32 anchors, interrupted E20 → 59.2%, 1.8M params, slow convergence
- Full trainer GELU, 100 epochs → 88.0%, 1.6M params (original proven result)
- Full trainer SquaredReLU, 100 epochs → 88.0%, 1.6M params, E96 best
Spectral Encoder Experiments
- [F] Spectral v1: flat FFT → 768-d → single constellation → collapsed
- Cause: concat norm √48≈6.93 vs anchor norm 1, not on same sphere
- [F] Spectral v2: per-band constellation (48×64=3072 anchors) → ~35%
- Cause: 3072 tri dims too diffuse, InfoNCE dead at 0.45, no cross-band structure
- [F] Spectral v3: FFT → 8 channels (spherical mean) → 128 anchors → 27%
- Cause: cos≈0.99, spherical mean collapsed all images to same point
- [P] Spectral v4: STFT + Cholesky + SVD → S^43 → 64 anchors → 46.8% (still running)
- CE carrying alone, CosineEmbeddingLoss frozen at 0.346, InfoNCE dead at 0.15
- Cholesky+SVD signature IS discriminative, contrastive losses unable to contribute
CATEGORY 1: SIGNAL DECOMPOSITION TO GEOMETRY
1.1 Wavelet Scattering Transform (Mallat)
Formula: S_J[p]x(u) = |||x * ψ_{λ₁}| * ψ_{λ₂}| ... | * φ_{2^J}(u) Library: kymatio (pip install kymatio) Github: https://github.com/kymatio/kymatio Expected output: ~10K-dim feature vector for 32×32 Literature baseline: ~82% CIFAR-10 with SVM, ~70.5% with linear Properties: Deterministic, Lipschitz-continuous, approximately energy-preserving
- 1.1a Scattering order 2, J=2, L=8 → L2 normalize → flat constellation on S^d
- Hypothesis: scattering features are rich enough that flat constellation should work
- Compare: direct linear classifier on scattering vs constellation pipeline
- 1.1b Scattering → JL projection to S^127 → constellation (64 anchors)
- JL preserves distances; S^127 matches our proven dim
- 1.1c Scattering → JL → S^43 → Cholesky/SVD signature → constellation
- Stack v4's geometric signature on top of scattering features
- 1.1d Scattering order 1 vs order 2 ablation
- Order 1 is ~Gabor magnitude; order 2 adds inter-frequency structure
- 1.1e Scattering + InfoNCE: does augmentation invariance help or hurt?
- Scattering is already translation-invariant; InfoNCE may be redundant
- 1.1f Scattering hybrid: scattering front-end + lightweight learned projection + constellation
- Test minimal learned params needed to bridge the 82→88% gap
1.2 Gabor Filter Banks
Formula: g(x,y) = exp(−(x'²+γ²y'²)/(2σ²)) · exp(i(2πx'/λ+ψ)) Expected: S scales × K orientations → S×K magnitude responses Properties: Deterministic, O(N·S·K), first-order scattering ≈ Gabor modulus
- 1.2a Gabor bank (4 scales × 8 orientations = 32 filters) → L2 norm → S^31
- Each filter response is a spatial map; pool to scalar per filter
- 1.2b Gabor → per-filter spatial statistics (mean, std, skew, kurtosis) → S^127
- 32 filters × 4 stats = 128-d, matches conv encoder output dim
- 1.2c Gabor vs scattering order 1 A/B test
- Validate that scattering order 1 ≈ Gabor + modulus
1.3 Radon Transform
Formula: Rf(ω,t) = ∫ f(x) δ(x·ω − t) dx Properties: Deterministic, exactly invertible via filtered back-projection
- 1.3a Radon at K angles → sinogram → L2 norm per angle → K points on S^d
- K angles = K geometric addresses, constellation measures the cloud
- 1.3b Radon → 1D wavelet per projection (= ridgelet) → aggregate to S^d
- Composition: Radon → Ridgelet, captures linear singularities
1.4 Curvelet Transform
Formula: c_{j,l,k} = ⟨f, φ_{j,l,k}⟩, parabolic scaling: width ≈ length² Properties: Deterministic, exactly invertible (tight frame), O(N² log N)
- 1.4a Curvelet energy per (scale, orientation) band → L2 norm → S^d
- Captures directional frequency that scattering misses
- 1.4b Curvelet + scattering concatenation → JL → constellation
- Test complementarity of isotropic (scattering) + anisotropic (curvelet) features
1.5 Persistent Homology (TDA)
Formula: Track birth/death of β₀ (components), β₁ (loops) across filtration Library: giotto-tda or ripser Properties: Deterministic, O(n³), captures topology no other transform sees
- 1.5a Sublevel set filtration on grayscale → persistence image → L2 norm → S^d
- 1.5b PH on scattering feature maps (topology of the representation)
- Captures whether scattering features form clusters, loops, voids
- 1.5c PH Betti curve as additional channel in multi-signature pipeline
- 1.5d PH standalone classification baseline on CIFAR-10
- Literature suggests ~60-70% standalone; valuable as complementary signal
1.6 STFT Variants (improving v4)
- 1.6a 2D STFT via patch-wise FFT (overlapping patches) instead of row/col STFT
- True spatial-frequency decomposition vs row+col approximation
- 1.6b STFT with larger n_fft=32 (current: 16) → more frequency resolution
- 1.6c STFT preserving phase (not just magnitude) via analytic signal
- Phase encodes spatial structure; current pipeline discards it
- 1.6d Multi-window STFT (different window sizes for different frequency ranges)
CATEGORY 2: MANIFOLD STRUCTURES
2.1 Hopf Fibration
Formula: h(z₁,z₂) = (2z̄₁z₂, |z₁|²−|z₂|²) : S³ → S² Properties: Deterministic, O(1), hierarchical (base + fiber)
- 2.1a Encode 4-d feature vectors on S³ → Hopf project to S² + fiber coordinate
- Coarse triangulation on S², fine discrimination in fiber
- 2.1b Quaternionic Hopf S⁷ → S⁴ for 8-d features
- Natural for 8-channel spectral decomposition (v3/v4 channel count)
- 2.1c Hopf foliation spherical codes for anchor initialization
- Replace uniform_hypersphere_init with Hopf-structured codes
- 2.1d Hierarchical constellation: coarse anchors on base S², fine anchors per fiber
2.2 Grassmannian Class Representations
Formula: Class = k-dim subspace of ℝⁿ, distances via principal angles Properties: Requires SVD, O(nk²)
- 2.2a Replace class vectors with class subspaces on Gr(k,n)
- Each class owns a k-dim subspace; classification = nearest subspace
- Literature: +1.3% on ImageNet over single class vectors
- 2.2b Grassmannian distance metrics ablation: geodesic vs chordal vs projection
- 2.2c Per-class anchor subspace: each anchor defines a subspace, not a point
2.3 Flag Manifold (Nested Subspace Hierarchy)
Formula: V₁ ⊂ V₂ ⊂ ... ⊂ Vₖ, nested subspaces Properties: Generalizes Grassmannian, natural for multi-resolution
- 2.3a Flag decomposition of frequency channels (DC ⊂ low ⊂ mid ⊂ high)
- Test whether nesting constraint improves spectral encoder
- 2.3b Flag-structured anchors: coarse-to-fine anchor hierarchy
2.4 Von Mises-Fisher Mixture
Formula: f(x; μ, κ) = C_p(κ) exp(κ μᵀx), soft clustering on S^d Properties: Natural density model for hyperspherical data
- 2.4a Replace hard nearest-anchor assignment with vMF soft posteriors
- p(j|x) = α_j f(x;μ_j,κ_j) / Σ α_k f(x;μ_k,κ_k)
- Learned κ per anchor = adaptive influence radius
- 2.4b vMF mixture EM for anchor initialization (replace uniform hypersphere init)
- 2.4c vMF concentration κ as a diagnostic: track per-class κ convergence
2.5 Optimal Anchor Placement
- 2.5a E₈ lattice anchors for 8-d constellation (240 maximally separated points)
- 2.5b Spherical t-design initialization vs uniform hypersphere init
- 2.5c Thomson problem solver for N anchors on S^d (energy minimization)
- Compare: QR + iterative repulsion (current) vs Coulomb energy minimization
CATEGORY 3: COMPACT REPRESENTATIONS
3.1 Random Fourier Features
Formula: z(x) = √(2/D) [cos(ω₁ᵀx+b₁), ..., cos(ωDᵀx+bD)] Properties: Pseudo-deterministic, preserves kernel structure, maps to S^d via cos/sin
- 3.1a RFF on raw pixels → S^d → constellation
- Baseline: how much does nonlinear kernel approximation help raw pixels?
- 3.1b RFF on scattering features → constellation
- Composition: scattering (linear invariants) → RFF (nonlinear kernel)
- 3.1c Fourier feature positional encoding (Tancik/Mildenhall style)
- γ(v) = [cos(2πBv), sin(2πBv)]ᵀ explicitly maps to hypersphere
3.2 Johnson-Lindenstrauss Projection
Formula: f(x) = (1/√k)Ax, preserves distances with k = O(ε⁻² log n) Properties: Pseudo-deterministic, near-isometric
- 3.2a JL from scattering (~10K) to 128-d → L2 norm → constellation
- Test: does JL + L2 norm preserve enough structure?
- 3.2b JL target dimension sweep: 32, 64, 128, 256, 512
- Find minimum k where constellation accuracy saturates
- 3.2c Fast JL (randomized Hadamard) vs Gaussian JL speed/accuracy tradeoff
3.3 Compressed Sensing on Scattering Coefficients
Formula: y = Φx, recover via ℓ₁ minimization if x is k-sparse Properties: Exact recovery for sparse signals, O(k log(N/k)) measurements
- 3.3a Measure sparsity of scattering coefficients (how many are near-zero?)
- If sparse: CS can compress much more than JL
- 3.3b CS measurement matrix → L2 norm → constellation
- Compare: CS vs JL at same target dimension
3.4 Spherical Harmonics
Formula: Y_l^m(θ,φ), complete basis on S², (l_max+1)² coefficients Properties: Deterministic, native Fourier on sphere, exactly invertible
- 3.4a Expand constellation triangulation profile in spherical harmonics
- Which angular frequencies carry discriminative info?
- 3.4b Spherical harmonic coefficients of embedding distribution as class signature
- 3.4c Hyperspherical harmonics for S^15 and S^43 (higher-dim generalization)
CATEGORY 4: INVERTIBLE GEOMETRIC TRANSFORMS
4.1 Stereographic Projection
Formula: σ(x) = x_{1:n}/(1−x_{n+1}), σ⁻¹(y) = (2y, ‖y‖²−1)/(‖y‖²+1) Properties: Conformal bijection S^n{pole} ↔ ℝⁿ, preserves angles
- 4.1a Stereographic → Euclidean scattering → inverse stereographic → S^d
- Apply scattering in flat space, project back to sphere
- 4.1b Stereographic projection as constellation readout alternative
- Instead of triangulation distances, read local coordinates via stereographic
4.2 Exponential / Logarithmic Maps
Formula: exp_p(v) = cos(‖v‖)·p + sin(‖v‖)·v/‖v‖ Formula: log_p(q) = arccos(⟨q,p⟩) · (q−⟨q,p⟩p)/‖q−⟨q,p⟩p‖ Properties: Deterministic, locally invertible, O(n)
- 4.2a Replace triangulation (1−cos) with log map coordinates at each anchor
- Log map gives direction + distance in tangent space (richer than scalar distance)
- Each anchor contributes d-dim tangent vector instead of 1-d distance
- 4.2b Log map triangulation → parallel transport to common tangent space → aggregate
- Geometrically principled alternative to patchwork concatenation
4.3 Parallel Transport
Formula: Γ^q_p(v) = v − (⟨v,p⟩+⟨v,q⟩/(1+⟨p,q⟩))·(p+q) on S^n Properties: Isometric between tangent spaces, exactly invertible
- 4.3a Compute log maps at K anchors → parallel transport all to north pole → aggregate
- Creates a canonical tangent-space representation independent of anchor positions
- 4.3b Parallel transport as inter-anchor communication in constellation
- How does the same input look from different anchor tangent spaces?
4.4 Möbius Transformations
Formula: h_ω(z) = (1−‖ω‖²)/‖z−ω‖² − ω Properties: Conformal automorphism of S^d, invertible, O(d)
- 4.4a Möbius "geometric attention": transform sphere to zoom into anchor regions
- Expand region near anchor, compress far regions
- Each anchor applies its own Möbius transform before measuring distance
- 4.4b Composition of Möbius transforms as normalizing flow on S^d
- Learned flow that warps embedding distribution toward better separation
4.5 Procrustes + Polar Decomposition
Formula: R* = argmin_R ‖RA−B‖_F = UVᵀ from SVD(BᵀA) Formula: A = UP (rotation × stretch)
- 4.5a Procrustes-align channel cloud to canonical pose before Cholesky/SVD
- Remove rotation variability, isolate shape information
- 4.5b Polar decomposition of channel matrix: U (rotation) + P (stretch) as separate features
- U encodes orientation of frequency cloud; P encodes shape/scale
- Both are geometric, both are deterministic from the channel matrix
CATEGORY 5: MATRIX DECOMPOSITION SIGNATURES
5.1 Already Tested
- Cholesky of Gram matrix → 36 lower-tri values (in v4, working)
- SVD singular values → 8 values (in v4, working)
- Concatenated 44-d signature on S^43 → 46.8% with CE-only
5.2 Remaining Decompositions
- 5.2a QR decomposition: Q (rotation) and R diagonal (scale per channel)
- R diagonal = per-channel magnitude; Q = inter-channel angular structure
- 5.2b Schur decomposition: T diagonal = eigenvalues, T off-diagonal = coupling
- For the Gram matrix: Schur gives eigenstructure in triangular form
- 5.2c Eigendecomposition of Gram: eigenvalues as spectral signature
- Compare: eigenvalues vs SVD singular values vs Cholesky diagonal
- These are related but not identical (λ_i = σ_i² for Gram = AᵀA)
- 5.2d NMF of magnitude spectrum: parts-based decomposition
- Requires iterative optimization (not fully deterministic)
- But finds additive, non-negative parts — texture components
- 5.2e Tucker tensor decomposition of spatial×frequency×channel tensor
- 3D structure: (H, W, freq_bins) per color channel
- Core tensor encodes interactions between spatial, frequency, channel modes
CATEGORY 6: INFORMATION-THEORETIC LOSSES
6.1 Already Tested
- InfoNCE (self-contrastive, two augmented views) — dead at 0.15 in spectral v4
- CosineEmbeddingLoss — frozen at 0.346 (margin-saturated)
- CV loss (Cayley-Menger volume) — running but not in 0.18-0.25 band
6.2 Loss Modifications
- 6.2a Drop contrastive losses entirely, CE-only + geometric losses
- v4 shows CE is the only contributor; contrastive is dead weight
- Hypothesis: removing dead losses may speed convergence
- 6.2b Class-conditional InfoNCE: positive = same class, not same image
- Requires labels but gives much stronger supervision signal
- 6.2c vMF-based contrastive loss: replace dot-product similarity with vMF log-likelihood
- κ-adaptive: high-κ for nearby pairs, low-κ for far pairs
- 6.2d Fisher-Rao distance as loss: d_FR(p,q) = 2·arccos(∫√(pq))
- Natural distance for distributions on the sphere
- 6.2e Sliced spherical Wasserstein distance as distribution matching loss
- Matches embedding distribution to target (e.g., uniform on sphere)
- 6.2f Geometric autograd (from GM3): tangential projection + separation preservation
- Adam + geometric autograd > AdamW on geometric tasks (proven)
- Operates on gradient direction, not loss value
6.3 Anchor Management
- 6.3a Anchor push frequency sweep: every 10, 25, 50, 100, 200 batches
- 6.3b Anchor push with vMF-weighted centroids instead of hard class centroids
- 6.3c Anchor birth/death: add anchors where density is high, remove where unused
- 6.3d Anchor dropout sweep: 0%, 5%, 15%, 30%, 50%
CATEGORY 7: COMPOSITE PIPELINE TESTS
7.1 The Reference Pipeline (from research article)
- 7.1a Scattering(J=2,L=8) → JL(128) → L2 norm → constellation(64) → classify
- The "canonical" pipeline; expected ~75-80% based on literature
- 7.1b Same as 7.1a but with learned 2-layer projection replacing JL
- Minimal learned params (~16K), test if projection adaptation matters
- 7.1c Scattering → curvelet energy → concat → JL → constellation
- Test complementarity
7.2 Hybrid: Spectral + Scattering
- 7.2a STFT channels (v4) + scattering features → concat → JL → S^d → constellation
- STFT gives spatial-frequency; scattering gives multi-scale invariants
- 7.2b Scattering → Cholesky Gram + SVD signature → constellation
- Apply v4's geometric signature to scattering output instead of STFT
7.3 Multi-Signature Constellation
- 7.3a Parallel extraction: scattering + Gabor + Radon → separate constellations → fusion
- Each primitive captures different geometric aspect
- Fusion: concatenate patchwork outputs → shared classifier
- 7.3b Hierarchical constellation: scattering → coarse anchors → residual → fine anchors
- Two-stage: first stage identifies broad category, second refines
7.4 Minimal Learned Params Tests
- 7.4a Best deterministic pipeline + 1 learned linear layer (d_in → 128) before constellation
- Measure: how much does a single projection layer help?
- Count: exact learned param count
- 7.4b Same as 7.4a but with SquaredReLU + LayerNorm (the proven patchwork block)
- 7.4c Sweep learned projection sizes: 0, 1K, 5K, 10K, 50K, 100K params
- Find the elbow where adding params stops helping
PRIORITY QUEUE (recommended execution order)
Tier 1: Highest Expected Impact
- 1.1a — Scattering + flat constellation (the literature leader)
- 1.1b — Scattering + JL → S^127 + constellation
- 6.2a — Drop dead contrastive losses from v4, measure CE-only ceiling
- 2.4a — vMF soft assignment replacing hard nearest-anchor
- 4.2a — Log map triangulation (richer than scalar distance)
Tier 2: High Expected Impact
- 7.1a — Full reference pipeline
- 1.1f — Scattering hybrid with minimal learned projection
- 1.2b — Gabor spatial statistics → S^127
- 5.2c — Eigendecomposition vs SVD vs Cholesky ablation
- 2.1b — Quaternionic Hopf S⁷→S⁴ for 8-channel data
Tier 3: Exploratory
- 1.5a — Persistent homology standalone
- 3.1b — RFF on scattering features
- 4.4a — Möbius geometric attention
- 7.3a — Multi-signature parallel constellations
- 2.2a — Grassmannian class subspaces
Tier 4: Deep Exploration
- 1.3a — Radon cloud on S^d
- 1.4b — Curvelet + scattering concat
- 2.3a — Flag decomposition of frequency channels
- 4.3a — Parallel transport aggregation
- 3.4c — Hyperspherical harmonics analysis
RUNNING SCOREBOARD
| Experiment | Val Acc | Params (learned) | CV | Anchors Active | InfoNCE | Key Finding |
|---|---|---|---|---|---|---|
| Linear baseline | 67.0% | 423K | — | — | — | Overfits E31 |
| MLP baseline | 65.0% | 687K | — | — | — | Overfits E10 |
| Core CE-only | 63.4% | 820K | 0.70 | — | — | CV never converges |
| Core CE+CV | 62.7% | 820K | 0.61 | — | — | CV hurts accuracy |
| Full GELU | 88.0% | 1.6M | 0.14-0.17 | 64/64 | 1.00 | Reference |
| Full SquaredReLU | 88.0% | 1.6M | 0.15 | 64/64 | 1.00 | Matches GELU |
| Spectral v1 (flat FFT) | FAIL | — | — | 1/64 | — | Norm mismatch |
| Spectral v2 (per-band) | ~35% | 1.2M | 0.17-0.19 | 900/3072 | 0.45 | Too diffuse |
| Spectral v3 (sph mean) | ~27% | 130K | 0.27-0.34 | 110/128 | 0.35 | Collapsed to point |
| Spectral v4 (STFT+Chol+SVD) | 46.8% | 137K | 0.52-0.66 | 53/64 | 0.15 | CE-only carry |
| Scattering baseline | ~82%* | 0 | — | — | — | Literature (SVM) |
Italicized entries are literature values, not our runs
NOTES & INSIGHTS
Why contrastive losses die on deterministic encoders
The STFT/FFT faithfully reports every pixel-level difference between augmented views. Two crops of the same image produce signatures as different as two different images. Without a learned layer to absorb augmentation variance, InfoNCE has nothing to align. Solutions: (a) augmentation-invariant features (scattering), (b) thin learned projection, (c) class-conditional contrastive (6.2b), (d) drop contrastive entirely (6.2a).
The Cholesky insight
L diagonal encodes "new angular information per tier given all lower tiers." This IS discriminative (proved by v4 reaching 46.8% with CE alone). The 44-d signature on S^43 carries real inter-channel geometry. Next question: is the STFT front-end the bottleneck, or the 44-d signature?
Scattering is the clear next step
82% on CIFAR-10 with zero learned params (literature) vs our 46.8%. Scattering is translation-invariant AND deformation-stable (Lipschitz). This directly addresses the augmentation sensitivity problem. kymatio provides GPU-accelerated PyTorch implementation.
The dimension question
S^15 (band_dim=16) vs S^43 (signature) vs S^127 (conv encoder output) E₈ lattice gives 240 optimal anchors on S^7 Proven CV attractor at ~0.20 is on S^15 Need to test which target sphere dimension is optimal for spectral features
Last updated: 2026-03-18, session with Opus Next: run scattering baseline (1.1a), then decide pipeline direction