Overview
An experiment in organizing information using machine learning text embeddings and 3D visualization. The project automatically organizes Wikipedia articles by their meaning, creating an explorable 3D graph where similar articles cluster together naturally.
Features
- Visualizes 1,000 or 10,000 Wikipedia articles in 3D space
- Automatic organization using text embeddings (no manual tagging)
- Interactive pan, rotate, and zoom controls
- Articles colored by location to show groupings
- Intelligent text rendering (billboarding, distance-based sizing)
- Browser-based 3D rendering with Three.js
Technology Stack
Python Backend:
sentence-transformersfor text embeddings (all-MiniLM-L6-v2 model)- UMAP for dimensional reduction (embeddings → 3D coordinates)
- Wikipedia API for article content
Frontend:
- Three.js for 3D rendering
- WebGL for performance with thousands of nodes
- HTML5 Canvas for text rendering
How It Works
- Text Embeddings: Each Wikipedia article is converted into a 384-dimensional vector representing its “meaning”
- Dimensional Reduction: UMAP algorithm reduces 384D → 3D coordinates while preserving relationships
- 3D Visualization: Articles are positioned in 3D space based on semantic similarity
- Exploration: Users can navigate the space to discover related topics
Key Insights
- Dynamic Organization: Unlike rigid folders, articles can exist between categories
- Automatic Updates: New articles automatically reorganize the entire graph
- Discovery: Machine-suggested groupings reveal unexpected connections
- 3D Advantage: 3D space preserves more relational information than 2D
The Dataset
Uses Wikipedia’s “Vital Articles” - two datasets:
- 1K articles: Level 3 vital articles (essential topics)
- 10K articles: Level 4 vital articles (comprehensive coverage)
Articles are broken into 500-word chunks, with embeddings averaged per page to capture the “average meaning” of each article.
Demo & Code
Design Philosophy
Traditional organization methods (folders, tags) require continuous manual effort. This experiment shows how machine learning can create self-organizing information systems that scale with your content, not your maintenance effort.
As spatial computing devices like Apple Vision Pro become mainstream, 3D interfaces for information will become increasingly important.