Wikipedia 3D Embeddings

Overview

An experiment in organizing information using machine learning text embeddings and 3D visualization. The project automatically organizes Wikipedia articles by their meaning, creating an explorable 3D graph where similar articles cluster together naturally.

Features

Visualizes 1,000 or 10,000 Wikipedia articles in 3D space
Automatic organization using text embeddings (no manual tagging)
Interactive pan, rotate, and zoom controls
Articles colored by location to show groupings
Intelligent text rendering (billboarding, distance-based sizing)
Browser-based 3D rendering with Three.js

Technology Stack

Python Backend:

sentence-transformers for text embeddings (all-MiniLM-L6-v2 model)
UMAP for dimensional reduction (embeddings → 3D coordinates)
Wikipedia API for article content

Frontend:

Three.js for 3D rendering
WebGL for performance with thousands of nodes
HTML5 Canvas for text rendering

How It Works

Text Embeddings: Each Wikipedia article is converted into a 384-dimensional vector representing its “meaning”
Dimensional Reduction: UMAP algorithm reduces 384D → 3D coordinates while preserving relationships
3D Visualization: Articles are positioned in 3D space based on semantic similarity
Exploration: Users can navigate the space to discover related topics

Key Insights

Dynamic Organization: Unlike rigid folders, articles can exist between categories
Automatic Updates: New articles automatically reorganize the entire graph
Discovery: Machine-suggested groupings reveal unexpected connections
3D Advantage: 3D space preserves more relational information than 2D

The Dataset

Uses Wikipedia’s “Vital Articles” - two datasets:

1K articles: Level 3 vital articles (essential topics)
10K articles: Level 4 vital articles (comprehensive coverage)

Articles are broken into 500-word chunks, with embeddings averaged per page to capture the “average meaning” of each article.

Demo & Code

Design Philosophy

Traditional organization methods (folders, tags) require continuous manual effort. This experiment shows how machine learning can create self-organizing information systems that scale with your content, not your maintenance effort.

As spatial computing devices like Apple Vision Pro become mainstream, 3D interfaces for information will become increasingly important.