Skip to main content
Brian Sunter

Wikipedia 3D Embeddings

Using machine learning text embeddings to organize and explore Wikipedia articles in an interactive 3D space.

Python Three.js Machine Learning UMAP Text Embeddings Data Visualization
Cover image for Wikipedia 3D Embeddings

Overview

An experiment in organizing information using machine learning text embeddings and 3D visualization. The project automatically organizes Wikipedia articles by their meaning, creating an explorable 3D graph where similar articles cluster together naturally.

Features

  • Visualizes 1,000 or 10,000 Wikipedia articles in 3D space
  • Automatic organization using text embeddings (no manual tagging)
  • Interactive pan, rotate, and zoom controls
  • Articles colored by location to show groupings
  • Intelligent text rendering (billboarding, distance-based sizing)
  • Browser-based 3D rendering with Three.js

Technology Stack

Python Backend:

  • sentence-transformers for text embeddings (all-MiniLM-L6-v2 model)
  • UMAP for dimensional reduction (embeddings → 3D coordinates)
  • Wikipedia API for article content

Frontend:

  • Three.js for 3D rendering
  • WebGL for performance with thousands of nodes
  • HTML5 Canvas for text rendering

How It Works

  1. Text Embeddings: Each Wikipedia article is converted into a 384-dimensional vector representing its “meaning”
  2. Dimensional Reduction: UMAP algorithm reduces 384D → 3D coordinates while preserving relationships
  3. 3D Visualization: Articles are positioned in 3D space based on semantic similarity
  4. Exploration: Users can navigate the space to discover related topics

Key Insights

  • Dynamic Organization: Unlike rigid folders, articles can exist between categories
  • Automatic Updates: New articles automatically reorganize the entire graph
  • Discovery: Machine-suggested groupings reveal unexpected connections
  • 3D Advantage: 3D space preserves more relational information than 2D

The Dataset

Uses Wikipedia’s “Vital Articles” - two datasets:

  • 1K articles: Level 3 vital articles (essential topics)
  • 10K articles: Level 4 vital articles (comprehensive coverage)

Articles are broken into 500-word chunks, with embeddings averaged per page to capture the “average meaning” of each article.

Demo & Code

Design Philosophy

Traditional organization methods (folders, tags) require continuous manual effort. This experiment shows how machine learning can create self-organizing information systems that scale with your content, not your maintenance effort.

As spatial computing devices like Apple Vision Pro become mainstream, 3D interfaces for information will become increasingly important.