Overview
Biped Copilot is a wearable robotics platform that enables blind and visually impaired users to navigate independently. Using stereo depth cameras worn on the chest, it provides real-time obstacle detection, ground analysis, and spatial audio feedback through bone conduction headphones.
Over 3.5 years as a core engineer, I contributed across the full stack: from low-level camera drivers and point cloud processing to AI-powered scene descriptions and multi-language support, shipping 45+ releases across 3 hardware platforms to users in 12+ languages.
Development Timeline
- Multiprocessing architecture with shared memory and inter-process communication
- Obstacle detection with YOLOv5, ground plane detection, SORT tracker
- 3D spatialized audio feedback with priority management
- BLE communication with companion mobile app
- Debian packaging and production service deployment
- Rerun 3D visualizer integration for real-time debugging
- Remote diagnostics upload, camera calibration and assignment scripts
- Multi-language audio assets (French, German)
- Edge 2 hardware support with obstacle tracking and risk model
- Image polling/processing pipeline split for performance
- Hemispatial neglect priority, navigation commands, guide dog rule
- Point cloud stabilization and 3D world-to-camera projection
- 3D Kalman tracking with speed estimation
- Hole detection using KD-tree ground analysis
- Small obstacle detection with Cython optimization
- VIM3 hardware platform support, copilot as installable package
- GPS integration for outdoor positioning
- Risk model with multi-Gaussian collision probability
- Hand and occlusion detection, motion-adaptive detection range
- Elliptical corridor rework for obstacle proximity
- OpenAI-powered scene descriptions triggered by button press
- Text-to-speech generation with audio streaming
- App state machine, prompts managed server-side
- Developer tooling: monorepo merge, benchmarking integration
- NOA hardware platform: button strip, menu system, AI detection classes
- Sentry error tracking, JWT authentication, Docker builds
- End-to-end testing framework, Webots simulator integration
- Obstacle audio sharpness, Gemini-generated assets
- 12+ language support with online TTS and auto-translate
- Local occupancy grid for spatial mapping
- BLE and audio system rework
- Token-to-audio streaming, JPEG compression for AI requests
- Real-time video description with route following
- Structured AI output with file caching
- Rich navigation instructions, favorite destinations, GPX recording
- Person finding, obstacle scanning, concurrent scene/video descriptions
Technical Scope
The system runs a Python multiprocessing architecture on embedded ARM boards (Khadas Edge 2, VIM3), managing concurrent tasks for image acquisition, obstacle detection, feedback generation, BLE communication, localization, and AI descriptions.
- Perception: stereo depth processing, point clouds, ground plane estimation, Kalman tracking, hole detection
- Audio: 3D spatialized feedback, priority management, obstacle sharpness, TTS integration
- AI/Cloud: OpenAI and Gemini scene descriptions, video analysis, structured output, server-side prompt engineering
- Infrastructure: Debian packaging, Docker, Sentry, GCloud, CI/CD with GitHub Actions
- Libraries: biped_ble (BLE comms), biped_language (i18n), biped_data (cloud sync), benchmarking tools