mirror of https://github.com/Frierenclaw/fern.git synced 2026-06-22 03:20:04 +00:00

Strict and reliable pipecat-ai orchestrator. Handles real-time audio streams, VAD, and schedules animation frames.

Find a file

ladev d301b1fd30 include license		2026-06-14 02:13:30 +03:00
src	update dockerfile and unharcode whisper language	2026-06-11 04:21:53 +03:00
.gitignore	feat: add frieren hub (character marketplace)	2026-06-09 16:13:28 +03:00
docker-compose.yml	fix docker compose	2026-06-11 02:14:16 +03:00
LICENSE	include license	2026-06-14 02:13:30 +03:00
README.md	fix cover upload issue and update readme	2026-06-09 21:12:29 +03:00

README.md

🦋 Fern (The Core Pipeline)

"A strict, reliable orchestrator. It doesn't tolerate delays."

Fern is the central nervous system and real-time orchestrator of the Frieren AI Ecosystem. Built on top of pipecat-ai and driven by asyncio, it manages the entire multimodal pipeline with extreme focus on ultra-low latency.

Important

Development Reboot: This repository marks a complete overhaul of the core pipeline. The previous implementation was highly synchronous and relied on subpar workarounds. This new iteration transitions fully to pipecat-ai, enforcing strict asynchronous handling, shifting CPU-bound workloads entirely onto Heiter, and establishing proper concurrency. The legacy codebase remains accessible in the old branch.

🔮 Responsibilities

The Gatekeeper: Establishes and manages bidirectional WebRTC connections with frieren-desktop.
The Transcriber: Captures raw PCM audio from the client and routes it through fast VAD and STT.
The Catalyst: Streams context to heiter (Inference Server) and processes incoming text tokens on the fly.
The Alchemist: Generates real-time audio streams via TTS and simultaneously extracts speech visemes (mouth animation data) to push back to the 3D client.
Interruption Handling: Instantly purges active buffers the moment the user starts speaking, ensuring natural conversation flow.

📐 Architecture Flow

From	Direction / Protocol	To	Data Transferred
`frieren-desktop`	`---> (Raw Audio PCM) ---ч>`	`VAD / STT`	Captures raw microphone input from the client.
`VAD / STT`	`---> (Audio/Text Chunks) --->`	`heiter` (LLM) Service	Decoded voice tokens routed to the inference backend.
`heiter` (LLM) Service	`<--- (Text Tokens) <---`	TTS Engine	Text tokens generated on the fly stream directly into the synthesis layer.
TTS Engine	`<--- (Audio & Visemes) <---`	`frieren-desktop`	Final synthesized audio frames and real-time lip-sync JSON pushed back to the 3D client.

🛠 Tech Stack & Spells

Core Engine: Python 3.14 (Optimized for asyncio)
Pipeline Framework: pipecat-ai
Network Protocol: LiveKIT for audio
Dependencies: pydantic, httpx, aiohttp

Part of the Frieren AI Ecosystem