00 · Install

Python 3.11 or newer. httpx powers the HuggingFace Neuron source. Grab a token at huggingface.co/settings/tokens — read scope is enough.

# Python 3.11+. httpx powers the HuggingFace Neuron source.
$ pip install cosmonapse httpx

# Read scope is enough — the token grants access to the public
# Inference Providers router at https://router.huggingface.co.
$ export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxx
01 · The Neuron

An LLM, behind the same interface.

A Neuron is anything that satisfies async fn(input, context) → output. Neuron(source="huggingface", ...) is the unified factory: it returns an async callable with that shape, wrapped around any OpenAI-compatible chat endpoint. Switch source="ollama" or source="flask" and the rest of the program is unchanged.

greeter.py
# A Neuron is anything that satisfies the NeuronFn contract:
#   async fn(input, context) -> output
#
# The unified factory wraps any source behind that interface. Here it's
# a HuggingFace endpoint; it could equally be Ollama, a Flask app, or an
# MCP server — the Axon never knows the difference.
import os
from cosmonapse import Neuron

greeter = Neuron(
    source="huggingface",
    endpoint="https://router.huggingface.co",
    model="meta-llama/Llama-3.1-8B-Instruct",
    api_key=os.environ["HF_TOKEN"],
    use_chat_api=True,
    max_new_tokens=128,
    temperature=0.7,
)

# Input the orchestrator sends: {"prompt": "..."} or {"messages": [...]}
# Output the Neuron returns:    {"response": "<text>", "meta": <raw>}
02 · The Axon

Identity, capabilities, validation.

The Axon wraps the Neuron, gives it an addressable id on the bus, and turns return values into protocol-valid AGENT_OUTPUT Signals. It never touches the Synapse itself — that boundary is enforced in code, not convention. This snippet is identical whether the Neuron is an LLM, a function, or a Flask app.

axon.py
# The Axon declares identity + capabilities and owns the Neuron.
# It doesn't know it's wrapping an LLM — this code is byte-for-byte the
# same as it would be for a hand-written async function.
from cosmonapse import Axon

axon = Axon(
    neuron_id="greeter",
    neuron_fn=greeter,
    capabilities=["text-generation", "chat", "greet"],
)
03 · The Dendrite

The only thing that touches the Synapse.

The Dendrite hosts Axons, emits REGISTER / HEARTBEAT / DEREGISTER on their behalf, routes inbound TASKs, and exposes the dispatch API. We build two — a role="worker" that serves requests, and an orchestrator (default role) that sends them. Both share the same in-memory Synapse.

dendrite.py
# The Dendrite is the only component that touches the Synapse.
# role="worker" is a protocol guard: workers can serve TASKs and bid,
# but cannot emit orchestration signals (TASK / FINAL / etc.).
from cosmonapse import Dendrite, MemorySynapse

synapse = MemorySynapse()         # in-process — no socket
await synapse.connect()

worker = Dendrite(
    synapse=synapse,
    namespace="demo",
    role="worker",
)
worker.attach_axon(axon)

orchestrator = Dendrite(synapse=synapse, namespace="demo")
04 · The dispatch

One TASK, one Pathway, one reply.

dispatch_and_wait is sugar over a Pathway: emit a TASK on a new trace_id, open a Pathway scoped to that trace, await the first terminal Signal, close the Pathway, and return the Signal. The LLM Neuron returns {"response": "...", "meta": {...}}, so the answer lives at reply.payload["output"]["response"].

dispatch.py
# dispatch_and_wait is sugar over a Pathway:
#   1. emit a TASK on this trace_id
#   2. open a Pathway scoped to the trace
#   3. await the first terminal Signal (AGENT_OUTPUT here)
#   4. close the Pathway, return the Signal
async with worker, orchestrator:
    reply = await orchestrator.dispatch_and_wait(
        neuron="greeter",
        input={"prompt": "Say hello to a project called Cosmonapse in one line."},
        timeout_s=30.0,
    )
    print(f"[{reply.type.value}] {reply.payload['output']['response']}")
05 · Putting it together

The whole program.

About 25 lines of real code, including the LLM. Save as main.py and run.

main.py
import asyncio, os
from cosmonapse import Axon, Dendrite, MemorySynapse, Neuron


greeter = Neuron(
    source="huggingface",
    endpoint="https://router.huggingface.co",
    model="meta-llama/Llama-3.1-8B-Instruct",
    api_key=os.environ["HF_TOKEN"],
    use_chat_api=True,
    max_new_tokens=128,
    temperature=0.7,
)


async def main():
    synapse = MemorySynapse()
    await synapse.connect()
    try:
        axon = Axon(
            neuron_id="greeter",
            neuron_fn=greeter,
            capabilities=["text-generation", "chat", "greet"],
        )
        worker = Dendrite(synapse=synapse, namespace="demo", role="worker")
        worker.attach_axon(axon)

        orchestrator = Dendrite(synapse=synapse, namespace="demo")

        async with worker, orchestrator:
            reply = await orchestrator.dispatch_and_wait(
                neuron="greeter",
                input={"prompt": "Say hello to a project called Cosmonapse in one line."},
                timeout_s=30.0,
            )
            print(f"[{reply.type.value}] {reply.payload['output']['response']}")
    finally:
        await synapse.close()


asyncio.run(main())
$ python main.py
[AGENT_OUTPUT] Hello, Cosmonapse! Welcome aboard — let's build something cool.

Exact text varies — the model is stochastic.

06 · Swap the model

One line moves between providers.

The endpoint is the only HuggingFace-specific line. Point it at a dedicated HF endpoint, a local TGI / vLLM server, or LM Studio — the Neuron, Axon, and Dendrite code never changes. For Ollama, swap the source.

providers.py
# The endpoint is the only HF-specific line. Point it elsewhere for any
# OpenAI-compatible chat server — your Neuron code never changes.
endpoint="https://router.huggingface.co"                          # default
endpoint="https://<your-endpoint>.endpoints.huggingface.cloud"   # dedicated HF endpoint
endpoint="http://localhost:8080"                                 # local TGI / vLLM / LM Studio

# For Ollama, switch source — same Axon, same Dendrite.
greeter = Neuron(source="ollama", model="llama3")