Building a Neuron.
The smallest possible Cosmonapse program — one LLM Neuron backed by Hugging Face, one Axon, one Dendrite, one TASK, one reply. Single process, in-memory Synapse, no broker to start. Read this first; every other example adds something on top of this shape, and the LLM doesn't add any boilerplate.
Python 3.11 or newer. httpx powers the HuggingFace Neuron source. Grab a token at huggingface.co/settings/tokens — read scope is enough.
# Python 3.11+. httpx powers the HuggingFace Neuron source. $ pip install cosmonapse httpx # Read scope is enough — the token grants access to the public # Inference Providers router at https://router.huggingface.co. $ export HF_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxx
An LLM, behind the same interface.
A Neuron is anything that satisfies async fn(input, context) → output. Neuron(source="huggingface", ...) is the unified factory: it returns an async callable with that shape, wrapped around any OpenAI-compatible chat endpoint. Switch source="ollama" or source="flask" and the rest of the program is unchanged.
# A Neuron is anything that satisfies the NeuronFn contract: # async fn(input, context) -> output # # The unified factory wraps any source behind that interface. Here it's # a HuggingFace endpoint; it could equally be Ollama, a Flask app, or an # MCP server — the Axon never knows the difference. import os from cosmonapse import Neuron greeter = Neuron( source="huggingface", endpoint="https://router.huggingface.co", model="meta-llama/Llama-3.1-8B-Instruct", api_key=os.environ["HF_TOKEN"], use_chat_api=True, max_new_tokens=128, temperature=0.7, ) # Input the orchestrator sends: {"prompt": "..."} or {"messages": [...]} # Output the Neuron returns: {"response": "<text>", "meta": <raw>}
Identity, capabilities, validation.
The Axon wraps the Neuron, gives it an addressable id on the bus, and turns return values into protocol-valid AGENT_OUTPUT Signals. It never touches the Synapse itself — that boundary is enforced in code, not convention. This snippet is identical whether the Neuron is an LLM, a function, or a Flask app.
# The Axon declares identity + capabilities and owns the Neuron. # It doesn't know it's wrapping an LLM — this code is byte-for-byte the # same as it would be for a hand-written async function. from cosmonapse import Axon axon = Axon( neuron_id="greeter", neuron_fn=greeter, capabilities=["text-generation", "chat", "greet"], )
The only thing that touches the Synapse.
The Dendrite hosts Axons, emits REGISTER / HEARTBEAT / DEREGISTER on their behalf, routes inbound TASKs, and exposes the dispatch API. We build two — a role="worker" that serves requests, and an orchestrator (default role) that sends them. Both share the same in-memory Synapse.
# The Dendrite is the only component that touches the Synapse. # role="worker" is a protocol guard: workers can serve TASKs and bid, # but cannot emit orchestration signals (TASK / FINAL / etc.). from cosmonapse import Dendrite, MemorySynapse synapse = MemorySynapse() # in-process — no socket await synapse.connect() worker = Dendrite( synapse=synapse, namespace="demo", role="worker", ) worker.attach_axon(axon) orchestrator = Dendrite(synapse=synapse, namespace="demo")
One TASK, one Pathway, one reply.
dispatch_and_wait is sugar over a Pathway: emit a TASK on a new trace_id, open a Pathway scoped to that trace, await the first terminal Signal, close the Pathway, and return the Signal. The LLM Neuron returns {"response": "...", "meta": {...}}, so the answer lives at reply.payload["output"]["response"].
# dispatch_and_wait is sugar over a Pathway: # 1. emit a TASK on this trace_id # 2. open a Pathway scoped to the trace # 3. await the first terminal Signal (AGENT_OUTPUT here) # 4. close the Pathway, return the Signal async with worker, orchestrator: reply = await orchestrator.dispatch_and_wait( neuron="greeter", input={"prompt": "Say hello to a project called Cosmonapse in one line."}, timeout_s=30.0, ) print(f"[{reply.type.value}] {reply.payload['output']['response']}")
The whole program.
About 25 lines of real code, including the LLM. Save as main.py and run.
import asyncio, os from cosmonapse import Axon, Dendrite, MemorySynapse, Neuron greeter = Neuron( source="huggingface", endpoint="https://router.huggingface.co", model="meta-llama/Llama-3.1-8B-Instruct", api_key=os.environ["HF_TOKEN"], use_chat_api=True, max_new_tokens=128, temperature=0.7, ) async def main(): synapse = MemorySynapse() await synapse.connect() try: axon = Axon( neuron_id="greeter", neuron_fn=greeter, capabilities=["text-generation", "chat", "greet"], ) worker = Dendrite(synapse=synapse, namespace="demo", role="worker") worker.attach_axon(axon) orchestrator = Dendrite(synapse=synapse, namespace="demo") async with worker, orchestrator: reply = await orchestrator.dispatch_and_wait( neuron="greeter", input={"prompt": "Say hello to a project called Cosmonapse in one line."}, timeout_s=30.0, ) print(f"[{reply.type.value}] {reply.payload['output']['response']}") finally: await synapse.close() asyncio.run(main())
$ python main.py
[AGENT_OUTPUT] Hello, Cosmonapse! Welcome aboard — let's build something cool.Exact text varies — the model is stochastic.
One line moves between providers.
The endpoint is the only HuggingFace-specific line. Point it at a dedicated HF endpoint, a local TGI / vLLM server, or LM Studio — the Neuron, Axon, and Dendrite code never changes. For Ollama, swap the source.
# The endpoint is the only HF-specific line. Point it elsewhere for any # OpenAI-compatible chat server — your Neuron code never changes. endpoint="https://router.huggingface.co" # default endpoint="https://<your-endpoint>.endpoints.huggingface.cloud" # dedicated HF endpoint endpoint="http://localhost:8080" # local TGI / vLLM / LM Studio # For Ollama, switch source — same Axon, same Dendrite. greeter = Neuron(source="ollama", model="llama3")
Integrating an Engram
Bind shared memory and call recall() / imprint() from inside the Neuron.
Pathway — three shapes
The full surface dispatch_and_wait is built on. Sequential, reactive, streaming.
Round-robin orchestrator
Split worker and orchestrator across processes, load-balance HuggingFace workers.