Theoretical Neuroscience & Computational Intelligence · Preprint 2026

Distributed Autonomous Neuron Theory:
A Graph-Theoretic Model of Cognition,
Memory, and Emergent Meaning

A First-Principles Framework for Understanding Neural Computation
as Autonomous Agent Networks with Probabilistic Correction

Bharat Rawat · Independent Research

View Mathematical Foundations & Proofs →

Abstract

We propose the Distributed Autonomous Neuron (DAN) Theory — a theoretical framework in which individual neurons operate as autonomous probabilistic agents within a self-modifying graph, collectively producing cognition, memory, and meaning through local interaction without centralized orchestration. Each neuron maintains a local probability distribution over possible next-state activations, propagates signals only to contextually relevant neighbors, and participates in continuous error-correction via bidirectional feedback. Memory is modeled not as stored content but as compressed reconstruction functions distributed across overlapping neuron subsets. The same neuron participates simultaneously in multiple thought-graphs, with meaning emerging from traversal context rather than node identity. We formalize this architecture mathematically, describe the correction and elimination model, present complexity analysis, and propose how this framework can serve as a foundation for neuromorphic computing and next-generation AGI architectures.

§ 1

Introduction

The dominant computational metaphor for the human brain — a central processor coordinating memory retrieval and execution — has guided decades of AI research yet remains fundamentally insufficient. The brain operates on approximately 20 watts across a volume of 1.4 kilograms, achieving cognitive feats that no engineered system of comparable energy budget approaches. This disparity is not merely quantitative; it suggests a qualitatively different computational paradigm.

This paper formalizes an alternative framework arising from first-principles reasoning: that the neuron is not a passive node in a centrally managed network but an autonomous agent — a local decision-maker that receives signals, generates a probability distribution over possible next activations, propagates selectively, and corrects continuously based on bidirectional feedback. No neuron knows what thought it is participating in. Each neuron only knows its local function, its current data, and what signal it received.

Meaning, memory, and cognition are therefore not properties of any individual neuron or even any fixed subgraph — they are emergent properties of traversal patterns across a massively parallel, self-modifying graph. A single neuron participates in thousands of distinct cognitive operations simultaneously, contributing different meaning in each by virtue of which other neurons are co-activated — a property we term contextual superposition.

We further propose that memory is not stored as content but as compressed generative functions: the brain stores the minimal instruction set needed to reconstruct an experience, not the experience itself. This explains both the extraordinary storage efficiency of biological neural systems and the well-documented reconstructive nature of human memory.

§ 2

Core Theoretical Axioms

2.1 Distributed Representation

No single neuron encodes any complete concept, memory, or percept. All meaningful representations arise from activation patterns across large neuron populations. Individual neurons are semantically inert — analogous to individual letters in an alphabet, which carry no meaning until arranged in sequence and context.

Axiom I — Semantic Inertness of Individual Nodes

For any neuron n_i in the graph G, the semantic content of n_i in isolation is undefined. Meaning is defined only for activation sets S ⊆ N where |S| ≥ 2, and is a function of the traversal pattern over S, not of any element of S independently.

2.2 Neuron as Autonomous Agent

Each neuron is modeled as an autonomous agent with three capabilities: (1) receiving and interpreting an input signal, (2) generating a ranked probability distribution over possible next neurons to activate, and (3) propagating a transformed signal forward while accepting error feedback from downstream neurons. Crucially, no neuron has visibility beyond its immediate neighborhood.

Axiom II — Local Autonomy

Each neuron n_i operates exclusively on: its received input signal x_i, its internal weight vector w_i, and its local adjacency set A(n_i). No global state, no central orchestrator signal, and no non-local information participates in any single neuron's computation.

2.3 Memory as Reconstruction Function

Memory is not a stored data record. It is a compressed function f_m such that given a retrieval key k, the function reconstructs the memory state m with sufficient fidelity for the cognitive task at hand. This explains why: (a) memories degrade and drift over time as function parameters shift, (b) different retrieval keys (a smell, a sound, a context) can reconstruct the same memory via different paths, and (c) memories cannot be surgically erased without corrupting overlapping reconstruction functions.

(2.1) Memory M is encoded as: f_m : K \to M* where K is the set of valid retrieval keys and M* is the reconstructed approximation of M // M* \neq M in general; reconstruction introduces drift // f_m is shared across multiple memories; deletion requires modifying f_m globally

§ 3

Formal Graph Model

3.1 Graph Definition

The neural system is modeled as a directed weighted dynamic graph G = (N, E, W, T) where N is the set of neuron nodes, E is the set of directed edges representing synaptic connections, W is the weight function over edges, and T represents the temporal dimension encoding spike timing.

(3.1) G = (N, E, W, T) N = {n₁, n₂, ..., nₖ} — neuron nodes E \subseteq N \times N — directed synaptic edges W : E \to ℝ — synaptic weight function T : E \to ℝ⁺ — temporal delay function // G is dynamic: E and W are modified continuously by learning

Unlike static graphs, G is self-modifying: the graph rewires its own topology based on activation history. Edges strengthen with repeated co-activation (Hebbian principle), weaken with disuse, and new edges form when persistent co-activation patterns are detected. This means the graph structure is itself a form of long-term memory.

Active node

Shared / superposition node

Unreached node

Cross-graph / recurrent edge

Figure 1. Directed neuron activation graph G initiated by trigger T. Node N₅ (purple) participates in multiple thought-subgraphs simultaneously (superposition). Node N₉ exists in the network but is not reached in this traversal. Dashed purple edges represent recurrent and cross-graph connections.

3.2 Node State Function

Each neuron n_i maintains an internal state vector and computes a transformation on its input:

(3.2) State of node nᵢ: sᵢ = (xᵢ, wᵢ, dᵢ, Pᵢ) xᵢ \in ℝⁿ — received input signal vector wᵢ \in ℝᵐ — internal weight vector (synaptic strengths) dᵢ — local data / learned representation Pᵢ : A(nᵢ) \to [0,1] — probability distribution over neighbors // Pᵢ sums to 1 over active candidate neighbors

(3.3) Node output: oᵢ = φ(xᵢ, wᵢ, dᵢ) Next node selection: nⱼ* = argmax_j [ Pᵢ(nⱼ) \cdot R(nⱼ, context) ] // R is contextual relevance — the black box parameter // φ is the node's local transformation function

3.3 Traversal Semantics

A thought-traversal T_q is a time-ordered sequence of activated nodes initiated by a query or sensory input. Unlike classical graph traversal, this traversal is: (a) non-deterministic due to stochastic selection, (b) bidirectional — nodes can send correction signals upstream, (c) non-exclusive — the same node may be active in multiple concurrent traversals, and (d) self-terminating when prediction error across the active subgraph drops below a threshold ε.

(3.4) Traversal Tq = {n₀, n₁, ..., nₖ} where n₀ is the trigger node Termination condition: Σᵢ εᵢ(t) ≤ ε_threshold where εᵢ(t) = |predicted_input(nᵢ) - actual_input(nᵢ)| at time t // System halts when aggregate surprise is minimized // Meaning = stable state after error minimization converges

§ 4

The Probabilistic Selection Mechanism

When a neuron n_i receives input, it does not fire to all neighbors. It generates a ranked distribution over its adjacency set and selects — with controlled stochasticity — which neighbor to activate. This is analogous to an autocomplete system: given the letter "A", the system generates all possible continuations ranked by probability, and selects one. That selection becomes the new context for the next node's distribution.

This mechanism has four components operating simultaneously, which together explain the full range of human cognitive behavior:

Component	Mechanism	Cognitive Effect
Greedy selection	argmax over P_i	Habitual, fast, predictable responses
Stochastic sampling	Sample from P_i with temperature τ	Creativity, insight, divergent thinking
Contextual gating	P_i modulated by concurrent traversal context	Mood, attention, situational judgment
Emotional bias	Global neuromodulator signals shift P_i	Emotional reasoning, fear responses, motivation

(4.1) P_final(nⱼ | nᵢ, context, emotion) = softmax[ W(nᵢ\tonⱼ) + C(nⱼ, context) + E(emotion) ] / τ // τ = temperature: low τ \to deterministic, high τ \to creative/random // C = contextual relevance from concurrent active traversals // E = emotional modulation (dopamine, serotonin analogs)

Figure 2. Probabilistic next-node selection. Node N receives input and generates a ranked probability distribution over candidate next nodes. Selection of Nₐ collapses this distribution and becomes the new context for the subsequent node's distribution generation.

§ 5

Error Correction and Elimination Model

The DAN model is fundamentally a prediction-correction system. Each node predicts the expected downstream signal and, upon receiving feedback that deviates from this prediction, initiates one of three responses: weight adjustment, path rerouting, or upstream error propagation. This produces a self-correcting cascade that converges toward a stable, low-error activation pattern — which we identify as the emergence of coherent meaning.

5.1 Feedback Propagation

Feedback flows in the reverse direction of activation. When node n_j receives an input that does not match its predicted input (based on its current weights and local model), it computes an error signal ε_j and propagates it back to the calling node n_i. Node n_i then decides whether to adjust its own weights, select a different next node, or propagate the error further upstream.

(5.1) Error at node nⱼ: εⱼ = predicted_xⱼ - actual_xⱼ Feedback signal to nᵢ: δᵢⱼ = \partialL/\partialW(nᵢ\tonⱼ) where L = loss function measuring prediction error // δᵢⱼ tells nᵢ how much the edge weight contributed to error

5.2 Path Elimination

When a traversal path consistently produces error above threshold, the path is not immediately deleted — edge weights are progressively reduced until they fall below an activation threshold, effectively eliminating the path from future traversals. This is the neural analog of synaptic pruning. It explains why it is impossible to truly erase a memory: the underlying connections degrade but the node itself remains, potentially reactivatable through sufficiently strong input.

Figure 3. Three-state correction model. State 1: downstream node returns error ε, upstream node adjusts weights and tries alternate path. State 2: downstream node confirms valid signal, traversal continues forward. State 3: downstream node N₂ provides new data or sequence hint to N₁, causing rerouting without full backtrack.

5.3 Weight Adjustment

Upon receiving error feedback δ, a node updates its internal weights using a local gradient rule. This is not centrally coordinated — each node performs its own update based solely on the feedback it receives from its immediate downstream neighbor.

(5.2) Weight update rule for edge nᵢ \to nⱼ: W_new(nᵢ\tonⱼ) = W(nᵢ\tonⱼ) - η \cdot δᵢⱼ η = local learning rate (per-node, adaptive) // No global learning rate — each node has its own η // This is local gradient descent without global loss surface

(5.3) Path elimination threshold: If W(nᵢ\tonⱼ) < W_min after k consecutive error feedbacks: edge (nᵢ\tonⱼ) marked inactive (not deleted) // Inactive edges can be reactivated by sufficiently strong signals // This models synaptic pruning and forgetting

§ 6

Superposition: Shared Neurons Across Thought-Graphs

A critical property of the DAN model is that neurons are not exclusively allocated to any single cognitive operation. The same neuron participates in thousands of distinct traversals simultaneously, contributing different functional roles in each by virtue of context. We call this property contextual superposition, and it is the primary source of the brain's extraordinary efficiency.

Definition — Contextual Superposition

A neuron n_i is said to be in contextual superposition if it belongs to k ≥ 2 distinct active thought-traversals T₁, T₂, ..., T_k simultaneously, where its functional contribution φ(n_i | T_j) differs across traversals as a function of concurrent activation context.

This has a profound implication for memory erasure: since neurons participate in many overlapping functions, modifying a neuron's weights to eliminate one memory necessarily perturbs all other cognitive operations involving that neuron. True targeted deletion is therefore computationally infeasible — the system can only perform weight dampening that reduces one memory's reconstructibility while degrading related functions.

Figure 4. Contextual superposition. Shared neuron N_s (purple) simultaneously participates in three distinct thought-graphs: recognizing a face (blue), recalling a name (red), and planning movement (green). N_s contributes different functional meaning in each traversal.

§ 7

Complex Example: Parallel Task Execution with Shared Neurons

Consider the cognitive task of driving a familiar route while holding a conversation. Two major thought-traversals operate concurrently: T_drive handling sensorimotor loop execution (largely automatic, low surprise), and T_talk handling language processing (higher attention, higher surprise budget). Both traversals share neurons in the prefrontal cortex region of the graph — responsible for sequential planning — and in the motor cortex region — responsible for coordinated movement.

Because T_drive has been heavily reinforced (familiar route → low prediction error → high weight edges → fast traversal), it consumes minimal computational resources. T_talk, operating simultaneously on shared neurons, receives higher temperature allocation — allowing more stochastic, creative language generation. When T_drive encounters a novel obstacle (unexpected car), prediction error spikes, the system redirects attention (raises drive-traversal temperature, reduces talk-traversal priority), and the conversation naturally pauses.

(7.1) Parallel traversal resource allocation: Σⱼ τⱼ(t) = τ_total (total temperature budget, constant) When εⱼ(t) ↑ for traversal j: τⱼ(t) ↑ (more attention \to more deterministic correction) τₖ(t) ↓ for k \neq j (other traversals get less resource) // This models the attention system as a temperature redistribution mechanism

Figure 5. Parallel traversal with shared neuron. T_drive (blue) and T_talk (green) execute concurrently, both routing through a shared planning node (purple). When T_drive detects a novel obstacle (red), prediction error ε spikes, and the system redistributes temperature budget — increasing drive-traversal determinism and pausing the language traversal.

§ 8

The Binding Problem and the Unknown Orchestrator

The DAN framework, as formalized above, deliberately leaves one parameter undefined: the contextual relevance function R(n_j, context) in equation (3.3). This function determines how a node's probability distribution is modulated by the global traversal context — yet no node has access to global context. This is the computational instantiation of the classical binding problem in neuroscience.

We propose that the binding function is not a separate computational module but an emergent property of synchronized oscillation across active subgraphs. Neurons participating in the same traversal fire in coordinated temporal patterns — their spike timing encodes membership in the same cognitive operation. The temporal dimension T in the graph definition G = (N, E, W, T) carries this binding information implicitly.

Open Problem — The Orchestrator

The function R(n_j, context) — which modulates local probability distributions using non-local traversal context without violating the local autonomy axiom — remains formally undefined. We hypothesize it is implemented via temporal spike synchrony (γ-oscillations, ~40Hz), but the precise mathematical mapping from spike timing patterns to probability modulation is not yet known. This is the primary open problem of the DAN framework.

An important corollary: if R emerges from oscillatory synchrony rather than centralized computation, then the brain's extraordinary energy efficiency follows directly. Synchronizing oscillations is metabolically cheap — far cheaper than routing all signals through a central processor. The 20-watt budget becomes not a constraint to explain but a predicted consequence of this architecture.

§ 9

Implications for Computational Architecture

The DAN framework prescribes a specific computational architecture that differs fundamentally from all currently dominant paradigms. The key requirements are:

DAN Requirement	Current Best Approximation	Gap
Autonomous per-node computation	Actor model systems	No local probability distribution; no self-modification
Sparse async message passing	Neuromorphic chips (Loihi 2)	Fixed topology; no online weight learning per node
Analog temporal coding	Spiking neural networks	Training methods immature; spike timing not fully exploited
Self-modifying graph topology	Neural architecture search	Offline; not continuous online rewiring
Contextual superposition	Transformer attention (partial)	Not truly simultaneous; sequential approximation only
Online error correction per node	Online learning algorithms	Catastrophic forgetting unsolved; global not local

The architecture that would fully implement DAN does not yet exist. It would require analog computing substrate (for continuous signal gradients), event-driven sparse processing (for energy efficiency), per-node adaptive weights (for local learning), and a physical implementation of temporal coding (for the binding signal). Optical computing and chemical computing substrates are the most promising candidates for meeting these requirements simultaneously.

§ 10

Discussion and Open Problems

The DAN framework presented here is consistent with the major established findings of computational neuroscience — distributed representation, Hebbian learning, predictive coding, sparse activation, and the binding problem — while providing a unified architectural model that subsumes them. Several open problems remain:

10.1 The Termination Signal

The formal termination condition (Equation 3.4) requires a threshold ε on aggregate prediction error. What computes this aggregate across thousands of simultaneously active nodes without centralized coordination? One hypothesis: the threshold is not computed but physically manifested — the oscillatory synchrony that implements binding naturally decoheres when error is minimized, and this decoherence is itself the termination signal.

10.2 The Node Function φ

Each neuron's local transformation function φ(x_i, w_i, d_i) is left abstract in this framework. Determining the specific computational form of φ for different neuron types (pyramidal cells, interneurons, granule cells) is a primary empirical research direction implied by this theory.

10.3 Initialization and Development

The DAN framework describes a mature, operating neural graph. How does the initial graph structure emerge during development? We hypothesize that genetic encoding provides a sparse initial topology with broad probability distributions, and that learning (experience-driven error correction) progressively sharpens distributions and rewires topology — explaining both the universality of basic cognitive architecture across humans and the individuality of learned skills.

10.4 Consciousness as Stable Superposition

The most speculative implication of the DAN model: conscious experience may correspond to a stable, high-coherence superposition state — a configuration in which a large subgraph of nodes is simultaneously active, synchronized, and mutually error-corrected below threshold. On this view, consciousness is not located in any region or node but is a property of a particular kind of traversal state: maximally coherent, low-surprise, high-superposition.

§ 11

Conclusion

We have presented the Distributed Autonomous Neuron (DAN) Theory — a first-principles framework modeling cognition as the emergent product of autonomous probabilistic agents (neurons) operating local functions on a self-modifying weighted graph, without centralized coordination. The framework formalizes: distributed memory as compressed reconstruction functions, thought as directed graph traversal with stochastic node selection, learning as continuous local weight adjustment from bidirectional error feedback, and the attention system as temperature redistribution across concurrent traversals.

The framework makes several testable predictions: that prediction error minimization is the universal termination signal for cognitive operations; that memory degradation follows a specific function of overlapping reconstruction path disruption; that attention capture events correspond to measurable temperature redistribution in concurrent traversal systems; and that the binding function correlates with γ-band oscillatory synchrony in a mathematically precise way yet to be determined.

Most significantly, the DAN framework implies that the gap between current AI and biological cognition is not primarily a scale problem — more parameters will not close it. The gap is architectural: current systems lack local autonomy, continuous self-modification, true temporal coding, and the unknown binding mechanism. Closing this gap requires not larger models but a fundamentally different substrate.

The theory presented here is necessarily incomplete. The binding problem remains formally open. The node function φ is unspecified. The developmental initialization question is unaddressed. These are not weaknesses to apologize for — they are the research agenda this framework generates. A theory that produces clear open problems is more valuable than a complete theory that produces none.

References & Related Work

Hopfield, J.J. (1982). Neural networks and physical systems with emergent collective computational abilities. PNAS, 79(8), 2554–2558. [Associative memory as energy minimization — foundational to § 3]
Rao, R.P.N. & Ballard, D.H. (1999). Predictive coding in the visual cortex. Nature Neuroscience, 2, 79–87. [Predictive coding framework — directly supports § 5]
Friston, K. (2010). The free-energy principle: a unified brain theory? Nature Reviews Neuroscience, 11, 127–138. [Energy minimization as cognitive principle — supports §§ 3, 8]
Maass, W. (1997). Networks of spiking neurons: the third generation of neural network models. Neural Networks, 10(9), 1659–1671. [Spiking and temporal coding — relates to § 9]
Tononi, G. (2004). An information integration theory of consciousness. BMC Neuroscience, 5, 42. [Consciousness as integrated information — relates to § 10.4]
Rolls, E.T. & Treves, A. (1998). Neural Networks and Brain Function. Oxford University Press. [Distributed representations and sparse coding — supports § 2]
Davies, M. et al. (2018). Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82–99. [Neuromorphic hardware — relates to § 9]
Kahneman, D. (2011). Thinking, Fast and Slow. Farrar, Straus and Giroux. [Dual-process cognition — conceptually maps to parallel traversal model § 7]

Preprint. Not peer reviewed. Submitted as a theoretical framework for community discussion and empirical investigation.

Correspondence: Bharat Rawat · India