Independent Research · Vol. 03 · Issue 13 · May 2026 Crypto × AI · Base · DePIN Inference

Why Dolphin is the most asymmetric bet in decentralized inference

A seven-person lab built the uncensored models that power Venice.AI, trained them on Bittensor compute, then issued a token that routes 100% of network revenue into buybacks. This is a look at what Dolphin actually is, how its verification engineering works, and why, against the economics of the inference-and-GPU niche it operates in, the current valuation is worth a closer look.

Independent Research Read · ~20 min
Network
Base
Token + staking + buyback all settle on Base L2
Token Contract
0xeD66…1dF8F
Verifiable on BaseScan, migrated from DPHN Mar 2
Total Supply
500M POD
~56M circulating (11% float)
Revenue → Buyback
100%
All network revenue buys POD on the open market
Verification
Live-weight proofs
Plus logprob fingerprinting, bonds, software integrity
Network Status
V1 live
Supply side active, public API expected soon
Anchor Customer
Venice.AI
Dolphin powers all uncensored chat (3M+ users)
Training Compute
Bittensor SN4
Targon: 1,500+ H200s used for model training

Most decentralized inference projects are supply looking for demand. They incentivise operators to bring GPUs online, build a marketplace, then hope someone shows up to buy the compute. Dolphin started from the other end. The team spent two years building uncensored open-source AI models that became good enough to power all uncensored chat on Venice.AI, a consumer product serving roughly three million users. Only after the models had real distribution did they issue a token and stand up a peer-to-pool inference network to serve those same models.

That ordering is the single most useful fact for understanding the project. It means Dolphin is not, primarily, a bet on whether a decentralized inference network can find demand. The demand already exists, on Venice and across five million monthly model downloads on Hugging Face. The open questions are narrower and more concrete: whether the network can convert that distribution into on-chain revenue, and whether the token mechanics route enough of that revenue back to holders to matter.

This piece works through four things. What Dolphin actually is, across the model layer, the network, and the Venice relationship. The verification stack, which is the part most decentralized inference projects get wrong and the part Dolphin has invested most heavily in. The token economy and how value is meant to flow. And, at the end, how all of that sits against the current valuation and the near-term catalysts.

In Brief

Dolphin is an AI lab whose uncensored models power Venice.AI, now running its own peer-to-pool inference network on idle consumer GPUs. The token, POD, captures 100% of network revenue through automatic open-market buybacks, against an unusually tight circulating float. The verification engineering, built around live-weight proofs, is among the most complete in the category. The principal caveat is timing: the public API has not opened yet, so revenue has not switched on. This is therefore a bet on near-term execution rather than on trailing cash flows, and the rest of this piece tries to weigh both sides of that honestly.

What Dolphin actually is

Dolphin is three things stacked on top of each other: a model lab, an inference network, and a token that coordinates the network. Understanding the project means understanding how those three layers connect.

The model layer came first

The team has spent since 2023 producing uncensored, unaligned versions of leading open-source models: Llama, Mistral, Mixtral, Gemma, Qwen, and others. The technique is fine-tuning a base model on a dataset scrubbed of refusals and bias, then training it the same way the original instruct model was trained, producing a version that follows the system prompt closely and does not arbitrarily refuse requests. The flagship, Dolphin Mistral 24B Venice Edition, became the default uncensored model on Venice.AI. The lab reports over five million monthly downloads across its models on Hugging Face, and its newest model is claimed to score zero refusals on Venice's 45-question refusal benchmark.

Whatever one thinks of the uncensored-AI thesis, the distribution is real and verifiable. Venice.AI is a known product with a real user base, and the Dolphin models are demonstrably the uncensored layer behind it. That distribution is the single most important fact about POD, because it is the thing that separates Dolphin from the dozens of decentralized inference projects that have a network but no customer. The alignment also runs deeper than a typical supplier relationship: Erik Voorhees, Venice's founder, is an investor in the project and a POD holder.

The network came second

The Dolphin Network is a peer-to-pool distributed inference system. Operators run node software on idle consumer or enterprise GPUs and get paid in POD for processing inference requests. The "peer-to-pool" framing is the central architectural choice and the team's main differentiation claim. In most decentralized GPU networks, a consumer rents a specific node for a fixed session. That model cannot capture the largest pool of compute on the planet, idle gaming GPUs, because a gamer will not commit their machine to uninterrupted multi-hour rentals.

Dolphin's answer is to pool all nodes running a given model and route individual requests across the pool. Any node can join or leave at any time without penalty. A gamer runs the node software while not playing, turns it off the instant they want the machine back, and keeps using their normal OS while it runs, since the software only consumes a portion of the GPU and does not require Linux. The pool absorbs the churn instead of being broken by it. This is a genuinely sensible design for the supply they are targeting, because inference is stateless and short: a request lasts seconds, so a node dropping offline between requests costs nothing, and the disappearing-node problem that kills traditional multi-hour compute rental mostly does not apply.

The training relationship

Dolphin's models are trained using Targon (Bittensor subnet 4), which the team describes as providing access to 1,500-plus H200s, and additional compute from Lium. This gives POD an indirect exposure to the Bittensor ecosystem on the training side, while the inference side runs on its own network. The team trained Dolphin 405B, a fine-tune of Meta's largest model, on a single B200 node, which is a credible engineering signal.

The verification stack

The hardest problem in decentralized inference is verification. If you pay a stranger to run a model on hardware you cannot inspect, how do you know they ran the model they claimed, at the quality they advertised, rather than quietly swapping in a smaller or more aggressively quantized version and pocketing the difference? Every prior attempt at this has run into the same wall, and it is the problem Dolphin has invested the most engineering into. The stack has five distinct layers.

Live-weight proofs, the core innovation

The centerpiece is what Dolphin calls encrypted live-weight proofs. Rather than verifying a file hash at load time (which a node can pass and then swap models in memory afterward), live-weight proofs sample the tensors actually resident in the serving runtime at the moment a response is produced, and compare them against the approved model's manifest. The team describes the overhead as roughly 0.1% versus 100% for full re-inference, a 100 to 1000 times efficiency edge, with anti-replay protection through signed challenge bundles, short time-to-live windows, per-challenge nonces, and randomized tensor selection.

The significant property is that this approach verifies what is loaded rather than what is emitted, which means it does not require re-running the prompt. That sidesteps the prompt-privacy problem that affects re-execution-based verification approaches, and it extends to non-text models because it samples weights rather than analysing token outputs.

The other four layers

Live-weight proofs do not stand alone. Dolphin layers four additional signals: multimodal verification that extends weight-sampling to image, audio, and video models via attention projections and MLP blocks plus golden-sample checks like CLIP embeddings and spectrogram statistics; logprob fingerprinting, where validators re-score sampled responses against the approved model within a tight tolerance to catch substitution while allowing hardware numerical noise; software integrity through encrypted, signed, obfuscated binaries plus hardware fingerprinting to prevent sock-puppet nodes; and tokenized output verification plus performance bounds that flag nodes running outside expected token-count and tokens-per-second ranges.

The multimodal coverage is the strategically important piece. Dolphin's own benchmarks claim consumer GPUs are 5.5 to 8 times more cost-efficient than H100s for image and video, and 12.8 to 18.2 times for audio. Those are the highest-margin modalities in AI (ElevenLabs is reportedly approaching $500M in annual recurring revenue on audio alone), and they happen to be the modalities most decentralized networks cannot verify. If Dolphin can reliably verify image and audio inference where competitors cannot, the verification stack and the revenue stack are the same moat.

The cryptoeconomic layer

On top of the technical verification sits a bonding system, recently restructured to be account-wide rather than per-node. Every node now carries a bond, but an operator does not need to buy POD and bond it in order to start mining. Instead, the mechanism runs through how rewards are claimed. An operator can claim earnings two ways: as bonded POD (the standard route) or as liquid POD subject to a 20% fee that is routed directly to xPOD stakers. The crucial rule is that an operator must accumulate four weeks of expected earnings as bond before the liquid-claim option unlocks at all. New miners simply mine their bond for the first four weeks before any liquid rewards become available.

That single rule closes the most dangerous attack vector in decentralized inference. The cheat-and-cash-out path, where a node passes initial checks, swaps in a fake or quantized model, extracts liquid rewards, and disappears, cannot function, because no operator can extract anything liquid until they have already built up four weeks of slashable bond. There is nothing to grab and run with early. Bonds are slashable on confirmed malicious activity, and validators can freeze a node's exit for 30 days on a detected model mismatch (reversible if it turns out to be an honest hardware or software fault), with a multisig executing slashing on review.

The bond also drives a Curve-style reward boost, calculated account-wide: the total bond an account holds relative to its expected earnings determines its multiplier, in the same way a veCRV balance applies across all of an account's positions. Bonding six months of expected earnings earns a 1.5x boost on a linear, non-competitive basis; over-bonding beyond that competes for up to 2x. Requesters who want maximum economic security can choose to route only to bonded nodes, and governance can raise the boost if cheating is ever detected, pulling more of the network into bonded operation.

On The Verification Debate

Among the live decentralized inference projects, Dolphin's verification design is the most complete that has been published. The live-weight-proof approach is genuinely differentiated: it verifies the loaded model rather than the emitted output, which makes it cheap enough to run on every response, privacy-preserving because it never needs the prompt, and extensible to image and audio where token-based methods cannot reach. This is the part of the project where the engineering most clearly leads, and it is not the kind of work that is quick to replicate.

The token economy

POD's token design is unusually aggressive on value capture and unusually dependent on emissions. Both halves of that sentence matter.

How value flows

There is one valuable asset in the Dolphin ecosystem: the token. The team operates as a DAO with no equity structure, and has stated that if they ever create one to accept fiat payments, it would be a non-profit structured like Morpho's, explicitly so that the token remains the only thing that captures value. The mechanism is blunt: 100% of network revenue is automatically used to buy back POD on the open market.

The flow works in two decoupled halves. On the demand side, API users buy credit from the protocol by depositing stablecoins or other accepted assets, and their requests are routed to whichever node in the relevant model pool is available. On the supply side, nodes are paid in POD from a node-rewards pool based on the inference work they process. The two sides are deliberately decoupled, which means the protocol can pay out more or less POD to nodes than it takes in as revenue.

This decoupling is the engine of the whole design. Early on, the network can run at a deliberate loss, emitting more POD to operators than revenue supports, in order to bootstrap supply and offer cheap or free inference while it scales. As nodes approach higher utilisation, the design targets the crossover where revenue outpaces emissions, buybacks exceed the new POD paid to operators, and the token tips into net deflation. The mechanism is built so that growth in real usage translates directly and automatically into buy pressure on POD, with no governance vote or discretionary buyback schedule in between. Every paid token of inference becomes a market bid for the token.

The pricing math

The team has published concrete numbers for its planned OpenRouter listing of a Qwen 35B-class model. They are worth laying out because they show both the margin and the dependence on volume.

LinePer 1M tokens
Cheapest comparable on OpenRouter$1.00
Dolphin list price$0.70
Paid to node operator$0.50
Network spread → POD buyback$0.20

Undercutting the cheapest centralised provider by 30% while still extracting $0.20 of buyback pressure per million tokens is a real edge, and it rests on the structural cost advantage of consumer GPUs. Because Nvidia segments the market (data centers cannot buy consumer cards in bulk, and consumer cards are hard to integrate into multi-GPU servers), consumer hardware stays cheap for gamers while enterprise accelerators carry steep margins. Dolphin's benchmark puts a 4090 at roughly 10 times the performance-per-dollar of an H100 for small-model LLM inference, before accounting for power. The margin is real. The question is purely whether the volume materialises.

There is reason to think it can, because inference is a liquid, easy-to-sell commodity in a way most decentralized-network output is not. OpenRouter alone routes on the order of a billion dollars a year in inference demand, and a buyer routing through an aggregator does not particularly care whose GPUs served the request, only that the model is good and the price is low. This is why Dolphin's primary commercial target is less the uncensored-model niche and more the OpenRouter and broader inference market for the best small open-source coding models, specifically Qwen 3.6 35B and 27B. Venice hosts the uncensored models Dolphin builds for it directly; the open coding models are where the volume sits.

The quality-to-price gap on those models is the entire demand-side argument, and it is stark. By Qwen's published benchmarks, Qwen 3.6 27B reaches roughly 98.7% of Claude Opus 3.5 and 82-88% of the far newer Opus 4.7 across coding and agentic tasks. On price, Opus runs about $25 per million output tokens; first-party Qwen 27B is around $1; and Dolphin serving that same model at a discount targets roughly $0.70. That is a model delivering 82-88% of frontier-tier intelligence at something like 97% below the frontier price. Good-enough quality at that kind of discount, sold into an already-liquid marketplace, is a straightforward thing to move. That is the demand-side bet in one sentence.

How it compares to the incumbent

The natural comparison in decentralized compute is Akash, the category's most established name and a useful benchmark for what "working DePIN" looks like today. Akash is a real, live network with genuine traction: it processes meaningful compute spend, recently launched a Burn-Mint Equilibrium model that burns AKT to mint a stablecoin credit for settlement, and through AkashML is reportedly serving on the order of a billion-plus tokens per day on OpenRouter. It even added a consumer-GPU beta to reach the same hardware pool Dolphin targets. It is not a weak comparison. It is the strongest one available, which is exactly why the contrast is instructive.

DimensionDolphin (POD)Akash (AKT)
Approx market cap~$15M~$210M+
What it sellsVerified inference (model output)Raw compute rental, plus AkashML inference
Demand originModel distribution + revenue via Venice (3M users)Supply-side marketplace, demand sourced externally
Supply modelPeer-to-pool, idle gamer GPUs, no uptime commitmentSession rental (consumer beta added later)
VerificationLive-weight proofs + 4 more layers; covers multimodalNo comparable inference-integrity primitive
Value capture100% of revenue buys POD on open marketBME burns AKT against compute spend

The differences that matter are structural, not about price. Akash sells raw compute and has to source demand for it externally; Dolphin's distribution runs through Venice, which hosts the uncensored models Dolphin builds and provides both revenue and model-layer exposure, alongside the five million monthly downloads of its open models. Akash has no inference-integrity primitive of the kind Dolphin built, and verification is the exact thing that determines whether decentralized inference can be trusted enough to charge for at all. Both now route value to the token through usage, but Dolphin's 100%-of-revenue buyback is a more direct mechanism than burning against a settlement stablecoin. And the peer-to-pool design reaches the idle-gamer-GPU supply that session-based rental structurally cannot.

There is also a revealing pricing detail. AkashML's inference is reportedly priced higher than Alibaba's own first-party Qwen API, which is an odd place to be: a decentralized network is supposed to undercut the centralized incumbent, not charge more than the model's own creator. It suggests the inference is not being optimised aggressively, and it points to how much room a leaner consumer-GPU network has to compete on price. Dolphin's whole pricing thesis depends on being the cheapest credible source of a given model, and the incumbent leaving that gap open is exactly the opening it is built to exploit.

The point is not that Akash is a weak project. It is a strong one, and its traction is part of what validates the category. The point is that the category's most established name expresses a less vertically-integrated version of the inference thesis, without the model-layer distribution and without the verification layer, which is what makes Dolphin's particular combination of pieces stand out.

The staking and bonding mechanics

POD borrows well-tested DeFi primitives. Holders can stake into xPOD, an auto-compounding vault in the style of xSUSHI: the xPOD balance stays static while its redemption ratio back to POD rises as buyback rewards flow in, which is tax-efficient because no claims are made. Staking carries a 3-month cooldown plus a short withdrawal window. The reward-boost system is modelled on Curve's LP boost, and crucially borrows Convex's vlCVX "constant power" mechanic, so stakers keep full rewards without needing to perpetually re-lock.

Holding xPOD is designed to carry utility beyond price exposure. Stakers receive a share of network revenue as an auto-compounding POD balance, gated access to product subscription tiers without paying, and a daily free inference allowance across the network's models. The allowance is delegable: xPOD includes a delegation function for both governance and daily inference allocations, which opens the door to a vlCVX-style vote-market where parties bid to receive delegated inference allocations from stakers who do not use theirs, turning unused allowance into extra yield. Staked xPOD can in turn be deposited into the bonding contract to mint bxPOD, a non-transferable, slashable version used as node-operator collateral.

The on-chain picture

POD migrated from its old DPHN contract on March 2, 2026, so the current token has roughly eleven weeks of trading history. Several things are visible directly on Base.

The staking contract holds a large share of supply, and recent activity has skewed toward accumulation rather than exit. Over a representative recent eight-day window, deposits into the staking vault outnumbered cooldown activations by roughly two and a half to one, with the activity spread across many named wallets rather than concentrated in a few. The supply structure outside protocol allocations is not whale-dominated: the largest individual holder controls under 1% of supply, and the top ten individuals combined hold under 4%. The largest accumulating wallets across the token's life have been methodical buyers that have held rather than flipped.

The supply-side network metrics have grown quickly. Cumulative tokens processed through the data-generation workload (the network's pre-revenue stress test) have climbed into the tens of billions, with active concurrent GPU counts ramping several-fold over a matter of weeks, and aggregate vRAM across the network reaching into the tens of terabytes. This is real capacity coming online, even though none of it is yet serving paid demand.

What The On-Chain Data Cannot Tell You Yet

All of the network throughput to date is synthetic data generation, not paid inference. Revenue is zero by design until the API opens. The buyback engine, which is the entire value-capture thesis, has therefore not yet processed meaningful volume. The supply side is demonstrably real. The demand side is still a promise with a date attached.

The risks, stated plainly

POD ran sharply during the recent Venice-driven move and has since corrected. The bullish case is genuinely strong, but it is a case about what happens if the team executes, so the honest version has to name what could get in the way.

Revenue has not switched on yet

The single most important caveat. The network is live on the supply side but has not yet opened its public API, which the team signals is expected soon. Until it does, 100% of revenue is still 100% of a number that has not started. The buyback flywheel that justifies the token's value-capture framing begins spinning with real volume only once the API ships and developers route paid inference through it. The OpenRouter listing is the catalyst that converts internal stress-testing into external revenue. The capacity is demonstrably there; the conversion to paid demand is the step that remains.

The emissions-to-revenue crossover

Operators are paid in POD, and during the bootstrap phase the protocol intentionally emits more than revenue supports in order to grow supply and offer cheap inference. The design is built to flip net-deflationary once utilisation drives revenue past emissions, and the buyback mechanism is automatic rather than discretionary, so the flip happens mechanically as usage scales. The thing to watch is the slope toward that crossover. The faster real demand ramps after the API opens, the sooner the buy pressure from buybacks outweighs the sell pressure from operators recouping costs. This is the central variable in the bull case, and it is observable in real time on-chain.

Unstaking dynamics after the run

The staking vault uses a multi-month cooldown, so anyone deciding to exit does not hit the market immediately, and the exit queue is a useful forward indicator to watch rather than an alarm. Recent on-chain activity has actually skewed toward staking inflows rather than exits, which is the constructive read. The team has discussed adjusting cooldown lengths, and bonded operators have a fee-dampened path to liquid POD, so these are mechanics worth monitoring as the post-run picture develops. None of it is acute on the data available today.

It is an execution bet, not a cash-flow bet

At its current fully-diluted valuation against revenue that has not started, POD is priced on the strength of its product distribution and the credibility of its roadmap rather than on trailing cash flows. The tight float amplifies moves in both directions: it is part of why the token ran so hard on the Venice narrative, and thin liquidity can cut the other way just as fast. The valuation embeds an expectation that the API launch and the buyback flywheel deliver. The case for it being underpriced is precisely a case that they will. If the timeline slips, the market waits, and the float that powered the move works in reverse.

Near-Term Catalysts

Without putting dates on things the team has not committed to publicly, the catalyst path over the coming stretch is reasonably clear. The events worth watching:

Each of these is a discrete, observable event rather than a vague milestone, and several plausibly land in the same window.

What would confirm the thesis

The conditions that would convert POD from a credible product story into a working revenue-and-buyback machine are specific and observable:

  1. The public API ships. This is the gating event. Everything downstream depends on it, and the team signals it is expected soon.
  2. The OpenRouter listing goes live with the $0.70 price holding. Listing on the most liquid inference marketplace in crypto, at a price that undercuts centralised providers while preserving the $0.20 buyback spread, is the moment internal capacity becomes external revenue.
  3. The first automated buybacks execute on-chain with visible volume. The buyback is the entire value-capture thesis. The first meaningful on-chain execution is the proof it works as designed.
  4. Revenue approaches the emissions crossover. The design target is revenue greater than emissions at high utilisation. Visible progress toward that crossover is what turns POD from net inflationary to net deflationary.
  5. Multimodal inference launches with verification intact. Image and audio are where the cost advantage and the margins are largest. Shipping verified multimodal inference is what makes the highest-revenue modalities addressable.

What this is, structurally

Dolphin is a product-first project trading at a network-first discount. The product is genuine: real uncensored models, a real anchor customer in Venice, five million monthly model downloads, and verification engineering that leads the category. That foundation is what separates it from the long list of AI-DePIN networks that incentivise idle GPUs and then wait for demand that never arrives. Dolphin already has the demand. The task in front of it is conversion.

The bull case rests on a few durable facts. The token captures 100% of network revenue through automatic buybacks. The float is tight. The consumer-GPU cost advantage is structural, rooted in Nvidia's own market segmentation rather than a temporary subsidy. The highest-margin modalities, audio and image, are the ones Dolphin can verify and most competitors cannot. And the market cap sits at a fraction of the category's incumbent, which has weaker distribution and no comparable anchor customer. The honest counterweight is timing: revenue has not switched on, the emissions-to-revenue crossover still has to happen, and the same tight float that can lift the token can cut against it. These are the risks of being early to something real, not the risks of a project still hunting for product-market fit.

The thing to watch is not the chart. It is the V2 worker release, the API opening, the first buyback executions, and the slope of paid volume after launch, all visible on-chain. If those land the way the team has telegraphed, the distance between what Dolphin is and where POD trades is the opportunity. If they slip, the market simply waits. Either way, the next few weeks resolve most of the question.