Skip to main content
On June 6th, 2025, Inference.net launched Devnet Epoch 3, introducing significant protocol changes designed to improve network scalability, operator experience, and economic alignment. This post details the technical improvements, new staking mechanisms, and operational changes that will define this next phase of Inference.net.
The Inference.net Staking Protocol is being tested on Solana Devnet with test tokens. These tokens have no monetary value and should not be used for real-world transactions or bought or sold by anyone.

Setting Up Your Solana Wallet for Devnet

All of these test features are deployed on Solana Devnet, so you’ll need to configure your wallet to use Solana Devnet.
This step is very important. If not completed correctly you will experience issues trying to submit transactions to our programs.
If you are using a different wallet, or a mobile wallet, please refer to the wallet’s documentation for instructions on how to switch to Solana Devnet. It’s recommended to only use one of the above wallets via a desktop browser.

Preparing for Epoch 3

  1. Verify your hardware meets the minimum requirements
  2. Review the new quickstart documentation to deploy your Epoch 3 node.
  3. Link your Solana wallet on the dashboard to your Inference.net account before June 13th
  4. Join our Discord to montior Epoch 3 rollout and get support.

Epoch 3 Feature Timeline

Epoch 3 will be rolled out in phases, with each phase bringing new features and improvements to the network. Please follow Discord for updates.

Core Protocol Improvements

Automatic Node Updates

Based on overwhelming operator feedback from previous epochs, we’ve implemented a comprehensive auto-update system for the Inference.net node software. This system works across all deployment types - CLI, Docker, and Desktop applications - and handles minor version updates without operator intervention. The auto-update mechanism performs health checks before and after updates, ensuring nodes remain operational throughout the process. In the event of an update failure, nodes automatically rollback to the previous stable version and report the issue to our monitoring systems. This significantly reduces the operational burden on GPU operators who previously needed to manually update nodes during each release cycle.

Unified Inference Engine Management

Previous versions of Inference.net required operators to select and deploy specific Docker containers based on their intended inference engine (SGLang or vLLM). This created complexity for operators who needed to understand the technical differences between engines and make deployment decisions based on their hardware capabilities. In Epoch 3, we’ve consolidated these into a single container that automatically detects hardware specifications and selects the optimal inference engine. The selection algorithm considers:
  • GPU architecture and compute capability
  • Available VRAM
  • CPU specifications
  • System memory
This abstraction layer reduces setup complexity while ensuring optimal performance for each hardware configuration.

Enhanced GPU Detection and Validation

Starting June 6th, the network will enforce strict GPU detection requirements. Only GPUs that can be properly identified by our detection system will be permitted to join the network. This change from Epoch 2’s permissive approach ensures:
  • Accurate job routing based on hardware capabilities
  • Proper performance benchmarking
  • Prevention of misrepresented hardware specifications
Common detection failures typically stem from outdated drivers or incorrect permissions. Operators experiencing detection issues should ensure they’re running the latest GPU drivers and that their Inference.net node has appropriate system permissions. If you’re having trouble identifying your GPU, please open a ticket on our Discord.

Stake-Weighted Job Routing

The cornerstone of Epoch 3 is our new stake-weighted routing system, which fundamentally changes how inference jobs are distributed across the network. This system creates economic incentives for reliable operation while ensuring efficient resource utilization.

Priority Score Calculation

Each instance operating on the network receives a priority score that determines its probability of receiving inference jobs. The priority score is calculated as:
Priority Score = 1 + k × (Device VRAM / Total Operator VRAM × Total INT Stake × Reputation Weight)
Where:
  • Device VRAM / Total Operator VRAM: Normalizes stake across operators with different numbers of machines
  • Total INT Stake: The sum of operator-owned and delegated tokens in their pool
  • Reputation Weight: A multiplier between 0 and 1 based on performance metrics
  • k: A network parameter that adjusts based on utilization

Understanding the k Parameter

The parameter k dynamically adjusts to optimize network efficiency:
  • When k = 0: Routing becomes round-robin, giving equal probability to all instances
  • When k is large: Routing heavily favors staked operators
  • During low network utilization: k increases to reward staked operators
  • During high network utilization: k decreases to leverage all available capacity
For operators, this means that during periods of low demand, having stake becomes increasingly important for receiving jobs. During peak demand, even operators with minimal stake can contribute and earn rewards.

VRAM Normalization

Since operators manage varying numbers of GPUs with different VRAM capacities, the routing system normalizes stake based on VRAM. For example:
  • Operator A: 100,000 INT staked, running 4x RTX 4090s (96GB total VRAM)
  • Operator B: 100,000 INT staked, running 1x A100 (80GB VRAM)
Despite equal stake, Operator A’s individual 4090s each receive: 100,000 × (24/96) = 25,000 effective stake per device, while Operator B’s A100 receives the full 100,000 stake allocation.

Reputation Scoring System

We employ a reputation system to evaluate GPU operator quality and determine job routing and rewards. Reputation combines three components: Verification, Uptime, and Reliability. Together, these reflect integrity, availability, and request completion performance and allow pool delegators to make more informed delegation decisions.

Verification Score

Determines operator honesty and is used to determine job routing.
  • We use a proprietary inference verification system to ensure that operators are running inference jobs honesty.
  • This system runs pass/fail verifications on all processed requests.
  • A score is computed based on a rolling window of verifications and operators are promoted/demoted from our trusted lane based on a verification failure threshold.
Verification Guidelines New workers are placed in the evaluation lane and receive test traffic to build trust. This traffic does not earn rewards/points. Once a worker has passed a specific number of verifications, they are promoted to the trusted lane and begin receiving verified inference traffic. If a trusted worker begins failing too many verifications, they can be moved back to the evaluation lane until performance improves. Verification Thresholds Note that these thresholds may be adjusted over time. The following are an example:
  • Promotion: Untrusted operators with ≤5 failures in their last 500 verifications are promoted to the trusted lane.
  • Demotion: Trusted operators with ≥10 failures in their last 500 verifications are demoted to the evaluation lane.
Operators that consistently fail verifications may be halted and slashed, according to the network security guidelines.

Reliability Score

Captures how consistently requests complete successfully and on time.
  • Inference request completion rates are measured over a fast and slow window and combined into a single score.
  • The score calculation uses a weighted average of the fast and slow window scores, e.g. 0.5 for fast (1 hour) and 0.5 for slow (72 hours).
  • Scores are recomputed for all operators hourly.

Uptime Score

Measures how consistently instances are online and responsive.
  • The network periodically checks whether instances are online by sending a health check inference request to the instance.
  • If the instance completes the request, it is considered online for the interval.
  • If the instance does not complete the request, it is considered offline for the interval.
  • To prevent gaming, checks begin only after an instance has been running for a short buffer period following startup.
Uptime scores are incorporated into network reward emission calculations to allow us to incentive specific types of hardware to join the network. Rewards are normalized across all operators by the number of points earned. More hardware means more points, and more points means more rewards.

Solana-Based Staking Protocol

Technical Implementation

The Inference.net staking protocol is implemented as a Solana program using the SPL token standard for $INT-DEV tokens. The protocol manages:
  • Operator pool creation and configuration
  • Stake delegation and undelegation
  • Commission rate management
  • Reward distribution and revenue payout
  • Slashing mechanisms (to be enabled in later phases)

Operator Pools

Each operator can create a staking pool with the following configurable parameters:
  • Reward Commission Rate: Percentage of epoch rewards retained by the operator (0-100%)
  • USDC Commission Rate: Percentage of USDC revenue retained by the operator (0-100%)
  • Delegation Status: Whether to accept external delegations
  • Minimum Self-Stake: Operators must maintain a globally-set minimum stake amount
Operators must maintain the minimum stake amount to ensure alignment between pool performance and operator incentives. Pool creation requires a one-time registration fee and each wallet can create at most one operator pool.

Delegation Mechanics

Token holders can delegate $INT-DEV to operator pools to earn a share of rewards without running hardware. The delegation process operates as follows:
  1. Tokens can be delegated immediately without cooldown
  2. Delegators earn rewards proportional to their share of the pool
  3. Rewards are calculated after operator commission
  4. Undelegating requires a cooldown period before tokens and rewards can be withdrawn
  5. No rewards accrue during the cooldown period
This design encourages stable, long-term delegation relationships while preventing rapid stake movements that could destabilize routing.
Token delegators are not exposed to any slashing risk, in the event a slashing event occurs.

Dual Token System

During Epoch 3, operators will interact with two distinct reward mechanisms:

$INT Points (Off-chain)

  • Accumulated in real-time as jobs are processed
  • Calculated based on computational work performed
  • May be awarded for non-compute contributions (guides, community help)
  • Serves as the primary performance metric during the testing phase

$INT-DEV Tokens (On-chain - Solana Devnet only)

  • Distributed for testing purposes via airdrop and reward distributions
  • Based on stake-weighted job completion
  • Required for staking to receive job allocations
  • Used for pool creation registration fees

USDC Revenue Sharing

  • Operators earn USDC for processing inference jobs
  • Operators can share USDC revenue with delegators by setting a USDC commission rate
  • Delegators receive proportional USDC earnings based on their stake
  • USDC accrues on-chain and can be withdrawn without affecting stake
The Inference.net Staking Protocol is being tested on Solana Devnet with test tokens. These tokens have no monetary value and should not be used for real-world transactions or bought or sold by anyone.

Long-term Implications

The architectural changes in Epoch 3 establish the foundation for Inference.net’s evolution from a points-based test network to a fully decentralized, economically sustainable inference protocol. The stake-weighted routing system creates a market mechanism for quality assurance, while the delegation system enables broader participation beyond hardware operators. As we progress through Epoch 3, we’ll gather data on:
  • Optimal k parameter values for different network conditions
  • Stake distribution patterns and delegation preferences
  • Performance improvements from unified engine management
  • Economic efficiency of the dual-token model
This data will inform the eventual transition to mainnet, where $INT tokens will replace the current devnet implementation.

Moving Forward

We encourage all operators to thoroughly review the new documentation, prepare their systems for the June 6th launch, and participate actively in testing these new features. Your feedback during this phase is crucial for refining these systems before our eventual mainnet deployment. For technical support, detailed documentation, and community discussions, please visit:
I