The State of EVM Indexing

A Guide on Data Infrastructure for Blockchain Apps

Editor’s Notes

Welcome to this comprehensive guide on the evolving landscape of blockchain indexing tools.

In this guide, we aim to give a clear, comprehensive overview of today’s indexing ecosystem, including strengths, tradeoffs, and real-world use of various tools. This report is a collaborative effort from Dune and developers who use these tools every day.

Special thanks to our contributors and dedicated reviewers:

Their insights ensure this guide reflects authentic user experiences.

The indexing space is evolving rapidly. Existing solutions are continually adding new features, and new solutions emerge every few months. While this guide attempts to reflect the space as of mid-2025, we encourage you to supplement it with your own research when selecting a solution.

We hope this resource empowers builders to navigate the indexing landscape with confidence.

—The Dune Team

I. Introduction

Why Index?

In a perfect world, app developers could plug directly into the blockchain and instantly retrieve the data their app needs—whether that’s the current balances of a wallet, the price of a token, or the latest activity from a protocol. Broadly speaking, these needs fall into three categories:

  1. Account data (e.g., balances, transfer history)
  2. Asset data (e.g., token prices, metadata)
  3. Protocol data (e.g., contract events, financial state)

While account and asset data can often be sourced from API providers, protocol data usually requires custom indexing.

Whether you’re running a node or using a provider, plugging into the blockchain typically means remote procedure calls (RPC). RPCs enable data access by allowing you to interact with functions and pull events that smart contract developers include in their protocols. But unless the smart contract developer included a function that precisely answers your question, you’ve got some work to do. 

In fact, that’s usually the case. Blockchain state is designed to allow contract devs to securely write and manage the state required for their protocols. All of this costs gas, after all–shout out to the optimizoors. It’s certainly not optimized for the data access requirements of blockchain app builders.

Regarding the work to be done, it minimally involves pulling logs across ranges of blocks and performing transformation and/or aggregation. Depending on the specific data point, it might include scanning millions of blocks, decoding event signatures, interpreting complex transaction traces, and performing intricate state reconstructions. Attempting to do this by reading directly from a node in realtime for every request would be so slow that you would abandon your app before it ever loaded. This performance gap is why every successful blockchain app, from Uniswap to Aave to OpenSea, relies on a critical piece of infrastructure: an indexer.

Here’s a standard design pattern we observe for building apps on blockchain data using RPC-based indexing:

Building an App with RPC-based Indexing

What are Indexers?

An indexer converts a blockchain’s raw, append-only logs into a structured, queryable database tailored to an application’s needs. It turns a 30-second (or 30-hour, in some cases) scan into a 30-millisecond API response by continuously tracking onchain activity, parsing data, and maintaining a realtime view of the relevant state.

The Graph pioneered event-driven indexing with subgraphs, setting the standard for how developers interact with blockchain data. Since then, the ecosystem has evolved into a diverse landscape of tools and philosophies. Some prioritize developer experience, while others focus on performance, flexibility, or verifiability. Each makes different tradeoffs in how they extract, transform, and serve onchain data.

Choosing the right indexer is a critical architectural decision. A poor fit can lead to costly migrations and mounting technical debt. Blockchains generate vast volumes of new data. While a raw Ethereum node may only grow by hundreds of gigabytes per year, derived and indexed datasets—especially those designed for apps—can balloon by terabytes annually. The more structured, queryable, and granular the view, the more data needs to be processed and stored.

If that weren’t hard enough, new primitives and patterns like account abstraction and intent-based protocols break traditional assumptions about how onchain activity can and should be indexed. As applications expand across chains, additional complexity compounds.

Navigating this requires more than comparing features. It demands understanding core architectural tradeoffs. How much indexing latency can you tolerate for reorg protection? Do you need access to raw traces or full contract state, or are logs enough? Are you optimizing for developer velocity, data integrity, or performance at scale? Do you need cross-chain data in a single database? Are hosted APIs and managed databases essential for your team, or do you prefer full control? Is vendor lock-in acceptable for faster time-to-market?

This report is a clear, practical guide to the modern indexing stack. Shaped by teams and data experts who run production-grade systems every day, it surfaces the real tradeoffs and lessons learned.

II. Indexing Fundamentals and Tradeoffs

Data Sources

Your choice of indexing solution depends heavily on what data you need. For some use cases, you can skip indexing entirely and use APIs from providers like CoinGecko, Sim API, or Zerion. These are out of scope for this report, but often a pragmatic choice. Often you only need logs emitted from a few known contracts, but some use cases require deeper access to traces, raw storage, or full contract state.

Suppose you're building an app that includes Uniswap trades. In V2 and V3, Swap events emit key details—amounts, sender, recipient, pool address—in a single, structured log. V4 introduced custom hooks, enabling arbitrary logic before, during, and after swaps. As a result, Swap events may no longer reflect the full asset flow. Indexing now requires trace access and logic to separate core behavior from hook effects.

By Uniswap X, the optimizoors had taken over, and even traces fall short. Much of the logic lives offchain in quote engines and signed intents, with relayers submitting final settlements. Indexing this means parsing calldata and reconstructing the execution path from outside the chain.

Realtime and Backfill Performance

Most apps need real-time data: sub-second updates triggered by new blocks. If you’ve looked at Dune.com and wondered whether you could just use that data in your app, you’re not alone. But Dune is optimized for flexible analytics, not realtime lookups.

Realtime performance is only half the story. Backfill speed is often the single biggest factor affecting developer velocity. If it takes two weeks to reindex, you’ll hesitate to make changes. If it takes two hours, you’ll iterate constantly. Fast backfills not only help you recover from outages, they enable tight feedback loops, faster feature launches, and more experimentation.

The ideal indexer keeps up with the latest block and can tear through years of historical data on demand.

Chain Support

As apps expand across chains, data volume and architectural complexity increase. Indexers must support multiple EVM chains (sometimes non-EVM ones) and offer a consistent way to query across them. For some use cases, indexing each chain in isolation is fine. For others you’ll want to unify crosschain data in the same system.

Data Transformation & Aggregation

Friends don’t let friends do math in Solidity. Whether you’re dividing raw balances by token decimals or aggregating thousands of swaps to compute P&L, your app needs transformation and aggregation.

Many protocols split their logic across multiple contracts. For example, Uniswap V3 provisioning involves the NFTPositionManager and pool contracts, emitting different pieces of the puzzle. To reconstruct full context, your indexer needs to support cross-event joins and delayed emissions.

Where transformations happen varies: some pipelines handle them inline, others in ETL/ELT stages, others at query time. The right answer depends on your use case and your stack.

Query & API Layer

Your database schema and indexing strategy directly shape query performance. In high-throughput, append-heavy environments like blockchain, poorly indexed tables can bottleneck even simple queries.

Tools also vary in how they expose data: REST, GraphQL, SQL. Devs may use ORMs like Drizzle for type safety, or query builders like Kysely. Each offers different tradeoffs in control, flexibility, and developer familiarity. Some indexers let you define custom endpoints; others provide a standard schema out of the box. Regardless, your API layer must scale with usage, not collapse under it.

Hosting, Control & Cost

There may be some hardcore teams running their own nodes, full pipelines, and custom infra—but most teams buy at least part of the stack:

  • Node/RPC access
  • Indexing engine
  • Hosted database
  • Hosted API service

Indexers differ in what they abstract. Some are fully managed. Others require you to manage infra, storage, and scaling. Your choice depends on your tolerance for devops, need for control, and internal expertise.

Cost is another critical factor. Self-hosted solutions incur expenses like compute, storage, RPC bandwidth, engineering time. Managed indexers typically charge by query volume, compute time, data volumes, or indexing hours. Free tiers help with prototyping, but costs can scale quickly. If you’re indexing many contracts or chains, pricing models can become a deciding factor. Understand what you're paying for, and how those costs evolve with usage.

Developer Experience

Developer experience shapes how quickly teams can ship, iterate, and debug. Key factors include:

  • The language used to express indexing logic (TypeScript, Solidity, YAML)
  • Local dev ergonomics (hot reloading, testability)
  • Observability (logs, metrics, error surfacing)

Some tools integrate with Git workflows, enabling fast deploys and preview environments. Others require more manual setup. The ability to test and iterate quickly, especially in early product stages, can be the difference between shipping in days vs weeks.

III. The Current Indexing Landscape: Solutions Deep Dive

As outlined in the previous section, the design of an indexing pipeline is a complex decision rooted in tradeoffs around execution environment, data sources, transformation needs, and developer ergonomics. With that foundation in place, we now turn to the actual tools in the market, each tailored to a different set of priorities and developer needs.

This section offers a deep dive into prominent indexing solutions available today. We explore what each tool is, how it approaches the challenges described earlier, and how it compares across the key dimensions defined in the previous section. The goal is to provide a grounded, unbiased look at the current indexing landscape, without prescribing a one-size-fits-all solution.

The Graph

Subgraphs are a widely adopted framework for indexing onchain data. They define how to track smart contract events and transform them into queryable GraphQL APIs. While originally created as part of The Graph protocol, subgraphs are now also supported by third-party platforms like Alchemy and Goldsky, and can be run in self-hosted environments as well.

The Graph refers to the broader decentralized indexing protocol and hosted service that pioneered subgraphs. Developers write mappings in TypeScript and define schemas in YAML. The Graph runs these in a decentralized network of indexers or in a centralized hosted version via Subgraph Studio.

Historically, subgraphs were seen as inflexible for advanced joins or slow to support new chains. In recent years, The Graph has introduced tools like Substreams and Firehose, which provide higher-throughput data extraction and more efficient subgraph execution.

Compared to other solutions, The Graph abstracts away most of the infrastructure and is resilient to single points of failure. Its decentralized model, broad chain support, and mature tooling make it a popular default for many indexing needs.

Comparison Notes

  • Data Sources: Primarily based on event logs and state mappings via ABI decoding.  
  • Performance: Historically slow due to block-by-block processing. Additions like Firehose and Substreams have enabled faster queries and rapid backfills. Near-instant updates are possible on well-supported chains.
  • Chain Support: There are over 60 networks listed with Subgraphs Studio support, and over 90 across the product suite. No built-in support for cross-chain joins within a single subgraph.
  • Data Transformation & Aggregation: Transformations are defined in AssemblyScript (similar to TypeScript). Aggregations can be implemented inside mappings, but complex operations may require offchain post-processing. 
  • Query & API Layer: GraphQL interface. Schema-driven and strongly typed. Limited support for custom endpoints beyond GraphQL schema.
  • Hosting, Control & Cost: Three options: Hosted (via The Graph's Subgraph Studio), Decentralized (via The Graph Network), Self-hosted (via open-source tooling). Subgraph Studio starts with 100K free monthly queries, then $2 per 100,000.
  • Developer Experience: Strong documentation, CLI tooling, and subgraph explorer. Requires learning Graph-specific schema definitions and AssemblyScript. Subgraph Studio and Graph CLI support previewing and deploying quickly.
“For me, I have an optimistic trust in the data from The Graph. I trust that they believe their social value is based on giving exact results.  If they lose social trust, then their product will not be used.  I believe what they are reporting is true until somebody is incentivized to prove that it is wrong.”

– User of The Graph, DevRel at an Oracle.

Ponder

Ponder is a self-hosted TypeScript indexing framework that gives developers full control over how blockchain data is ingested, transformed, and stored. It uses event-driven logic combined with tools like viem to perform onchain reads and trigger custom transformations during indexing. Ponder is especially well suited to developers who want to tightly integrate their indexing logic with the rest of their app stack.

Unlike managed platforms, Ponder imposes no infrastructure constraints and no vendor lock-in. Teams can define their schema, manage their own database, and fine-tune performance across the pipeline. It’s a strong fit for complex, performance-sensitive applications—especially DeFi protocols that require custom calculations or use nonstandard smart contract architectures.

Ponder is also popular among teams building real-time dashboards and analytics apps, where latency and iteration speed matter. Local development features like hot reloading, clear error surfacing, and test-friendly structure make it easy to ship quickly.

Comparison Notes

  • Data Sources: Reads event logs and can call onchain functions via RPC using viem. Full control over what data is extracted and how it is transformed. Can index any data accessible through standard contract interfaces or public methods.
  • Performance: Realtime with fast backfills if you have the right RPC setup. Benchmarks show Ponder can be 10–15x faster than Graph subgraphs for certain tasks, but depends on RPC latency and tuning.
  • Chain Support: Supports all EVM chains via RPC. Multi-chain setups require manual configuration. No built-in support for unified cross-chain views.
  • Data Transformation & Aggregation: Transformations in TypeScript. Supports inline aggregation, async reads, and conditional logic
  • Query & API Layer: No built-in query layer. Developers expose data via their own app servers. Can integrate with ORMs like Drizzle or query builders like Kysely. Good for teams building a tightly integrated backend.
  • Hosting, Control & Cost: Fully self-hosted but often deployed on platforms like Railway or Render. Free to use but infra costs scaled with usage.
  • Developer Experience: Strong local dev experience with hot reload, CLI, and TypeScript support. Built with modern tooling and conventions. Easy to integrate into full-stack Typescript apps.
“If you’re tracking a few specific contracts, need to make various API calls during your transformation, and want to store this data in your own database; then Ponder will be the simplest and lowest cost option when combined with an RPC provider like Quicknode or Alchemy.”

– Andrew Hong, Founder of Herd, stated in the Crypto Data Engineering Guide

Envio

Envio is a high-speed indexing platform optimized for EVM chains. It uses a proprietary framework called HyperIndex, which operates on a pre-indexed data layer maintained by Envio. This model enables rapid event lookups and fast backfills without needing to parse raw blockchain data yourself.

Rather than pulling data directly from a node or RPC, Envio pre-processes blockchain events and lays them out in an internal format that allows extremely fast access. This enables use cases like wildcard indexing, where you can extract all events of a certain type across all contracts, without predefining specific addresses. It’s particularly useful for monitoring new protocols, token standards, or dynamic contract deployments.

HyperIndex supports both local development and production use, but the data source remains Envio’s managed infrastructure. You can host the indexer logic yourself, but the underlying data layer is centralized and accessed via Envio’s APIs. This introduces a tradeoff: you gain speed and ease of use, but give up some control over the raw extraction layer.

Comparison Notes

  • Data Sources: Operates on pre-indexed onchain data from Envio’s internal API. Access to event logs and receipts; limited visibility into state or storage reads. Not suited for use cases requiring full traces or arbitrary onchain function calls.
  • Performance: Backfills can exceed 5,000 events/sec. Real-time support via HyperRPC for fresh block ingestion. Performance benefits come from reading from preprocessed disk layouts rather than RPC.
  • Chain Support: Broad EVM chain support. No built-in support for non-EVM or unified cross-chain views.
  • Data Transformation & Aggregation: Supports inline transformations in the indexing framework. Developers can define event handlers using Envio’s SDK and emit structured data. No support for stateful onchain reads or custom contract joins.
  • Query & API Layer: Ships with a GraphQL API to expose indexed data. Designed for frontend use with minimal additional configuration. Custom endpoints and queries possible through the SDK.
  • Hosting, Control & Cost: Developers can run their indexer logic locally or in production, but must query Envio’s managed preindexed backend. Development tier includes 750 free indexing hours. Paid plans start at $70/month, scaling with indexing time and support requirements.
  • Developer Experience: Clean CLI and SDK, minimal setup required. Quick onboarding for simple event-based use cases. Less flexible for deep customization or integration with nonstandard data flows.

Subsquid

Subsquid (SQD) offers a modular indexing architecture that separates data extraction from transformation. Its core model involves fetching blockchain data through a distributed archive network and transforming it via the Squid SDK, which outputs into custom data sinks like PostgreSQL or BigQuery.

Unlike traditional event-by-event indexers, Subsquid processes data in large batches. This model is optimized for speed, scale, and analytics use cases. It excels in scenarios where developers need access to broad datasets, across many contracts or blocks, and where latency is less critical than throughput. Subsquid also stands out for its support of Substrate-based and non-EVM chains, making it well-suited for multichain analytics and cross-ecosystem dashboards.

Because Subsquid allows direct output into user-managed databases, teams can integrate blockchain data into familiar infrastructure and query it using standard tools. This reduces dev overhead for teams building complex analytics or integrating onchain data with offchain sources.

Comparison Notes

  • Data Sources:  Supports logs, receipts, traces, and state diffs. Enables deep access to transaction internals. Extracts raw chain data through archive nodes (no reliance on standard RPCs).
  • Performance: Batch-based processing achieves high-speed backfills (tens of thousands of blocks per second). Not designed for sub-second latency or real-time UI updates.
  • Chain Support: Supports 200+ networks including EVM, Substrate, and other non-EVM chains. Well-suited for developers building multichain analytics.
  • Data Transformation & Aggregation: Transformations handled in the Squid SDK; outputs can be directed to custom databases. Best for teams comfortable defining custom schemas and handling analytics-oriented workflows.
  • Query & API Layer: No built-in API layer; developers define their own via connected databases. Offers maximum flexibility, but more engineering effort required compared to plug-and-play GraphQL endpoints.
  • Hosting, Control & Cost: Requires running your own indexers (squids), but relies on hosted archive nodes for data ingestion. Free shared-tier supports most early-stage use cases. Dedicated nodes start around $140/month; pricing scales with compute/storage.
  • Developer Experience: Fast setup for large-scale backfills. Familiar output targets (Postgres, BigQuery) reduce onboarding friction for data teams. Less ergonomic for frontend-focused teams without existing data infra.

Goldsky

Goldsky is a managed indexing platform that builds on the subgraph model, offering developers a fast and reliable way to ship blockchain APIs without handling infrastructure themselves. It is fully compatible with The Graph's subgraph schema and tooling, but optimizes for speed, support, and developer control. 

Rather than requiring teams to run their own graph nodes, Goldsky handles deployment, scaling, and infrastructure, with strong SLAs and responsive support. Developers can define subgraphs using familiar Graph CLI tooling, and Goldsky handles the rest—making it an appealing alternative for teams that want predictable performance and rapid iteration without DevOps overhead.

For teams that want to work with onchain data in their own infrastructure, Goldsky also offers “Mirror,” which streams subgraph outputs to user-managed databases like Postgres or Kafka. While not the focus of this guide, this can be valuable for analytics or backend integration workflows.

Comparison Notes

  • Data Sources: Supports contract events, logs, and state from subgraph mappings. Data access aligned with what’s available through The Graph’s indexing logic.
  • Performance: Significantly faster indexing than vanilla subgraphs due to custom infra optimizations. Performance improvements are most noticeable on high-volume contracts and fast chains.
  • Chain Support: Supports 90+ networks, mostly EVM. Maintains chain parity with The Graph while adding deployment improvements.
  • Data Transformation & Aggregation: Uses the same TypeScript-based mapping model as subgraphs. Aggregations and transformations are written as part of subgraph logic.
  • Query & API Layer: Exposes data through GraphQL, compatible with schema generated by subgraph definitions. 
  • Hosting, Control & Cost: Startup-friendly pricing tiers with variable costs based on subgraph compute and storage units. A team like Aave pays ~$300/month while Uniswap pays ~$2800/month. There is a pricing calculator to estimate costs.
  • Developer Experience: Compatible with Graph CLI and Subgraph Studio formats. Simple deployment and monitoring UI.

Homegrown Solutions

Some teams choose to build fully custom indexing pipelines in-house. This involves running their own blockchain nodes (often including archive nodes), extracting raw chain data, designing custom schemas, and writing bespoke logic for transformation, storage, and serving. The appeal is full control: every design decision is tailored to the specific needs of the project.

This model is most common among High-Frequency Trading (HFT) firms, analytics platforms, or protocol developers launching new chains with non-standard execution environments. When no existing indexer supports your use case—or when performance and correctness cannot be compromised—homegrown solutions offer unmatched flexibility.

The tradeoff is complexity. These systems are costly to build and maintain, often requiring a dedicated engineering team and long-term investment. But when done well, they can scale efficiently and become a lasting strategic advantage.

Comparison Notes

  • Data Sources: No limitations beyond what your own infrastructure can extract
  • Performance: Depends entirely on implementation. Capable of low-latency ingestion and parallelized backfills with sufficient engineering effort.
  • Chain Support: Multichain support is possible but adds significant overhead
  • Data Transformation & Aggregation: Enables specialized transforms, novel aggregations, and tight integration with application logic.
  • Query & API Layer: Can be built to match any API shape or backend requirement. Often tightly coupled to internal data platforms.
  • Hosting, Control & Cost: Fully self-managed. Significant engineering and infra cost, often running into millions of dollars over time.
  • Developer Experience: Maximum flexibility, but high initial setup and maintenance burden. Tooling, observability, and iteration speed are entirely your responsibility.
“ In the past, I maintained my own nodes, but this approach became impractical due to the growing volume of data and the expanding number of chains. In this context, selecting the right indexing solution is critical for my work, as it allows me to focus on data analysis and answering the research questions that matter most in my work.”

– Johnnatan Messias, Research Scientist (MPI-SWS)


Sim IDX

Sim IDX is a high-performance, fully managed indexing platform developed by Dune, designed for teams that want fast, reliable access to rich onchain data without managing infrastructure. Unlike most solutions that extract data post-execution, Sim embeds indexing logic directly into Solidity listener contracts. These contracts are executed inside Dune’s custom instrumented EVM (iEVM), which enables realtime filtering, parallel backfills, and deep state access not typically available through logs alone.

Because the indexing logic runs during execution, Sim can skip irrelevant blocks, execute jobs in parallel, and expose intra-transaction state changes. This architecture allows for high-throughput backfills and granular indexing, particularly valuable for complex DeFi protocols and realtime applications.

IDX also ships with a Git-based development workflow: developers write listener contracts in Solidity, commit them to GitHub, and deploy via pull requests. On top of the indexing layer, Sim provides a TypeScript API stack using SQL and/or Drizzle to define auto-scaling Hono endpoints, balancing structure and flexibility for consumer-facing APIs.

Comparison Notes

  • Data Sources: Full EVM state access, including calling other contracts, accessing contract storage, and traces. Indexing logic is defined at execution-time via Solidity listeners, enabling intra-tx visibility and selective block filtering.
  • Performance: Real-time enabled; indexing runs as blocks execute. Backfills are highly parallelized and filtered, enabling rapid ingestion of historical data.
  • Chain Support: Supports 10+ EVM chains with more added monthly. Crosschain indexing and querying supported natively. 
  • Data Transformation & Aggregation: Available within indexing and query layers.
  • Query & API Layer: TypeScript + SQL-based REST API layer.
  • Hosting, Control & Cost: Fully managed service. Custom usage-based enterprise pricing. Self-hosted data streaming coming by end of July for teams needing data custody.
  • Developer Experience: Unique model with Solidity-defined indexing logic. Listener and API logic are committed and deployed via GitHub. Tight feedback loops and familiar dev tooling (VS Code / Cursor, GitHub, SQL, TypeScript).
  • Performance: Fast backfills via parallel block execution and selective filtering based on contract activity. Query layer is built for high RPS. 
  • Chain Support: 10+ EVM chains supported. More coming each month.
  • Developer Experience: Unique Solidity-based indexing logic gives onchain clarity and composability. TypeScript + SQL-based API layer enables rapid iteration and flexible output formats. No need to manage RPCs, infra, or storage.

IV. Why Dune built Sim IDX

Sim IDX was created to address long-standing limitations in traditional blockchain indexing. Most existing systems as mentioned above extract data from full nodes after execution, transforming it externally. This model introduces tradeoffs: developers must often choose between real-time responsiveness and reorg safety, accept limited visibility into execution details, or take on the operational burden of managing infrastructure.

Dune built Sim IDX to challenge this model by embedding indexing logic directly into a custom instrumented EVM (iEVM). This architecture, developed initially by smlXL team and advanced by Dune since its acquisition in November 2024, removes the need for developers to manage node infrastructure or complex ETL pipelines. This reverses the typical flow–processing relevant blocks as they execute rather than after the fact–which enables fast, parallelized backfills and sub-second data availability. Indexing becomes an execution-layer operation, allowing for much finer-grained access to blockchain state.

This architecture was motivated by the need for more precise data, particularly for complex protocols like DeFi apps, where intermediate state changes within a single transaction can be just as important as final state. For example, capturing each price update in a volatile swap, rather than just the end result, can unlock new types of analysis and real-time responsiveness that event-based systems can miss.

Sim was also designed to simplify protocol-wide observability. Many projects need to track a large and dynamic set of contracts—such as every ERC-721 or every Uniswap V3 fork. With Sim, developers can define a single indexing job that targets all contracts implementing a given interface, eliminating the need for long, static address lists or multiple pipelines.

Finally, Sim IDX reflects a belief that developers should have both control and velocity. It supports Git-based workflows with preview environments, requires no node or DB maintenance. By moving indexing closer to execution and reducing operational friction, Sim is built to serve teams that need speed, accuracy, and flexibility as they scale. 

Currently, Sim is only available through a fully managed system, meaning that it’ll be zero-devops and overhead for the team to manage the pipeline and start building with Sim. In the coming month, Sim will also roll out options for self-hosted database for those who needs more control over their data. 

VII. Contributor Perspectives

Andrew Hong, Founder of Herd 

"In blockchain data, where you do your table transformations will heavily impact the latency and cost of your data pipeline. Sim IDX allows you to move those transformations to the start of the pipeline, avoiding giant tables and expensive joins later on. It's a fresh take on data engineering in crypto that we leverage at Herd."

Billy, Developer Relations at Api3

“The costs of RPC service can be quite high unless you run your own infrastructure, so the barrier to entry can be high. 

Arbitrage and liquidation opportunities are very fast paced, where every second counts. So any delays on indexers costs you opportunities, especially during times of volatility in the market.

Teams need to factor in the cost of doing business from abstracting some of the in-house maintenance versus the technical debt cost of maintaining it in house.  Sometimes, you just have to launch.

Sim is VERY interesting. I'm actually excited to use it for real-time updates that can trigger on function calls and realtime collateral at the smart contract level.  A whole new level of services can come from this level of indexing as well as advantages in timing.” 

CryptoFede, EX-Head of DevRel at Mode Network

“ I just love when a team decides to take on a problem that seemed to be solved and just break down every part of the problem to come up with a solution. Sim IDX is innovative and I'm honestly excited to see it's performance.” 

Danning Sui, Data Scientist at Flashbots

“The data indexing landscape in crypto today is both mature and competitive. For most crypto companies starting a data team, it’s generally more cost-effective and strategic to subscribe to an enterprise-grade streaming service rather than building and maintaining an in-house real-time indexing pipeline and data warehouse. Unless your core business involves building data products, using a vendor will reduce your cost of hiring a full in-house team down to 10%.

Operating your own pipeline also means grappling with countless technical challenges: ensuring RPC/node reliability, managing real-time freshness vs. chain reorg consistency, and more. With onchain data indexed, the primary role of the internal data team often shifts to orchestration: integrating external onchain and internal offchain sources and enabling analytics through visualization and BI tools of choice.” 

Crypto venture or research firms often rely almost entirely on Dune, but for most companies with an actual product regardless a chain or dapp, internal pipelines remain essential for tying user context to onchain behavior, and lots of them today run a hybrid data setup. Based on teams I’ve built and worked with, the common stack includes a vendor-streamed ClickHouse, orchestrated via Airflow, with analysis in tools like Hex.tech. This allows teams to combine offchain sensitive user data with onchain behavior—critical for growth and product insights. For visibility, teams also need to decode contracts, or push their protocol data to Dune—via Spellbook or public uploads—because Dune has become the go-to destination for industry metrics and dashboards.”

Johnnatan Messias, Research Scientist at MPI-SWS

“ As a blockchain data researcher, I often need to access data across multiple chains. This introduces considerable complexity and consumes time that could otherwise be used for deeper analysis or exploring new research directions. Even working with a single chain can be time-intensive, and the challenge increases significantly when dealing with multiple chains, especially high-throughput ones like rollups.

In the past, I maintained my own nodes, but this approach became impractical due to the growing volume of data and the expanding number of chains. In this context, selecting the right indexing solution is critical for my work, as it allows me to focus on data analysis and answering the research questions that matter most in my work.

Sim IDX offers a powerful framework for indexing and transforming blockchain data. It’s particularly valuable for analyzing complex on-chain behaviors that static SQL queries cannot easily capture. It can be used for vote tracking, airdrop eligibility, DeFi transaction flows, and DAO governance patterns. This capability is highly useful to my day-to-day research.”

Pool, Data at Berachain 

“ Onchain indexing is a core part of applications, but it's often overlooked by developers entering the space. First-time builders often focus on smart contract and frontend design, but quickly discover that querying blockchain data directly is prohibitively slow and expensive. Indexing platforms serve as critical infrastructure, powering the backends of everything from DeFi dashboards to NFT marketplaces.

The new Sim IDX makes it easier for users to leverage onchain data, abstracting away the complexities of indexing infrastructure. I look forward to seeing how builders use them.”

IX. Conclusion: The Path Forward

The EVM indexing landscape is no longer a one-solution domain; it's a vibrant ecosystem of specialized tools addressing diverse needs. Indexing is also table‑stakes infrastructure, not a nice‑to‑have add‑on. As this guide illustrates, choosing the right indexer hinges on carefully evaluating your application's core requirements: the criticality of realtime latency, the complexity of data transformations, the depth of state access needed (events, traces, contact classifications), the scale of historical backfilling, multi-chain support, and your team's tolerance for infrastructure management versus the desire for developer velocity.

From the decentralized philosophy of The Graph to the granular control of Ponder, the speed of Envio's HyperIndex, the bespoke power of homegrown solutions, and the novel execution-layer approach of Sim IDX – each tool makes distinct trade-offs. There is no universal "best," only the "best fit."

The key takeaway is intentionality. Understand your data needs deeply, prioritize your non-negotiables, and leverage the comparisons and insights from practitioners shared here. This dynamic space will continue evolving rapidly, driven by new chain architectures, data-intensive applications (like intent-based systems and AI onchain), and demands for verifiable indexing. By selecting a solution aligned with your fundamentals today, you build a resilient foundation ready to adapt tomorrow. 

The realtime data must flow.

Contents

Ready to bring your Blockchain to Dune?

Power your App with Dune data

Steam Dune data  in your analytics environment

Want to join Dune?

Related chains:
No items found.

Dune Catalyst

Integrate your blockchain and tell your story on Dune.

Sim

Access realtime, multichain data in one platform

Dune Datashare

Get 1.5M crypto datasets, ready to export or stream directly where you need it.

Ready to get started?

Individuals + Small Teams

Create and explore queries, dashboards and trends with 500k+ data analysts.

Enterprise

Tailored solutions trusted by 6k+ Web3 teams and premier enterprises