In this post we will focus on the off-chain parts of the code. We’ll discuss the need for “indexers” and walk through some parts of the implementation of the indexer in this example. You can find the complete repository with all the code we’ll discuss today on my GitHub.
Indexers, what they are and why we need them
In the blockchain space an indexer is a service which consumes raw data from a source (typically a co-located full-node instance for that blockchain) and parses it into a format which is more useful for a specific application. For example, in the case of our chat app, the indexer consumes a stream of Near blocks and produces a stream of events (e.g. received messages and contact requests).
Indexers are important because the databases used in operating the blockchain itself are generally not optimized to perform the kinds of queries applications care about. For example, getting a user’s balance for an ERC-20 token on Ethereum is usually done by running the query through the EVM because that is the only way the information is available from a typical Ethereum node. This is an extermely expensive operation compared with looking up an entry in a traditional relational database. Therefore a simple optimization to any application that needs fast access to ERC-20 balances would be to run an indexer on the raw Ethereum data which populates a traditional database with the balances it cares about. Then the application would use this database as its source for the balances instead of an Ethereum node directly. This is how the Etherscan block explorer works under the hood; Etherscan runs an indexer to populate a database which is then used to populate the fields in the web pages Etherscan serves.
Indexers are not just important for Ethereum, any high performance dapp on any blockchain will need to include an indexer somewhere in its architecture. The example chat app we have been discussing on Near is no exception, so let’s dive into how the indexer is implemented.
Getting the raw data
Indexers only process raw blockchain data into a format the associated application can use; they do not generate the data in the first place. Therefore, the first question we need to answer when creating an indexer is: where does the blockchain data come from?
Near provides a few different data sources, as described below.
Running a nearcore node
The best source of data (in terms of decentralization and security) for any blockchain is the peer-to-peer network of the blockchain itself. To access this source of data you must run a node that understands the blockchain’s protocol. In the case of Near, the node implementation is called nearcore. Its source code is open on GitHub. There is documentation available on how to run your own nearcore node. The main barrier to entry here is the amount of disk space required for this; it is recommended you have 1 TB of dedicated storage for your node and it takes a while for it to synchronize with the chain as a result of needing to download all that data.
Once you have a nearcore node setup, Near provides a convenient indexer framework in Rust that can be used to build indexers with nearcore as a data source. For a real project this would be the best way to create an indexer. However, our example is just a demo, so we don’t want to spend hours of time downloading chain data to a dedicated 1TB server. Fortunately there are other options.
Near data lake
To make it easier for developers to get their projects started, Near created the data lake framework as an alternative source of data for indexers to use. The data lake framework is built on top of the indexer framework mentioned above, using a nearcore node as the source of data. The indexer feeding the data lake is trivial in the sense that it is not processing the data for a specific application, it is just passing the data long to be stored in AWS S3 storage. However, this enables developers to get access to this data using their own AWS account and then build their own (non-trivial) indexers using this S3 storage as the data source.
The advantage of this for developers is that this method is much faster to get working. The disadvantage, though, is that the data is coming from a centralized source and is therefore theoretically easier to corrupt than using the peer-to-peer network directly.
Accessing the data lake requires you to pay for the AWS resources you use in delivering that data to you. Once again, for the purposes of the chat app example, I didn’t want to make people sign up for AWS and spend money to run the indexer. Therefore, I chose the final data source option.
Public RPC nodes
The final way to access blockchain data if you are not running your own node or accessing someone else’s pre-built data store is to use someone else’s nodes. RPC nodes are nodes in the blockchain network that are meant to serve the requests of users. Every blockchain has RPC node providers (some free, some paid). A list of the RPC providers for Near can be found here.
This is the least efficient way to access blockchain data because it takes multiple RPC requests to get the data typical indexers use. Each RPC request incurs network latency, thus making the indexer sluggish to respond to events happening on-chain. The only advantage of this approach is that it is free to setup a demo as long as there is a free RPC provider for the chain (which is the case with Near). Therefore, this is the source of data the indexer in our example uses.
All that said, the indexer itself does not care where its data comes from. Therefore even though our example is using the worst data source, it’s worth exploring its implementation because the concepts this indexer uses are the same as those in an indexer that is built using Near’s data lake or node-based indexer frameworks.
Our indexer is built as a tokio app in Rust. Tokio is a Rust framework for writing high-performance applications where I/O operations are the main bottleneck. Our indexer is such an application because the actual computation it performs is extremely fast compared to the time it takes to request data from the RPC nodes. The main features of tokio are that it uses non-blocking asynchronous pimitives and has multithreading built-in to enable parallel execution. This is in addition to it being in Rust, so it naturally has the concurrency safe and memory safe guarantees Rust provides.
If tokio is the stage on which our application is set, then what follows are the actors in the play (pun intended; this application follows the actor model, but I choose to do it in tokio directly instead of using a library like actix because I think tokio’s channels provide stronger typing than the generic messages used in most actor frameworks).
The indexer has four main roles: the manager, the block downloader, the chunk downloader, and the receipt handler.
The manager process supervises the whole indexer. It is responsible for delegating work to the other processes, and telling them to shutdown when the program is being closed (e.g. in the case of an error being encountered). For example, the manager handles the load-balancing of the chunk downloaders by cycling through them when assigning a chunk to download.
The block downloader
As the name implies, the purpose of the block downloader process is to download blocks. It periodically polls the Near RPC to check if there are any new blocks and if there are then downloads them and sends them to the manager. If we were not using the RPC as our data source then this process would be replaced with a connection to a Near node or data lake instead.
The chunk downloader(s)
On Near, blocks do not contain the data about transactions; chunks do. Blocks only give information about what new chunks are available. The reason for this is because of Near’s sharding (you can read more about that here). Therefore, we need separate processes to download the chunk data for each block. The chunk downloaders fulfil this role. Our indexer has multiple chunk downloader instances to enable downloading the chunks in parallel.
If we were not using the RPC as our data source then depending on how the data is factored in the data source we were using these processes might not need to exist, (for example near-indexer-framework includes all block and chunk data into a single message). But for our case, since we are using the RPC, these processes are necessary.
The receipt handler
Chunks contain “receipts” which are created when a transaction is processed. When the manager receives a new chunk from a chunk downloader, it sends all the receipts to the receipt handler process (we could have multiple receipt handler instances to process receipts in parallel just like how we have multiple chunk downloaders, but the receipt processing is fast enough that I didn’t think this added much of a performance improvement). This process filters down the receipts to only the ones we care about, then downloads the execution outcome for the receipts, and finally processes the events from those outcomes. In the case of this example, we simply write the events to a file (for a live demo you can watch the file with something like the
tail -f Unix command to see the events come in), but you can imagine a production implementation could forward these events as push notifications to a mobile version of the app.
You may notice throughout the indexer code that there is some complexity around sending chunks/receipts with the block hash after the block which included those chunks. This is a quirk of the Near RPC where it wants to know you are aware of later blocks to serve the execution outcome. Again this would be handled much smoother if using a better data source.
It is intentional that there are no panics in any of the actor functions. When they encounter an error they log it and send a shutdown message to the manager (and the manager sends it out to all other actors). This is important because panicking in a multithreaded application can cause unexpected behaviour (in general tokio is pretty good about bringing down the whole app gracefully, but it’s still better to code defensively against it).
In this post we discussed why indexers are important for real-world dapps and looked at some of the details of the example indexer implemented for the chat dapp. As with the previous post, there are exercises in the indexer code included in comments tagged as
EXERCISE. I encourage you to try out these exercises if you want some hands-on experience with the code-base.
About the Series
This is the final post in this series. In Part 1 we took at look at general principles of smart contract development and how they apply to an example contract for a chat dapp. In Part 2 we did a deep dive into how to use
near-sdk to write smart contracts for Near in Rust. Finally, this post discussed how indexers are needed to integrate the blockchain data with off-chain components of our app.
One final piece of the code I did not cover is the integration testing. The integration testing uses the near-workspaces library to simulate the blockchain locally and uses the same async Rust style as the indexer. Even though integration tests are not especially flashy or interesting, they are important to ensure your contract works correctly. I encourage you to take a look at the integration tests for the messenger contract and try the exercise there to get some hands-on experience in that area too.
If you have enjoyed this series of blog posts, please get in touch with us at Type-Driven consulting. We are happy to provide software development services for dapps, as well as training materials for your own engineers.