One of the benefits of blockchains is the level of observability they provide. Nearly every question about protocol activity, balances, and user behavior can be answered with blockchain data. A frequently asked question is: What other tokens do holders of one specific token also possess across different projects?
Answering this question is crucial as it allows us to see what top holders of one project also share across other projects. This helps us filter out the noise by identifying which holders have more significant stakes. We can also think about this from the perspective of traditional finance, where we can consider the implications of being able to see the portfolio composition of all major TSLA holders. With on-chain data, we can begin to answer questions like these.
In this example, we will analyze the holders of ENA (though you can use any other token of interest as well), the governance token behind the Ethena protocol. The Ethena protocol issues a synthetic dollar (USDe), backed by crypto assets and corresponding short futures positions. USDe’s peg stability is supported through delta hedging derivatives positions against protocol-held collateral.
We will walk through the analysis of ENA holders to understand the other assets that all ENA holders have in common. This approach allows us to identify other assets that more sophisticated DeFi users also have exposure to. Once this data is isolated, we can delve deeper into the other protocol tokens they are using.
Let’s jump into some code. All the code to recreate this can be found here.
First, we begin by getting all current holders of ENA tokens. This can be done by fetching data from the blockchain/tokens/{address}/holders/latest endpoint. Once we finish fetching all this data we can begin to do a bit of data clean up.
Now that we have a list of all the wallets that hold ENA, let’s pull each of the wallet's holdings for all other tokens.
After we iterate over all the wallets that hold ENA, we can start to do some data cleaning to isolate a population of interest for us. First, we are going to start by filtering out some tokens that are most likely spam. To do this we filter our data set for just tokens that contain numbers/letters/underscores. This will remove any non-normal tokens with `.com` or other special characters in the name. There are other approaches we can take to filter out spam but for now, this will do. We apply a few more transformations to our data to clean it up for the final analysis.
Finally, we take a simple count of all wallets with balances and give them a boolean value of 1 if they own a balance of the token. After we create our pivot table of data we apply a few more transformations and cleanup. We start by defining a number of wallets that must own assets of interest. This helps us focus on just a population with high wallet overlap.
Finally, we can plot this data to get a view into the composition of larger ENA holders' wallets.
We can see some broad categories of tokens that stand out.
We can apply the same logic and instead of counting the number of wallets that own the assets we can just apply the total number of tokens they hold as well to see who is more concentrated in each asset.
Now, we can apply these same ideas across any other project of interest and start to find a good portfolio weighting or build indexes based on the weight of the tokens. Or, you can use this data to find areas of deeper research and analysis.
In the next write-up, we will dive deeper into data filtering and cleaning addresses. A big piece of analyzing wallet activity is contextualizing what wallets are doing and “who” they are. We will explore EOA, Exchanges, Whales, Multisigs, and other smart contracts and what it means to watch balances in each of these on-chain identities.