Guide to Simple Blockchain Analysis using Dogecoin.info API
The aim of this article to show how easy it is with a little bit of python to analyse the dogecoin blockchain and hopefully learn something in the process. The dogecoin.info API allows for easy access to dogecoin blockchain data. In this post I will have quick look at submitting requests and analysing the returned data.
Setup
As with all python data science endeavours, first we have to import all libraries we will be using.
import datetime
import io
import json
import pprint
from typing import Union
import matplotlib.pyplot as plt
import pandas as pd
import requests
import seaborn as sns
Request Block Information From Doge Coin
To send a request to dogecoin.info I am using the python requests library. The get_block
function below allows us to send a simple https request to get information about an individual block by just specifying the block height. To find the current height we can use get_current_height
, which is just a simple http request.
def get_current_height() -> Union[int, None]:
url = "http://dogechain.info/chain/Dogecoin/q/getblockcount"
with requests.get(url) as r:
if r.ok:
return int(r.content)
def get_block(height: int) -> Union[dict, None]:
url = f"https://dogechain.info/api/v1/block/{height}"
with requests.get(url) as r:
if r.ok:
return json.loads(r.content)
Using these functions we can get the current height and retrieve the data for a given block.
current_height = get_current_height()
print(f"Current dogecoin height: {current_height:,}")
pprint.pprint(get_block(current_height))Current dogecoin height: 3,641,435
{'block': {'average_coin_age': '395.32024690',
'confirmations': 1,
'difficulty': 5357404.901646454,
'hash': '32bce595d10cbea0d7a897c38f9c0a8629df24524aead7478e799c9ea05646e2',
'height': 3641435,
'is_orphan': False,
'merkleroot': '5d07643de6e1784ed5a2ba1169a9b8761116bd12b8f8f59bb5d494c9fccb2083',
'next_block_hash': None,
'nonce': 0,
'num_txs': 14,
'previous_block_hash': '0182961c8c22b153f659936e7c34432031410d45a7e1266e07217a9fe6688eeb',
'time': 1615460006,
'txs': ['eae8fe99edf143b2564b750add70eca6d07381f38a75a1c226a4be1117cdd4a7',
'8f17d5e66025a6331310076d5d8f217b6e52a7aa12f7ea5b5735816fc2097b90',
'67922ff4f84a4cfabf526bf0009e2b06d67cf0427971e351e2e9969cfd1c4e29',
'5d6338526282f741b340fd59964101e9be3d9378bd97ad5ed46cc4c30e6d94b8',
'f4503911cb02335e423639fccf512b40cf3170a579826a2440bfe2432a92a95d',
'4c9009ecf1edb0f964a0f3619cb64c34513dfe4e0843f63168b6cbfb9181ddae',
'ab7b5c984ee8d7752c0098183b5e212aa006471ca153407aeea852dc2cc5267d',
'754c193e93b07b779b18b81066898ebfd1d39571b680f5413618cfe6a7dacfe7',
'61bd926707e7c89095b0e260871105fc9dd1d144bd0dc046a3831d7cf07a5b92',
'68948d6e0b139449a31276c61228a51ad750b221b87b2103df87a6f866be2d55',
'7d819208f26c04fb1d8b3cf2556c0b8b24e845534d4c68d9f326537ace1dfdb3',
'c7fd9604f4f2a276f05b6ed9adb8e1d421d37ae8370d3f47e850545dd7e06ccc',
'd9023bd35020d9978101bacd2173bfeed077bb215a939f620c659267269c55a4',
'31479f8638475c04abbce84871bf0ebaedfa0b33d04f3f3a00e1430896710b5a'],
'value_in': '17542518.49462023',
'value_out': '17552518.49462023',
'version': 6422788},
'success': 1}
There is a lot of data in each block. It is worth noting the data is nested in the json object under the block
key.
Sampling Blocks
Dogecoin roughly adds a block every minute and to query for every block would take a long time, so I am going to sample by requesting every 10,000th block. To get all the relevant block I just loop over the block heights and store the results in a dictionary.
sample_every_n_blocks = int(1e4)
sample_block_heights = range(0, current_height, sample_every_n_blocks)
sample_blocks = {height: get_block(height) for height in sample_block_heights}
Convert to Pandas
To further analyse the data I convert the json objects to a pandas dataframe. First I remove any empty responses. Then as I mentioned earlier the actual block data is nested into the json object under the key block
. These block data are then passed to pandas as a list of dictionaries.
df = pd.DataFrame.from_records([block.get('block') for block in sample_blocks.values() if block])print(list(df.columns))['hash', 'height', 'previous_block_hash', 'next_block_hash', 'is_orphan', 'difficulty', 'time', 'confirmations', 'merkleroot', 'num_txs', 'value_in', 'value_out', 'version', 'average_coin_age', 'nonce', 'txs']
All the fields are present in the dataframe, but for now I am only going to focus on a few.
df = df[['height', 'num_txs', 'value_in', 'value_out', 'average_coin_age', 'time', 'difficulty']]
df.tail()
Handling Time
The creation time of each block is in the time
column as a timestamp. This can easily be converted to a python datetime. Given the time between blocks is around 7 days, I have just rounded to the nearest date to simplify further analysis.
df.loc[:,'time'] = pd.to_datetime(df.loc[:,'time'], unit='s').dt.round('1D')
df = df.set_index('time')df.tail()
Price Data
As part of most blockchain analysis we probably want to include the value of doge against the US dollar. There are many APIs and websites that provide price data but for simplicity I use yahoo finance. By simply specifying the currency pair (in this case DOGE/USD) and the start and stop times you can get the open, high, low and close prices trivially.
By using the requests library again we can get pull the prices. By using the min and max times of our blockchain data we can request the relevant timespan from yahoo as well. The yahoo finance data is returned as bytes and we can use io.BytesIO
to push the data into pandas.
def get_doge_price_data(start_timestamp: int, end_timestamp: int) -> Union[bytes, None]:
url = f"https://query1.finance.yahoo.com/v7/finance/download/DOGE-USD?period1={start_timestamp}&period2={end_timestamp}&interval=1d&events=history"
with requests.get(url) as r:
if r.ok:
return r.contentprice_data = get_doge_price_data(int(df.index.min().timestamp()), int(df.index.max().timestamp()))
df_price = pd.read_csv(io.BytesIO(price_data), parse_dates=['Date'])
df_price.head()
Joining Price Data and Blockchain Data
We now have two dataframes: one of blockchain data and one of prices, but it would be better if we could combine them. Given they both contain a date column we can use this for the joining.
df = pd.merge(df, df_price[['Date','Close']], left_index=True, right_on='Date')
df.set_index('Date', inplace=True)df.tail()
Correlations
Now we have a rather our dataframe will all the data of interest lets have a quick look at correlations between columns. For this I am using pandas corr()
functionality and the seaborn heatmap. Before the correlation a few of the columns need to be converted to a numeric data type. This goes back to the format of the json returned by dogecoin.info as some of the numeric values are returned as strings.
df['value_in'] = pd.to_numeric(df['value_in'])
df['value_out'] = pd.to_numeric(df['value_out'])
df['average_coin_age'] = pd.to_numeric(df['average_coin_age'])fig, ax = plt.subplots(figsize=(12,6))
sns.heatmap(df.corr(), annot=True, ax=ax);
Unsurprisingly the value_in and value_out are perfectly correlated. Aside from this the strongest correlation found is between blockchain height and difficulty.
Time Series
Lets have a look at how difficulty varies over time. Above we used height, but given the near-constant rate of block creations we can switch to time and in the process get a more interpretable feature.
Below we have made of pandas and matplotlib to visualise the DOGE/USD price and the difficulty. To smooth out the spikes we have used the rolling
pandas method to calculate a rolling mean. As doge spent most of its initial life at a very low price and in the last couple of years has jumped significantly we have a very wide range of values to represent. To achieve we have a log y-scale.
fig, (ax1, ax2) = plt.subplots(figsize=(12,12), nrows=2)
df['Close'].plot(ax=ax1, label='Close')
df['Close'].rolling(7).mean().plot(ax=ax1, label='rolling mean Close')
ax1.set_ylabel('Closing price DOGE / USD')
ax1.set_yscale('log')
ax1.legend()
df['difficulty'].plot(ax=ax2, label='difficulty')
df['difficulty'].rolling(7).mean().plot(ax=ax2, label='rolling mean difficulty')
ax2.set_ylabel('difficulty')
ax2.set_yscale('log')
ax2.legend();
Both plots show initial low and rather flat values between 2015 and 2017. Then with a step-change in both. In the last few months we have another step-change in price, but have yet to see the same in difficulty. It would be interesting to recreate this plot in a couple of months to see if difficulty increases and whether miners have moved to doge.
Conclusion
With just a little bit of python we have pulled blockchain data and price data from two APIs. We have then converted this raw data into something usable (pandas dataframe. Then using some simple visualisations maybe found out some new information about the dogecoin blockchain!
This article is available as a jupyter notebook on github.