Predictive Analytics is a scientific method that uses data, statistical algorithms, machine learning and artificial intelligence techniques to identify future trends and make predictions about the unknown events in the future. The goal is to mine data, apply predictive modelling techniques to analyze current and historical data to provide best assessment of what will happen in the future.
Predictive analytics is used in marketing, business management, sports, insurance, policing, retail, travel, mobility, healthcare, social networking, and several fields. Credit Scoring is one of the best-known applications of predictive analytics and everyone with a bank account would have interacted with a predictive analytics system in their banking journey.
In this post we are going to use Predictive Analytics to predict propensity of an Art Blocks NFT to resell using Flipside Crypto, Google BigQuery ML and AI Platform.
Objectives & Approach
The objective is to predict which Art Blocks NFTs are going to be resold in the month of August 2021 by analyzing the historical data of NFT sales on Ethereum blockchain. The solution involves the following steps
- Explore Art Blocks NFT sales and resales historical data
- Extract and label NFT sales & resales data
- Choose a Machine Learning & AI model
- Create, train, and deploy Machine Learning & AI Model on Google BigQuery ML platform
- Evaluate the model and understand its performance
- Used trained Machine Learning & AI model to generate predictions
- Evaluate accuracy of predictions
What are Art Blocks, Flipside Crypto and Google BigQuery ML?
- Art Blocks – Art Blocks resurrected the Generative Art that has been around for a long time in the world of NFTs. Demand of Art Blocks NFTs are sky high and several artworks from artist Tyler Hobbs collection are being sold in millions of US Dollars. Art Blocks NFTs became one of the most popular and prestigious NFTs to own after Crypto Punks! Check out Art Blocks over here
- Flipside Crypto – If you are regular reader of this blog, you don’t need introduction of Flipside Crypto. Just in case if you don’t know, Flipside Crypto is an amazing organization with a goal to make Blockchain data easily accessible for everyone to explore & derive insights. The best part it costs nothing. By joining Flipside Crypto you can explore well organized and labelled data of Ethereum Blockchain, Terra Blockchain, Polygon Sidechain and curated data related to NFTs, Uniswap, Aave, Polygon, Compound etc.
- Google BigQuery ML – Google’s BigQuery ML is revolutionizing the way Machine Learning & AI is used by data scientist to predict future using historical data. BigQuery ML lets any SQL developer create and execute machine learning models in BigQuery using standard SQL queries. With BigQuery ML machine learning on large datasets does not requires extensive programming and knowledge of ML frameworks or python.
1/
— Art Blocks (@artblocks_io) February 23, 2021
Art Blocks is a first of its kind generative art project. It represents fundamental innovation in the combination of art and technology.
What do you do? Unlock unique art on the Ethereum blockchain, each project produced by different artists.
How do you do it? See below
Exploring Art Blocks NFT Sales & Resales till July 2021
Exploring Art Blocks sales and resales on Ethereum blockchain is amazingly easy with the help of Flipside Crypto. Before we start working on Machine Learning models to predict future resales, let’s explore and understand sales till July 2021. Here are few charts to help us understand Art Blocks sales trend
The first chart shows daily sales transactions starting late 2020 till July 2021. This chart includes initial sale during minting process as well as subsequent sales, referred as resales from now onwards.
This chart shows the daily resales in logarithmic scale as resale events range from 2 to 10,000 till July 2021.
This chart shows resale transactions by Art Block’s collection.
Extracting Art Block’s NFT Sales Data and Labelling It
One of the most important steps in predictive analysis using machine learning is to extract right data and label data properly. Data Labeling is the process of identifying raw data and adding meaningful and informative labels to provide context so that machines can easily learn from historical data. To predict future resales, we should extract raw data from Flipside Crypto and properly label “resale” events. Here is a query to extract historical Art Blocks NFT sales with “resale” label required for machine learning.
WITH sales as (
SELECT
evt.block_timestamp,
token_metadata:collection_name::string as collection,
mt.token_metadata:name::string as token_name,
price_usd,
evt.token_id,
row_number() over(partition by evt.token_id order by evt.block_timestamp) as sale_seq,
CASE WHEN sale_seq > 1 then 1 else 0 end as is_resold
FROM ethereum.nft_events as evt
INNER JOIN ethereum.nft_metadata as mt ON evt.contract_address = mt.contract_address and evt.token_id = mt.token_id
WHERE evt.event_type = 'sale'
and evt.block_timestamp IS NOT NULL
AND evt.block_timestamp >= CURRENT_DATE - INTERVAL '6 Months'
AND evt.project_name = 'art_blocks'
and evt.price_usd > 0
and mt.token_metadata:platform::string ='Art Blocks Curated'
)
SELECT * FROM sales;
The following table shows the sample data extracted from Flipside Crypto while the actual data extracted was more than 50K records.
Machine Learning & AI Model for Predicting NFT Resales
Machine Learning models can be classified into the following three groups
- Binary Classification Models - these models are used to predict binary outcome like "Is this customer going to buy a product or not" or "Is this NFT going to be resold or not", etc.
- Multiclass Classification Models - these models are used to predict one or more outcomes for a given record. Examples include "What is the genre of the movie?", "What is the age group of a customer", etc.
- Regression Models - Regression models are used to predict a numeric value for predicting future values like sales, price, etc.
Predicting if an NFT is going to be resold or not is a binary classification problem (yes/no) and we will be using logistic regression model to determine which NFTs are going to be resold in August 2021 based on historical sales data.
In statistics, the logistic model (or logit model) is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1, with a sum of one.
from wikipedia
Create, train, and deploy Machine Learning & AI Model on Google BigQuery ML platform
We identified the Machine Learning & AI Model for predicting NFR resales and labelled historical data with the help of Flipside Crypto warehouse. Now it is time to create, train and deploy a logistic regression model on Google BigQuery ML platform.
Let us start with loading data extract from Flipside Crypto into Google BigQuery platform into a table named "art_blocks.resale_model". Once the data is loaded it is time to create a prediction model using the following statement.
CREATE OR REPLACE MODEL `art_blocks.resale_model`
OPTIONS(MODEL_TYPE = 'logistic_reg',
labels = [ 'is_resold' ]
)
AS
SELECT
*
FROM `art_blocks.sales`
where BLOCK_TIMESTAMP < '2021-08-01';
The above statement creates "lotistic regression (llogistic_reg)" model and specifies that historical data label we are trying to train is "is_resold". After model is successfully created, select the model, and explore model training statistics.
Evaluate the NFT Resale Model - How Confident is Our Prediction Model?
To get a sense of model's performance we can look at its AUC: Area Under the ROC Curve performance metric. A model's AUC value can range between 0, indicating no predictions were correct, and 1, indicating all predictions are correct. If you run the following query, it shows how confident is our model.
SELECT
roc_auc,
# evaluating the auc value based on the scale at http://gim.unmc.edu/dxtests/roc3.htm
CASE WHEN roc_auc >.9 THEN 'excellent' WHEN roc_auc >.8 THEN 'good'
WHEN roc_auc >.7 THEN 'fair' WHEN roc_auc >.6 THEN 'poor' ELSE 'fail' END
AS modelquality
FROM ML.EVALUATE(MODEL `art_blocks.resale_model`);
Wow! The ROC Curve score says 0.968! It's an excellent score a data scientist can expect. Our model thinks it can predict future resales with 96% accuracy. But is the model going to really that good? Let us ask the model to predict resales for August 2021
Predicting August 2021 Art Blocks NFT Resales
It's time to predict the future! Run the following query to get batch predictions for the month August 2021.
SELECT
BLOCK_TIMESTAMP,
collection,
token_id,
token_name,
is_resold,
predicted_is_resold
FROM ML.PREDICT(MODEL art_blocks.resale_model,
(
SELECT
TOKEN_ID,
BLOCK_TIMESTAMP,
COLLECTION,
TOKEN_NAME,
SALE_SEQ,
PRICE_USD,
IS_RESOLD
from `art_blocks.sales`
where BLOCK_TIMESTAMP between '2021-08-01' and '2021-08-31'
))
Here is a sample list of predictions generated by model along with the actual values for the Art Blocks Collection "Spectron by Simon De Mai".
Art Blocks NFT Resale Predictions by Google BigQuery ML are 95%+ Accurate
One of the surprising facts I learnt while going through this exercise is how accurate is Google BigQuery ML & AI framework. This framework can predict correctly more than 95% of time if an Art Blocks NFT is going to be resold or not. Let's us explore the charts and compare predicted vs actual.
The chart below shows actual resale transactions of Top 10 Art Blocks NFTs for the month of Aug 2021 and compares them with predictions generated by Google BigQuery ML. Predictions for Geometry Runners & Phase were spot on.
This chart below shows accuracy percentage of Top 10 Art Blocks NFT resales prediction for the month of Aug 2021.
The follow two charts show the predictions for all Art Blocks collections.
Closing Thoughts
This exercise was a lot more fun than I anticipated. It is mostly because how easy it was to explore Art Blocks transactions on Flipside Crypto and use that data to predict with high accuracy using Google BigQuery ML platform. Flipside Crypto provides clean data with proper labels and Google's BigQuery ML provides SQL Interface to build Machine Learning & Models. As a SQL enthusiast I could not ask for more and here are few insights from this exercise
- Art Blocks resales are extremely hot in the past few weeks and several NFTs are being sold for millions. Predicting which NFT is going to be resold in future is a good opportunity for investors (whales).
- Overall accuracy of Google BigQuery ML model used in this exercise produced stunning accuracy rate of 96%
- The accuracy of the model is 100% for collections like Geometry Runners and phase.
- The accuracy of the model drops drastically for the Art Blocks collections which had lower number of sale transactions
- Data required for training Machine Learning & AI models is quite easy to gather on Flipside Crypto warehouse (its FREE! )