ARM

1. Overview

ARM is usually used to find relationships between different elements occured in the dataset. Here, I choose a online retail dataset as the datasource, and brieftly describe the common term used in arm.

Support refers to the frequency of both antecedent and consequent products occur in dataset.

Confidence refer to given the antecedent product, the probability that customer buy the consequent product.

Support-and-Confidence-for-Itemset-A-and-B

@image source: https://www.softwaretestinghelp.com/apriori-algorithm/

Lift is the ratio of support of union antecedent and consequent item to the product of single support of antecent and single support of consequent item.

Rule is an expression of relation between antecedent and consequent items. In the retail dataset, the rule describe the relation that a customer buy certain product based on buy other products.

Apriori algorithm is used to find the association rules given by certain support, confidence, lift and other conditions. The Apriori will start with finding the frequent buying single product in retail example, then based on preset minimum support, confidence, the algorithm only keeps those products satisfy the conditons, and use these products to generate new products sets, it will keep repeating the loop until all rules under conditions found.

apriori
@image source: https://imgbin.com/download/tFwH2qpi

2. Data Prep

Raw data from Kaggle (Online Retail Transaction): https://www.kaggle.com/datasets/mathchi/online-retail-data-set-from-ml-repository
sample_raw_data
Result transaction data:
sample_transaction_data_arm

Generally, I transferred row based transaction data to column based, removed the unused columns, and tested a couple of times for the suitable amount of data, make sure the dataset is not too large to casue out of memory error and not too less which makes graph lack of data point, finally I did some transformation to make it ‘transaction data’ format in R.

Detailed process (Python):
https://github.com/BraydenZheng/Product_Recommendation/blob/master/arm/retail_data_clean.ipynb

3. Code ARM (R)

https://braydenzheng.github.io/arm/skip_render/arm.html

4. Result

Here I set 0.1% support and 0.2 confidence for apriori algorithm argument, with 766911 rules generated, and most of rulse contains one or two items in both antecedent and consequent, product buying relation is low in general, but still many connection between buying behaviour.

**Top 15 rules for lift:**
lift_15_group

For the top 15 lift rule, we can see all of these consist of single item, which makes sense for transaction in online retail, single combination (eg. bread + milk) will be most common compared with multiple itemest. Also, the lift value is around 15, we can see the strong connection between antecedent and consequent items there.

**Top 15 rules for confidence:**
confidence_15
The layout for top 15 confidence rules looks scattered in the plot graph, but actually they are condense with same difference in the value, most high confidence rules has support over 0.06 which is quite high in these dataset with only 200 transcation picked.

**Top 15 rules for support:**
support_15_group
The heighest support went to 0.2 with lift around 6, I do see some grey dot on rightdown corner, with high support but low lift, even it’s single item relation rule, compared with other two graphs above, look like the support is least influcence factor for the strength of association.

5. Conclusion

From the practice and observation above, I see the ‘support’, ‘confidence’, ‘lift’ all play differnet roles in association measurement, depend on the scenario, we may look into ‘support’ for finding the buying association with huge orders base, consider ‘lift’ for the buy together items with strong connection, and finally use these rules to help with the product promption and recommendation.

Author

Bofan Zheng

Posted on

2023-02-28

Updated on

2023-03-03

Licensed under

NN