NN
1. Overview
Neural Network is a subfield of machine learning that teach computer to process data in a way that is inspired by human brain. A typical neural network consists of interconnected neurons organized into layers, including an input layer, hidden layers, and an output layer. Each neuron receives and processes input data, passing the results along to the next layer.
@Image Source: https://raw.githubusercontent.com/BraydenZheng/img/main/uPic/Artificial-Intelligence-Neural-Network-Nodes-1024x670.jpg
2. Data Prep
Using the same amazon store sales data as before
I did following steps to clean optimize the data.
- Keep only numerical value as input feature, for training purpose
- Clean and format Numerical data
- Normailze data with standard scaler
- Outliers removal for some columns
- Discretize the rating column into 2 buckets, and take it as label, in order for classification purpose.
Code Step: https://github.com/BraydenZheng/Product_Recommendation/blob/master/nn/data_prepare.ipynb
Split training and testing data as 80%, 20% portion accordingly.
Lable used here will be rating column, which consist value [0, 1] to stand for good rating / bad rating.
Cleaned Data
3. Code
Model training and evaluation: https://github.com/BraydenZheng/Product_Recommendation/blob/master/nn/nn.ipynb
4. Results
This is basic 3 layers NN network using sigmoid activation function. In general, the library model MLPClassifier has good F-1 score 0.70, providing a balanced measure of both precision and recall. The accuracy is compartively low, which is only 62%.
Regarding to individual label recognizition, Label 0 prediction accuracy are obviously betten than label 1, possibly due to the lack of data and basic level of NN structure.
5. Conclusion
The neural network is a powerful tool for classification tasks. Even a simple NN with few layers and parameters can achieve good performance. However, NNs can also be resource-intensive and time-consuming to train. Striking a balance between resource usage and performance is an important consideration when choosing a NN model, particularly for retail stores with large datasets.