Machine Learning: Sudoku Benchmark - Batch size
Go back to listContext
In the process of machine learning, the model is fed with a dataset for training. While the entire dataset is used for training and/or validation, it is previously splitted into several chunk called batchs. There isn't a simple and optimized for all model, it mosty depends of its design and the underlying hardware.
The hyperparameter batch size has two main implicatons:
- A large chunk allows faster computations, as data transfer are less involved
- An adpated size improve the quality of the learning
Observaton: What is the optimal batch size ?
We iterate over different batch sizes and we find the best one by crossing train speed, loss and inference score
{
"configuration": {
"chart": {
"type": "spline",
"polar": false,
"zoomType": "",
"options3d": {},
"height": 500,
"width": null,
"margin": null,
"inverted": false,
"zooming": {}
},
"credits": {
"enabled": false
},
"title": {
"text": ""
},
"colorAxis": null,
"subtitle": {
"text": ""
},
"xAxis": {
"title": {
"text": "Batch size"
},
"categories": [
32,
64,
128,
256,
512,
1024,
2048,
4096
]
},
"yAxis": [
{
"gridLineWidth": 0,
"title": {
"text": "Inference score",
"style": {
"color": "#4BAF50",
"font-size": "20px"
}
},
"opposite": true,
"floor": 0,
"ceiling": 1,
"softMin": 0,
"softMax": 1,
"labels": {
"style": {
"color": "#4BAF50"
}
}
},
{
"gridLineWidth": 0,
"title": {
"text": "Loss",
"style": {
"color": "#8A5CA0",
"font-size": "20px"
}
},
"opposite": true,
"floor": 0,
"softMin": 0,
"labels": {
"style": {
"color": "#8A5CA0"
}
}
},
{
"title": {
"text": "Train speed",
"style": {
"color": "#333333",
"font-size": "20px"
}
},
"floor": 0,
"softMin": 0,
"labels": {
"style": {
"color": "#333333"
}
}
}
],
"zAxis": {
"title": {
"text": ""
}
},
"plotOptions": {
"series": {
"dataLabels": {
"enabled": false,
"format": "{series.name}",
"distance": 30,
"align": "left",
"inside": true,
"allowOverlap": false,
"style": {
"fontSize": "17px"
}
},
"showInLegend": null,
"turboThreshold": 1000,
"stacking": "",
"groupPadding": 0,
"centerInCategory": false,
"findNearestPointBy": "x"
}
},
"navigator": {
"enabled": false
},
"scrollbar": {
"enabled": false
},
"rangeSelector": {
"enabled": false,
"inputEnabled": false
},
"legend": {
"enabled": true,
"maxHeight": null,
"align": "center",
"verticalAlign": "bottom",
"layout": "horizontal",
"width": null,
"margin": 12,
"reversed": false
},
"series": [
{
"name": "Inference score",
"data": [
1.0,
1.0,
1.0,
0.9995,
0.9986,
0.9953333333333333,
0.7525,
0.128
],
"lineWidth": 5,
"color": "#4BAF50",
"marker": {
"enabled": 0
}
},
{
"name": "Loss",
"data": [
0.26327189803123474,
0.26460690796375275,
0.2732749581336975,
0.2879774570465088,
0.30791865587234496,
0.33643590410550434,
0.469817191362381,
0.9736444652080536
],
"yAxis": 1,
"lineWidth": 5,
"color": "#8A5CA0",
"marker": {
"radius": 5
}
},
{
"name": "Train speed",
"data": [
8549.14182234787,
15680.141107766616,
23745.130478730836,
33286.34902837073,
38571.15468764459,
42428.303119038035,
39961.405817985826,
39342.40257046455
],
"yAxis": 2,
"lineWidth": 5,
"color": "#333333",
"marker": {
"enabled": 0
}
}
],
"drilldown": {},
"tooltip": {
"enabled": true,
"useHTML": false,
"format": null,
"headerFormat": "",
"pointFormat": "<span style=\"color:{series.color}\">{series.name}</span>: <b>{point.y:.2f}</b><br/>",
"footerFormat": "",
"shared": true,
"outside": false,
"valueDecimals": null,
"split": false
},
"annotations": null
},
"hc_type": "chart",
"id": "96754272489634226160706510652091683561"
}
Assertions
- Loss and inference score are inversely correlated. A loss greater than 0.35 reduces the performance
- The train speed doesn't increase after 1024.
- The loss is too high after 1024