Machine Learning: Sudoku Benchmark - Batch size

Context

In the process of machine learning, the model is fed with a dataset for training. While the entire dataset is used for training and/or validation, it is previously splitted into several chunk called batchs. There isn't a simple and optimized for all model, it mosty depends of its design and the underlying hardware.

The hyperparameter batch size has two main implicatons:

A large chunk allows faster computations, as data transfer are less involved
An adpated size improve the quality of the learning

Observaton: What is the optimal batch size ?

We iterate over different batch sizes and we find the best one by crossing train speed, loss and inference score

{ "configuration": { "chart": { "type": "spline", "polar": false, "zoomType": "", "options3d": {}, "height": 500, "width": null, "margin": null, "inverted": false, "zooming": {} }, "credits": { "enabled": false }, "title": { "text": "" }, "colorAxis": null, "subtitle": { "text": "" }, "xAxis": { "title": { "text": "Batch size" }, "categories": [ 32, 64, 128, 256, 512, 1024, 2048, 4096 ] }, "yAxis": [ { "gridLineWidth": 0, "title": { "text": "Inference score", "style": { "color": "#4BAF50", "font-size": "20px" } }, "opposite": true, "floor": 0, "ceiling": 1, "softMin": 0, "softMax": 1, "labels": { "style": { "color": "#4BAF50" } } }, { "gridLineWidth": 0, "title": { "text": "Loss", "style": { "color": "#8A5CA0", "font-size": "20px" } }, "opposite": true, "floor": 0, "softMin": 0, "labels": { "style": { "color": "#8A5CA0" } } }, { "title": { "text": "Train speed", "style": { "color": "#333333", "font-size": "20px" } }, "floor": 0, "softMin": 0, "labels": { "style": { "color": "#333333" } } } ], "zAxis": { "title": { "text": "" } }, "plotOptions": { "series": { "dataLabels": { "enabled": false, "format": "{series.name}", "distance": 30, "align": "left", "inside": true, "allowOverlap": false, "style": { "fontSize": "17px" } }, "showInLegend": null, "turboThreshold": 1000, "stacking": "", "groupPadding": 0, "centerInCategory": false, "findNearestPointBy": "x" } }, "navigator": { "enabled": false }, "scrollbar": { "enabled": false }, "rangeSelector": { "enabled": false, "inputEnabled": false }, "legend": { "enabled": true, "maxHeight": null, "align": "center", "verticalAlign": "bottom", "layout": "horizontal", "width": null, "margin": 12, "reversed": false }, "series": [ { "name": "Inference score", "data": [ 1.0, 1.0, 1.0, 0.9995, 0.9986, 0.9953333333333333, 0.7525, 0.128 ], "lineWidth": 5, "color": "#4BAF50", "marker": { "enabled": 0 } }, { "name": "Loss", "data": [ 0.26327189803123474, 0.26460690796375275, 0.2732749581336975, 0.2879774570465088, 0.30791865587234496, 0.33643590410550434, 0.469817191362381, 0.9736444652080536 ], "yAxis": 1, "lineWidth": 5, "color": "#8A5CA0", "marker": { "radius": 5 } }, { "name": "Train speed", "data": [ 8549.14182234787, 15680.141107766616, 23745.130478730836, 33286.34902837073, 38571.15468764459, 42428.303119038035, 39961.405817985826, 39342.40257046455 ], "yAxis": 2, "lineWidth": 5, "color": "#333333", "marker": { "enabled": 0 } } ], "drilldown": {}, "tooltip": { "enabled": true, "useHTML": false, "format": null, "headerFormat": "", "pointFormat": "{series.name}: {point.y:.2f} ", "footerFormat": "", "shared": true, "outside": false, "valueDecimals": null, "split": false }, "annotations": null }, "hc_type": "chart", "id": "108232524444844913290058707675858566412" }

Assertions

Loss and inference score are inversely correlated. A loss greater than 0.35 reduces the performance
The train speed doesn't increase after 1024.
The loss is too high after 1024