Gathered the images of hands with the products.
To hands without products.
And sequentially saw if the hand came first or a hand with product came first?
Depending on that identified which product it was and used that to assess the number of products in the cart and measure their weights.
This was parallel to identifying what the product was which was done in Created a Visual image Text-based search engine.