AI has gotten really, REALLY good at static image recognition. Picking the latte from the glass of milk is pretty much ‘done and dusted’.
The new frontier in computer vision is now video. Being able to analyze and recognize video has enormous and far-reaching potential. Just consider real-world opportunities like self-driving cars and surveillance cameras.
One benchmark of performance is the ActivityNet dataset, which contains nearly 650 hours of footage from a total of 20,000 videos.
Of the 200 activities of daily life in that dataset, AI systems had the greatest difficulty recognizing the simple activity of coffee drinking in both 2019 and 2020!
This seems like a major problem, since that cup of ‘Joe’ is the fundamental activity from which all other human activities flow 🙂