What does it take for a computer to see you? How about see your face and detect the composition in order to crop it out?
That is a problem I ran into while working on Kynto face mapping as suggested by Thomas. Let me fill you in a little bit how it works.
Kynto is built ontop of WebRTC – which in our case means Kynto can speak to other users without going through Kynto server which is the backbone of who owns what, and account authentication etc. Why would I want to bypass the server? well i’m looking to arm users with functionality beyond just text. That is why I introduced screen sharing, video, audio, and dj hosting modes all in one lift.
That brings me to my next problem. How can I map the webcam users face to their virtual avatars face, and keep the users face in frame despite users being at different distances and positions on a moment to moment basis.
There are off the shelf server components such as Amazons rekognition and Microsofts Azure offerings but that would necessitate a server component. For me this is unacceptable – there is a time and a place to invoke such measures but the browser is more than capable of powering a solution. Given the developer is willing to learn how.
That brings me to TensorFlowJS.