Tue, Dec 11, 2018
With clients and partners located around the globe, we've always had a culture of remote collaboration here at Tryolabs. We are used to joining meetings no matter where we are, using tools such as Slack, Google Hangouts, and Zoom. A sweet consequence of this is a generous work from home policy, allowing us to work from home whenever we want.
Trouble is, working remotely leads to us missing out on all the fun that takes place outside of meetings when we’re not connected. Wouldn’t it be cool to have a robot representing us at the office, showing us what goes on while we’re not there?
As a group of computer vision, IoT, and full-stack specialists, we got really enthusiastic about the idea and went on a mission to create a robot that could be remotely controlled from home and show us what takes place at the office.
We spent the 2018 edition of the Tryolabs Hackathon facing that challenge with only three rules in place:
It all started with the design of the mini-robot’s mechanical structure, which is present at the office while the remote worker isn't.
To have a robot that can easily move around the office, it must be mobile, stable, small enough to pass through doors, and big enough to not be overseen and trampled on by the team working at the office.
We went through several iterations of its structural design before settling on this one:
We chose aluminum as the main material for the components since it's light, robust, and cheap.
Once we defined the design and selected the materials, we cut the aluminum parts and put them together with small screws. Since we had to work with the tools available at the office and from the store around the corner, this was a rather improvised and humorous process. We used heavy books to shape the aluminum and sunglasses as safety glasses while drilling into the components, just to give you an idea. 🙈
The main hardware components we settled on were:
While some of us continued working on the hardware and assembling the pieces, the rest of the team started building the software that would control all the components mentioned above.
The aim of the robot's software was to enable real-time communication between the remote workers and the teams at the office. In other words, the robot needed to be able to transmit video and audio from the office to the people working remotely and vice versa.
While evaluating various approaches to solving this problem, we came across WebRTC, which promised to be the tool we were looking for:
Specifically, we used the WebRTC extension included in UV4L. This tool allowed us to create bidirectional communication with extremely low latency between the robot and the remote worker’s computer.
Running the UV4L server with the WebRTC extension enabled, we were able to serve a web app from the RaspberryPi, then simply access it from the remote worker’s browser establishing real-time bidirectional communication; amazing!
This allowed us to set up a unidirectional channel for the video from the PiCamera to the browser, a bidirectional channel for the audio, and an extra unidirectional channel to send the commands from the browser to the robot.
To be able to see the data and send the commands in a user-friendly way for the remote worker, we researched how to integrate those functionalities into an accessible and practical front-end.
Inspired by the web app example from the UV4L project, we integrated the data channels mentioned above into a basic but functional front-end, including the following components:
To handle the movement commands the robot would receive from the remote worker, we developed a controller written in Python, that runs like a system service. This service translates commands that control the robot’s motors by:
Here's a snippet of the controller classes:
As a result, we were able to control the robot, “walk” it through the office, and enable remote workers to see their teams and approach them via the robot.
However, it wasn’t enough for our enthusiastic team and we continued to pursue the ultimate goal: having an autonomous robot.
We thought, wouldn't it be awesome if the robot could recognize people and react to their gestures and actions (and in this way have a certain amount of personality)?
A recently released project called PoseNet surfaced fast. It’s presented as a "machine learning model, which allows for real-time human pose estimation in the browser". So, we dug deeper into that.
The performance of that neural net was astounding and it was really attractive as we ran it over TensorFlowJS in the browser. This way, we were able to get a higher accuracy and FPS rate than by running it from the RaspberryPi, and also less latency than if we had run it on a third server instead.
Rushed by the parameters of the hackathon, we skimmed the project’s documentation and demo web app source code. Once we identified which files we were going to need, we imported them and immediately jumped to integrating these functionalities into our web app.
We wrote a basic detectBody
function to infer the pose estimation key points that invoked thenet.estimateMultiplePoses
with these params:
Said detectBody
was invoked at a rate of 3 times per second to refresh the pose estimation key points.
Then, we adapted some util functions in order to print the detected body key points and plot its skeleton above the video, arriving at a demo like this:
This was a very quick proof of concept which added a wonderful feature and hugely expanded the potential capabilities of our robot.
If you’d like to know how this model works under the hood, you can read more here.
48 hours and an unknown amount of coffee led to the construction of a mini-robot with the ability to walk through the office, enable real-time communication between the remote worker and their office mates, and even transport an LP. 😜
We managed to build the hardware, implement the communication software, and build a PoC for an additional feature using computer vision, which facilitates the robot's interaction with people. Future enhancements could include object detection features that would allow the robot to recognize objects and interact with them without human help using Luminoth, our open source toolkit for computer vision, for example.
Though we normally prototype for longer than two days, this hackathon project reflects how we work at Tryolabs. We often build prototypes and solutions with state-of-the-art technologies to enhance operational and organizational processes.
© 2024. All rights reserved.