Controlling a Quadcopter with Hand Gestures

Disclaimer: I work for Microsoft as a software engineer on the team responsible for HoloLens and Mixed Reality devices. This is entirely my own project, completed in my spare time, and is not sponsored by Microsoft. Furthermore any opinions about products, technologies, coding advice, and API usage patterns are entirely my own and do not reflect those of Microsoft.

Introduction and Concept

Typical uses of augmented reality devices involve inserting and manipulating digital content in the physical world. I wanted to go one step further and use an augmented reality device to manipulate a physical object. A quadcopter seemed like a natural choice, given my interest in the FPV hobby. This project uses two devices: a Microsoft HoloLens, and a Crazyflie 2.0 Nano Quadcopter.

Microsoft’s website describes the HoloLens as “the first self-contained, holographic computer, enabling you to engage with your digital content and interact with holograms in the world around you.” It uses hand gestures as a primary input for interaction with digital content. This project uses the same gestures to interact with a physical device using the on-board Bluetooth radio. You can read more about the HoloLens at https://www.microsoft.com/en-us/hololens.

The Crazyflie 2.0 is an open sourced nano-quadcopter produced by Bitcraze AB out of Sweden. An onboard Nordic NRF51 SoC enables communication via Bluetooth. For more details, check out the links above, or read my full product review.

Here’s a video demonstration of the app on YouTube:

Implementation Details

Crazyflie

The Crazyflie has an open source custom flight control software developed and maintained by Bitcraze. It uses an over-the-air protocol called CRTP for wireless communication via Bluetooth LE. )I’ve also created and maintain a Crazyflie target in Betaflight using the same CRTP protocol. Both firmware solutions are compatible with this project. There is extensive documentation about the Crazyflie on the Bitcraze Wiki if you’d like to learn more.

HoloLens Application

The code on HoloLens responsible for consuming hand gestures, translating them to setpoints, and transmitting them to the crazyflie lives inside a Windows UWP Application. The UWP Platform is an app platform with APIs that lets an application run on all Windows 10 devices. A couple years ago I wrote a very basic UWP (known as UAP at that time) in C#. The source code for that app is hosted on Bitcraze’s GitHub. Since HoloLens supports UWP, I used this app as a starting point and added a new class implementing IFlightController for parsing and converting hand gestures to setpoints. You can find the source code here: GitHub: crazyflie2-windows-uap-client (Branch “HoloLens”). The majority of the HoloLens specific code is in the GestureController class.

Hand Gestures

Parsing hand gestures on the HoloLens turned out to be pretty straightforward. There are a set of APIs (Windows.UI.Input.Spatial) for handling various forms of input. The API defines several specific gestures to handle at a fairly high level. These include tap, tap-and-hold, navigation (tap and drag, used for scrolling) and manipulation (tap and drag, used for positioning digital content in the world).

The app needs two objects/classes to handle gestures. A SpatialInteractionManager is required to register the application for handling all interactions, and a SpatialGestureRecognizer is used to register the application for start/update/complete/cancel events for specific gestures.

        public GestureController()
        {
            spatialLocator = SpatialLocator.GetDefault();

            gestureRecognizer = new SpatialGestureRecognizer(
                SpatialGestureSettings.Tap |
                SpatialGestureSettings.ManipulationTranslate);

            gestureRecognizer.ManipulationCanceled += OnManipulationCanceled;
            gestureRecognizer.ManipulationCompleted += OnManipulationCompleted;
            gestureRecognizer.ManipulationStarted += OnManipulationStarted;
            gestureRecognizer.ManipulationUpdated += OnManipulationUpdated;
            gestureRecognizer.Tapped += OnTapped;

            interactionManager = SpatialInteractionManager.GetForCurrentView();
            interactionManager.InteractionDetected += OnInteractionDetected;
        }

The handler for SpatialInteractionManager’s InteractionDetected event must route interactions to the gesture recognizer:

        private void OnInteractionDetected(
            object sender,
            SpatialInteractionDetectedEventArgs e)
        {
            gestureRecognizer.CaptureInteraction(e.Interaction);
        }

Manipulation Gesture

The goal is to use three degrees of freedom of a gesture to control three flight setpoints. Specifically, gestures along the Y axis (vertical) control thrust. The X axis (side to side) maps to roll angle. The Z axis (forward and backward) sets the pitch angle. The “Manipulation Gesture” fits this goal well. On each gesture update event, the API provides an offset of the hand from the starting point of the gesture. This offset and position are relative to a specified global coordinate system. The offset units are absolute distance (meters, with approximately cm resolution).

The global coordinate system is important: This is how the app accounts for head/device movements during an active gesture. I used a SpatialStationaryFrameOfReference out of the Windows.Perception.Spatial namespace. This class provides a coordinate system that’s at a fixed position in space where the HoloLens takes care of accounting for changes in the device’s position. Without a stationary frame of reference, the system cannot tell the difference between a head movement in one direction and a hand movement in the other.

The app calls CreateStationaryFrameOfReferenceAtCurrentLocation() at the start of a gesture to create a SpatialStationaryFrameOfReference object corresponding to the device’s current location.

        private void OnManipulationStarted(
            object sender,
            SpatialManipulationStartedEventArgs e)
        {
            // Manipulation has started - obtain the frame of reference relative
            // to when the gesture began
            stationaryFrameOfReference = 
                spatialLocator.CreateStationaryFrameOfReferenceAtCurrentLocation();
        }

On each manipulation update, the SpatialStationaryFrameOfReference object is passed in to TryGetCumulativeDelta to obtain an offset (in meters) relative to the point in space where the gesture began.

        private void OnManipulationUpdated(
            object sender,
            SpatialManipulationUpdatedEventArgs e)
        {
            // Get the manipulation delta relative to the frame of reference from
            // when the manipulation began
            //
            // Using a stationary frame of reference prevents movements of the device 
            // from affecting the gesture offset
            SpatialManipulationDelta manipulationDelta = 
                e.TryGetCumulativeDelta(stationaryFrameOfReference.CoordinateSystem);

            // Store the offset
            lastGestureOffset = manipulationDelta.Translation;
        }

Setpoint Calculation

Offsets are given in absolute distance (meters). The app normalizes them to a physical range (+/- 25 cm feels pretty good to me). This is only done as needed, when the control layer requests an updated setpoint.

//
// Summary:
//      Scaling factor for gesture ranges (in meters)
//      Gesture offsets are divided by this scalar to map a (-1,1) range to
//      (-gestureRangeScale, gestureRangeScale)
private const float gestureRangeScale = 0.25f;

public FlightControlAxes GetFlightControlAxes()
{
    FlightControlAxes axes;
    // Populate axes - normalize and clamp to (-1,1) for RPY and (0,1) for T
    axes.roll = Clamp(lastGestureOffset.X / gestureRangeScale, -1, 1);
    axes.pitch = Clamp(lastGestureOffset.Z / gestureRangeScale, -1, 1);
    axes.yaw = 0; // No yaw support currently
    axes.thrust = Clamp(lastGestureOffset.Y / gestureRangeScale, 0, 1);
    axes.isSelfLevelEnabled = isSelfLevelEnabled;
    axes.isArmed = isArmed;

    return axes;
}

Tap Gesture

The app also makes use of the “Tap” gesture as a way to toggle the ‘armed’ state. Arming is a safety feature to prevent the motors from spinning unintentionally (Betaflight only for now). The manipulation completed/canceled handlers clear this variable (and the setpoints) to stop the motors any time the gesture ends or the hand travels outside the detectable range.

        private void OnTapped(
            object sender,
            SpatialTappedEventArgs e)
        {
            isArmed = !isArmed;
        }        

Usage

The app is still under development — use with caution! If anything goes wrong hit the “disconnect” button on the app to kill the Bluetooth link. The link will eventually (1-2 seconds) time out stop the motors.

  1. Update the Crazyflie Firmware to the latest release or newer (both the STM32 and NRF51 — updated NRF51 firmware is required even if using Betaflight as the main flight controller).
  2. Build and deploy the UWP to the HoloLens (from GitHub)
  3. Turn on the Crazyflie and set it down on a level surface
  4. On the HoloLens, go to Settings->Bluetooth and pair with the Crazyflie
  5. Launch the app and place it somewhere in your space
  6. Click the “connect” button to start the Bluetooth communication link
  7. If using Betaflight, perform a tap gesture to arm
  8. (Recommended) stand behind the Crazyflie for proper orientation
  9. Begin flying by tapping and holding. Hand movements up and down control thrust. Side to side controls roll. Forward and back controls pitch.

Conclusion & Future Work

Flying with hand gestures is pretty challenging (though not as challenging as I originally expected). I think the biggest hurdle is the latency on the Bluetooth link: it’s very noticeable in the video at the beginning of this post. Latency makes it easy to overcorrect and get out of control. It’s also hard to hold the hand steady in three dimensions, and having no control over yaw certainly doesn’t help.

Flying manually is just the proof of concept, though. I’m more excited about the potential of coupling hand gestures with extra tracking and stabilization. The last part of the demo video shows flight with the flow deck. The Flow Deck is an add-on expansion board for the Crazyflie 2.0 that features a laser range sensor which stabilizes the height and an optical flow sensor which stabilizes lateral movement parallel to the floor. With this deck, the Crazyflie can hold its position in the air pretty reliably. This lets the user move around and make smaller adjustments to position, and is overall a much more compelling experience.

The next step (work in progress) is to add support for Bitcraze’s Loco Positioning System. The Loco Positioning System uses a series of anchors which transmit high frequency radio signals and measure time of transmission. You can think of it as an indoor GPS system for tracking the position of the Crazyflie. Synchronizing the HoloLens’ coordinate system with the Loco Positioning system presents many scenario ideas, including:

  • A “follow me” mode where the copter follows behind the user wearing the HoloLens
  • Setting of waypoints using tap gestures, and rendering them with holograms like in a video game
  • Making use of HoloLens’ powerful surface reconstruction engines to incorporate obstacle avoidance and environment awareness to a quadcopter, without needing to put a full sensor payload on board

Thanks for reading! I hope this was useful and/or interesting. Be sure to follow @thejumperwire on Instagram, Twitter, and Facebook to stay up to date with my progress on this effort! I’d also love to hear any comments, thoughts, questions, or ideas you may have related to this concept! Go ahead and leave a comment below.

4 thoughts on “Controlling a Quadcopter with Hand Gestures

    • I just read through the technical document on github — very cool! The classifier work is impressive. I haven’t seen that specific project but I was aware of some work going on to use leap motion with the crazyflie on the bitcraze wiki (https://wiki.bitcraze.io/misc:hacks:leapmotion?s%5B%5D=leap&s%5B%5D=motion) but seems to be unrelated to your project.

      I’d be interested to hear some more about the stabilization challenges you highlight towards the end of the paper – the “flow deck” does a fantastic job at fusing with the IMU for position hold. What’s the downward facing camera solution you’re using and trying to calibrate?

      • The flow deck does to a great job indoors, we are looking at having this unit be flyable outdoors as well- we found the flow deck doesn’t work as well outdoors. The setup we are trying is a downward facing fpv camera mounted like a deck on the bottom of the crazyflie. By running semi-visual odometry on the base-station using the camera feed and the imu data from the crazyflie, we hope to replicate something similar to what the flow deck provides, but more robustly and generally. Since cameras of this size are generally analog and rolling shutter, calibration has been a challenge.

  • Great web site. Plenty of useful information here. I am sending it to several pals ans additionally sharing in delicious.
    And naturally, thank you on your sweat!

Leave a Reply

Your email address will not be published. Required fields are marked *