VICKI PETROVA

GUaiTAR

Vision + ML for early skill acquisition (AI guitar teacher)

A virtual guitar + closed-loop coach for first riffs, chords, and timing.

Abstract

Early motor learning benefits from well-timed, augmented feedback and structured deliberate practice. Reviews in motor control and HCI show that multimodal, terminal feedback accelerates acquisition for complex skills, especially in novice phases. GUaiTAR turns that evidence into a pragmatic system: a “virtual guitar” for iOS/iPadOS/macOS that uses on-device vision and ML to map hand gestures to strings/frets and deliver just-in-time, color-coded feedback for riffs, chords, and rhythm. GUaiTAR lets you learn guitar from anywhere by playing just by using your hands and iPad's camera. This allows you to practice the hand-movements and learn quicker. A go-to first song for guitar newbies is “Smoke on the water” – GUaiTAR allows you to play it just with your iPad and hand-gestures.

Demo

What I did

  • Designed the sensing-to-feedback architecture and real-time pipeline.
  • Collected/annotated two datasets; built data tooling for frame/tab alignment.
  • Trained two CoreML models.
  • Implemented the “virtual guitar” renderer, tab visualizer, and metronome.
  • Created the “Smoke on the Water” and two more songs flow and chord game.
  • Shipped iOS/macOS builds and performance-profiled on Apple platforms.

How it works (technical)

GUaiTAR tracks both hands with Apple's Vision framework. Right-hand pinch events are mapped to virtual strings; left-hand digit positions are parsed to frets/chords. Data were collected with paired video + tab labels; I built two custom datasets and trained a CreateML chord classifier (C, D, F, G, Em) and a sequence model for string plucks.

Models are quantized for on-device inference via CoreML. The loop is video sensing → inference → feedback in a single render pass: frame → keypoints → class → UI.

I aimed for a realistic digital experience of playing the guitar even if you’re not holding a physical guitar in your arms. To achieve this I used the Vision, CoreML, CoreGraphics, and AVFoundation frameworks.

1) Dominant Hand: When playing the guitar, you have to use both hands. Your right hand plays the strings while your left hand presses on the frets. To achieve this experience, I used the Vision framework integrated with the camera and a machine learning model I created with CreateML.

A natural gesture to embody playing a string with the right hand is pinching your thumb with each finger plays a different string. The camera is used to register livestream input of the user's hand with the help of Vision and CoreGraphics to recognize the pinching.

A beginner usually starts by learning to play only open strings with their right hand and keep the rhythm. This is why I created exercises and songs for the user to try. The exercises represent a metronome accompanied by visual guitar tabs indicating the sequence of notes to play. If the user plays a note correctly, the app indicates it with green color to empower the user with real-time feedback as they're practicing.

GUaiTAR prototype

2) Nondominant Hand: Then, for the left hand movement the user uses their hand to show different digits with their fingers to represent the fret number they want to play. This resembles the coordination you have to do both mentally and physically between your mind and hands when playing the guitar in real life. To achieve the left hand movement I trained an image classification model using CreateML with my own dataset to classify hand gestures for digits 1, 2, 3, and 4, used to represent the fret numbers.

In real life, once you can play open strings well you start incorporating your left hand to press on the frets. So the app includes exercises that incorporate the use of both hands simultaneously, effectively teaching that coordination between seeing the notes on the guitar tabs, knowing what to do in your brain and then following through will your hands on time.

3) Guaitar Riffs: The app also includes two exercises where users learn to play the guitar riffs of two famous songs. This is just like paying the guitar in real life as you also need to learn to maintain the melody of a song while reading the guitar tabs.

GUaiTAR prototype

4) Chords: Finally, at some point you also have to learn guitar chords. There are so many that I always have a hard time remembering them. So I trained another ML model for image classification using CreateML which recognizes 5 different chords - C, D, F, G, Em. The app includes a game where users are given a random chord name and they have 10 seconds to place their hand in the correct position to gesture the chord. If they do it on time, they win.

GUaiTAR prototype

Ultimately, GUaiTAR is a realistic virtual guitar. It can replace your guitar or be a companion to practice with on the go.

Why it matters

  • Shortens the “first-sound to first-riff” gap with immediate, actionable feedback.
  • Builds accurate hand maps early, reducing bad-habit formation and speeding error recovery.
  • Provides a portable, realistic practice surface, no instrument required, expanding access and daily reps.
  • Establishes patterns for AI learning companions that scaffold complex bimanual skills.

Research alignment

  • Fluid Interfaces (FI): scaffolds skill with progressive tabs/tempi; closed-loop practice; just-in-time correctness cues.
  • Multisensory Intelligence (MSI): vision-based hand/fret perception; audio/visual feedback; gesture semantics for strings/frets.
  • Personal Robots (PR): enables a socially assistive guitar coach, perceiving hand/fret intent in real time; can extent to model learner progress, deliver adaptive, affective feedback.

Challenges

  • False positives or negatives at fast tempi: hand traction is not perfect and needs tuning
  • Chord shape ambiguity (C vs G): enriched training set with edge poses

Summary

GUaiTAR is a “virtual guitar” for iOS/macOS that teaches first riffs and chords using vision + CoreML. Two custom models map right-hand pinches to strings and left-hand shapes to frets/chords, driving color-coded, just-in-time feedback. With metronome and tab visualizer, pilots show faster first-riff times and better error recovery. Result: Apple Swift Student Challenge – Distinguished Winner (1 of 50).