What I'm trying to achieve is to register when a user has clapped their hands. This works for music that is not too dynamic. I can measure a deviation in amplitude and register spikes. The issue is when the music itself has sudden spikes.
Now, of course, the mic also registers the music being played from the speakers. I've tried to simply subtract the original music track from the recorded audio by inverting the phase in the gain node, but this doesn’t work.
I've been looking into acoustic echo cancellation. I figured that the recorded audio also has a room impulse response and then there’s a difference in loudness. This is where I’m kinda getting stuck. How do I obtain the impulse response from a room? I would still need to subtract an original piece of audio from audio with the IR to get the actual IR.
I think that when I have the IR I could apply it to the music track during analysis and make the two signals more alike.
Also, are there also any other things I should think of so I can end up with two comparable signals?
Would really appreciate some help :)