Nice post. I think this may have use for SDET/QA beyond streaming audio/media space. The technique may prove useful in telecom hardware/software/services testing as well.
Just some queries on the big picture here:- this looks like a one off project and the challenge here is considered quite unique and not the usual day to day question that we encounter during testing. It interests me when the team manage to find out so many metrics & formula to measure the quality of the application. Does it mean that the test team has also spent effort to understand the formula, code it into test for calculation, etc? Or the test team actually engage some SME (subject matter expert) to advise? Would hope to know how Google test team approach this situation? Besides, mind sharing how long did you guys use to complete this project?
Great post! I managed to run the test, however, I am getting scores less or around 2. And yes, I used 'rebasing' technique with the right sampling rate, etc. I would like to try with your file and see if it makes it any better. Could you please share the wav file at pyauto_private/webrtc/human-voice-linux.wav since it seems we (public) doesnt seem to have access to that (or I couldnt figure out how). One thing to note is that I am trying this on a virtual machine (ubuntu flavor).
I'm a newbie to webrtc. Kindly, someone explain me in detail, how I can setup my machine to run the test? - I was able to compile the chrome nightly build using ninja. What should be the next step to run the above mentioned test (i.e., which file or command I should execute to perform the above mentioned tasks).
Ok, here's how you do it. First, get the code (http://dev.chromium.org/developers/how-tos/get-the-code) or update your existing checkout so you get the latest code (I landed some patches very recently). Instead of building chrome, build browser_tests. Configure your machine like instructed in https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/media/chrome_webrtc_audio_quality_browsertest.cc&q=chrome_webrtc_a&sq=package:chromium&l=49 (if you run Mac you're out of luck; the test isn't implemented there).
Then add this to the solutions list your .gclient file (it's in the folder above your chrome src/ folder:
That will download the resources you need. It will probably fail on downloading from the webrtc-chrome-resources bucket, so you need to comment that part out in the hooks in webrtc.DEPS/DEPS. Then you need to get a hold of PESQ (and if you're on windows: sox.exe), build those yourself and put them in src/chrome/test/data/webrtc/resources/tools. We can't redistribute those binaries but they're readily available on the web.
Then run out/Debug/browser_tests --gtest_filter="WebRtcAudio*" --run-manual. If it works you'll get a PESQ score printed out.
Unknown: Sure, that file is private only for historical reasons. I'm going to try to pull it out to the public world when I get time. For now I'll attach it in comment #3 here: https://crbug.com/279195.
The best way to figure out why the score is low is to listen to the recorded file (just uncomment the DeleteFile on trimmed_recording). Load it up in audacity and compare to the source file. Often you will find that the volume level of the recording is too low because the system's input or playback volume levels are wrong. If they're too low you'll get a bad recording and if they're too high you'll get distortions. Read the comments on the test on how to set up your machine carefully and look at the volume levels in pavucontrol. I think they're all 100% on our machines, but this may or may not be appropriate for your sound hardware.
Thanks for the answer Patrik - I think volume levels are okay. looking at the waveforms in audacity, recorded waveform is always expanded somewhat. To give an example, I shared a photo below where the original signal is 5.181 seconds long and recorded signal is 5.581 seconds long.
From what I know, PESQ can deal with the silence in between talkspurts well if they are different between original and recorded files, i.e. one may want to reduce the silence period and play out quicker at the receiving side without much quality compromise. However, if my memory serves me right, PESQ cannot deal with the expanded wave forms, which could be the cause for low PESQ results in my environment.
But this doesnt explain why your tests are at around 4. I wonder if virtual machine environment has a contributing factor here. Do you run your tests on physical machines? Thanks again.
BloodyArmy: We do have very skilled audio engineers on the WebRTC team, which is how I learned about PESQ, sampling rates and so on. I wouldn't have been able to pull this test off without help from those experts for sure.
I would say this test took about three manweeks to implement. It was very hard with all the OS-specific quirks, and it took a lot of testing and tweaking to get it to run well.
Akmal Nishanov: Yes! I did a tech talk on GTAC 2013. Find the video and slides here: http://www.youtube.com/watch?v=IbLNm3LsMaw&list=SPSIUOFhnxEiCODb8XQB-RUQ0RGNZ2yW7d
There's no article like this one though; I might write one in the future.
Hope this doesn't double post, tried posting and browser crashed... anyway, I found this article really appropriate to the webRTC work I'm testing. Very helpful. I saw in your future considerations you're looking at latency models. I did some work on that. Before webrtc, I was doing web automation, and I built a latency generator using a linux VM and Netem. I put squid on there to open a port and then route the browser calls through the proxy. Before the tests starts I would make a call that remotely sets the bandwidth/latency/packet loss profile on the VM. Then run the test through the VM. It worked very well for us. If you are interested I have a write up on it at my blog: http://sdet.us/simulating-real-world-latency-during-automation/
Thanks for sharing your honest experience. When I first took a look at my headshots, I wasn’t too thrilled with mine but you’ve given me a new perspective!
Nice post. I think this may have use for SDET/QA beyond streaming audio/media space. The technique may prove useful in telecom hardware/software/services testing as well.
ReplyDeleteGreat Post! Thanks for sharing. :)
ReplyDeleteJust some queries on the big picture here:- this looks like a one off project and the challenge here is considered quite unique and not the usual day to day question that we encounter during testing. It interests me when the team manage to find out so many metrics & formula to measure the quality of the application. Does it mean that the test team has also spent effort to understand the formula, code it into test for calculation, etc? Or the test team actually engage some SME (subject matter expert) to advise? Would hope to know how Google test team approach this situation? Besides, mind sharing how long did you guys use to complete this project?
Great post, thanks!
ReplyDeleteIs there any articles/posts analyzing WebRTC video quality? Many thanks.
Great post! I managed to run the test, however, I am getting scores less or around 2. And yes, I used 'rebasing' technique with the right sampling rate, etc. I would like to try with your file and see if it makes it any better. Could you please share the wav file at pyauto_private/webrtc/human-voice-linux.wav since it seems we (public) doesnt seem to have access to that (or I couldnt figure out how). One thing to note is that I am trying this on a virtual machine (ubuntu flavor).
ReplyDeleteThanks!
I'm a newbie to webrtc.
DeleteKindly, someone explain me in detail, how I can setup my machine to run the test? - I was able to compile the chrome nightly build using ninja. What should be the next step to run the above mentioned test (i.e., which file or command I should execute to perform the above mentioned tasks).
Ok, here's how you do it. First, get the code (http://dev.chromium.org/developers/how-tos/get-the-code) or update your existing checkout so you get the latest code (I landed some patches very recently). Instead of building chrome, build browser_tests. Configure your machine like instructed in https://code.google.com/p/chromium/codesearch#chromium/src/chrome/browser/media/chrome_webrtc_audio_quality_browsertest.cc&q=chrome_webrtc_a&sq=package:chromium&l=49 (if you run Mac you're out of luck; the test isn't implemented there).
DeleteThen add this to the solutions list your .gclient file (it's in the folder above your chrome src/ folder:
{
"name" : "webrtc.DEPS",
"url" : "svn://svn.chromium.org/chrome/trunk/deps/third_party/webrtc/webrtc.DEPS",
"managed" : True,
},
That will download the resources you need. It will probably fail on downloading from the webrtc-chrome-resources bucket, so you need to comment that part out in the hooks in webrtc.DEPS/DEPS. Then you need to get a hold of PESQ (and if you're on windows: sox.exe), build those yourself and put them in src/chrome/test/data/webrtc/resources/tools. We can't redistribute those binaries but they're readily available on the web.
Then run out/Debug/browser_tests --gtest_filter="WebRtcAudio*" --run-manual. If it works you'll get a PESQ score printed out.
Good luck! :)
Unknown: Sure, that file is private only for historical reasons. I'm going to try to pull it out to the public world when I get time. For now I'll attach it in comment #3 here: https://crbug.com/279195.
ReplyDeleteThe best way to figure out why the score is low is to listen to the recorded file (just uncomment the DeleteFile on trimmed_recording). Load it up in audacity and compare to the source file. Often you will find that the volume level of the recording is too low because the system's input or playback volume levels are wrong. If they're too low you'll get a bad recording and if they're too high you'll get distortions. Read the comments on the test on how to set up your machine carefully and look at the volume levels in pavucontrol. I think they're all 100% on our machines, but this may or may not be appropriate for your sound hardware.
Thanks for the answer Patrik - I think volume levels are okay. looking at the waveforms in audacity, recorded waveform is always expanded somewhat. To give an example, I shared a photo below where the original signal is 5.181 seconds long and recorded signal is 5.581 seconds long.
Deletehttps://plus.google.com/105460497673148445079/posts?banner=pwa
From what I know, PESQ can deal with the silence in between talkspurts well if they are different between original and recorded files, i.e. one may want to reduce the silence period and play out quicker at the receiving side without much quality compromise. However, if my memory serves me right, PESQ cannot deal with the expanded wave forms, which could be the cause for low PESQ results in my environment.
But this doesnt explain why your tests are at around 4. I wonder if virtual machine environment has a contributing factor here. Do you run your tests on physical machines? Thanks again.
BloodyArmy: We do have very skilled audio engineers on the WebRTC team, which is how I learned about PESQ, sampling rates and so on. I wouldn't have been able to pull this test off without help from those experts for sure.
ReplyDeleteI would say this test took about three manweeks to implement. It was very hard with all the OS-specific quirks, and it took a lot of testing and tweaking to get it to run well.
Akmal Nishanov: Yes! I did a tech talk on GTAC 2013. Find the video and slides here: http://www.youtube.com/watch?v=IbLNm3LsMaw&list=SPSIUOFhnxEiCODb8XQB-RUQ0RGNZ2yW7d
ReplyDeleteThere's no article like this one though; I might write one in the future.
Hope this doesn't double post, tried posting and browser crashed... anyway, I found this article really appropriate to the webRTC work I'm testing. Very helpful. I saw in your future considerations you're looking at latency models. I did some work on that. Before webrtc, I was doing web automation, and I built a latency generator using a linux VM and Netem. I put squid on there to open a port and then route the browser calls through the proxy. Before the tests starts I would make a call that remotely sets the bandwidth/latency/packet loss profile on the VM. Then run the test through the VM. It worked very well for us. If you are interested I have a write up on it at my blog: http://sdet.us/simulating-real-world-latency-during-automation/
ReplyDeleteHi, Any progress for test under simulated network conditions which u listed in future work?
ReplyDeleteWhere is the source code for chrome_webrtc_audio_quality_browsertest.cc? Has it moved, or been deleted from the chromium project?
ReplyDeleteThe file name changed
Deletehttps://cs.chromium.org/chromium/src/chrome/browser/media/webrtc/webrtc_audio_quality_browsertest.cc?sq=package:chromium
Thanks for sharing your honest experience. When I first took a look at my headshots,
ReplyDeleteI wasn’t too thrilled with mine but you’ve given me a new perspective!
What about some kind of fake device driver which makes a microphone-like device appear on the device level? Thanks for the post
ReplyDelete