It sounds like you are hinting at setting up regression tests of application generated DOMs. The differences are screened by humans. The keys would be (1) creating/choosing test scripts that generate plenty of coverage maximizing true positives and minimizing false positives. (2) Possibly masking out parts of the DOM that are likely to change most of the time. (3) Make it it easy and fast for humans to review the possible regressions and of course report the true regressions.
Hello. That quality bot you mention sounds really cool. Do you know if this would ever be made available? Sounds like it would make a good companion to selenium/webdriver
Chris, Selenium (and Webdriver) are used very heavily at Google, and we have a centralized farm of Selenium machines to execute these tests around the clock.
BlackTigerX, rendering does take time and every millisecond is interesting :) Also, for some of the larger, script-driven and AJAXy sites, they need the full DOM loaded to complete rendering.
Kazumatan, you are right. We are also working to make it easy for the human raters to label the non-interesting but changing portions of the DOM that is changing (think Google Feedback style region selection), for later filtering.
Dojann, Open Sourcing is definitely on the map, only the timing is a question. We've designed most of it so that it could run outside of Google's infrastructure for just this reason :) We are also looking at hosting options that let folks easily run on hosted machines they own, with VPN access to their staging environments. I can't speak to the timing, but it is partly dependent on the level of interest from the community in these options.
Ben, yup, we are hoping to share the service and code 'soon'. The more interest we see, the faster this will happen.
1. Comparing page rendering with instant pages turned off and on seems cool.. But, there should have been some tool/ automation that verified the rendering of pages before even the instant pages feature was introduced.. How was that being done and why wasn't that used here?
2. How did the pixel/DOM comparison solve the problem of dynamically generating ads? Did you just verify that the place holders/DOM elements and not the content?
Great questions Raghav...Chrome and many internal team at google uses variety of tools including "quality bots" to automatically verify rendering and catch layout issues. Reason we had to use quality bots here is because it does work at scale automagically vs most traditional automation tools requires custom test for each page which is hard to make it scale. Also, we need to keep in mind that page is hidden until made visible and only way to know page was prerendered is to have injected JS while page was in pre-rendered state.
Re-Dynamic Ads: You are right. In general Bots verify information about the elements, but not the content. On top of that we have ad detection mechanism in place to detect and ignore ads while comparing.
We use pixel comparison (against a defined baseline) at our company in our automated regression testing. We have found that dynamically generated ads have caused a lot of problems. To start with we just set the success threshold lower, but this was not satisfactory. So now we only do the pixel comparison for pages with web ads.
I'm thinking about solving this problem by asserting that partial images can be found within the page being tested.
Photographer-Anairda. Likely no tutorials soon, we hope to do one better and just open source it, document it, and let folks re-host their own instances if they like. I'm happy to chat about the details. The crawler is relatively easy--think lots of elementFromPoint() calls, the problems are in scale and reporting and rendering the data.
Hi Tiago. Sikuli is very cool. Some folks have used it at Google, some have built similar approaches, and there are even some commercial products that work this way. We fundamentally have focused on the DOM diff, instead of the pixel diff for three reasons. 1. When you detect and file a 'bug' based on a screenshot, it is a significant amount of work to repro-debug the underlying issue that caused the pixels to be off, so why not just get that data while you are on the site? 2. If you know the structure of the web page, you dont need to use fancy and probabilistic approaches to identify elements that have either scaled, translated, or failed to appear--you know exactly which DOM elements have have failed. 3. We are building a corpus of which elements tend to cause differences, so we can hopefully correlate failures across many sites/runs, to determine if there are underlying issues in the browsers, tooling, or DOM usage--thats the ultimate goal. Great question--this is fundamental to the what and why of Bots.
Hi Cithan. False positives and noise from ads was the reason a lot of people avoided this area, thinking it couldn't be useful data :) We use an 'ignore' filter for data from common ad-like sites during our crawl. We also are working on a way for our first-line crowd sourced evaluators to mark page areas as dont care, on a per-site basis to add the filter set. Most significantly though, we also have the notion of a 'baseline' for a site. If the site permutes all the time, but within a range, you can choose to only flag sites as they go outside of that normal range. Many top portal site data looks like this...the urls and divs shift around a bit day to day, but they amount of entropy day over day stays within a normal range/band.
How extensively does Google use Selenium for test automation and in what ways? Thanks
ReplyDeleteChris
Why couldn't you just preload the HTML and all associated files? Does the rendering take that much?
ReplyDeleteIt sounds like you are hinting at setting up regression tests of application generated DOMs. The differences are screened by humans. The keys would be (1) creating/choosing test scripts that generate plenty of coverage maximizing true positives and minimizing false positives. (2) Possibly masking out parts of the DOM that are likely to change most of the time. (3) Make it it easy and fast for humans to review the possible regressions and of course report the true regressions.
ReplyDeleteAre there any plans to OS this utility as I would be interested to see how I could use something like this in my work.
ReplyDeleteHello.
ReplyDeleteThat quality bot you mention sounds really cool. Do you know if this would ever be made available? Sounds like it would make a good companion to selenium/webdriver
Python + Selenum (ScreenShots!) + ImageMagic.
Delete1) Get the page, save it to a db/archive.
2) Compare against last/base.
Chris, Selenium (and Webdriver) are used very heavily at Google, and we have a centralized farm of Selenium machines to execute these tests around the clock.
ReplyDeleteBlackTigerX, rendering does take time and every millisecond is interesting :) Also, for some of the larger, script-driven and AJAXy sites, they need the full DOM loaded to complete rendering.
Kazumatan, you are right. We are also working to make it easy for the human raters to label the non-interesting but changing portions of the DOM that is changing (think Google Feedback style region selection), for later filtering.
Dojann, Open Sourcing is definitely on the map, only the timing is a question. We've designed most of it so that it could run outside of Google's infrastructure for just this reason :) We are also looking at hosting options that let folks easily run on hosted machines they own, with VPN access to their staging environments. I can't speak to the timing, but it is partly dependent on the level of interest from the community in these options.
Ben, yup, we are hoping to share the service and code 'soon'. The more interest we see, the faster this will happen.
cheers!
Expect major updates and perhaps even OS at GTAC in October.
ReplyDeleteI have two questions
ReplyDelete1. Comparing page rendering with instant pages turned off and on seems cool.. But, there should have been some tool/ automation that verified the rendering of pages before even the instant pages feature was introduced.. How was that being done and why wasn't that used here?
2. How did the pixel/DOM comparison solve the problem of dynamically generating ads? Did you just verify that the place holders/DOM elements and not the content?
Great questions Raghav...Chrome and many internal team at google uses variety of tools including "quality bots" to automatically verify rendering and catch layout issues. Reason we had to use quality bots here is because it does work at scale automagically vs most traditional automation tools requires custom test for each page which is hard to make it scale. Also, we need to keep in mind that page is hidden until made visible and only way to know page was prerendered is to have injected JS while page was in pre-rendered state.
ReplyDeleteRe-Dynamic Ads: You are right. In general Bots verify information about the elements, but not the content. On top of that we have ad detection mechanism in place to detect and ignore ads while comparing.
Will there be any tutorials on building these kinds of bots?
ReplyDeleteCould you used Sikuli, to compare the images ?
ReplyDeleteRegards,
Thiago Peçanha
We use pixel comparison (against a defined baseline) at our company in our automated regression testing. We have found that dynamically generated ads have caused a lot of problems. To start with we just set the success threshold lower, but this was not satisfactory. So now we only do the pixel comparison for pages with web ads.
ReplyDeleteI'm thinking about solving this problem by asserting that partial images can be found within the page being tested.
Photographer-Anairda. Likely no tutorials soon, we hope to do one better and just open source it, document it, and let folks re-host their own instances if they like. I'm happy to chat about the details. The crawler is relatively easy--think lots of elementFromPoint() calls, the problems are in scale and reporting and rendering the data.
ReplyDeleteHi Tiago. Sikuli is very cool. Some folks have used it at Google, some have built similar approaches, and there are even some commercial products that work this way. We fundamentally have focused on the DOM diff, instead of the pixel diff for three reasons. 1. When you detect and file a 'bug' based on a screenshot, it is a significant amount of work to repro-debug the underlying issue that caused the pixels to be off, so why not just get that data while you are on the site? 2. If you know the structure of the web page, you dont need to use fancy and probabilistic approaches to identify elements that have either scaled, translated, or failed to appear--you know exactly which DOM elements have have failed. 3. We are building a corpus of which elements tend to cause differences, so we can hopefully correlate failures across many sites/runs, to determine if there are underlying issues in the browsers, tooling, or DOM usage--thats the ultimate goal. Great question--this is fundamental to the what and why of Bots.
Hi Cithan. False positives and noise from ads was the reason a lot of people avoided this area, thinking it couldn't be useful data :) We use an 'ignore' filter for data from common ad-like sites during our crawl. We also are working on a way for our first-line crowd sourced evaluators to mark page areas as dont care, on a per-site basis to add the filter set. Most significantly though, we also have the notion of a 'baseline' for a site. If the site permutes all the time, but within a range, you can choose to only flag sites as they go outside of that normal range. Many top portal site data looks like this...the urls and divs shift around a bit day to day, but they amount of entropy day over day stays within a normal range/band.