Investigating Google's Ad Business
Here's a methodology for identifying piracy sites making money with Google ads
Welcome back to Digital Investigations, the newsletter about digging into digital content and systems!
In my previous newsletter, I outlined 5 free tools you can use to investigate digital ads. Then I kind of disappeared for a year...
My excuse is I spent a good part of that time investigating the biggest digital ad business in the world: Google. My colleagues and I tried to crack open the black box that is Google’s ad business. Here’s what we found:
Porn, Piracy, Fraud: What Lurks Inside Google’s Black Box Ad Empire
How Google’s Ad Business Funds Disinformation Around the World
Methodology article: How We Determined Which Disinformation Publishers Profit From Google’s Ad Systems
Google Says It Bans Gun Ads. It Actually Makes Money From Them
Google Allowed a Sanctioned Russian Ad Company to Harvest User Data for Months
In this newsletter I’ll explain how I used one of the tools mentioned in my previous newsletter (plus a couple of others) to uncover a large piracy scheme making money with Google ads. I’ll outline the methodology I used to find hundreds of apparent piracy sites working with a single company, and how I used Google’s own data to show that these sites engage in mass copyright infringement.
Hopefully this reinforces why digital ads are worth investigating and provides a concrete example of how to do it. In case it’s not already clear, I’m on a mission to get journalists and researchers to dig into the murky, fraud filled business of digital ads. Contact me if I can help!
PapayAds and Manga Pirates
One of our reporting goals was to identify networks of websites earning money from Google ads that are breaking the company’s rules.
We scanned more than 7 million websites looking for Google ad activity. This built a dataset of sites to investigate. Then we needed to identify possible violations among them. I knew manga piracy was a problem within Google’s ad network thanks to the previous excellent work of DeepSee.io. So I searched for manga sites among our set of sites monetizing with Google. This was easy because many manga sites have the word “manga” in their domain name. When I found a site, I needed to do five things:
Identify if the site has pirated material.
Confirm the presence of Google ads.
Quantify the site’s traffic.
See which partners, if any, the site works with to get Google ads.
See if those partners and/or Google ad IDs are connected to a larger network of piracy sites.
Let’s break down each each step, look at the tools I used, and the resulting findings.
1. Identify if the site has pirated material.
One thing I learned about manga is that the industry, which does billions of dollars in annual sales, has been slow to embrace digital. It’s still focused on selling printed books. If you find a site filled with scanned pages from manga books, there’s a good chance it’s not licensed material. The challenge was to find a reliable data source that could indicate whether the site is engaged in piracy.
Google’s online Transparency Report has a section called “Content delistings due to copyright.” It discloses data about URLs the company removed from Google search due to copyright violations.
Manga publishers hire copyright enforcement companies to scan the web for their copyrighted material. The companies file copyright notices asking Google to delist infringing URLs from its search engine. Google’s Transparency Report reveals the number of URLs it has removed per domain.
If Google has removed thousands of URLs from a site due to copyright infringement, that’s a good signal it’s a piracy site. And even more important, it’s a strong indicator that Google’s ad network should not be placing ads on it.
Let’s use mangaclash.com as an example. Here’s where you can view Google’s copyright delisting info for the site.
The page shows that since 2020 Google received takedown notices for more than 100,000 URLs on the site, and removed over 90% of them (including duplicates) for copyright infringement. It’s a strong signal Google considers the site to be a mass infringer of copyright. Now we have reliable, repeatable approach to identify piracy sites. (Mangaclash.com did not respond to a request for comment.)
2. Confirm the presence of Google ads.
We built a tool to assist with this process, and you can read more about it in our methodology story. But it’s possible to manually identify Google ads without building a scraper.
Turn off you ad blocker, load a site and look at the ads on the page. For example, here’s a Nike ad I was served on a manga site. There is a blue triangle next to a blue “x” in the upper right hand corner. Click on it.
If it’s a Google ad, clicking the triangle will show you something like this:
And/or take you to a page like this:
Now you know the ad was placed by Google. If you visit mangaclash.com now, you’re unlikely to see Google ads; it appears they were removed as a result of our story.
3. Quantify the site’s traffic.
So far we know the site contains pirated content and is making money from Google ads. But how popular is it? This is important because more traffic = more potential ad revenue.
SimilarWeb is my preferred tool for checking a site’s traffic. You get a snapshot of three months of data for free. That’s enough to get a sense of how large a site is, where its audience is based, and how the audience finds its way to the site (via search, social media, etc.).
When I checked SimilarWeb at the end of last year, it showed mangaclash.com had 7.3 million visitors in September. If you look now, it shows the site racked up over 9 million visits in November. Also note that SimilarWeb says people view an average of 9 pages per visit, and spend just over seven and a half minutes per session. That’s incredible engagement!
Thanks to SimilarWeb, we know this site is popular and could be showing lots of ads and earning revenue from them.
4. See which partners, if any, the site works with to get Google ads.
Here’s where Well-Known.dev comes in. This is one of the free tools I described in my previous newsletter. If you want to understand more about the data Well-Known.dev collects and displays, watch this video of a workshop I gave at the International Journalism Festival. For now, suffice it to say the site aggregates data from websites and digital ad systems and gives you an easy way to see their relationships and partners. Go sign up for a free account! It shows more data if you are logged in.
A search for mangaclash.com and Well-Known.dev returns a page listing a huge number of ad systems and partners the site appears to be working with. Our reporting was focused on Google, so we zeroed in on the Google info. Here’s what I saw when I looked at the Google section in late November (it’s different now):
Each of the rows lists a different Google ad account and, if you’re lucky, the person or company associated with it, and their domain. These are the people or companies who have accounts in Google’s ad system and who mangaclash.com says it works with to get Google ads.
One thing stood out: most of these accounts belong to the same company/site, papayads.net. PapayAds works with publishers to maximize their ad revenue. And it has a lot of Google accounts.
Andrea De Donatis, the CEO of PapayAds, told me that he used fake names on many of the company’s Google accounts. I redacted them to avoid any unintentional overlap with real people. I also redacted info for companies other than PapayAds because we didn’t examine them in our reporting.
Thanks to Well-Known.dev we see PapayAds is the key Google partner for mangaclash.com. Does it work with other manga piracy sites?
5. See if those partners and/or ad IDs are connected to a larger network of piracy sites.
Yes it does! Let’s take another look at a few of the Google ads account entries from the mangaclash.com data:
Each ad account has a number listed (in brackets) next to the magnifying glass icon. The first row shows that Well-Known.dev has identified 691 websites that say they work with PapayAds via that Google account ID. Click on the magnifying glass and Well-Known.dev will show you all 691 sites. Nice! You can repeat this for each ad account.
We grabbed the site lists for every PapayAds Google account we could find. Ruth Talbot, the data journalist I worked with on this series, combined the lists to identify common sites across the Google ad accounts. I looked through the list to identify sites with the word “manga” in their domain, or other indicators they could be manga piracy sites. I visited them and reran these steps:
Identify if the site has pirated material.
Confirm the presence of Google ads.
Identify the size of the site’s traffic.
Soon we had a spreadsheet with a sample of 50 manga sites that were working with PapayAds, attracted roughly 750 million visits in September, and had close to 2 million URLs delisted by Google for copyright infringement. Yet Google was placing ads on the sites, in apparent violation of its own policy against monetizing pirated content. There’s more about this scheme and about PapayAds in our story.
After we shared our findings Google said it uses a combination of human oversight, automation and self-serve tools to protect ad buyers. It said it terminated PapayAds’ ad accounts and removed ads from many of the manga sites we identified.
PapayAds CEO De Donatis said he was not aware that sites his company worked with were filled with pirated content. He said he’s not responsible for the content of the sites, and that nearly all of the manga sites were approved by Google to receive ads before signing on with him.
A final note on methodology: My five step workflow used free tools and did not require advanced coding or deep technical knowledge. It enabled us to uncover a previously unreported company helping manga piracy sites make money from Google ads, and used Google’s own data to show its ad business funnelled money to sites engaged in mass infringement. Read the resulting story here.
Thanks for reading. In case you didn’t notice, this newsletter is now on Substack, which means you can add comments. I’d love to hear from you!
Read previous issues of Digital Investigations here: digitalinvestigations.substack.com. If you enjoyed this newsletter, please encourage others to read and subscribe. It’s totally free and I have no plans to charge for a subscription.