Digital Investigations

Share this post

Investigating Google's Ad Business

digitalinvestigations.substack.com

Investigating Google's Ad Business

Here's a methodology for identifying piracy sites making money with Google ads

Craig Silverman
Feb 1
5
Share this post

Investigating Google's Ad Business

digitalinvestigations.substack.com
Screenshot of ProPublica article with headline "Porn, Piracy, Fraud: What Lurks Inside Google’s Black Box Ad Empire"
A story we published late last year

Welcome back to Digital Investigations, the newsletter about digging into digital content and systems!

In my previous newsletter, I outlined 5 free tools you can use to investigate digital ads. Then I kind of disappeared for a year...

My excuse is I spent a good part of that time investigating the biggest digital ad business in the world: Google. My colleagues and I tried to crack open the black box that is Google’s ad business. Here’s what we found:

  • Porn, Piracy, Fraud: What Lurks Inside Google’s Black Box Ad Empire

  • How Google’s Ad Business Funds Disinformation Around the World

  • Methodology article: How We Determined Which Disinformation Publishers Profit From Google’s Ad Systems

  • Google Says It Bans Gun Ads. It Actually Makes Money From Them

  • Google Allowed a Sanctioned Russian Ad Company to Harvest User Data for Months

In this newsletter I’ll explain how I used one of the tools mentioned in my previous newsletter (plus a couple of others) to uncover a large piracy scheme making money with Google ads. I’ll outline the methodology I used to find hundreds of apparent piracy sites working with a single company, and how I used Google’s own data to show that these sites engage in mass copyright infringement.

Hopefully this reinforces why digital ads are worth investigating and provides a concrete example of how to do it. In case it’s not already clear, I’m on a mission to get journalists and researchers to dig into the murky, fraud filled business of digital ads. Contact me if I can help!

Thanks for reading Digital Investigations! Subscribe for free to receive new posts and support my work.

PapayAds and Manga Pirates

One of our reporting goals was to identify networks of websites earning money from Google ads that are breaking the company’s rules.

We scanned more than 7 million websites looking for Google ad activity. This built a dataset of sites to investigate. Then we needed to identify possible violations among them. I knew manga piracy was a problem within Google’s ad network thanks to the previous excellent work of DeepSee.io. So I searched for manga sites among our set of sites monetizing with Google. This was easy because many manga sites have the word “manga” in their domain name. When I found a site, I needed to do five things:

  1. Identify if the site has pirated material.

  2. Confirm the presence of Google ads.

  3. Quantify the site’s traffic.

  4. See which partners, if any, the site works with to get Google ads.

  5. See if those partners and/or Google ad IDs are connected to a larger network of piracy sites.

Let’s break down each each step, look at the tools I used, and the resulting findings.

1. Identify if the site has pirated material.

One thing I learned about manga is that the industry, which does billions of dollars in annual sales, has been slow to embrace digital. It’s still focused on selling printed books. If you find a site filled with scanned pages from manga books, there’s a good chance it’s not licensed material. The challenge was to find a reliable data source that could indicate whether the site is engaged in piracy.

Google’s online Transparency Report has a section called “Content delistings due to copyright.” It discloses data about URLs the company removed from Google search due to copyright violations.

Manga publishers hire copyright enforcement companies to scan the web for their copyrighted material. The companies file copyright notices asking Google to delist infringing URLs from its search engine. Google’s Transparency Report reveals the number of URLs it has removed per domain.

If Google has removed thousands of URLs from a site due to copyright infringement, that’s a good signal it’s a piracy site. And even more important, it’s a strong indicator that Google’s ad network should not be placing ads on it.

Let’s use mangaclash.com as an example. Here’s where you can view Google’s copyright delisting info for the site.

Chart showing URLs from mangaclash.com delisted by Google
Google’s URL delisting data for mangaclash.com

The page shows that since 2020 Google received takedown notices for more than 100,000 URLs on the site, and removed over 90% of them (including duplicates) for copyright infringement. It’s a strong signal Google considers the site to be a mass infringer of copyright. Now we have reliable, repeatable approach to identify piracy sites. (Mangaclash.com did not respond to a request for comment.)

2. Confirm the presence of Google ads.

We built a tool to assist with this process, and you can read more about it in our methodology story. But it’s possible to manually identify Google ads without building a scraper.

Turn off you ad blocker, load a site and look at the ads on the page. For example, here’s a Nike ad I was served on a manga site. There is a blue triangle next to a blue “x” in the upper right hand corner. Click on it.

Screenshot of Nike ad on manga piracy site, with blue triangle in the upper right hand corner
Note the blue triangle in the upper right corner of the Nike ad

If it’s a Google ad, clicking the triangle will show you something like this:

Image showing a Google ad disclosure

And/or take you to a page like this:

Screenshot of a webpage Google shows you if you click on the blue triangle in an ad
A Google ads info page

Now you know the ad was placed by Google. If you visit mangaclash.com now, you’re unlikely to see Google ads; it appears they were removed as a result of our story.

3. Quantify the site’s traffic.

So far we know the site contains pirated content and is making money from Google ads. But how popular is it? This is important because more traffic = more potential ad revenue.

SimilarWeb is my preferred tool for checking a site’s traffic. You get a snapshot of three months of data for free. That’s enough to get a sense of how large a site is, where its audience is based, and how the audience finds its way to the site (via search, social media, etc.).

When I checked SimilarWeb at the end of last year, it showed mangaclash.com had 7.3 million visitors in September. If you look now, it shows the site racked up over 9 million visits in November. Also note that SimilarWeb says people view an average of 9 pages per visit, and spend just over seven and a half minutes per session. That’s incredible engagement!

Screenshot of mangaclash.com's traffic statistics from SimilarWeb
Mangaclash.com’s stats on SimilarWeb

Thanks to SimilarWeb, we know this site is popular and could be showing lots of ads and earning revenue from them.

4. See which partners, if any, the site works with to get Google ads.

Here’s where Well-Known.dev comes in. This is one of the free tools I described in my previous newsletter. If you want to understand more about the data Well-Known.dev collects and displays, watch this video of a workshop I gave at the International Journalism Festival. For now, suffice it to say the site aggregates data from websites and digital ad systems and gives you an easy way to see their relationships and partners. Go sign up for a free account! It shows more data if you are logged in.

A search for mangaclash.com and Well-Known.dev returns a page listing a huge number of ad systems and partners the site appears to be working with. Our reporting was focused on Google, so we zeroed in on the Google info. Here’s what I saw when I looked at the Google section in late November (it’s different now):

Screenshot from well-known.dev showing A selection of the Google ad accounts used by mangaclash.com to earn money
Some of the PapayAds Google accounts used by Mangaclash.com to earn money

Each of the rows lists a different Google ad account and, if you’re lucky, the person or company associated with it, and their domain. These are the people or companies who have accounts in Google’s ad system and who mangaclash.com says it works with to get Google ads.

One thing stood out: most of these accounts belong to the same company/site, papayads.net. PapayAds works with publishers to maximize their ad revenue. And it has a lot of Google accounts.

Andrea De Donatis, the CEO of PapayAds, told me that he used fake names on many of the company’s Google accounts. I redacted them to avoid any unintentional overlap with real people. I also redacted info for companies other than PapayAds because we didn’t examine them in our reporting.

Thanks to Well-Known.dev we see PapayAds is the key Google partner for mangaclash.com. Does it work with other manga piracy sites?

5. See if those partners and/or ad IDs are connected to a larger network of piracy sites.

Yes it does! Let’s take another look at a few of the Google ads account entries from the mangaclash.com data:

Close up of screenshot from well-known.dev shoing three of the Google ad accounts used by mangaclash.com

Each ad account has a number listed (in brackets) next to the magnifying glass icon. The first row shows that Well-Known.dev has identified 691 websites that say they work with PapayAds via that Google account ID. Click on the magnifying glass and Well-Known.dev will show you all 691 sites. Nice! You can repeat this for each ad account.

We grabbed the site lists for every PapayAds Google account we could find. Ruth Talbot, the data journalist I worked with on this series, combined the lists to identify common sites across the Google ad accounts. I looked through the list to identify sites with the word “manga” in their domain, or other indicators they could be manga piracy sites. I visited them and reran these steps:

  1. Identify if the site has pirated material.

  2. Confirm the presence of Google ads.

  3. Identify the size of the site’s traffic.

Soon we had a spreadsheet with a sample of 50 manga sites that were working with PapayAds, attracted roughly 750 million visits in September, and had close to 2 million URLs delisted by Google for copyright infringement. Yet Google was placing ads on the sites, in apparent violation of its own policy against monetizing pirated content. There’s more about this scheme and about PapayAds in our story.

After we shared our findings Google said it uses a combination of human oversight, automation and self-serve tools to protect ad buyers. It said it terminated PapayAds’ ad accounts and removed ads from many of the manga sites we identified.

PapayAds CEO De Donatis said he was not aware that sites his company worked with were filled with pirated content. He said he’s not responsible for the content of the sites, and that nearly all of the manga sites were approved by Google to receive ads before signing on with him.

Share

A final note on methodology: My five step workflow used free tools and did not require advanced coding or deep technical knowledge. It enabled us to uncover a previously unreported company helping manga piracy sites make money from Google ads, and used Google’s own data to show its ad business funnelled money to sites engaged in mass infringement. Read the resulting story here.


Thanks for reading. In case you didn’t notice, this newsletter is now on Substack, which means you can add comments. I’d love to hear from you!

Read previous issues of Digital Investigations here: digitalinvestigations.substack.com. If you enjoyed this newsletter, please encourage others to read and subscribe. It’s totally free and I have no plans to charge for a subscription.

Leave a comment

Thanks for reading Digital Investigations! Subscribe for free to receive new posts and support my work.

Share this post

Investigating Google's Ad Business

digitalinvestigations.substack.com
Comments
TopNewCommunity

No posts

Ready for more?

© 2023 Craig Silverman
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing