Tools and tips round up: Goodbye Google cache and LinkedIn X-ray search
It's not all bad news! Learn how to investigate live streaming platforms, work with whistleblowers, and more.
LinkedIn X-ray search is what sourcing experts like Irina Shamaeva call the ability to use search engines to search for LinkedIn profiles using the “site:” operator. Here’s a basic example of an X-ray search to find people at ProPublica who might report on health:
site:linkedin.com/in “propublica” “health”
Plug the above query into Google or Bing and the results will show LinkedIn profiles that contain the keywords “propublica” and “health” in their About or Experience sections. This is different from LinkedIn’s native search, which has a field to search for keywords in someone’s job title.
Concerns about the state of X-ray search recently emerged in an article by Marcel van der Meer of Sourcing Training. He demonstrated that LinkedIn appeared to be restricting the amount of profile information available via search engines. Shamaeva dug in and came up with an equally concerning lack of results.
I reached out to LinkedIn to get some clarity. Is this temporary or did LinkedIn change the information it allows search engines to index? I’m sorry to say it’s the latter. Here’s a statement I received from LinkedIn:
Unauthorized data scraping is getting more sophisticated. It occurs when personal information is taken by unauthorized third parties from your social network profiles and used in ways you didn’t agree to. On LinkedIn, we regularly test new ways to help keep the control of your data where it belongs: in your hands. Read more about our latest efforts here.
Here’s a relevant passage from a LinkedIn help center article that was updated late last year:
In order to protect our members' data and our website, we don't permit the use of any third party software, including "crawlers", bots, browser plug-ins, or browser extensions that scrape, modify the appearance of, or automate activity on LinkedIn's website.
X-ray as we knew it appears to be dead.
As noted above, LinkedIn has a fairly powerful native search tool with lots of filters. But X-ray searching is more flexible and turned up profiles that LinkedIn native search missed. So, yeah, it’s bad news.
Last thing: If you’re a journalist, I strongly encourage you to take advantage of LinkedIn’s offer of a free Premium account. Just sign up and attend the upcoming LinkedIn for Journalists webinar and you’ll receive a code for a profile upgrade. It lasts for a year.
Here’s another reminder of the fickle nature of online tools/platforms: Google cache is dead.
Google search liaison Danny Sullivan announced on Twitter that the ability to view a cached version of a webpage has been removed from search results.
Sullivan said cache “was meant for helping people access pages when way back, you often couldn't depend on a page loading. These days, things have greatly improved. So, it was decided to retire it.”
It’s strange, and disappointing, that Google killed a longtime feature just because it no longer serves its original purpose. Cache has clear utility as an archive. But Google doesn’t value that enough to keep it.
What to do? Here’s a quick list of other online archives from Nico Dekens and a reminder from Micah Hoffman that Bing and Yandex still cache pages. The recent edition of
’ OSINT Newsletter also highlighted WayMore, a tool that can search across multiple services that archive URLs.Latest from me
Forgive the self plug, but my new investigation looks at how more than $1 billion was stolen from fraud victims and routed through Walmart stores. Peter Elkind and I interviewed more than 50 people, dug into court and public records, and obtained exclusive internal docs to reveal how Walmart has long facilitated consumer fraud on a mass scale.
Read the story to learn about China-based gift card laundering rings, money transfer fraud, and how crooks continue to use Walmart gift cards to steal money from unsuspecting consumers.
I’m always happy to receive story tips and suggestions. Anything you send is just between us and doesn’t need to be detailed. I love to hear from people so please email! You can also reach out and ask for my Signal # if you’re more comfortable going that route.
Free virtual events
🖥️ The Tow Center for Digital Journalism at Columbia University is hosting “Working with whistleblowers: What journalists should know” on Feb. 15 at 4 pm ET. From the event page: “Panelists will discuss their experience blowing the whistle, including the aftermath and perspectives on working with journalists.”
🖥️ Authentic8 is hosting “OSINTUp: a virtual skill-sharing event” on Feb. 22 at 11 am ET. It “will bring together some of the leading OSINT experts to share tips, insights and resources for researchers.”
🖥️ SkopeNow is hosting “OSINT for Live Streaming Platforms” on Feb 22 at 11 am ET. (I’m sad that these are at the same time!) You will learn “Techniques to gather and analyze intelligence from live streaming content; Third-party tools for channel analytics and downloading videos; Workflows for navigating Twitch and fundamental elements of the platform.”
Tools
📍 Paul Myers of the BBC highlighted Tiny Scan, a “a really useful, simple to use tool that takes a technical look at a webaddress.”
📍 GPTZero is a tool to analyze text in to see if it may have been generated using one of several AI tools.
📍 A Bellingcat community member created a ChatGPT custom prompt called OSINVGPT that’s “trained on open source investigation guides and cases to help you out with your open source investigations.”
📍 Henk Van Ess created a ChatGPT custom prompt “that helps you to come up with the right Google Dorks. No subscription to @openai is needed.”
📍
shared a list of tools you can use to view Instagram and TikTok accounts without being logged in. She also shared a nice list of Twitter/X advanced search operators maintained by Igor Brigadir, and a browser plugin you can use to review top performing tweets from an account, among other data.📍 Ismael/GONZOsint created fingerprinted, a web app (and GitHub repo) that “presents a variety of identifiers created through different methods, offering detailed insights into your browser's unique digital profile.” Nice quick way to test your level of browser privacy.
📍 Sarah Wormer wrote a case study of how to can use FouAlytics, a free tool, to analyze the ad trackers, external links, and other details of a website.
📍 Google rolled out new AI-powered features in Pinpoint. It announced “new generative AI features that help reporters evaluate documents or a collection of documents by asking questions to better understand key points.” Pinpoint can also extract data tables from multiple documents and combine them into a single spreadsheet. Finally, Google launched a free online lesson, “Introduction to AI for Journalists.”
Worth reading
📚 Meta announced it will label AI generated images on Facebook, Instagram and Threads. It’s already in place for images created with Meta’s AI tools, and the company said it plans to support labeling images generated with tools from Google, OpenAI, Microsoft, Adobe, Midjourney, and Shutterstock. Read this Wired article about about some of the limitations of Meta’s approach. (I’ll have more on AI and OSINT in an upcoming edition of this newsletter.)
📚 The Markup and Consumer Reports published, “Each Facebook User is Monitored by Thousands of Companies.” Really cool how they recruited volunteers to download their own Facebook data in order to enable the analysis.
📚 Faine Greenwood and Konrad Iturbe published, “Identifying Small Drones from Screenshots and Displays.”
📚 The Global Investigative Journalism Network published a breakdown of how reporters and researchers “used 3D modeling, satellite imagery, vessel tracking data, and official sources” to reconstruct the Pylos shipwreck.
📚 Andrew Keh and Start Thompson of the New York Times wrote a piece about online “obituary pirates” and how their hunger for web traffic can flood search results with false information.
📚 Alberto Fittarelli of CizitzenLab wrote “PAPERWALL: Chinese Websites Posing as Local News Outlets Target Global Audiences with Pro-Beijing Content.” (Nice passive DNS work here!)
📚 Ryan McGrady wrote “What We Discovered on ‘Deep YouTube’” for The Atlantic.
📚 Kate Knibbs wrote a piece for Wired about how women’s website The Hairpin became and AI clickbait farm after the site was bought by a Serbian DJ/web entrepreneur. The same DJ was the subject of a 2019 BuzzFeed News article by Dean Sterling Jones (which I edited). It detailed how the DJ bought women’s website The Frisky and turned it into “a vampire website feeding off the property’s former popularity and brand name to sell pay-for-play articles in order to influence search engine rankings.”
📚
wrote “A 1-minute way to geolocate road signs that show the distance to the nearest cities.”📚 Researchers at New York University, Stanford, and the University of Central Florida published an article in Nature, “Online searches to evaluate misinformation can increase its perceived veracity.”
That’s it for this edition of Digital Investigations! Thanks for reading. You can find me on Threads, Bluesky, Mastodon, and LinkedIn. I’m not very active on Twitter these days.