Key tools and approaches for using AI in OSINT and investigations
AI and LLMs pose significant risk to information integrity. But they also offer a lot of promise for OSINT and investigations. Here's a look at some encouraging areas.
1. Photo/Video analysis
Geolocation
Geolocation can be labor intensive. You might view a video frame by frame to look for landmarks and other location clues, or extract dozens of screenshots to run through reverse image search engines, among many other steps.
Unless you posses the memory and/or ability to travel the world like Rainbolt to learn about local highways, streetscapes, and landscapes, you probably can’t identify locations by sight alone. But AI can learn from Google StreetView, OpenStreetMaps, and other sources to potentially become proficient at geolocation.
GeoSpy is a free tool to identify the location of an uploaded photo. Here’s more on how it works. There’s also a plan for a paid version, GeoSpy Pro, which will analyze large sets of images and generate insights from the data.
What else? Try Picarta and this Geolocation Estimation tool. There’s also a model on HuggingFace that can analyze an image, though my initial testing found that it didn’t attempt geolocation. HuggingFace is also home to a model “that can answer questions about small details in high-resolution images.”
Facial recognition
Yandex’s image search tool is good at matching faces. Paid tools such as Pimeyes, FaceCheck.id and the highly controversial ClearView are creepily effective. Face++ performs a variety of facial recognition functions.
Suggested reading:
“Finding people with their faces,” by Techjournalist
AI Detection
Warning: No AI detection tool will ever be 100% accurate or able to detect every model.
Image
Video
Text
Audio
Suggested readings:
“How to Identify and Investigate AI Audio Deepfakes, a Major 2024 Election Threat,” by Rowan Phil for GIJN
“How to think about deepfakes and emerging manipulation technologies,” By Sam Gregory for the Verification Handbook
2. Search
Investigator/trainer Henk Van Ess launched AI Search Whisperer, a tool that uses Chat GPT 4.0 to suggest Google Dorks. Here’s what Henk says about it:
The tool forecasts patterns in texts and integrates Google Dorks into its functionality and explains why it did it. It is not build in the normal ChatGPT. I choose the best of the best, GPT-4 Turbo with 128K memory. It can read and understand a longer piece of text in one go. This allows it to provide more relevant and coherent responses.
Along with Henk’s tool, you can test DorkGPT from PredictaLab.
3. Research and Analysis
Pattern/Entity Recognition
There’s a big opportunity to use AI to identify objects and patterns in maps, images and videos, as well as in large datasets. Christiaan Triebert of the New York Times shared a powerful example of how it used Picterra to assist with an investigation into bombings by the Israeli military in Gaza:
Document/Data Analysis
This isn’t exactly a new application of AI. Journalists and investigators have been using machine learning to analyze datasets for a long time, relatively speaking. For example, in 2021 my colleagues used machine learning to help analyze and classify a large dataset of Facebook groups and their content:
What’s new is how easy LLMs like ChatGPT make it to collect, analyze and generate insights from datasets. There’s a cottage industry of custom GPTs built to assist with document and data analysis (you need a premium subscription to ChatGPT to use these):
EmbedAI lets you upload web links, PDFs, and text documents and train a chatbot on them.
Data Analyst says you can “Drop in any files and I can help analyze and visualize your data.”
PDF Insight generates “insights and summaries of PDF documents uploaded by users.”
PDF.ai does the same.
OSINVGPT helps you pursue open-source investigations.
Consensus can search across 200 million academic papers to “get science-based answers, and draft content with accurate citations.”
AI OSINT assists with ope-source research. You can ask it to help investigate a domain, email, name etc. The same developer also created Cylect.io.
Fintool is a custom GPT for searching company data in the SEC’s EDGAR database.
You can also try Claude AI, which has a free beta but is only available in certain countries. And Google recently announced new AI-based features in Pinpoint, its document analysis tool:
First, we’re announcing new generative AI features that help reporters evaluate documents or a collection of documents by asking questions to better understand key points. For example, if a reporter was looking at a collection of historical documents, they could ask Pinpoint questions to help them better understand what’s in them, such as their key points and main themes.
Pinpoint also has a new feature that combines data tables from different documents into a single spreadsheet.
Suggested readings/resources:
Mike Reilly of Journalist’s Toolbox AI has a page listing custom GPTs and lists of AI search (and SEO) tools and AI bowser plugins. Reilly is giving a series of workshops on using AI in journalism. They’re free for ONA members and $25 for non-members. Details here.
Nico Dekens (aka Dutch OSINT Guy) wrote, “Using AI for extracting Usernames, Emails, Phone Numbers, and Personal Names from large datasets.”
I highly recommend reading Generative AI in the Newsroom.
Filipino journalist Jaemark Tordecilla recently wrote about how he “created an AI tool to help investigative journalists find stories in audit reports.”
Transcription/Translation/Summarization
AI is pretty good at transcription and summarization. Here’s a list of YouTube summarization tools. There’s also a custom ChatGPT called Video Insights that does video and audio summarization and transcription. Transcription tools such as Otter and Whisper also use AI extract entities, tags, and themes from transcripts.
What did I miss? Please share your thoughts in the comments.
That’s it for this edition of Digital Investigations! Thanks for reading. You can find me on Threads, Bluesky, Mastodon, and LinkedIn. I’m not very active on Twitter these days.