What is this?
This checklist is meant to be used as a reporting tool to help journalists and researchers when trying to find out who published a website. This is meant to be used in conjunction with offline reporting techniques.
Following this checklist does not guarantee that you can unmask an owner of a website who does not want to be found, but it can help surface crucial clues and connections that can act as leads for further reporting.
🌟 Strong recommendation: while running through this checklist, create a data diary—it can be a TextEdit doc, a Google Doc, just the Notes app, whatever. It is important to be able to retrace your steps.
✍️ Are there any authors listed?
- If the site is WordPress, try this wildcard search on Google to reveal the author list: “https://yourwebsite.com/author/*/“
📫 Are there any email addresses or contact information?
- If there are email addresses, do those share the domain with the website?
- Does the email show up in haveibeenpwned.com?
- Check to see if there is a Gravatar associated with that address:
🕑 What’s the server’s local time?
- Look at the
datetimeattribute in links on WordPress sites. GMT timestamp can reveal time zone based on GMT offset:
<time class="updated" datetime="2022-03-04T10:21:40+06:00">March 4, 2022</time>
- Look at the
📡 Does the website have an RSS feed?
- Does the RSS feed give any additional information about authors / stories that aren’t visible on the site?
- You can pull RSS article links into Google sheets using IMPORTFEED
Features and functionality
- 🗞 Does the website have a newsletter?
- Check for the physical postal address—required by the CAN-SPAM Act in the US
- 💸 Does the website collect donations?
- 🛒 Does the website have an e-commerce store? Or, does it sell products?
- Try walking through the checkout process (without paying). Sometimes the real payee name is revealed just before you confirm the payment.
- 🔗 What domains does the website link to most? (Requires scraping)
- ❤️ Who links to the domain most often?
- Google search operator: “link:yourwebsite.com”
- Check backlinks on ahrefs.com 💵
- Do the links have UTM codes?
Photos, images, and documents
- 📸 Are there author photos?
- 🔎 Do the images have EXIF data?
- Instructions here.
- 👀 Do the images have any other identifying information?
- Run through the list here
- 🪣 Where are the images hosted?
- If on AWS S3, the bucket name can be revealing—or you might find the bucket isn’t secure.
- 📄 Are there PDFs hosted on the site?
- On a search engine, “filetype:pdf site:<yourwebsite.com>”
- If you find some, check the metadata with “Get Info” in your PDF viewer.
If there are any social media profiles mentioned on the site, they are worth investigating.
- 👤 Are there any social media accounts in the <meta> section of the HTML?
- 📅 When were the individual accounts created? Does it line up with the site history?
- 📊 What platform has the biggest reach?
- 📣 Is the messaging different across platforms?
- 📇 Do they have completely distinct account names across social media platforms or are they more or less the same?
- Note: just because you find the same account name across platforms doesn’t necessarily mean they belong to the same person!
On the Facebook profile, go to Page Transparency:
- ☎️ Is there an address and phone number for the page?
- ⏪ Does the page history reveal a different name?
- Has the page shifted topics?
- 🐣 When was the Facebook page created?
- Is the page running any groups?
- 🗳 Has the page run any ads? Has the page run political ads?
- 🤖 Does Facebook flag any “related pages” for the given page? Rely on Facebook’s algorithms to find connections!
On Twitter, the account might be part of a pod or network that boosts it. Using en.whotwi.com, it’s worth checking:
- 👯♀️ Who is the account engaging with?
- 🐦 What are the account’s tweeting patterns?
- #️⃣ What hashtags are associated with the account?
- Who were the account’s first follows / followers?
- Find this here: https://en.whotwi.com/
Don’t forget to check to see if the site has accounts on Youtube, Instagram, Reddit, Github…
🗄 Have you archived the website? (You always should!)
- you can do this on archive.org or use their browser extension.
- you can grab the whole website on Terminal with
wget -mpEk <yourwebsite.com>
🖥 What is the website using?
- Is it using WordPress, Squarespace, something else?
☁️ Where is it hosted?
- Is it on Google Cloud, AWS, Cloudflare, something else?
🪳 Are there any trackers present?
- You can check Blacklight to begin with.
🛍 How is the site monetized?
- Are there any affiliate links (Amazon, etc.)?
🧬 What are the various tracking identifiers, and are those shared with other domains?
Are there any relevant subdomains?
- Use Farsight Security DNSDBScout Flexible.
📜 Are there historic WHOIS records?
⌛️ Has the site changed over time?
- Look at archive.org to see whether the domain shifted tremendously—and if so, when.
🗑 Did the earlier version of the site have more information?
- People can remove info when a site’s been up for a while.
Resources & Tools
Open Source Intelligence Techniques – Michael Bazzell https://inteltechniques.com/book1.html
Verification Handbook – edited by Craig Silverman https://datajournalism.com/read/handbook/verification-3
- Blacklight: The Markup’s real-time website privacy inspector.
- builtwith.com: gives you the infrastructure of the site, including IP addresses, analytics codes, tech stack, etc. Freemium model.
- DNSDBScout: allows you to search and “flexible search” for passive DNS lookups including IP <-> domain mapping.
- Dnslytics: offers a range of tools including reverse Analytics and reverse DNS lookups, as well as WHOIS data. Freemium.
- RiskIQ: a “threat intelligence” tool that allows you to get reverse IP, reverse analytics, WHOIS, SSL, subdomains, etc.
- Whoxy: a tool that lets you see historical WHOIS registrations. Free.
- The Internet Archive browser extension.
Social Media Accounts
- Sensity AI: check if an image is GAN-generated or not. Freemium.
- whotwi.com: create a profile-at-a-glance for any account on Twitter. Free.
View this checklist on GitHub.
TOP IMAGE: Hana Joy