Unless you like to look at old websites or video games, you probably don’t think much about The Internet Archive, a project that was founded by tech pioneer Brewster Kahle in 1996, in an ambitious attempt to back up as much of the internet as possible. But the site and its efforts are becoming more and more important, as links rot and totalitarian governments and dictators around the world crack down on free speech. To take just one example, the Archive is working on backing up as many Brazilian websites as it can, after a number of requests from those worried about the impact the new government of Jair Bolsonaro could have on certain kinds of information. In a tweet this week, Jason Scott— who works as a curator for the Archive—put a call out to anyone with Brazilian content, asking them to either upload it directly to the Archive, or even to mail hard drives to the San Francisco-based organization.
I don't speak Portuguese, so spread as you see fit:
If you have Brazilian culture, content or data, please upload it directly to the Internet Archive, or mail hard drives out to us to store your material. An incredible amount of material is going to be lost. We will host it.
— Jason Scott (@textfiles) January 2, 2019
The Archive has software programs or “bots” that continually crawl and index hundreds of millions of websites every day, in much the same way Google does, and users can submit suggestions for more immediate backups via the Archive’s site (which includes the Wayback Machine, a search engine for old or missing web content). In an interview with CJR, Scott says he also periodically reaches out to a variety of different organizations and groups in order to archive important web content. And over the past month or so, Scott said about a dozen people had reached out to him personally, asking the Archive to back up as much of the Brazilian internet as possible, because of concerns that Bolsonaro’s government might remove or censor news sites and other important resources.
Freedom House, which ranks countries based on the amount of freedom they provide for speech and information, said Brazil’s ranking declined last year, in part because of government restrictions on content referring to political candidates (the country’s Congress passed a law last year that requires social-media platforms to immediately remove any anonymous content that is deemed to be offensive or defamatory). Much like the Trump campaign and administration have in the United States, the government of Bolsonaro has attacked journalists and the media for being “fake news,” and the aid group Reporters Without Borders has expressed concern about what kinds of action a Bolsonaro government might take against news organizations and journalism in general.
RELATED: Brazilian journalists gird for tough times under Bolsonaro
Of course, even if the Internet Archive backs up Brazilian content and websites, the government could easily block access to the Archive’s site within Brazil if it wanted to, or it could launch legal action in the United States to have content removed from the Archive—orders that the non-profit organization would likely be forced to honor. But in the meantime, providing a backup of news and other content at least allows for it to be seen, even if the sites in Brazil that are currently hosting it decide to remove it for legal or other reasons.
It’s not just countries like Brazil that need to worry about content disappearing. The Archive’s team of volunteers (which you can join) is currently busy trying to back up all the adult content on Tumblr, after its new owner said it was cracking down on such material. And after tech billionaire Peter Thiel successfully bankrupted Gawker Media by financing a $140-million defamation lawsuit on behalf of former wrestler Hulk Hogan, there was widespread fear Thiel or some other buyer might delete the archives. But the Archive formed a partnership with the Freedom of the Press Foundation to create a service designed specifically for endangered news outlets like Gawker. The preamble for the collection says that it “focuses on news outlets we deem to be especially vulnerable to billionaire problem, aiming to preserve sites in their entirety before their archives can be taken down or manipulated.” The service also captured the archives of LA Weekly, after the magazine was acquired by a then-unknown group of financial backers with unknown motives.
Parker Higgins, who helped set up the Archive partnership while at the Freedom of the Press Foundation, also created a Twitter bot called @LinkArchiver in 2017 that used the Archive to automatically back up any links tweeted by accounts that it followed. But Higgins was forced to shut the service down earlier this year after Twitter turned off the API it used (software that allows third-party services to extract data from Twitter automatically). In the year or so that it was active, Higgins said the automated archiver backed up more than 7 million links from over 9,000 users. Although the Twitter bot no longer functions, there is a site called Save My News—created by data journalist Ben Welsh in partnership with the Internet Archive and a number of other archiving solutions, including Archive.is and WebCite—that allows anyone to enter a URL and have the site automatically backed up by the Archive.
Protecting the archives of news and other websites from angry billionaires and state governments is also one of the features promoted by Civil, the blockchain-powered platform for journalism that has launched more than a dozen independent newsrooms (as well as an ill-fated cryptocurrency token sale). Maria Bustillos, who runs a Civil-based magazine called Popula, talked to CJR last year about how the blockchain can help maintain a website’s archives, since all of the information in an article can be distributed along with the blockchain’s decentralized ledger of financial data. Just before Christmas, Bustillos archived what she said is the first ever news story to be automatically protected from deletion by being included in the Ethereum blockchain, which Civil is based on. In addition, Bustillos said the story is also backed up via the so-called Interplanetary File System or IPFS, an open-source attempt to create a peer-to-peer file storage and sharing system for media.Mathew Ingram is CJR’s chief digital writer. Previously, he was a senior writer with Fortune magazine. He has written about the intersection between media and technology since the earliest days of the commercial internet. His writing has been published in the Washington Post and the Financial Times as well as by Reuters and Bloomberg.