Go to the Globe and Mail homepage

Jump to main navigationJump to main content

AdChoices

Technology

QuickSort

Where the Globe divides the high from the
low elements of Internet and digital culture

Entry archive:

The race to archive Twitpic before 800 million pictures vanish Add to ...

Right now, a collective of Internet archivists and programmers is trying to do the impossible: save more than 800 million pictures uploaded to the Twitter photo-sharing service Twitpic before they disappear down the memory hole after the company’s scheduled shutdown on October 25.

For this group of digital librarians, saving a bunch of stranger’s pictures is about keeping alive a key piece of our digital culture.

Get Adobe Flash player

“With hundreds of millions of photos at stake – going back to the earliest days of Twitter – we recognized it as a vital part of online history,” said Jason Scott, a member of the Archive Team.

“With the growth of Twitter came the growth of the Twitpic service, and it was used as a way to show images both newsworthy and socially relevant across the last seven years” said Scott, calling the potential loss of those pictures “unbelievable.”

As the Archive Team is slowly attempting to download the entire cache of photos and comments, it is faced with yet another obstacle: Twitpic blocked the Archive Team from accessing its website directly.

Twitpic is letting individual users export the pictures they uploaded, however many are reporting bugs and long delays in the process.

“There are still many cases of people downloading .zip files and finding that they are empty or corrupted,” said Scott, adding that many users are not even aware of the upcoming shutdown.

Some individuals even offered to pay for the storage and transfer costs but both them and the Archive Team were ignored by Twitpic founder, Noah Everett.

Everett did not respond to The Globe’s interview requests.

Twitpic is just the latest example of a website filled with user uploaded content getting shut down before any archives are made.

The Archive Team collective started in 2009. Soon after, Yahoo! announced it was shutting down Geocities, a service allowing Yahoo users to host their own website for free.

While a lot of websites hosted on the Geocities platform are by today’s standards old and ugly, they give a glimpse into the early days of the Internet, when millions of people suddenly discovered it.

“What we were facing, you see, was the wholesale destruction of the still-rare combination of words and digital heritage, the erasing and silencing of hundreds of thousands of voices, voices that representing the dawn of what one might call “regular people” joining the World Wide Web,” wrote the collective at the time.

“A surprising amount of people came forward to help, everyone from coders and archivists through designers and supporters” said Scott.

“Over the last five years, we’ve been involved in hundreds of smaller projects and dozens of larger ones to provide at least some record of websites that are going away.”

In 2010 the Archive Team managed to grab 10 Terabyte (Tb) of data from Friendster, a precursor to Facebook, created in 2002.

More recently they retrieved 8 Tb from Google Reader accounts, as Google decided to shut down the RSS feed reader.

Keeping archives is tremendously important says Marie-Pierre Aubé, Director of Records Management and Archives at Concordia University in Montreal. “It explains where we’re coming from and how our society works.”

But as the amount of data generated every day has increased exponentially, archivists are finding themselves constrained by the resources available.

“In one day we’re creating as much data as was created in one year during the 1900s” Aubé says.

As the Archive Team is celebrates its fifth birthday, still only a minority of companies are being pro-active, or allow their users to export their own archives.

Ultimately, it may be up to national institutions to do the work of archiving history, says Aubé.

In 2010, Twitter announced it had started to give the U.S. Library of Congress the full archive of public tweets.

“We can’t ask a private company that is shutting down to keep their archives (online) indefinitely” says Aubé. “But we can ask them to send those to the proper institution,” she said.

For Scott, last-minute scrambles to save disappearing sites has been a feature since the dawn of the Web in the 1990s.

“I wish things were different now, but they appear to be very much the same,” he says. “Except for the case that kilobytes of data lost have given way to gigabytes or terabytes.”

Report Typo/Error

Follow us on Twitter: @GlobeTechnology

In the know

The Globe Recommends

loading

Most popular videos »

Highlights

More from The Globe and Mail

Most popular