The New York Public Library may evoke 19th century Manhattan, but the Beaux-Arts landmark is at the forefront of one of the biggest challenges of the 21st century: how to digitally store information forever.
“We were the first to take it on, on this scale” says Canadian Barbara Taranto, managing director of the New York Public Library Labs. It’s a tricky undertaking even for this data-systems whiz with her degrees in computer science and digital information science. But by pairing technology with volunteer know-how, the library has scored huge successes.
For example, the library was given a massive collection of 40,000 hand-written restaurant menus spanning the 18th- to 21st-centuries.
Too dense to be read by computers, last April the library created the “What’s on the Menu?” project that asked volunteers to transcribe the menus. Crowd-sourcing, or harnessing collective wisdom, has been a godsend for institutions that want to digitize their collections.
By the end of January, 728,219 dishes had been transcribed from 11,876 menus.
Famous chef and entrepreneur Mario Batali has endorsed the project and described it as “a resource for future chefs, sociologists, historians and everyone who loves food. It’s not just ‘What’s on the Menu,’ it reveals so much more.”
Marine biologists want to know which summers saw plentiful oysters and which did not. Novelists want to know what their characters might have eaten in a New York restaurant in 1942. Chef Rich Torrisi has called it an inspiration to developing his own restaurant.
The library also has the largest collection of photographs in New York City, bigger than the Museum of Modern Art or the Metropolitan Museum of Art, and soon they will be crowd-sourcing those as well. They’ll ask volunteers to help create the photos’ metadata, which makes the photograph searchable. This means that a 1930s photo of a mother and two children eating ice cream in Brooklyn will be found in the NYPL system when someone searches for “ice cream” or “baby pram” or even “stroller and children.” The user won’t have to know the name of the people, the location or the photographer.
Digital materials are the fastest-growing part of the library. The NYPL Digital Gallery has more than 700,000 images. Ms. Taranto calls it a “universe of content” in all formats, including archives and large media collections. NYPL’s site gets 28 million visits annually from more than 200 countries.
There’s pressure to digitize collections, which Ms. Taranto happily notes also reduces “the barrier to primary source materials.” Also, a digital surrogate “reduces the wear and tear on the physical object,” ensuring its longevity. But she asks, “How will disabled folks find what they need? What about non-English speakers?”
Then there’s managing the digital rights to old materials now newly available. Researchers may want online access to, say, author Jack Kerouac’s papers, and the heirs agree, but they may not want them blasted across the World Wide Web.
Libraries are good at helping people find material, but the real trick is figuring out how to store it. The new standard bearer for much of this work are rules set by the Library of Congress. They are setting the course for the NYPL, many universities, government agencies and even other countries. But they’re not the only ones. Ms. Taranto notes that this is “an international conversation in the Western world.”
Since the technology to retrieve and maintain digital information is constantly evolving, every decision is an expensive one when dealing with information held in one of the biggest public library systems in the world, with 90 locations, three research centres, 16 million patrons and more than 50 million items.
When she started this process in 1997, Ms. Taranto says, “no single organization had taken on what the NYPL was taking on.”
A few preservation staff had experimented, but the process really geared up in 1999 with a grant from Atlantic Philanthropies. Ms. Taranto set a goal which she admits was “very ambitious”: to digitize 600,000 items over five years. Content was chosen with careful parameters: high research value, unique, rare, single source, complete collections. They had to build a metadata authoring tool, a digitization lab, and several content management systems to handle website development. At the height of the project, 40 people were on staff, she says. It took seven years and the results are on Digitalgallery.nypl.org and www.Inmotionaame.org.
In 2005 they joined the Library of Congress as a lead participant in the National Digital Newspaper Program, digitizing microfilm of local newspapers such as the New York World (1890-1910) and the New York Daily Sun (1890-1910). Currently there are about 100 people on NYPL’s digital staff.
Having been in New York for more than 25 years and a recognized expert in her field, Ms. Taranto is also a valuable resource for local non-profit groups. She’s met with the NY Botanical Society, the NY Police Department archives, the Institute for the Blind. She tells them “how to get up to speed, how to do it right, how to make smart choices, how to spend [their]dollars.”
More than ever libraries have one eye on the future and one on the past. Every day information arrives at the NYPL in digital form, but what to do with donated legacies which include outdated floppy discs or cassettes? How to play audio etched onto wax cylinders? Or store legacies held on a flash drive? Creating new backups and storing them is timely, expensive and, frankly, daunting. Not for Ms. Taranto.
“We never saved everything anyway,” she dryly notes. “We are doing our best due diligence, we are taking this as seriously as we possibly can and we’re doing what our resources will allow us to do. Will we lose things? Of course. That’s the history of humanity.”