| INNODATA ISOGEN CASE STUDY |
||||
| SMITHSONIAN COLLECTION OPENS VIRTUAL DOOR ON AMERICA'S FIRST SCIENTIFIC EXPEDITION ACCESS TO DATA FROM PACIFIC VOYAGES FOSTERS MODERN RESEARCH AND BURNISHES SMITHSONIAN'S IMAGE CHALLENGE The U.S. Exploring Expedition of the Pacific, led by Captain Charles Wilkes from 1838 to 1842, produced a veritable ocean of data. The leading scientists and artists of the day sailed on a mission to collect preserve and document anything of value to natural historians throughout the Pacific Ocean. They logged volumes of notes and drawings, collecting nearly 2,400 anthropological artifacts and 50,000 plant specimens. Crisscrossing the Pacific, Wilkes's expedition established that Antarctica is a continent, mapped South America's coast and the Columbia River basin, charted several Pacific Island groups and researched Hawaii volcanoes. The accuracy of the maps helped guide U.S. forces in the Pacific to victory during World War II. Although the expedition has been largely forgotten, the volume of data was staggering – five volumes of narrative descriptions, 15 volumes of published scientific and anthropological documents, plus four additional volumes that had never been published. In all, the Smithsonian received 1,600 pieces in 1858. Now, the Smithsonian wanted to make these 160-year- old records of flora, fauna, geogra¬phy and meteorology available to modern researchers through its Galaxy of Knowledge portal. SOLUTION Because much of the vast collection required labor- intensive transcription and document linking, the Smithsonian knew that it needed to partner with an offshore content services provider to create the digital archive. Moreover, the documents needed to be converted with a high level of accuracy to ensure the material's usefulness to scholars. From that perspective, the Smithsonian's decision to partner with Innodata Isogen, a leader in digitizing content, was a logical choice. IMPLEMENTATION Each page of the printed volumes was scanned with optical character recognition software and the text files were checked against the original by the Innodata Isogen team to ensure complete accuracy of the conversion. The text files were then converted into accessible XML data files. Data elements within the pages were tagged and coded to allow the information to be matched to the document type definition (DTD) system that the Smithsonian has established for its on- line content. Photos of more than 2,000 artifacts and hundreds of pages of draw¬ings and illustration plates were digitized and tagged and coded using Smithsonian's DTD system to allow the entire collection to be searched with key words. Throughout the process, Smithsonian scholars checked each page and illustration for accuracy and to establish an order for the on-line presentation of the collection. They developed descriptions of the sailing vessels and the 600-plus crew of sailors and scientists from the collection and other resources. |
CHALLENGE Provide scholars and the general public with ready access to records from the first U.S.-sponsored exploration of the globe SOLUTION Partner with Innodata Isogen to create a virtual library of interactive text and images on the Internet BENEFITS Scientists and historians worldwide can mine the data for hidden discoveries, and the Smithsonian's stature grows in the emerging field of digital archives |
|||
| Each month, more than three million people visit the Smithsonian's Galaxy of Knowledge portal, giving this oft-forgotten expedition the public spotlight it deserves When the site was launched in early 2004, visitors to the site could read an overview of the expedition, and then choose whether to further explore the narrative texts, scientific texts, plates or supplemental material and resources. Narrative and scientific texts and plates can be viewed as JPG files or as print¬able PDF pages that are exact copies of the originally published reports. In addition, the supplemental material and resources section contains photos of more than 2,000 artifacts, powered by a search engine that enables researchers to review the entire collection through the use of key words. For example, a herpetologist can compare the salamanders of South America and Samoa as easily as a geologist can study the differences of rock strata from the Columbia River with those of Antarctica. IMPACT When completed after eight weeks, the project put crumbling yellow pages once off-limits to all but dedicated scholars just a mouse-click away to all researchers via computer. Each month, more than three million people visit the Smithsonian's Galaxy of Knowledge portal, giving this oft-forgotten expedition the public spotlight it deserves. Scientists can now compare 160-year-old descriptions with current data to identify changes in the flora, fauna, geology and meteorology throughout the Pacific. Digitizing the entire collection also ensures its preservation and establishes protocols that will help other Smithsonian archivists to create similar virtual museums of its collection, which can now be cross referenced to facilitate multi-disciplinary research. Moreover, the steady stream of online visitors who visit the site to explore the expedition's discoveries furthers the Smithsonian's reputation as a leader in the effort to digitize historical records. |
||||
| Learn about: |
||||