Archiving the Web: How to Support Research of Future Heritage?

Archiving the Web: How to Support Research of Future Heritage?
NWO-CATCH Meeting, hosted by WebART
April 19, 2013
William G. LeFurgy
Library of Congress
We Are in The Big Data Business
Focus from Items
Born Digital Cultural Heritage Data Challenge
Infrastructure Improvement: Perpetually Iterative
White House RFI Input Instructive
• Request for Information on Public Access to Federally
Funded Scientific Research Data, Nov. 2011
• Interested organizations comment on how best to ensure
long-term stewardship and public access
• Input provided to inform development of agency policies
and standards for managing big data
Summary of Responses
• 118 individual responses
– 50% from academic research departments, professional
– 35% from libraries, repositories and allied organizations
– 10% from publishers and commercial organizations
– 5% other
• Excellent (unstructured!) data set to analyze current
thinking on big data stewardship
Top-Level Policy Recommendations
• Remarkable degree of congruence among
– Broadly allocate resources for data stewardship
– Extend national digital stewardship infrastructure
– Institute and enforce a data preservation mandate
– Encourage policies to support secondary use
• But… conflicted about IP, copyright, privacy
Need: Path to Greater Resources
• Funders to include money in awards for data
• Need cost models, other guidance for estimating
data life cycle costs
• Allocate expanded resources to support national
data repositories
Need: National Digital Stewardship
• Leverage current institutional efforts to define
best practices, tools, services
• Extend community of practice for data
stewardship through collaborative action
across disciplines
• Develop a skilled workforce with data
stewardship expertise
Support: Secondary Use, Respect for Data
• Broadly apply a citation mechanism for data sets
(e.g., DataCite, DOIs)
• Criteria for evaluating grant applications tied to
secondary use of data
• Give equal credit for publishing articles and data
• Develop robust metrics to track data publication and
How We Feel About Our Work
How Some Perceive Our Work
“If you go down to the British Library
today, you‟re sure of a big surprise.”
“„Capturing the nation‟s digital memory‟ –
that‟s the phrase they are using about the
venture. Your first response might be: …”
„„The internet archives
itself, doesn‟t it?
It‟s called Google.‟
People Love Libraries (In Principle)
How is Our Brand Different?
Promoting Web Archiving at Library of Congress
• Post on our blog, The Signal
• Tweets via @ndiipp
• Media mentions : Mashable, The Economist,
New York Times
• Active involvement in IIPC; supporting theme for
this year’s meeting, Scholarly Access to Web
Archives: Progress, Requirements, and
Library of Congress Digital Preservation
YouTube Channel
Cultivating Brand Awareness
“A city’s conserved historic core can also differentiate that
city from competing locations—branding it nationally and
internationally—thus helping the city attract investment
and talented people.”
Related Approaches Are Good, Too
Researcher Engagement
A new model for Research IT services, University of Melbourne
Entwined: Branding and Supporting
Cultural Heritage Data Research
Slide 2: 1970s Glam Rock,
Slide 3: Forging a Digital Roadmap: The Preservation, Curation and Stewardship Nexus,
Slide 6: Estimate of Internet users worldwide,
Slide 7: International Internet Preservation Consortium,
Slide 8: Company Information System,
Slide 9: Pearson Think Tank Publications,; Enter the Matrix,
Slide 10: Library Challenges in a 2.0 world,
Slide 11: Elliot computer,
Slide 12: Request for Information: Public Access to Digital Data Resulting From Federally Funded Scientific
Research,; White House, DC,[email protected]/3735172478/
Slide 13: Your Comments on Access to Federally Funded Scientific Research Results,
Slide 18: Cite: White House Memo, Increasing Access to the Results of Federally Funded Scientific Research,
Slide 19: Graphic: brand,
Slide 20: Graphic: opportunities,
Slide 21: Graphic: Pop!Tech 2007,; Graphic: Pop!Tech 2007,
Slide 22: Graphic: ?,; Graphic: bored,
Slide 23: Cite:
Slide 24: Cite:
Slide 25: Cite:
Slide 26: Cite and graphic:
Slide 29: Cite and graphic: America's Young Archivists: The K-12 Web Archiving Program;; Tim O'Reilly - Keynote for 2011 NDIIPP/NDSA Partners Meeting,
Slide 30: Cite and graphic: European Public Sector Information Platform, Topic Report No. 2012 /04,
Open Data in Cultural Heritage Institutions,
ons.pdf; Framework for the development of a brand strategy for The European Library, Europeana and Europeana
Libraries,; What is
DPLA?,; Investing in Historical Assets for
Sustainable Development,
Slide 31: Cite and graphics: Measuring the Impact of Digital Resources: The Balanced Value Impact Model;
WebART,; IIPC, Web Archiving Use Cases,; Altmetrics,; Common Crawl,
Slide 32: Cite and graphics, A new model for Research IT services,
Slide 33: Graphic: Tendril #2,