Bringing Computational Access to Book-length Documents Via an ETD Pilot.

Published in CNI: Coalition for Networked Information Fall 2019 Membership Meeting, 2019

Recommended citation: William A. Ingram. Bringing Computational Access to Book-length Documents Via an ETD Pilot. CNI: Coalition for Networked Information Fall 2019 Membership Meeting. December 9-10, 2019. Washington, DC. https://www.cni.org/topics/electronic-theses-dissertations-etds/bringing-computational-access-to-book-length-documents-via-an-etd-pilot

Virginia Polytechnic Institute and State University (Virginia Tech) Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, is the recipient of an IMLS National Leadership Grant for Libraries award to fund research into bringing computational access to book-length documents, through a research and piloting effort employing electronic theses and dissertations (ETDs). The three-year project is motivated by the following library and community needs:

(1) Despite huge volumes of book-length documents in digital libraries, there is a lack of models offering effective and efficient computational access to these long documents.
(2) Nationwide open-access services for ETDs generally function at the metadata level. Much important knowledge and scientific data lie hidden in ETDs, and we need better tools to mine the content and facilitate the identification, discovery, and reuse of these important components.
(3) A wide range of audiences can potentially benefit from this research, including but not limited to librarians, students, authors, educators, researchers, and other interested readers.

Our research focuses on extracting and analyzing segments of long documents (chapters, reference lists, tables, figures), as well as methods for automated classification and summarization of individual chapters of longer texts to increase findability. The project brings cutting-edge machine/deep learning technologies to advance discovery, use, and potential for reuse of the knowledge hidden in the text of books and book-length documents. By focusing on libraries’ ETD collections, the research will enhance ETD programs, devising effective and efficient methods for opening the knowledge currently hidden in the rich body of graduate research and scholarship.