Publications

A New Annotation Method and Dataset for Layout Analysis of Long Documents

Published in Companion Proceedings of the ACM Web Conference 2023, 2023

Parsing long documents, such as books, theses, and dissertations, is an important component of information extraction from scholarly documents…. Read more

Recommended citation: Aman Ahuja, Kevin Dinh,Brian Dinh, William A. Ingram, and Edward Fox. 2023. A New Annotation Method and Dataset for Layout Analysis of Long Documents. In Companion Proceedings of the ACM Web Conference 2023 (Austin, TX, USA) (WWW ’23 Companion). Association for Computing Machinery, New York, NY, USA, 834—-842. https://doi.org/10.1145/3543873.3587609. https://dl.acm.org/doi/abs/10.1145/3543873.3587609

Applications of data analysis on scholarly long documents

Published in 2022 IEEE International Conference on Big Data (Big Data), 2022

Theses and dissertations record the work of graduate students and are typically a requirement at the culmination of the graduate degree … Read more

Recommended citation: Bipasha Banerjee, William A. Ingram, Jian Wu, and Edward A. Fox. 2022. Applications of data analysis on scholarly long documents. In IEEE International Conference on Big Data, Big Data 2022, Osaka, Japan, December 17-20, 2022. IEEE, 2473–2481. https://doi.org/10.1109/BigData55660.2022.10020935 https://10.1109/BigData55660.2022.10020935

Parsing Electronic Theses and Dissertations Using Object Detection

Published in Proceedings of the first Workshop on Information Extraction from Scientific Publications, 2022

Electronic theses and dissertations (ETDs) contain valuable knowledge that can be useful for a wide range of purposes. To effectively utilize the knowledge contained in ETDs … Read more

Recommended citation: Aman Ahuja, Alan Devera, and Edward Alan Fox. 2022. Parsing Electronic Theses and Dissertations Using Object Detection. In Proceedings of the first Workshop on Information Extraction from Scientific Publications. Association for Computational Linguistics, Online, 121–130. https://aclanthology.org/2022.wiesp-1.14 https://aclanthology.org/2022.wiesp-1.14/

Analyzing and Navigating ETDs Using Topic Models

Published in 25th International Symposium on Electronic Theses and Dissertations - ETD 2022, Novi Sad, Serbia September 7 - 9, 2022, 2022

Electronic theses and dissertations (ETDs) contain valuable knowledge that can be useful in a wide range of research areas… Read more

Recommended citation: AmanAhuja, William A. Ingram, Chenyu Mao, Chongyu He, Jianchi Wei,and Edward A. Fox. 2022. Analyzing and Navigating ETDs Using Topic Models. In 25th International Symposium on Electronic Theses and Dissertations (ETD 2022), September 7-9, 2022, Novi Sad, Serbia. https://hdl.handle.net/10919/109986

A Study of Computational Reproducibility using URLs Linking to Open Access Datasets and Software

Published in Companion Proceedings of the ACM Web Conference 2022, 2022

Datasets and software packages are considered important resources that can be used for replicating computational experiments. … Read more

Recommended citation: Lamia Salsabil, Jian Wu, Muntabir Hasan Choudhury, William A. Ingram, Ed- ward A. Fox, Sarah Michele Rajtmajer, and C. Lee Giles. 2022. A Study of Com- putational Reproducibility using URLs Linking to Open Access Datasets and Software. In Companion of The Web Conference 2022, Virtual Event / Lyon, France, April 25 - 29, 2022. ACM, 784–788. https://doi.org/10.1145/3487553.3524658 https://doi.org/10.1145/3487553.3524658

Building A Large Collection of Multi-domain Electronic Theses and Dissertations

Published in IEEE International Conference on Big Data (Big Data), 2021

In this work, we report our progress on building a collection containing over 450k Electronic Theses and Dissertations (ETDs), including full-text and metadata … Read more

Recommended citation: Sami Uddin, Bipasha Banerjee, Jian Wu, William A. Ingram, and Edward A. Fox. 2021. Building A Large Collection of Multi-domain Electronic Theses and Dissertations. In 2021 IEEE International Conference on Big Data (Big Data), Or- lando, FL, USA, December 15-18, 2021. IEEE, 6043–6045. https://doi.org/10.1109/ BigData52589.2021.9672058 https://doi.org/10.1109/BigData52589.2021.9672058

Applications of Mining ETDs

Published in ETD conference, 2021

Theses and dissertations contain a wealth of knowledge reflecting graduate students exploration in a scholarly domain. Although print submission was common practice early on.. Read more

Recommended citation: Bipasha Banerjee, William A. Ingram, Jian Wu, and Edward A. Fox. 2021. Ap- plications of Mining ETDs. In 24th International Symposium on Electronic The- ses and Dissertations (ETD 2021), November 15-17, 2021, United Arab Emirates. https://doi.org/10.26226/morressier.614c9b8c87a68d83cb5d59b2 https://doi.org/10.26226/morressier.614c9b8c87a68d83cb5d59b2

Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations

Published in ACM/IEEE Joint Conference on Digital Libraries in 2021, 2021

Electronic Theses and Dissertations (ETDs) contain domain knowledge that can be used for many digital library tasks, such as analyzing citation networks and predicting research … Read more

Recommended citation: Muntabir Hasan Choudhury, Himarsha R. Jayanetti, Jian Wu, William A. Ingram, and Edward A. Fox. 2021. Automatic Metadata Extraction Incorporating Visual Features from Scanned Electronic Theses and Dissertations. In ACM/IEEE Joint Conference on Digital Libraries, JCDL 2021, Champaign, IL, USA, September 27-30, 2021. IEEE, 230–233. https://doi.org/10.1109/JCDL52503.2021.00066. https://doi.org/10.1109/JCDL52503.2021.00066

ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations

Published in ACM/IEEE Joint Conference on Digital Libraries in 2021, 2021

We focus on electronic theses and dissertations (ETDs), aiming to improve access and expand their utility, since more than 6 million are publicly available, and they constitute an important corpus to … Read more

Recommended citation: Sampanna Yashwant Kahu, William A. Ingram, Edward A. Fox, and Jian Wu. 2021. ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations. In ACM/IEEE Joint Conference on Digital Libraries, JCDL 2021, Champaign, IL, USA, September 27-30, 2021. IEEE, 180–191. https://doi.org/ 10.1109/JCDL52503.2021.00030 https://doi.org/10.1109/JCDL52503.2021.00030

Mining ETDs for Trends in Graduate Research.

Published in CNI: Coalition for Networked Information Fall 2020 Membership Meeting, 2020

Our ongoing research project applies computational analysis and text mining techniques to a large corpus of electronic theses and dissertations (ETDs) in order to gain insight into the evolution of graduate research topics. … Read more

Recommended citation: William A. Ingram. Mining ETDs for Trends in Graduate Research. CNI: Coalition for Networked Information Fall 2020 Membership Meeting, November 12, 2020. Virtual. https://www.cni.org/topics/electronic-theses-dissertations-etds/mining-etds-for-trends-in-graduate-research

Figure Extraction from Scanned Electronic Theses and Dissertations.

Published in VtechWorks: Viginia Tech ETD, 2020

The ability to extract figures and tables from scientific documents can solve key use-cases such as their semantic parsing, summarization, or indexing. … Read more

Recommended citation: Sampanna Yashwant Kahu. 2020. Figure Extraction from Scanned Electronic Theses and Dissertations. Thesis. Virginia Tech. https://vtechworks.lib.vt.edu/handle/ 10919/100113 http://hdl.handle.net/10919/100113

A Heuristic Baseline Method for Metadata Extraction from Scanned Electronic Theses and Dissertations

Published in ACM/IEEE Joint Conference on Digital Libraries in 2020, 2020

This is a paper for the poster which has been accepted to ACM/IEEE Joint Conference on Digital Libraries 2020 and recieved Best Poster Award Honorable Mention. Read more

Recommended citation: Muntabir Hasan Choudhury, Jian Wu, William A. Ingram, and Edward A. Fox. 2020. A Heuristic Baseline Method for Metadata Extraction from Scanned Electronic Theses and Dissertations. In JCDL ’20: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2020, Virtual Event, China, August 1-5, 2020. ACM, 515–516. https://doi.org/10.1145/3383583.3398590 https://dl.acm.org/doi/10.1145/3383583.3398590

Increasing Accessibility of Electronic Theses and Dissertations (ETDs) Through Chapter-level Classification

Published in VtechWorks: Viginia Tech ETD, 2020

Great progress has been made to leverage the improvements made in natural language processing and machine learning to better mine data from journals, … Read more

Recommended citation: Palakh Mignonne Jude. June, 2020. Increasing Accessibility of Electronic The- ses and Dissertations (ETDs) Through Chapter-level Classification. MS thesis, Computer Science, Virginia Tech (June, 2020). http://hdl.handle.net/10919/99294 http://hdl.handle.net/10919/99294

Classification and extraction of information from ETD documents.

Published in CS6604: Digital Libraries, 2020

In recent years, advances in natural language processing, machine learning, and neural networks have led to powerful tools for digital libraries, allowing library collections to be discovered, used, and reused in exciting new ways. … Read more

Recommended citation: John Aromando, Bipasha Banerjee, William A. Ingram, Palakh Jude, and Sampanna Kahu. 2020. "Classification and extraction of information from ETD documents." http://hdl.handle.net/10919/96645

Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning.

Published in Data and Information Management, 2020

Natural language processing (NLP) covers a large number of topics and tasks related to data and information management, leading to a complex and challenging teaching process. Meanwhile, problem-based learning is a teaching… Read more

Recommended citation: Liuqing Li, Jack H. Geissinger, William A. Ingram, and Edward A. Fox. 2020. Teaching Natural Language Processing through Big Data Text Summarization with Problem-Based Learning. Data and Information Management 4, 1 (2020), 18–43. https://doi.org/10.2478/dim-2020-0003 https://doi.org/10.2478/dim-2020-0003

Summarizing ETDs with deep learning

Published in ETD Conference 2019, Porto, Portugal, 2020

Inspired by the millions of Electronic Theses and Dissertations (ETDs) openly available online, we describe a novel use of ETDs as data for text summarization. We use a large corpus of ETDs to evaluate techniques for … Read more

Recommended citation: William A. Ingram, Bipasha Banerjee, and Edward A. Fox. 2020. Summarizing ETDs with deep learning. Cadernos de Biblioteconomia, Arquivística e Documentação 1 (Mar. 2020), 46–52. https://doi.org/10.48798/cadernosbad.2014 https://bad.pt/publicacoes/index.php/cadernos/article/viewFile/2014/pdf

Bringing Computational Access to Book-length Documents Via an ETD Pilot.

Published in CNI: Coalition for Networked Information Fall 2019 Membership Meeting, 2019

Virginia Polytechnic Institute and State University (Virginia Tech) Libraries, in collaboration with Virginia Tech Department of Computer Science and Old Dominion University Department of Computer Science, is the recipient of an IMLS National Leadership Grant for Libraries award… Read more

Recommended citation: William A. Ingram. Bringing Computational Access to Book-length Documents Via an ETD Pilot. CNI: Coalition for Networked Information Fall 2019 Membership Meeting. December 9-10, 2019. Washington, DC. https://www.cni.org/topics/electronic-theses-dissertations-etds/bringing-computational-access-to-book-length-documents-via-an-etd-pilot

Big Data Text Summarization: Using Deep Learning to Summarize Theses and Dissertations.

Published in CS4984: Special Topics, 2018

Team 16 in the fall 2018 course “CS 4984/5984 Big Data Text Summarization,” in partnership with the University Libraries and the Digital Library Research Laboratory, prepared a corpus of electronic theses and dissertations (ETDs) for … Read more

Recommended citation: Naman Ahuja, Ritesh Bansal, William A. Ingram, Palakh Jude, Sampanna Kahu, and Xinyue Wang. "Big Data Text Summarization: Using Deep Learning to Summarize Theses and Dissertations." http://hdl.handle.net/10919/86406