RDM & TDM in JupyterHub with Newspapers

Page Banner
Leddy Library, University of Windsor

Event Information

Join us for a series of research data management (RDM) and text data mining workshops funded through a Compute Ontario grant: Building Capacity for Research Data Management and Text Data Mining in a JupyterHub Advanced Research Computing Environment.

We will introduce researchers to Jupyter Lab using SHARCNET and the Digital Research Alliance of Canada's JupyterHub portals. Jupyter is an open-source electronic notebook (ELN) which replicates in digital form a paper lab notebook. It allows researchers to add observations, protocols, and annotate code among other uses. ELNs facilitate good RDM practices in helping researchers write reproducible code and faciliatate data and code sharing. More information about Project Jupyter can be found here.

RDM at the University of Windsor is a joint commitment between many campus partners including, the Leddy Library, the Office of Research & Innovation Services and IT Services.  


We are most grateful to Compute Ontario for funding our event series in support of training for RDM in an advanced research computing environment.




Paul Preney, SHARCNET High Performance Computing Technical Consultant, University of Windsor
Art Rhyno, Systems Librarian, Leddy Library, University of Windsor
Tim Ribaric, Digital Scholarship Librarian, Brock University
Cal Murgu, Instructional Design Librarian, Brock University

Schedule and Registration

Please make sure to register. We will need to gauge room capacity restrictions and finalize catering numbers for coffee and lunches for in-person attendance. Also, for in-person attendees, please bring your own laptop.

Tuesday, February 21, 2023 (Hybrid)

Opening Remarks

Karen Pillon, Associate University Librarian, Leddy Library, University of Wndsor
Berenica Vejvoda, Research Data Librarian, Leddy Library, University of Windsor

Introduction to JupyterHub 

Time: 9:15AM-11:00AM
Description: This workshop will introduce researchers to Jupyter Lab using SHARCNET's and the Digital Research Alliance of Canada's JupyterHub portals. This workshop will cover various details including discussing what is and how to obtain a Digital Research Alliance of Canada account, understanding the purposes of electronic notebooks, how to use Jupyer Lab on these portals including how to find and load various pre-installed softwares with your notebooks. (Attendees will not need a Digital Research Alliance of Canada account in this workshop or in the next two workshops in the RDM & TDM in JupyterHub with Newspapers series. Guest account login and passwords will be provided to in-person attendees.)
Presenter: Paul Preney (SHARCNET High Performance Computing Technical Consultant, University of Windsor)
Bio: Mr. Preney has an Honours B.Sc. (Biology and Computer Science); M.Sc. (Computer Science); B.Ed. (Teachables: Biology and Computer Science); and is an Ontario Certified Teacher (OCT). He is also a member of the Standards Council of Canada (SCC) Mirror Committee to SMC/JTC 1/SC 22 (Programming languages) and is a Subject Matter Expert of the SCC Mirror Cmte to SMC/JTC 1/SC 22/WG 21 (C++). He has taught courses at the secondary level and as a sessional instructor in Computer Science and Education at the University of Windsor. Mr. Preney is currently the University of Windsor on-campus SHARCNET staff person for supporting researchers and their high-performance, advanced, storage, and cloud computing needs.
In-person location: Collaboratory (Main floor, Leddy Library, 401 Sunset Ave, Windsor)
In-person registration: https://www.eventbrite.com/e/523594082997
Virtual registration: https://www.eventbrite.com/e/523691444207

Wednesday, February 22, 2023 (Hybrid)

Text Data Mining of Newspapers in JupyterHub

Time: 9:00AM-11:00AM
Description: This session provides an introduction to textual analysis and data mining with newspaper text. Using an historical community newspaper from Essex County, Jupyter notebooks will be used to explore the uses of digitized content in a browser without requiring the installation of specialized software. Participants will be provided with guest credentials for Digital Alliance research computing resources.
Presenter: Art Rhyno (University of Windsor, Leddy Library, Systems Librarian)
Bio: Art Rhyno has worked in information technology since the late 1980s and recently completed a 5-year stint as Chair of the ALTO Board, which is the leading body for encoding newspaper text. He has worked on newspaper digitization projects with many organizations, including the Internet Archive and the World Bank. Art is a former co-owner and publisher of a community newspaper and is Associate Chair of OurOntario/OurDigitalWorld, a position he has held since 2004.

In-person location: Collaboratory (Main floor, Leddy Library, 401 Sunset Ave, Windsor)
In-person registration: https://www.eventbrite.com/e/523595396927
Virtual registration: https://www.eventbrite.com/e/523691865467

Tuesday, February 28, 2023 (Virtual)

RDM in Jupyter: The Importance of Keeping your Data Reproducible

Time: 9:00AM-11:00AM
Description: This session will take a deep dive into some research data management best practices when developing in a Jupyter environment. The focus will be on ensuring reproducibility of analysis and bundling up code and data for use by others. This will be examined in two ways: moving your project to Github, and remixing/extending work that already exists. Participants will need a GitHub account for the session that can be created here.
Presenters: Tim Ribaric (Brock University) and Cal Murgu (Brock University) 

Tim Ribaric is the Digital Scholarship Librarian at Brock University. He holds a Masters of Computer Science and is currently completing his PhD in Educational studies. He is a big fan of Github as a resource for researchers and uses Jupyter notebooks for almost all of his research, including his most recent Web Archives analysis platform called All Our Yesterdays. He is the instructor for the Library Juice Academy Python for Librarians course, which is of course all written in Jupyter Notebooks. 

Cal Murgu is the Instructional Design Librarian at Brock. Prior to Brock, he was the Digital Humanities Librarian at the New College of Florida (Florida’s Honour College). He also teaches Digital Humanities and LIS at Western University in the Fall semesters. His current research project investigates how interactive digital notebooks can be used to help non-programmers learn to use and manipulate web archives. Cal is something of a jack-of-all-trades technologist who works with the Brock Digital Scholarship Lab whenever possible.

Virtual registration: https://www.eventbrite.com/e/523692547507

Event GitHub Repository

All code used for the workshops will be available in the Leddy Library's Academic Data Centre GitHub repository.

Send us a message