Agenda

Monday, Feb 10, 2025

Time
Activity
10:00am-10:50am Designing Clinical Trial Protocol Extraction Workflows: Unleashing Large Language Modes (LLM) and Retrieval Augmented Generation (RAG)
Speaker: Ramya Sri Baluguri (UC Davis)

This session will demonstrate a process for reducing time and labor in operational oversight by submitting reports to clinicaltrials.gov from IRB protocols.
11:00am-11:50am Tips and Tools for Dealing with Large Datasets at the Command Line
Speaker: Leigh Phan (UCLA), Kristian Allen (UCLA)

Do you lie awake at night waiting for your R script to finish? Is your laptop running out of memory due to merging million row data frames? Do you panic at the thought of opening your data file in Excel? This presentation is for you! We'll cover some low hanging fruit and basic approaches to squeeze the most out of your local machine when dealing with larger data sets. We will walk through some efficient command line tools and file formats to get the most out of your laptop.
1:00pm-1:50pm R Clone: What is it? Why do I need it?
Speaker: Jamie Jamison (UCLA)

Rclone is a command-line tool that allows users to transfer and synchronize files between cloud storage services, servers, and local machines. Rclone is a cross-platform application that runs on Linux, Windows and MacOS.
12:00pm-1:30pm Wikipedia Editathon
Speaker: Corliss Lee (UC Berkeley), Bee Lehman (UC Berkeley)

Wikipedia is among the most visited websites in the world, but even Wikipedia's staff acknowledge that its content reflects its editorship (75.6 percent of US editors are white and 90 percent are male). These editors decide which topics are 'notable' and therefore worthy of inclusion, which often excludes genderqueer and female individuals and topics from the global South. Read more in our blog post, The Bias of Notability in Wikipedia.You can be part of the solution! Join us on February 10 on Zoom for a tutorial for the beginner Wikipedian, then edit some Wikipedia entries yourself. Sign up for a free Wikipedia account in advance (optional). You may bring topic ideas or articles you'd like to edit, or we'll help you find a way to make a difference in an area of interest to you.
Click here to register.
2:00pm-2:50pm Unlocking Data: A Gentle Introduction to APIs
Speaker: Jairo Melo (UC Santa Barbara), Jose Nino (UC Santa Barbara)

APIs (Application Programming Interfaces) serve as powerful tools for accessing and interacting with data in a structured and automated manner, providing capabilities far beyond what is possible through traditional interfaces. Despite their ubiquity in software development, APIs remain underutilized in humanities research, where they offer immense potential for innovative data collection and analysis. In this session, we’ll demystify APIs and showcase their potential through the Library of Congress API. Participants will see step-by-step how to identify if a collection has API access, construct requests, and interpret responses, all with tools like Postman. We’ll walk through concepts like authentication, endpoints, parameters, and how to transform JSON responses into tabular data for visualization. This session is designed as a show-and-tell introduction—perfect for researchers and students with no prior coding or data analysis experience. By the end, attendees will have a clear understanding of how APIs work, their practical applications, and how to start integrating them into their research workflows. Get ready to see what APIs can do for your projects!
Click here to register.
3:00pm-4:50pm Working with Large Language Models (LLMs) on Your Laptop
Speaker: Carl Stahmer (UC Davis), Pamela Reynolds (UC Davis)

Avoid the privacy issues of sending prompts to cloud based LLMs (like ChatGPT) by installing and managing a local instance of OLLAMA, an open source downloadable LLM and API. We’ll set up a chatbot and practice directly querying the LLM for generative outputs, text clustering and classification.

Tuesday, Feb 11, 2025

Time
Activity
10:00am-10:50am Data Research Repositories, Digital Scholarly Ecosystems, Open Science & AI Possibilities
Speaker: Ray Uzwyshyn (UC Riverside), Brendon Wheeler (UC Riverside)

Learn about the research possibilities through the lens of the UCR Library Research Services Department. Topics such as data repositories, digital scholarship ecosystems and research with AI will be discussed to help illustrate relationships between data and artificial intelligence.
Click here to register.
11:00am-11:50am GIS & Mapping: Where to Start
Speaker: Susan Powell (UC Berkeley)

Interested in digital mapping and GIS (geographic information science), but not sure where to start? Have some experience, but want to learn more about what the campus has to offer? This virtual workshop is for you! We'll provide an overview of the GIS and digital mapping landscape as a whole, including: which tools are out there and how to choose the right one for your needs, common terms used in the field, resources for learning how to get started mapping, and where to go to find data to create your first project. No experience or special software is required to participate in this workshop. The Zoom link to the workshop will be sent to registrants 24 hours in advance.
Click here to register.
12:00pm-12:50pm Critical and Creative Applications of Data Centric Research: Critical Data Lab Projects in Progress Conversation
Speaker: Cindy Nguyen (UCLA)

Critical Data Lab is a collaborative convergence network with UCLA scholars to explore critical and creative applications within sociocultural data centric research. Through invited speakers and workshopping problems and projects, we seek to cultivate an inclusive community at UCLA of transdisciplinary experimentation built on ethical engagement and thoughtful exchange. This presentation and discussion will share works in progress from some of the projects at UCLA as well as the conversation from our fall convening focused on broader questions related to data curation, production, circulation, and publishing in the realms of teaching and research. Some topics and projects discussed include sustainable and ethical collaborations and reciprocity, multilingual and multimodal data analysis and access, humanistic and literary data, global south and colonial knowledge. Critical Data Lab is convened by Cindy Nguyen and with support from UCLA DataX, Institute for Digital Research and Education, and Office of Advanced Research Computing.
1:00pm-1:50pm Historical Maps as Data (and how to use them)
Speaker: Maggie Tarmey (UCLA)

Historical maps are fascinating objects and often beautiful to look at. However, they are often underutilized as research tools. Did you know that maps are data visualizations? In this 50-minute session, we’ll discuss the hows and whys behind historical maps as data and then discuss how you can use them in your own research. No prior experience with maps is required, and this session welcomes those from any and all research backgrounds.
2:00pm-2:50pm To Be or Not To Be: Photogrammetry and Virtual Reality Environments
Speaker: Alvaro Alvarez (UC Riverside), Brendon Wheeler (UC Riverside)

Learn how to store captured 3D data with the help of the CreatR Lab Makerspace and Innovative Media Librarian at the UCR Library. We will go over how to store 3D data and to engage the data with virtual reality to use in personal or academic projects. Learn about scene making in Unity and how to successfully setup environments without overburdening your resources.
Click here to register.
3:00pm-3:50pm OSPOs in Higher Ed Discussion Group
Speaker: Tim Dennis (UCLA), Reid Otsuji (UC San Diego), Todd Grappone (UCLA), David Minor (UC San Diego)

Open Source Program Offices (OSPOs) are transforming how universities support open-source innovation. With support from the Sloan Foundation, UC campuses are building a first-of-its-kind multi-campus OSPO network inspired by UCSC's leadership and collaboration with UCLA, UCSD, UCB, UCD, and UCSB. This session invites researchers to explore how "open" practices intersect with their work, from open science to open educational resources. We'll also discuss opportunities to collaborate and build a vibrant, inclusive OSPO network across the UC system.

Wednesday, Feb 12, 2025

Time
Activity
10:00am-10:50am Safeguarding Privacy in Qualitative Research with QualiAnon
Speaker: Renata Curty (UC Santa Barbara)

Join us for practical tips on qualitative data de-identification and strategies to tackle the unique challenges researchers face when working with human subject data. This session will feature QualiAnon, an open-source tool designed to help researchers securely anonymize and pseudonymize sensitive qualitative data with precision and flexibility. Unlike automated tools that rely on blind substitutions, QualiAnon gives you full control over how and when data is anonymized, allowing you to manually define replacement rules for specific datasets or variables. This approach ensures a higher level of accuracy and privacy by stripping identifying information without compromising the richness of the data. In this session, we’ll demonstrate how QualiAnon can streamline your data protection efforts, improve your workflow, and ensure compliance with privacy standards. Don’t miss the chance to discover how this tool can simplify the de-identification process while safeguarding sensitive information in your research!
Click here to register.
11:00am-11:50am DataPlanet: Data for Data Science
Speaker: Jingbo Shang (UC San Diego), TBD (UC San Diego)

Coming soon!
Click here to register.
12:00pm-12:50pm Quarto Showcase
Speaker: Greg Janée (UC Santa Barbara), Julien Brun (UC Santa Barbara), Sam Csik (NCEAS), Renata Curty (UC Santa Barbara)

Quarto is a publication tool built into the RStudio multilanguage data analysis and visualization platform, often used in conjunction with code notebooks. Quarto enables a more reproducible approach to data-driven research by integrating documentation, code, and results in one document. But it’s also a great publication platform on its own! In this session a panel of Quarto users will showcase what Quarto is capable of, especially when used in concert with GitHub Pages hosting, including creating websites (course websites, personal websites, blogs), presentations, dashboards, academic papers, and books. We will also demonstrate, in real-time, how easy it is to create such research deliverables. Prepare to be sold on using Quarto!
Click here to register.
1:00pm-1:50pm Oral History Methodology for the Qualitative Researcher
Speaker: Ann Glusker (UC Berkeley)

Qualitative research encompasses many methodological approaches, and one of the most powerful is oral history. As described by the UC Berkeley Oral History Center, this approach involves producing "first-person narrative topical and life histories that explore the narrator’s understanding of events through a recorded interview...[this] results in a lightly edited transcript of the interview that the narrator reviews, which is then published." This differs from the more familiar qualitative research interview, in that it is a two-directional, and transparent process... which comes with its own challenges. In this workshop you will learn more about these synergies and contrasts between and within methods, hear more about best practices for conducting oral history projects, and have time to ask questions about project ideas of your own.
Click here to register.
2:00pm-3:50pm Distilling a research domain with bibliometric analysis in R
Speaker: Colton Baumler (UC Davis), Pamela Reynolds (UC Davis)

Coming soon!
4:00pm-4:50pm Get credit for your work with ORCID
Speaker: Megan Van Noord (UC Davis), Pamela Reynolds (UC Davis)

This session discusses how to curate your scholarly identity and increase your research impact using ORCID, a tool for connecting researchers with their scholarship and collaborators across disciplines, borders, and time.

Thursday, Feb 13, 2025

Time
Activity
10:00am-10:50am Integrating SQL into R analytical workflows using duckDB & dbplyr
Speaker: Julien Brun (UC Santa Barbara), Greg Janée (UC Santa Barbara)

In this workshop, we will discuss why you might consider relational databases to store research data. We will go over how to insert data in and retrieve data from a database using R and duckDB. We will focus on how to use the R dbplyr package to integrate databases into tidyverse-focused analytical workflows. This workshop will cover three main points:
  • Introduce concepts of a database and discuss why you might want to have/want to use one
  • How to integrate the use of a database into an R analytical workflow
  • Hands-on exercise using duckDB, dbplyr and how it can be used to learn some SQL basics.

Click here to register.
12:00pm-12:50pm AI Readiness, AI Reproducibility, and Data Stewardship
Speaker: Christine Kirkpatrick (San Diego Supercomputer Center)

This session will discuss lessons learned and emerging community practices as they relate to AI Readiness and AI Reproducibility. Insights into ways to leverage Gen-AI and LLMs for data stewardship will be shared, as well as open research questions, and gaps in practices at the intersection of AI and data. Please bring your own practices and insights to the session so they can inform the NSF-funded FAIR in Machine Learning, AI Readiness, (AI) Reproducibility Research Coordination Network (FARR).
Click here to register.
1:00pm-2:30pm From "Could we…?" to "Should we…?": Developing a critical Framework for Engaging Artificial Intelligence
Speaker: Shelby Hallman (UCLA), Alexandra Solodkaya (UCLA), Ashley Peterson (UCLA)

No matter whether you love it, fear it, or are sick of hearing about it, artificial intelligence impacts our lives. In this workshop, we will go beyond what AI tools do (or their creators say they do) and take a hard look at the broader implications of developing and using artificial intelligence. We will explore the power structures that inform the human labor behind AI, as well as the impacts of AI development and use on natural ecosystems, systematic oppression, and information economies. We will also discuss the agency we retain as students, teachers, and lifelong learners to co-create critical frameworks that challenge the inevitability of robot overlords.
3:00pm-3:50pm Sharing and Locating Research Data in Humanities and Social Science Studies
Speaker: Tianji Jiang (UCLA)

Sharing and reusing data can effectively reduce redundant efforts in data collection. It also enhances the efficiency of scientific research investments by preventing the reinvention of the wheel. Building a sustainable data-reuse process and culture requires frameworks that include policies, standards, roles, and responsibilities, all of which must address the diverse needs of data providers, curators, and (re)users alike. A critical step in the data sharing and (re)use cycle involves researchers depositing their research data into data-sharing infrastructures, keeping their data in a sharable and usable form, making other researchers aware of their data, and identifying data that are (re)usable for their needs. In this session, I aim to introduce some ideas and techniques for preparing research data for sharing. I will also introduce 2 to 3 data repositories ideal for researchers depositing their data and seeking (re)usable data for their studies. Additionally, I will offer several tips and strategies for identifying and utilizing (re)usable data effectively. The session is structured to last approximately 50 minutes, comprising a 40-minute presentation followed by a 10-minute question-and-answer segment.
4:00pm-4:50pm Visualizing Islamic Manuscripts data at UCLA Library
Speaker: Saad Shaukat (UCLA)

UCLA library holds one of the largest collections of Islamic manuscripts (roughly 10,000 items) covering multiple languages including Arabic, Persian, Turkish, and Urdu, and spanning nearly 9 centuries (12th-20th). Currently, UCLA Library Special Collections is in the process of a multi-year effort to create digital records for its Islamic Manuscript holdings. At HumTech, our team is engaged in a project to visualize the cataloged manuscripts through Tableau. The goal of this project is to represent a birds eye view of the Islamic manuscript holdings for interested researchers. Through representation of multiple data points that the library cataloging process captured from the manuscripts, the visualization hopes to provide easy access to researchers for dissecting the data in multiple ways and helping them in drawing useful insights into the Islamic manuscript holdings at UCLA. In this presentation, we will be discussing the challenges that we faced in data cleaning, wrangling and visualization that are specific to manuscript collections, especially one that is primarily in non-Latin script and spans to the pre-modern period.

Friday, Feb 14, 2025

Time
Activity
10:00am-11:50am Finding Geospatial Data
Speaker: Michele Tobias (UC Davis)

Coming soon!
12:00pm-12:50pm Data in Context: Strategies for Evaluating and Utilizing Existing Datasets for Research
Speaker: Wasila Dahdul (UC Irvine), Pamela Reynolds (UC Davis)

Join us for a workshop designed to equip you with the skills to responsibly assess and use existing datasets for new research. Using the U.S. National Parks Visit Data as a real-world example, we will consider key issues in reusing data, including historical context, data collection methods, and ethical aspects of the dataset. This workshop will underscore the importance of documentation, provenance, and context in utilizing existing datasets for research, and participants will gain skills in evaluating data for reuse and suitability for their work.
Click here to register.
1:00pm-2:50pm Tableau Workshop
Speaker: Madison Bautista (UCLA)

Learn how to utilize Tableau Prep Builder and Tableau to showcase your data. From data cleaning, hiding blank fields, to data filtering, Tableau Prep can be used to organize your data more efficiently. Once the data is clean, Tableau can be used to visualize the data ranging from charts, graphs, and even GPS coordinates. This small workshop will show you how anyone can easily use Tableau Prep and Tableau for data organization and analysis.
3:00pm - 4:15pm Unlocking image, audio, and video data in the Industry Documents Library: a Python based, open source stack for audio transcription, text extraction, sentiment analysis, and topic classification.
Speaker: Geoffrey Boushey (UC San Francisco), Kate Tasker (UC San Francisco)

The Industry Documents Library is a digital archive of documents created by industries which influence public health, hosted by the University of California, San Francisco Library. This archive contains millions of video, audio, and image files from the tobacco, opioids, fossil fuel, drug, and food industries, including advertisements, legal depositions, internal marketing documents, public health campaigns, and other historical records. This session will start with a presentation and overview of the contents of the IDL and search interface. Next, we will introduce a python based, open-source stack researchers can use to analyze, transcribe, and categorize data in IDL video, audio, and image files. Although participants will have an opportunity to try out these technologies during the workshop, the primary focus will be an overview of available tools and data, and participation in the programming sections is optional.
Click here to register.