Agenda

Monday, Feb 10, 2025

Time	Activity
10:00am-10:50am	REAP: Retrieval Enhanced AI for Research Protocol Extraction and Submission Speaker: Ramya Sri Baluguri (UC Davis Health Clinical and Translational Science Center) Join us for a novel research use case to design a clinical trial protocol extraction workflow using pre-trained Large Language Models (LLM) and Retrieval Augmented Generation (RAG).
11:00am-11:50am	Tips and Tools for Dealing with Large Datasets at the Command Line Speaker: Leigh Phan (UCLA), Kristian Allen (UCLA) Do you lie awake at night waiting for your R script to finish? Is your laptop running out of memory due to merging million row data frames? Do you panic at the thought of opening your data file in Excel? This presentation is for you! We'll cover some low hanging fruit and basic approaches to squeeze the most out of your local machine when dealing with larger data sets. We will walk through some efficient command line tools and file formats to get the most out of your laptop.
1:00pm-1:50pm	R Clone: What is it? Why do I need it? Speaker: Jamie Jamison (UCLA) Rclone is a command-line tool that allows users to transfer and synchronize files between cloud storage services, servers, and local machines. Rclone is a cross-platform application that runs on Linux, Windows and MacOS.
12:00pm-1:30pm	Wikipedia Editathon Speaker: Corliss Lee (UC Berkeley), Bee Lehman (UC Berkeley) Wikipedia is among the most visited websites in the world, but even Wikipedia's staff acknowledge that its content reflects its editorship (75.6 percent of US editors are white and 90 percent are male). These editors decide which topics are 'notable' and therefore worthy of inclusion, which often excludes genderqueer and female individuals and topics from the global South. Read more in our blog post, The Bias of Notability in Wikipedia.You can be part of the solution! Join us on February 10 on Zoom for a tutorial for the beginner Wikipedian, then edit some Wikipedia entries yourself. Sign up for a free Wikipedia account in advance (optional). You may bring topic ideas or articles you'd like to edit, or we'll help you find a way to make a difference in an area of interest to you.
2:00pm-2:50pm	Unlocking Data: A Gentle Introduction to APIs Speaker: Jairo Melo (UC Santa Barbara), Jose Nino (UC Santa Barbara) APIs (Application Programming Interfaces) serve as powerful tools for accessing and interacting with data in a structured and automated manner, providing capabilities far beyond what is possible through traditional interfaces. Despite their ubiquity in software development, APIs remain underutilized in humanities research, where they offer immense potential for innovative data collection and analysis. In this session, we’ll demystify APIs and showcase their potential through the Library of Congress API. Participants will see step-by-step how to identify if a collection has API access, construct requests, and interpret responses, all with tools like Postman. We’ll walk through concepts like authentication, endpoints, parameters, and how to transform JSON responses into tabular data for visualization. This session is designed as a show-and-tell introduction—perfect for researchers and students with no prior coding or data analysis experience. By the end, attendees will have a clear understanding of how APIs work, their practical applications, and how to start integrating them into their research workflows. Get ready to see what APIs can do for your projects!
3:00pm-4:50pm	Working with Large Language Models (LLMs) on Your Laptop Speaker: Carl Stahmer (UC Davis), Pamela Reynolds (UC Davis) Avoid the privacy issues of sending prompts to cloud based LLMs (like ChatGPT) by installing and managing a local instance of OLLAMA, an open source downloadable LLM and API. We’ll set up a chatbot and practice directly querying the LLM for generative outputs, text clustering and classification.

Tuesday, Feb 11, 2025

Time	Activity
10:00am-10:50am	Data Research Repositories, Digital Scholarly Ecosystems, Open Science & AI Possibilities Speaker: Ray Uzwyshyn (UC Riverside), Brendon Wheeler (UC Riverside) Learn about the research possibilities through the lens of the UCR Library Research Services Department. Topics such as data repositories, digital scholarship ecosystems and research with AI will be discussed to help illustrate relationships between data and artificial intelligence.
11:00am-11:50am	GIS & Mapping: Where to Start Speaker: Susan Powell (UC Berkeley) Interested in digital mapping and GIS (geographic information science), but not sure where to start? Have some experience, but want to learn more about what the campus has to offer? This virtual workshop is for you! We'll provide an overview of the GIS and digital mapping landscape as a whole, including: which tools are out there and how to choose the right one for your needs, common terms used in the field, resources for learning how to get started mapping, and where to go to find data to create your first project. No experience or special software is required to participate in this workshop. The Zoom link to the workshop will be sent to registrants 24 hours in advance.
12:00pm-12:50pm	Critical and Creative Applications of Data Centric Research: Critical Data Lab Projects in Progress Conversation Speaker: Cindy Anh Nguyen (UCLA), Meredith Cohen (UCLA), Anna Robinson-Sweet (UCLA) Critical Data Lab is a collaborative convergence network with UCLA scholars to explore critical and creative applications within sociocultural data centric research. Through invited speakers and workshopping problems and projects, we seek to cultivate an inclusive community at UCLA of transdisciplinary experimentation built on ethical engagement and thoughtful exchange. This presentation and discussion will share works in progress from some of the projects at UCLA as well as the conversation from our fall convening focused on broader questions related to data curation, production, circulation, and publishing in the realms of teaching and research. Some topics and projects discussed include sustainable and ethical collaborations and reciprocity, multilingual and multimodal data analysis and access, humanistic and literary data, global south and colonial knowledge. Presentations of projects by Cindy Anh Nguyen, Meredith Cohen, and Anna Robinson-Sweet. Critical Data Lab is convened by Cindy Nguyen and with support from UCLA DataX, Institute for Digital Research and Education, and Office of Advanced Research Computing.
1:00pm-1:50pm	Historical Maps as Data (and how to use them) Speaker: Maggie Tarmey (UCLA) Historical maps are fascinating objects and often beautiful to look at. However, they are often underutilized as research tools. Did you know that maps are data visualizations? In this 50-minute session, we’ll discuss the hows and whys behind historical maps as data and then discuss how you can use them in your own research. No prior experience with maps is required, and this session welcomes those from any and all research backgrounds.
2:00pm-2:50pm	To Be or Not To Be: Photogrammetry and Virtual Reality Environments Speaker: Alvaro Alvarez (UC Riverside), Brendon Wheeler (UC Riverside) Learn how to store captured 3D data with the help of the CreatR Lab Makerspace and Innovative Media Librarian at the UCR Library. We will go over how to store 3D data and to engage the data with virtual reality to use in personal or academic projects. Learn about scene making in Unity and how to successfully setup environments without overburdening your resources.
3:00pm-3:50pm	OSPOs in Higher Ed Discussion Group Speaker: Tim Dennis (UCLA), Reid Otsuji (UC San Diego), Todd Grappone (UCLA), David Minor (UC San Diego) Open Source Program Offices (OSPOs) are transforming how universities support open-source innovation. With support from the Sloan Foundation, UC campuses are building a first-of-its-kind multi-campus OSPO network inspired by UCSC's leadership and collaboration with UCLA, UCSD, UCB, UCD, and UCSB. This session invites researchers to explore how "open" practices intersect with their work, from open science to open educational resources. We'll also discuss opportunities to collaborate and build a vibrant, inclusive OSPO network across the UC system.

Wednesday, Feb 12, 2025

Time	Activity
10:00am-10:50am	Safeguarding Privacy in Qualitative Research with QualiAnon Speaker: Renata Curty (UC Santa Barbara) Join us for practical tips on qualitative data de-identification and strategies to tackle the unique challenges researchers face when working with human subject data. This session will feature QualiAnon, an open-source tool designed to help researchers securely anonymize and pseudonymize sensitive qualitative data with precision and flexibility. Unlike automated tools that rely on blind substitutions, QualiAnon gives you full control over how and when data is anonymized, allowing you to manually define replacement rules for specific datasets or variables. This approach ensures a higher level of accuracy and privacy by stripping identifying information without compromising the richness of the data. In this session, we’ll demonstrate how QualiAnon can streamline your data protection efforts, improve your workflow, and ensure compliance with privacy standards. Don’t miss the chance to discover how this tool can simplify the de-identification process while safeguarding sensitive information in your research!
11:00am-11:50am	DataPlanet: Data for Data Science Speaker: Jingbo Shang (UC San Diego), Shuheng Li (UC San Diego), Alessandro D'Amico (UC San Diego) Join us for two presentations from the fellows of the UC San Diego Data Planet Initiative, a new data sharing resource to make it easier to find and use research-ready datasets. Speaker: Shuheng Li Title: The Collection and Application of MASD, a Multimodal Activity Sensing Dataset Description: This project collects and presents MASD, an open-source Multimodal Activity Sensing Dataset designed to advance machine learning research in human activity recognition (HAR). MASD integrates data collected from three daily sensing devices: Inertial Measurement Unit (IMU) data in smartwatches and smartphones, WiFi Channel State Information (CSI) extracted by routers and body skeleton data captured by a Kinect camera. MASD enables various HAR research directions, including WiFi-based HAR, multimodal integration, sensing domain adaptation, and foundation model development. We highlight its potential by introducing a recently accepted paper that learns skeleton representation for downstream HAR tasks. Speaker: Alessandro D'Amico Title: Towards Large-Scale EEG Data Collection Description: Novel insights into human cognition can benefit from advancements in machine learning and artificial intelligence. However, these modern models are dependent on large volumes of high quality datasets. This talk describes some groundwork conducted on validating low-cost EEG amplifiers for studying human cognition, as well as pilot studies into data collection in non-lab environments in order to explore the feasibility of large-scale EEG data acquisition.
12:00pm-12:50pm	Quarto Showcase Speaker: Greg Janée (UC Santa Barbara), Julien Brun (UC Santa Barbara), Sam Csik (NCEAS), Renata Curty (UC Santa Barbara) Quarto is a publication tool built into the RStudio multilanguage data analysis and visualization platform, often used in conjunction with code notebooks. Quarto enables a more reproducible approach to data-driven research by integrating documentation, code, and results in one document. But it’s also a great publication platform on its own! In this session a panel of Quarto users will showcase what Quarto is capable of, especially when used in concert with GitHub Pages hosting, including creating websites (course websites, personal websites, blogs), presentations, dashboards, academic papers, and books. We will also demonstrate, in real-time, how easy it is to create such research deliverables. Prepare to be sold on using Quarto!
1:00pm-1:50pm	Oral History Methodology for the Qualitative Researcher Speaker: Ann Glusker (UC Berkeley) Qualitative research encompasses many methodological approaches, and one of the most powerful is oral history. As described by the UC Berkeley Oral History Center, this approach involves producing "first-person narrative topical and life histories that explore the narrator’s understanding of events through a recorded interview...[this] results in a lightly edited transcript of the interview that the narrator reviews, which is then published." This differs from the more familiar qualitative research interview, in that it is a two-directional, and transparent process... which comes with its own challenges. In this workshop you will learn more about these synergies and contrasts between and within methods, hear more about best practices for conducting oral history projects, and have time to ask questions about project ideas of your own.
2:00pm-3:50pm	Distilling a research domain with bibliometric analysis in R Speaker: Colton Baumler (UC Davis), Pamela Reynolds (UC Davis) This is the third in a 3-part Modern Practical Scholar workshop series designed to share inclusive, equitable, and sustainable practices for knowledge acquisition. We'll focus on setting up and using free, open source software to swiftly gather, organize, and curate research literature. These workshops are applicable for any level of scholarship, and may be attended as a series or separately. This specific workshop is being offered in person on the UC Davis campus with a Zoom broadcast for remote participants. After this workshop on bibliometrics in R, learners will be able to apply a bibliometric analysis framework to quickly gather the most relevant documents in a domain of science, and specifically Describe the base principles and purpose of using bibliometric analysis Recognize and apply R packages and functions to perform bibliometric analysis Analyze bibliometric data to identify patterns and relationships in a corpus Create and rapidly review a subset of most important documents from a corpus Prerequisites. All members of the UC Davis and UC Love Data Week community are welcome to attend. If you'd like to code along, bring your laptop with internet access and the latest version of R installed. Note that this workshop is not an introduction to R and learners are expected to have already acquired basic fluency in the language.
4:00pm-4:50pm	Get credit for your work with ORCID Speaker: Megan Van Noord (UC Davis), Pamela Reynolds (UC Davis) This session discusses how to curate your scholarly identity and increase your research impact using ORCID, a tool for connecting researchers with their scholarship and collaborators across disciplines, borders, and time.

Thursday, Feb 13, 2025

Time	Activity
10:00am-10:50am	Integrating SQL into R analytical workflows using duckDB & dbplyr Speaker: Julien Brun (UC Santa Barbara), Greg Janée (UC Santa Barbara) In this workshop, we will discuss why you might consider relational databases to store research data. We will go over how to insert data in and retrieve data from a database using R and duckDB. We will focus on how to use the R dbplyr package to integrate databases into tidyverse-focused analytical workflows. This workshop will cover three main points: Introduce concepts of a database and discuss why you might want to have/want to use one How to integrate the use of a database into an R analytical workflow Hands-on exercise using duckDB, dbplyr and how it can be used to learn some SQL basics.
12:00pm-12:50pm	AI Readiness, AI Reproducibility, and Data Stewardship Speaker: Christine Kirkpatrick (San Diego Supercomputer Center) This session will discuss lessons learned and emerging community practices as they relate to AI Readiness and AI Reproducibility. Insights into ways to leverage Gen-AI and LLMs for data stewardship will be shared, as well as open research questions, and gaps in practices at the intersection of AI and data. Please bring your own practices and insights to the session so they can inform the NSF-funded FAIR in Machine Learning, AI Readiness, (AI) Reproducibility Research Coordination Network (FARR).
1:00pm-2:30pm	From "Could we…?" to "Should we…?": Developing a critical Framework for Engaging Artificial Intelligence Speaker: Shelby Hallman (UCLA), Alexandra Solodkaya (UCLA), Ashley Peterson (UCLA) No matter whether you love it, fear it, or are sick of hearing about it, artificial intelligence impacts our lives. In this workshop, we will go beyond what AI tools do (or their creators say they do) and take a hard look at the broader implications of developing and using artificial intelligence. We will explore the power structures that inform the human labor behind AI, as well as the impacts of AI development and use on natural ecosystems, systematic oppression, and information economies. We will also discuss the agency we retain as students, teachers, and lifelong learners to co-create critical frameworks that challenge the inevitability of robot overlords.
3:00pm-3:50pm	Sharing and Locating Research Data in Humanities and Social Science Studies Speaker: Tianji Jiang (UCLA) Sharing and reusing data can effectively reduce redundant efforts in data collection. It also enhances the efficiency of scientific research investments by preventing the reinvention of the wheel. Building a sustainable data-reuse process and culture requires frameworks that include policies, standards, roles, and responsibilities, all of which must address the diverse needs of data providers, curators, and (re)users alike. A critical step in the data sharing and (re)use cycle involves researchers depositing their research data into data-sharing infrastructures, keeping their data in a sharable and usable form, making other researchers aware of their data, and identifying data that are (re)usable for their needs. In this session, I aim to introduce some ideas and techniques for preparing research data for sharing. I will also introduce 2 to 3 data repositories ideal for researchers depositing their data and seeking (re)usable data for their studies. Additionally, I will offer several tips and strategies for identifying and utilizing (re)usable data effectively. The session is structured to last approximately 50 minutes, comprising a 40-minute presentation followed by a 10-minute question-and-answer segment.
4:00pm-4:50pm	Visualizing Islamic Manuscripts data at UCLA Library Speaker: Saad Shaukat (UCLA) UCLA library holds one of the largest collections of Islamic manuscripts (roughly 10,000 items) covering multiple languages including Arabic, Persian, Turkish, and Urdu, and spanning nearly 9 centuries (12th-20th). Currently, UCLA Library Special Collections is in the process of a multi-year effort to create digital records for its Islamic Manuscript holdings. At HumTech, our team is engaged in a project to visualize the cataloged manuscripts through Tableau. The goal of this project is to represent a birds eye view of the Islamic manuscript holdings for interested researchers. Through representation of multiple data points that the library cataloging process captured from the manuscripts, the visualization hopes to provide easy access to researchers for dissecting the data in multiple ways and helping them in drawing useful insights into the Islamic manuscript holdings at UCLA. In this presentation, we will be discussing the challenges that we faced in data cleaning, wrangling and visualization that are specific to manuscript collections, especially one that is primarily in non-Latin script and spans to the pre-modern period.

Friday, Feb 14, 2025

Time	Activity
10:00am-11:50am	Finding Spatial Data Speaker: Michele Tobias (UC Davis) We'll discuss how people approach finding spatial data to identify which methods work for certain kinds of data. Participants should be prepared to discuss their own process and to ask questions. This will be more of a discussion than an instructor-lead workshop. We'll work as a group to document and share what we discover together.
12:00pm-12:50pm	Data in Context: Strategies for Evaluating and Utilizing Existing Datasets for Research Speaker: Wasila Dahdul (UC Irvine), Pamela Reynolds (UC Davis) Join us for a workshop designed to equip you with the skills to responsibly assess and use existing datasets for new research. Using the U.S. National Parks Visit Data as a real-world example, we will consider key issues in reusing data, including historical context, data collection methods, and ethical aspects of the dataset. This workshop will underscore the importance of documentation, provenance, and context in utilizing existing datasets for research, and participants will gain skills in evaluating data for reuse and suitability for their work.
1:00pm-2:50pm	Tableau Workshop Speaker: Madison Bautista (UCLA) Learn how to utilize Tableau Prep Builder and Tableau to showcase your data. From data cleaning, hiding blank fields, to data filtering, Tableau Prep can be used to organize your data more efficiently. Once the data is clean, Tableau can be used to visualize the data ranging from charts, graphs, and even GPS coordinates. This small workshop will show you how anyone can easily use Tableau Prep and Tableau for data organization and analysis.
3:00pm - 4:15pm	Unlocking image, audio, and video data in the Industry Documents Library: a Python based, open source stack for audio transcription, text extraction, sentiment analysis, and topic classification. Speaker: Geoffrey Boushey (UC San Francisco), Kate Tasker (UC San Francisco) The Industry Documents Library is a digital archive of documents created by industries which influence public health, hosted by the University of California, San Francisco Library. This archive contains millions of video, audio, and image files from the tobacco, opioids, fossil fuel, drug, and food industries, including advertisements, legal depositions, internal marketing documents, public health campaigns, and other historical records. This session will start with a presentation and overview of the contents of the IDL and search interface. Next, we will introduce a python based, open-source stack researchers can use to analyze, transcribe, and categorize data in IDL video, audio, and image files. Although participants will have an opportunity to try out these technologies during the workshop, the primary focus will be an overview of available tools and data, and participation in the programming sections is optional.