Agenda

Monday, Feb 9, 2026

Time
Activity
11:00am-11:50am Intro to for Data Cleaning
Speaker: Megan Van Noord (UC Davis)

Learn how to clean, standardize, and transform messy data using OpenRefine, a powerful open source tool for working with large or inconsistent datasets. This hands on session will introduce key features such as faceting, clustering, data transformation, and enrichment. OpenRefine runs locally in your web browser and works especially well with CSV and TSV files.

What You’ll Learn
  • How to explore and assess a dataset
  • Techniques for identifying and resolving inconsistencies
  • Ways to split, transform, and standardize data
Prerequisites
  • Install OpenRefine before the workshop
  • Basic familiarity with Excel or Google Sheets
Who Should Attend

Researchers, students, librarians, and anyone who works with structured data and wants faster, more reliable cleanup workflows.
Click here to register.

12:00pm-12:50pm Five Ways to Love Your Data
Speaker: Stephanie Labou (UC San Diego), Reid Otsuji (UC San Diego), Kat Koziar (Fresno State Library)

Kick off UC Love Data Week with a fun and interactive session about five of the ways you can love your data. Learn how to write love notes to your future self in the form of robust documentation, how to share the love through citation and licensing, and other ways to give your data the quality time and attention it deserves. Whether you’re an undergraduate with a couple of spreadsheets, or a researcher with GBs of data, this session will provide useful tips and tricks to ensure a healthy long term relationship with your data.
Click here to register.
1:00pm-1:50pm One Metric to Fool Yourself - A Cautionary Tale in Machine Learning Evaluation
Speaker: Emil Hvitfelt (Posit)

When fitting a model, statistical or machine learning, we often want to evaluate its performance. We have a wealth of different methods for all types of scenarios, from classification and regression to survival analysis. While these performance metrics work as intended, we can often get more out of models by carefully combining and using these metrics to capture what we really care about in our models. Optimal performance and minimal bias.

UC Love Data Week guest speaker: Emil Hvitfelt from Posit.

2:00pm-2:50pm How to Use Data Repository Finders
Speaker: Sadie Davenport (UC Davis), Emily Atkinson (UC Davis)

Choosing a repository is an important task that can impact the discoverability and long-term maintenance of your research data. There are several interactive tools designed to help with the selection process. These include filterable repository 'finders,' which allow researchers to identify and compare relevant repositories. In this session, learners will learn when to use repository finders, receive demonstrations of two repository finders, and participate in evaluating repositories based on their characteristics. Learners will receive a summary handout after attending this session.
Click here to register.
3:00pm-4:00pm UC Loves UC Data
Speaker: Stephanie Labou (UC San Diego), Wasila Dahdul (UC Irvine), Jade Yonehiro (California Digital Library)

Have you ever wondered:
  • How many incoming UC freshman are California residents?
  • How many students are majoring in mechanical engineering at UC San Diego?
  • How many patients are served by UC’s six medical centers?
  • What country does UC publish more research than annually?
These questions, and many others, are answerable – if you know where to find the data! In this workshop, we'll learn how to find and access raw data and aggregate statistics, using UC-specific questions as examples. Attendees will leave with a better understanding of where to locate UC data, levels of data availability, common data privacy considerations, and how UC librarians can help you get data!
Click here to register.

Tuesday, Feb 10, 2026

Time
Activity
10:00am-10:50am Can AI Tell a Cigarette from a High Tech Gadget?
Speaker: Geoffrey Boushey (UC San Francisco)

Modern GenAI vision models can describe scenes and identify objects impressively well, but their classifications can still shift depending on how they are prompted. This talk explores a GenAI system’s ability to detect smoking imagery in a visually busy ’80s sci-fi video by breaking the video into 1-second frames and evaluating two prompts: one that encourages detection and one that emphasizes neutrality. Multiple runs of each prompt are compared to examine changes in precision, recall, accuracy, and overall consistency. The session offers a practical look at how prompt phrasing and model variability affect real-world use cases, and what to keep in mind when using GenAI tools for content analysis.
Click here to register.
11:00am-11:50am Publishing a Quarto website on GitHub
Speaker: Julien Brun (UC Santa Barbara), Greg Janée (UC Santa Barbara)

Are you looking for a simple, professional way to showcase your research, CV, and/or complement your publications with more interactive content? What if we were to tell you you can create a free website with Quarto & GitHub pages to promote your wokr? Join us for a hands-on workshop on building websites using Quarto and GitHub Pages. If you can write in Markdown, you can build a website — no web development knowledge is needed. In this session, you will learn how to turn your analysis and writing into a sleek website hosted for free on GitHub. Whether you use R, Python, or just plain text, this workflow allows you to maintain a dynamic academic portfolio with minimal effort.
Click here to register.
11:00am-11:50am Web Scraping, Crawling and Spoofing [Part 1]
Speaker: Jake Anderson (UCLA)

This workshop connects applied social science research with practical data skills in a two-part format. The first part presents a large-scale research project on discrimination in the labor market, focusing on how novel data are accessed, constructed, and managed to answer substantive empirical questions. The second part is a hands-on session on web scraping with Selenium, where participants learn how to responsibly collect structured data from public websites using reproducible workflows. The session emphasizes good data practices, from acquisition and documentation to downstream analysis, and is designed to be accessible to researchers working with a wide range of data types.

Note: This workshop is two parts. The workshop will run for one hour starting at 11:00am, have a one hour break, and resume at 1:00pm for the final hour.

12:00pm-12:50pm Data Rescue Project
Speaker: Lena Bohman (Data Rescue Project)

Public data from governmental agencies like the Bureau of Labor Statistics and Dept. of Education is increasingly at risk. This talk details the Data Rescue Project (DRP), a concerted effort by librarians and archivists to preserve at-risk public U.S. governmental data. We will cover the DRP's progress as a data rescue clearinghouse: running volunteer training, utilizing ICPSR's Data Lumos repository, and operating the Data Rescue Portal to track rescued datasets. Learn what we are doing and how you can contribute to public data preservation efforts. Speaker: Lena Bohman - Steering Committee Member, Data Rescue Project
Click here to register.
1:00pm-1:50pm Web Scraping, Crawling and Spoofing [Part 2]
Speaker: Jake Anderson (UCLA)

This workshop connects applied social science research with practical data skills in a two-part format. The first part presents a large-scale research project on discrimination in the labor market, focusing on how novel data are accessed, constructed, and managed to answer substantive empirical questions. The second part is a hands-on session on web scraping with Selenium, where participants learn how to responsibly collect structured data from public websites using reproducible workflows. The session emphasizes good data practices, from acquisition and documentation to downstream analysis, and is designed to be accessible to researchers working with a wide range of data types.

Note: This workshop is two parts. The workshop will run for one hour starting at 11:00am, have a one hour break, and resume at 1:00pm for the final hour.

1:00pm-1:50pm Prototyping Data Solutions with Arduino
Speaker: Brendon Wheeler (UC Riverside)

This event will follow a hands-on introductory workshop with Tinkercad on how to use the Arduino microcontroller and how data plays a key role in its usefulness. Attendees will learn how to use the Arduino microcontroller with code and simulations that will allow attendees to create research projects with a little imagination. This workshop will be recorded and hands-on.
Click here to register.
2:00pm-2:50pm Collecting Cultural Heritage Data with the DPLA API
Speaker: Jairo Melo (UC Santa Barbara)

Are you wondering if there’s a better way to collect information from digital archives, libraries, and museum collections than downloading items one by one? In this workshop, we will explore how to take advantage of API services provided by cultural institutions using Python to collect and structure data and digital artifacts. Using open cultural heritage data from the Digital Public Library of America (DPLA), participants will retrieve a sample of digital objects and relevant metadata, simulating a real research scenario. The session introduces key concepts such as pagination, filtering, and basic data inspection, with a focus on realistic data collection workflows.
Click here to register.
2:00pm-2:50pm Accessible Data for Everyone
Speaker: Kristin Briney (CalTech Library)

How do you format a dataset so it can be understood by a blind person, deaf person, or other disabled individual? This question is increasingly relevant given new requirements from the US Department of Justice to make all public university websites and web content accessible to people with disabilities in the next year. This workshop reviews why accessible research data matters, general accessibility principles, and how to make various data files (spreadsheets, images, etc.) more accessible for everyone.
Click here to register.

Wednesday, Feb 11, 2026

Time
Activity
10:00am-10:50am How I Slacked at Work and Accidentally Built an Automated, AI Driven ETL Pipeline
Speaker: Beth Tweedy (UC Davis)

Learning new technologies like Artificial Intelligence can feel risky when you are working with critical research data. It started when I was daydreaming at work about my Dungeons & Dragons campaign and realized the best way to learn these tools (for me) was to take the 'work' out of the equation.

Join me for a case study on how I used a personal D&D database as a 'sandbox' to learn high-impact data skills. By working with data I knew intimately (and didn't mind breaking), I was able to experiment with AI tools, fail safely, and eventually build a working automated pipeline. This isn't a talk about being a coding genius; it's about how a librarian harnessed curiosity and personal passion to gain a better understanding of how these often intimidating black-box tools actually work.

This session is designed for:

  • Researchers, Scholars, and Practitioners interested in 'Ground Truth' testing—using known data to safely audit the strengths, weaknesses, and limits of AI tools before trusting them with your scholarship.
  • Librarians who want a practical look at how a researcher might approach the steep learning curve of this landscape. This session offers a viable path for building the technical vocabulary to translate user needs to technical experts/developers as well as applying these skills to your own work.
  • Technical Experts & Data Scientists who support research. By watching a 'research hacker' navigate the messy reality of AI integration, you will see how users who 'know enough to be dangerous' actually approach these problems—offering insights on how to better anticipate pitfalls and guide the campus community.

11:00am-11:50am Moving Your Data Around with RClone
Speaker: Jamie Jamieson (UCLA), Tim Dennis (UCLA)

Coming soon!
12:00pm-12:50pm Making Research Software Citable and Discoverable: Practical FAIR4RS Wins
Speaker: Tim Dennis (UCLA), Laura Langdon (UC Davis), Reid Otsuji (UC San Diego), Leigh Phan (UCLA)

This workshop is intended for researchers, graduate students, and librarians who work with research code and want practical ways to make software easier to find, cite, and reuse.
1:00pm-2:30pm Wikipedia Editathon
Speaker: Bee Lehman (UC Berkeley), Ann Glusker (UC Berkeley)

Wikipedia is among the most visited websites in the world, but even Wikipedia's staff acknowledge that its content reflects its editorship (75.6 percent of US editors are white and 90 percent are male). These editors decide which topics are 'notable' and therefore worthy of inclusion, which often excludes genderqueer and female individuals and topics from the global South. You can be part of the solution! Join us on February 11 on Zoom for a tutorial for the beginner Wikipedian, then edit some Wikipedia entries yourself. Sign up for a free Wikipedia account in advance (optional). You may bring topic ideas or articles you'd like to edit, or we'll help you find a way to make a difference in an area of interest to you.
2:00pm-2:50pm Publication-Ready Data Tables in R
Speaker: Stephanie Labou (UC San Diego), Julien Brun (UC Santa Barbara)

Are you tired of spending your time manually formatting tables for article submission? Does the idea of pasting regression coefficients from R to Excel make you wince? This session will demo how to make publication-quality (and reproducible!) data tables in R. We'll use the `gt` package to showcase options for table customization and the `gtsummary` package for ways to combine model outputs with table creation. Note that this session assumes a working knowledge of R; we will dive straight into main table content without a general overview of R.
Click here to register.
3:00pm-3:50pm Introduction to Public Health Data
Speaker: Elisa Cortez (UC Riverside)

Are you looking for public health data but don't know where to start? Join us for a practical guide on locating the specific public health data and information you need.
Click here to register.

Thursday, Feb 12, 2026

Time
Activity
10:00am-11:50pm Visualizing Data with Ggplot for R Users
Speaker: Pamela Reynolds (UC Davis), Nick Ulle (UC Davis)

If a picture tells a thousand words then a data visualization can convey a million! Join us to learn how to apply data visualization principles and best practices to generate compelling plots with ggplot in R. Through a series of case studies, we'll make static plots and practice iterating to improve upon them and the data stories they tell.

This workshop is for current R users who want to expand their plotting skills, and thus is not an introduction to the R programming langauge. It is part of DataLab's 'Intermediate R' training series, which you can learn more about on our training website.

After completing this workshop, learners should be able to:

  • write code to generate plots in R using the ggplot2 package
  • develop various different plot types (dot plots, histograms, bar charts, boxplots, time series, etc.)
  • employ ggplot's grammar of graphics syntax to customize plots,
  • effectively use layers and aesthetics to make plots more informative and meaningful (including adding groupings, scaling, color, faceting, labels, and trendlines)

This workshop is offered in person and as a virtual broadcast as part of UC Love Data Week. Local learners are encouraged to attend in person. All members of the University of California system are welcome to register. Advance registration required.

Prerequisites

This workshop is intended for motivated learners from all domains at UC Davis and the other UCs who want to hone their existing R skills. Thus, we won't cover introductory R questions during this session. Participants should be comfortable with basic R syntax, and have the latest version of R and RStudio pre-installed and running on their laptops. If you're new to R, or want to brush up, we suggest you start with our 'R Basics' workshop curriculum.
Click here to register.

12:00pm-12:30pm From Static to Dynamic: Exploring Interactive Data Visualization
Speaker: Xiuqi "Jade" Li (UC Santa Barbara), Julien Brun (UC Santa Barbara)

Most of us are used to making figures that are meant to be read, not explored – plots designed for PDFs and journal pages. Interactive graphics open up new possibilities by letting viewers zoom in, filter, and engage with data in ways static figures cannot. In this session, we’ll explore what makes a visualization interactive, when it’s worth using, and how simple interactions can add insight. We’ll look at a few short examples using tools such as DataTables, Plotly, and Shiny. You will leave with practical tips on creating, sharing, and sustaining interactive graphics.

No prior experience with interactive visualization is required.

Note: This session is part of Data to Discovery, a biweekly virtual series hosted by the UCSB Library. This series highlights practical tools and core principles for working with data and engaging in open science. Each 30-minute session is designed to be concise, focused, and practical.


Click here to register.
1:00pm-1:50pm Getting Started with Raspberry Pi and the Sense HAT
Speaker: Alvaro Alvarez (UC Riverside)

This hands-on workshop introduces beginners to the Raspberry Pi and its versatile Sense HAT add-on. You'll learn the essentials of Python programming to control an LED matrix and collect real-world, high-quality data on temperature, humidity, and device orientation (gyroscope). These foundational skills are crucial for building academic and research projects, from environmental monitoring to robotics and physics experiments. No prior coding experience is necessary!
Click here to register.
2:00pm-2:50pm Introduction to Data Cleaning with Python and Pandas
Speaker: Barbara Martinez Neda (UC Riverside)

Join us to learn how to transform messy, raw spreadsheets into clean, usable data. This workshop will guide you through the essentials of the Pandas library, a key tool for data manipulation. By the end, you’ll be able to load messy files, understand the structure of a DataFrame, and produce the clean data needed for analysis and visualization.
Click here to register.
2:00pm-3:50pm Building Decision Support Web Apps with R-Shiny
Speaker: Andy Lyons (UC ANR)

Although R is an incredibly powerful programming language for data analysis and decision support, it is not widely known among the general public. The Shiny package enables R users to turn their data analysis code into web applications with user-friendly interfaces that are easy to deploy. This hands-on workshop will provide an introduction to Shiny including the principles of reactive programming, fundamentals of UI design, and deployment options. By the end of the workshop, participants will be able to convert a spreadsheet based model into a Shiny web app. Some experience with R is expected, see registration page for details.

Prequesites

Some experience with R is expected (at least enough to run prepared scripts). The hands-on exercises (recommended but not required) will require participants to have a free account on Posit Cloud (i.e., so you can run prepared scripts in RStudio in a browser). Experienced R users are welcome to run the exercises in RStudio Desktop or Positron, but support will be limited. Parts of the workshop will require you to follow along as the instructor demonstrates code to solve a problem. It is therefore highly recommended that you have a computer with two monitors, so you can view the instructor's screen on one monitor, and work in RStudio in the other. Participants who only have a small laptop with no external monitor may find it challenging to complete the hands-on exercises, and may just want to watch.
Click here to register.

Friday, Feb 13, 2026

Time
Activity
10:00am-10:50am AI-Assisted Data Visualization and Dashboards
Speaker: Raymond Uzwyshyn (UC Riverside)

This interactive seminar introduces participants to the transformative potential of AI-assisted data visualization. Drawing on emerging practices in 'vibe coding'—an intuitive, natural language approach to AI-assisted development—attendees will explore best practices for generating sophisticated, data-driven dashboards through prompting and iterative collaboration with AI. The session will overview key principles for effective interactive data visualization, demonstrate practical workflows for AI-assisted coding, and showcase real-world applications ranging from exploratory data analysis to publication-ready research dashboards. Participants will learn how to leverage AI tools to rapidly prototype visualizations and iterate design concepts while maintaining analytical rigor and aesthetic sophistication. Whether you're looking to extract deeper insights from existing datasets, build interactive dashboards for research, or develop comprehensive analytic framework for ongoing work, this seminar will equip you with practical strategies for integrating AI assistance into your data visualization workflows. No advanced coding experience required—just curiosity about how AI can augment your data practice.
Click here to register.
11:00am-11:50am Critical Integration of GenAI in Qualitative Analysis
Speaker: Renata Curty (UC Santa Barbara)

Join us for an exploration of how Generative AI can be strategically integrated into qualitative research workflows. This session introduces the Guided AI Thematic Analysis (GAITA) framework, a structured approach to using AI for coding and theme development while upholding rigorous analytical standards. Participants will then experience a live demonstration of an AI assistant embedded in QualCoder, showcasing how AI can support qualitative data analysis without replacing researcher judgment. Attendees may follow along step by step or observe the workflow in action, gaining practical insights and best practices for critically and effectively incorporating AI tools into qualitative research.
Click here to register.
1:00pm-2:30pm Deep Learning with Drone Imagery in ArcGIS Pro
Speaker: Maggi Kelly (UC ANR), Shane Feirer (UC ANR)

Drone images have allowed researchers to investigate novel questions due to their incredible spatial resolution. Ironically however, the level of detail in drone imagery renders traditional image analysis methods largely ineffective. This webinar will describe how Deep Learning methods can be used for object detection and segmentation in high resolution imagery. A workflow for doing Deep Learning in ArcGIS Pro will be explained, with a comparison of the results of two commonly used Deep Learning algorithms (MASK R-CNN and SAM) applied to drone imagery of orchard trees taken at different times of the year.
Click here to register.
2:00pm-2:50pm Measuring Research Impact with OpenAlex: A Data-Driven DIY Approach
Speaker: Jing Han (UC Riverside)

This session begins with an introduction to OpenAlex, then guides participants through exploring the web interface and working with the API to access, clean, and interpret open data to measure institutional research impact.
Click here to register.