Agenda

Monday, Feb 8, 2021

Time
Activity
9:00 - 10:00 am Publishing Data at Dryad
Speakers: Wasila Dahdul (UC Irvine), Daniella Lowenberg (California Digital Library)

Dryad is an open data multidisciplinary publishing platform that allows UC researchers to archive and publish their data for free. Join the workshop to learn how to use Dryad to share research data for re-use and to meet publisher and funder data availability requirements. The workshop will also cover how to prepare data and documentation for archiving and to enable research reproducibility.
11:00 am - Noon A Gentle and Friendly Introduction to APIs
Speakers: Tim Dennis (UCLA), Kat Koziar (UC Riverside), Stephanie Labou (UC San Diego), Leigh Phan (UCLA)

Have you wanted to use APIs to gather data, but aren’t quite sure where to start? This is the workshop for you. During this 60 minute workshop, we will cover the basic functionality and terminology of APIs while walking through three examples. We will also describe and share resources available to help you use them. We will demonstrate that APIs are not as big, scary, and inaccessible as they might originally seem.
12:00 - 12:30 pm Hathi Trust Research Center demo
Speaker: Eleanor Dickson Koehl (HathiTrust Research Center)

This session will introduce the HathiTrust Research Center and its tools and services for text and data mining the HathiTrust Digital Library. Come learn how you can get started with computational analysis of the 17 million-volume HathiTrust corpus and hear how other researchers have made use of the HTRC toolkit!
1:00 - 2:00 pm SQL in 60 minutes
Speaker: Colin Jemmott (UC San Diego)

SQL is the most common way to select data or answer questions with databases. It is also straightforward enough that you can learn enough to start doing basic analysis in only an hour! No programming experience is necessary, but you will need access to a keyboard and microphone for this interactive session.
2:00 - 3:00 pm Working with GIS Data
Speaker: Janet Reyes (UC Riverside)

A map in a GIS (geographic information system) may dazzle us with its visual appeal and complexity, but if the data it contains is out-of-date, offset, incomplete, or too generalized, how useful is it?

In this workshop you will learn what to look out for while discovering and managing GIS data. Topics include key parameters of geospatial data, acquiring data, file formats, and including tabular data in a GIS.
3:00 - 5:00 pm Excelling with Excel: Best Practices for Keeping Your Data Tidy
Speakers: Pamela Reynolds (UC Davis), Victoria Farrar (UC Davis)

Have you been frustrated by inconsistent data codes or struggled to import your data from a spreadsheet to a downstream analysis? Ever wonder if there’s a better way to enter and organize your data? This workshop covers best practices for managing and entering your data in Microsoft Excel (and similar software including Google Sheets and LibreOffice). We will talk about how to structure your data project, review case studies of common pitfalls, and practice using built-in tools for data validation to help you get organized and make the most out of your data-driven project.

Tuesday, Feb 9, 2021

Time
Activity
9:00 - 10:00 am Introduction to Locating Secondary Data & Searching Data Repositories: Social Sciences Edition
Speaker: Elizabeth Salmon (UC Merced)

Access to research data is essential for reproducing pre-existing findings, facilitating collaboration, and developing new insights through secondary analysis. As a researcher, how do you locate data sets that are relevant to your research? This workshop will provide an introduction to the data repository landscape and strategies for navigating data sources to discover relevant and usable data with a focus on supporting research in the social sciences.
10:00 am - Noon Geocoding Address Data: Approaches to Personally Identifiable Data
Speaker: Michele Tobias (UC Davis)

Geocoding is the process of estimating a real-world location (typically we think of latitude & longitude) from a postal address. In this workshop, we’ll discuss the concepts needed to geocode data, understand options for working with personally identifiable data and non-identifiable data, and gain some hands-on experience with geocoding address data using QGIS, a GIS with a graphical user interface.
12:00 -12:30 pm ProQuest TDM Studio demo
Speakers: John Dillon (ProQuest), Sara Randall (ProQuest)

ProQuest’s TDM Studio is a text and data mining solution which opens up millions of newspaper articles, dissertations, and primary sources to text and data mining. It provides both Python and R interfaces alongside push-button data visualizations for research as well as teaching and learning. This presentation walk through a few TDM examples and possibilities.
1:00 - 2:00 pm Introduction to Digital Humanities and Humanities Data
Speaker: Rachel Starry (UC Riverside)

Interested in learning where to look for historical, cultural, or other kinds of humanities datasets? This workshop will share resources for discovering textual, visual, spatial, and social network data of use to students and scholars across humanities disciplines. We will also look at a variety of digital tools and methodologies that humanities researchers are using to computationally answer research questions and generate new ones. Whether you are new to Digital Humanities or have been engaged in digital research or teaching for years, join us to share your questions and connect with other UCR community members interested in DH.
2:00 - 3:00 pm Tidy Tuesday: clean up a data set
Host: Tim Dennis (UCLA)

A weekly data project aimed at the R ecosystem. As this project was borne out of the R4DS Online Learning Community and the R for Data Science textbook, an emphasis was placed on understanding how to summarize and arrange data to make meaningful charts with ggplot2, tidyr, dplyr, and other tools in the tidyverse ecosystem. However, any code-based methodology is welcome - just please remember to share the code used to generate the results.
3:00 - 4:00 pm Ethical considerations in data
Speakers: Leigh Phan (UCLA), Ibraheem Ali (UCLA), Erin Foster (UC Berkeley), Stephanie Labou (UC San Diego)

In this one hour interactive workshop, data librarians from UCLA, UCB, and UCSD will host an open discussion around data ethics focusing on case studies of participant data privacy, algorithmic bias, and the social justice components of research. The discussion will also include an overview of related resources available at various UC campuses, including further reading available through the libraries and groups on campus conducting research in these areas.

Wednesday, Feb 10, 2021

Time
Activity
9:00 - 10:00 am Data "Ownership": Rights and Responsibilities
Speaker: Michael Ladisch (UC Davis)

This talk will discuss the nuances of "ownership" and copyright implications for scholarly research data. We'll discuss various types of responsibilities when working with data, and identify relevant tools and services. After this talk researchers should be able to make more informed decisions about sharing and accessing data.
10:00 - 11:00 am Data sharing 101
Speakers: Wasila Dahdul (UC Irvine), Ho Jung Yoo (UC San Diego), Derek Devnich (UC Merced)

Do you need to make your data publicly available to meet funder or publisher requirements? How can you share your data in ways that increase the impact and reproducibility of your research results? Join this workshop to learn best practices for effective data sharing, including how to prepare data and documentation for sharing, and selecting an appropriate data repository. This workshop is part of UC Love Data Week, a week-long offering of presentations and workshops focused on data access, management, security, sharing, and preservation. All members of the University of California community are welcome to attend.
11:00 am - Noon ICPSR presentation
Speaker: Anna Shelton (ICPSR)

Do you need data for the gagillion papers you have to write this quarter? Or maybe you're going up for tenure and need to publish? How about teaching students to use data (but you don't have time to create that lesson plan, we're in a pandemic after all!) Don't worry, ICPSR has you covered. Join us for this presentation to learn about the classroom resources, data training, and over 15,000 datasets that are available to you for free. With data on economics, political science, public health and so much more, there is something data here for you.
12:00 - 12:30 pm Gale Digital Scholar demo
Speaker: Sarah Ketchley (Gale)

This session will provide an overview of the main text mining features of the Gale Digital Scholar Lab, and provide examples of research output using platform tools for topic modeling, sentiment analysis and named entity recognition. The import and export options provided flexibility and extensibility, and the demo will showcase use cases using integrated Gale Primary Source archives, and the researcher’s own uploaded datasets.
1:00 - 2:00 PM Data science student lightning talks
Moderator: Stephanie Labou (UC San Diego)

Personal projects are a common way for data science students to gain experience and build up a project portfolio. Unlike projects for a course or capstone sequence, personal projects are entirely self-directed by students and can be on any topic - and topics can get quite creative! Join us for an hour of lightning talks from UC students as they share their data science focused personal projects, focusing on data access, data cleaning, and any data privacy concerns.
2:00 - 3:00 pm Discovery and Evaluation of Social Science Data Sources
Speaker: David Michalski (UC Davis)

Governments, non-governmental organizations, and social scientists have amassed vast collections of data about the social world. Learn how to discover, evaluate, and access sources available for new analyses and visualizations.
2:00 - 3:00 pm Cybersecurity for Graduate Students in Research

In this session for Love Data Week 2021, UC Berkeley’s Chief Information Security Officer and Research IT staff will talk about current trends in cybersecurity and campus resources to support graduate students and researchers at large. Members of the Berkeley Information Security Office will also discuss best practices for securing devices to keep your information safe.
3:00 - 5:00 pm Introduction to Command Line
Speakers: Tyler Shoemaker (UC Davis), Carl Stahmer (UC Davis)

Learn and practice how to talk directly to your computer via command line. The shell is a very powerful tool for using scientific software and working with large data sets. It is primarily used to manage files and run programs, and it allows for automation of repetitive tasks. No prior coding experience is necessary. This workshop is a prerequisite for DataLab’s workshops on Introduction to Version Control with Git and Reproducible Research for Teams with GitHub.

Thursday, Feb 11, 2021

Time
Activity
10:00 am - Noon Basic Statistics in R (part 1)
Speakers: Wesley Brooks (UC Davis)

This 2-part workshop explores basic, practical applied statistics using the R statistical programming language. On the first day we’ll focus on common procedures like assessing the distribution of your data and calculating differences between groups. In the second day we’ll focus on common linear models (linear regression, ANOVA, etc.) in R. We will also calculate the power of a simple study. This workshop is appropriate for learners with various data types, but will most emphasize continuous numerical data. Prior experience with R and RStudio required.
11:00 am - 1:00 pm Hacky Hours: Drop in and talk about code
Host: Kat Koziar (UC Riverside)

Take your computer coding to the next level. Whether you are a novice interested in learning to code or an experienced programmer comfortable with many languages and platforms, this is the place for you to learn from and share with others. Come talk with us about a variety of coding languages and topics, data management, data cleaning, data visualization, and data science methodologies.
1:00 - 2:00 PM Getting started with LastPass/Veracrypt

Join members of UC Berkeley Information Security Office and Research IT in this one-hour session to learn more about password managers, specifically LastPass Premium, and free tools to use in encrypting your devices.
2:00 - 3:00 pm New NIH Policy for Data Management and Sharing - Get the Scoop!
Speakers: Maria Praetzellis (California Digital Library), Ariel Deardorff (UC San Francisco)

In late 2020 the NIH announced their new Policy for Data Management and Sharing. Starting in 2023, this policy will require all NIH researchers to prospectively plan for how their scientific data will be preserved and shared through submission of a Data Management and Sharing Plan. While they might not be required by the NIH for two years, we encourage researchers to start writing data management plans now as they are a useful tool for planning and tracking research data.This session will give an overview of data management plans (DMPs), dive deep into the new NIH requirements, and share local resources and tools like DMPTool.
3:00 - 4:30 pm Harvesting Twitter data with twarc and best practices for social media research
Speakers: Torin White (UC Santa Barbara), Jon Jablonski (UC Santa Barbara)

twarc is a command line tool and Python library for gathering and working with Twitter JSON data via the public Twitter API. Emerging from the Black Lives Matter and Document the Now movements, twarc comes along with a loose set of community standards that respect the autonomy of the user and aligns with the EU's GDPR and Twitter's terms of service. As part of supporting social media research, the UCSB Library Collaboratory will get you started with what you need to use twarc and present current ethical best practices that we use to orient researchers to social media research.

Friday, Feb 12, 2021

Time
Activity
10:00 am - Noon Intro to Text Mining and NLP for Health Data
Speakers: Wesley Brooks (UC Davis), Arthur Koehl (UC Davis)

This workshop covers an introduction to natural language processing (NLP) and caveats for its application to health data. Using the R programming language we will introduce the basics of text processing and demo how to calculate common metrics including word frequencies, term frequency-inverse document frequency (TFIDF), and principal component analysis (PCA) to explore important words and group similar documents. We will also introduce more advanced NLP topics (sentiment analysis, topic modeling, etc.) and discuss classical versus deep learning approaches, as time permits. Learners with proficient R skills are encouraged to code along.
12:00 - 1:00pm Web scraping Using XPath and Chrome Extension
Speakers: Greg Janée (UC Santa Barbara), Renata Curty (UC Santa Barbara)

Have you ever wanted to harvest data off a website that was not already in an analysis-friendly format? If so, this web scraping workshop is for you! Web scraping can be done entirely manually but it is usually faster, more efficient and less error-prone if automated. UCSB Research Data Services will navigate you through some basic web scraping techniques including how to use XPath and the Chrome web browser with the Scraper extension to extract data from the web with little technical knowlege. We will also discuss the ethical issues of web scraping.
1:00 - 3:00 pm Critical Approach to Data Visualization
Speakers: Lindsay Poirier (UC Davis), Emily Merchant (UC Davis), Pamela Reynolds (UC Davis)

This workshop will unpack the subjective process of data visualization and its relationship to concepts of diversity, equity and inclusion. We'll critically explore how data can be used to uphold and perpetuate, or quantify and demonstrate structural oppression. Through this workshop learners will practice the technique of "data visceralization," the process of experiencing differences in data and understanding them viscerally. This workshop is led by UC Davis DataLab's Data Feminism research and learning cluster, which focuses on thinking about data science and data ethics as informed by the ideas of intersectional feminism. Explore our Data Feminism reading list and activities, here.