Data Education Modules

We’ve collected hundreds of data education modules to help you learn data science & analysis skills. Begin by selecting a module format that works for you:

Webinar recordings

The Lab Down? Skill Up! webinar series in March to June 2020. Take a look at the topics below and get started! Each module includes a video recording (30min-1hr) and may include slides and additional materials.

Browse by topics

Arcus 101

Welcome to Arcus!
What is Arcus? This webinar provides an introduction to Arcus for a wide audience.

Getting started with coding: R

Demystifying R – R for the absolute beginner
No previous experience in R required! This webinar begins your journey with R - what is the R language? Why is it useful in contexts like medical research and operations? How can you get started learning more?

The Art of Learning R: Getting Started in Data with Zero Prior Experience
Leila will help you get started learning R from the beginning - absolutely zero prior experience with data analysis + R required!

Getting started with coding: Python

Demystifying Python – Python for the absolute beginner
No previous experience in Python required! Heard of Python but not sure what it really is or if it would prove useful to you? Join Joy as she walks you through the very basics of what Python is and why you might find it useful in your workflow.

Hello World! Your very first computer program
This is a great webinar for novices to programming! Follow Jeff Pennington in this hands-on session, in which Jeff builds a simple computer program using the Python programming language. No prior experience required.

Getting started with coding: SQL database queries

Intro to SQL with Clinical Data - 101
This lecture is an introduction to SQL as a whole, in which we will teach folks the basic concepts and applying those concepts to start writing real SQL queries. During this session we use both data from Arcus as well as “fake clinical data” from a public source (this second source can be used by anyone, not just those with Arcus access). (See 102 below to keep going!)

Intro to SQL with Clinical Data - 102
A continuation of SQL 101 above, this lesson moves on to working with more advanced SQL techniques (for summarizing and aggregating results), and also goes into more detail on working with SQL JOINS (i.e. writing queries that combine results from more than 1 table).During this session we use a publicly available SQLite Data set (which contains “fake” clinical data that we can run our queries against).

REDCap and Survey Research

Survey Methodology and REDCap
Creating your own survey instruments for use in research can be challenging. Wording your questions well, providing appropriate response choices and paying attention to your survey’s visual design can greatly improve the quality of the data you get back. In this session we go through some evidence-based survey question and answer design basics, including easy-to-implement tips and tricks for designing your questions in REDCap.

REDCap API workflows in R vs. Python
This webinar is a realistic workshop on using REDCap with survey response data, taught bilingually in R and Python. Come to learn more about REDCap, stay for a fun, gently competitive exploration of differences between R and Python!

Making Dashboards out of your REDCap Data
Tired of recreating the same visualizations and reports as you gather more data? Want to increase your skill manipulating REDCap data in R or Python? This session will introduce dashboards as a powerful, time-saving, and visually compelling method for analyzing REDCap data. As with previous sessions, Joy and Cass will alternate showing you useful approaches in R and Python (respectively). We’ll also connect you to resources from the REDCap team to keep expanding your survey methods skillset!

Introductory Statistics and Machine Learning

WTH is Linear Regression? A friendly intro for the complete novice.
Regression is a fancy term for explaining variability in your data. We will take a conceptual visual laypersons approach to the linear model. This will help you understand plots, trendlines, correlation, and set the foundation for understanding concepts behind “machine learning”.

WTH Is Linear Regression? Part 2 – A linear model from CHOP clinical data in an Arcus Lab
In part 2 of this week’s series, we will construct a linear model of the relationship between birth weight and length from CHOP clinical data, building on concepts in WTH Is Linear Regression? A simple-to-follow, straightforward treatment of the linear model plus supporting concepts of data extract, cleaning, transformation, data structures, exploratory data analysis, visualization.

Introduction to Machine Learning
What are the foundational ideas in machine learning, and how are they relevant to medical research? Join Cass as they scope out ML for the research community, with a focus on framing a research question as a regression or classification task.

Computing tools and techniques

Unix Command Line I
Learn about interacting directly with the command line interface (CLI) of your computer. This session provides a hands-on introduction to using the popular Bash Terminal (a type of Unix shell) to gain access to a variety of extremely helpful methods and tools. (Part 1 of 2)

Unix Command Line II
Continue learning about interacting directly with the command line interface (CLI) of your computer! (Part 2 of 2)

Intermediate Bash Scripting
This session looks at intermediate-level bash scripting, including setting options in a shell script to create an interactive command-line tool. It is appropriate for leaners with some basic familiarity with Bash (e.g., that one has gone through the Software Carpentry beginner sequence).

Going deeper: R

R Markdown
What is R Markdown? If you’ve worked a little with R (or would like to) and have heard this term, or just seen nicely formatted reports, dashboards, articles, or books that include R, you’ve encountered R Markdown! Make your very first R Markdown file and learn how to use R Markdown to communicate with data more effectively. This webinar is tailored for people who have at least a bit of experience of R (3-10 hours) but the R-curious are welcome as well.

Intermediate Data Visualization in R
This presentation covers some advanced concepts of data visualization in R, specifically focusing on the ggplot2 package. Topics include tips and tricks (e.g. facet_grid margins), common troubles with ggplot() (e.g. sorting in descending order) and best practices. Familiarity with R and ggplot2 is useful and expected, but not required.

Census Data 101 with R
Learn what data the decennial Census and American Community Survey collects, and how to fetch and utilize data via the free Census API. This session uses R to fetch and parse data.

Philly Public Data in R
Interested in using publicly available data to understand child health? In this webinar, we’ll work with public data sourced from Philadelphia’s open data portal, discuss how to work with it effectively, and map that data, all in R. We’ll share code and talk about missing data, interpolation, points, polygons, R Markdown, and more. This is a great end-to-end project that will provide you with a template for similar projects you can do on your own.

Going deeper: Python

Python II - Using pandas for tabular data
How do you use Python for data analysis? pandas is a popular scientific computing library for Python that adds in some essential features for importing, transforming, and analyzing data, especially tabular data. This session introduces the main features in the pandas library. We recommend you watch Demystifying Python and/or spend a little time with Python before this session, but it is not required.

I have the data, now what? Exploratory Data Analysis of Public Data in a Jupyter Notebook + Python
In this video, Joy will take you through exploratory analysis of a NYC Open Data Portal dataset using Python and a Jupyter Notebook running in Google Colab. This is a great end-to-end project that will provide you with a template for similar projects you can do on your own.

Visualize tabular data with Python + pandas + seaborn
How do you effectively use the Python ecosystem of data analysis tools to visualize your data? In this session, we will introduce some of the key methods for exploring and visualizing tabular data. Appropriate for learners with some beginner-level experience with Python syntax - previous experience with pandas helpful but not required.

What is a workflow? Introduction to snakemake and HPC
In this intermediate session, Perry introduces workflows in Python using the snakemake library (“a scalable bioinformatics workflow engine”). He focuses on bioinformatics workflows and HPC (high-powered computing) use cases. This is a great session to learn how to bridge Python scripting with real-world, cutting-edge data science infrastructure.

Research data management and project management

Introduction to Data Management Principles
In this video, Joy discusses best practices for storing research data that is neat, tidy, and future-proof. No previous experience required!

Scientific Project Management with Arcus
Juan introduces key principles of project management (“a project is a temporary individual or collaborative effort done to create a service, product or capacity not previously existing. It involves planning and design”) and shows how this methodology works with Arcus Scientific Project teams.

Intro to Bioinformatics Technology Collaboration
Arcus aims to empower all staff at the Research Institute to become confident in working with data. This course provides tips and tricks for researchers of all backgrounds to become adept at using GitHub, at communicating functional requirements for tooling and software, and to understand the importance of computational pipelines. Development of integrated metadata management system for research data collections will be highlighted to demonstrate the implementation of these skills.


Intro to Tableau for Data Visualization
Data visualization helps you explore, analyze, and communicate insights from your data. This beginner-focused session will provide a crash course in data visualization best practices, and also show you what it looks like to build visualizations in Tableau Desktop or Tableau Public, which are popular data visualization applications in research and business settings. No previous experience required!

Web development

CSS Styling with Codepen
Want to learn about about using color and styling in web design? Need to match a branding requirement? Want to design something that is pretty and effective? This session in the iSTEM Creative Coding series will focus on CSS Styling using the Codepen platform. Appropriate for learners with no experience or beginner-level HTML/web development experience.

More coming soon!

Tutorials and articles

Find over 80 tutorials and articles on a range of topics below! (You can also use the search feature or view recently published articles.)

Return to top

Glossary of terms

Just starting? Trying to figure out if this site will be relevant to you? A good place is to begin with the glossary of terms, and then choose “Arcus Orientation” as your “educational pathway” below to get a selection of articles that probably makes the most sense for what you need.

Return to top

Icons made by catkuro from