Using the REDCap API
Tired of working with downloaded .csv files? Want to reach into your REDCap database in a scripted way that always gets the freshest copy of data? Read on!
There’s an equation in this paper! Now what?
Anyone reading scientific papers will eventually, if not frequently, come to a line of mathematical symbols that do not make sense. This article is about how to keep reading!
Version control your writing
You’ve heard of using version control for code, but have you thought about what version control could do for your writing?
Getting more from R-Markdown
You may have heard that R-Markdown files are great for reproducible research, but what do they actually do? This post provides an overview of what you can do with R-Markdown (including properly-formatted journal articles, slide decks, websites, dashboards, Word documents, and more) and links with resources to get you started.
How do I collapse data from several columns into one?
Learn how to take similar data stored across several different columns and combine it into a single column in R
Getting Started with lasso regression in R
Links to great resources to learn how to conduct lasso regression in R, and other related techniques.
R 4 Beginners Chapter 7 - Reading Tabular Data
Learn to import tabular data with the readr package.
ANOVA tables in R
This post shows how to generate an ANOVA table from your R model output that you can then use directly in your manuscript draft.
Linear Regression in R: Annotated Output
This post shows how to run a linear regression model in R, and how to interpret the output produced, line by line.
Understanding Interactions in Linear Models
This post will walk you through how adding an interaction term between a continuous and a categorical predictor changes your model, both statistically and in terms of your real-world interpretation of results.
Arcus Data Repository: A Fast Track to Research
Why has Arcus created a new clinical data source? Find out more about the Arcus Data Respository.
R 4 Beginners Chapter 6 - Reproducible Programming
Learn the tools for coding reproducibly with R scripts and R Markdown.
R 4 Beginners Chapter 5 - Data Transformation
Learn how to get your data in the form you need using dplyr
.
Josh Taban’s CHOP Internship
How an internship at CHOP led to a love of data analysis with R.
R 4 Beginners Chapter 4 - Data Visualization with ggplot2, Part II
Continue learning data visualization with ggplot2, using statisical transformations, plot labels, position adjustments, and coordinate systems.
R 4 Beginners Chapter 3 - Data Visualization with ggplot2
Getting started using ggplot2 for data visualization.
R 4 Beginners Chapter 2 - Coding Basics
Walking through some basics of writing code for the first time.
R 4 Beginners Chapter 1 - Introduction and Installation
Interested in learning R from the very beginning? This is a great place to start!
The UMLS Metathesaurus
Confused by all the potential ways to code or describe medical terms? ICD, OMOP, RxNorm, SNOMED – what are all these acronyms? Learn more about an exceptionally helpful tool in this post by librarian and data guru Hannah Calkins.
What Type of Machine Learning Should I Use?
Let’s demystify machine learning! In this article, you’ll learn 1) the fundamentals of machine learning and 2) how to translate your research questions into well-scoped ML tasks.
The REDCap API and Windows
Have you tried our hints and tips for accessing your REDCap data via the REDCap API but have run into strange ‘TLS’ or ‘SSL’ errors? Do you use Windows? You’ll want to read this piece!
The U.S. Census Bureau and Child Health
What does the U.S. Census Bureau have to do with child health? How can you discover and use Census data in your research? Find out more in this first of a series of articles on Census data.
Data Sharing and Privacy: A Very Cursory Overview
Can you see / use given data? What are the rules around privacy? When do you need IRB approval? Find out more in this flowchart-driven article!
User Groups at CHOP
Are you an R or Python user? Are you interested in either language? You might find CHOP’s User Groups interesting and useful!
Cloud Tools for the Unconvinced
Curious about R and Python, but not sure where to start? Not ready to commit to a download? Read on!
The Spreadsheet Betrayal
Somewhere along the line that trust that you had fostered soured.
Arcus Annotations and cTAKES
A given cTAKES configuration, called a pipeline, has multiple stages (annotators) that perform specific tasks like part-of-speech tagging or detecting term negation. The steps are run in sequence.
Arcus Annotations and RDoC
Rather than classifying symptoms into disorders as in the DSM system, any aspect of human experience can be measured, multidimensionally, by scoring it within a carefully designed system of domains, constructs within domains, and sub-constructs on the one hand and units of analysis on the other. These factors are examined ‘in a context emphasizing developmental trajectories and the individual’s interactions with his or her environment.’
Arcus Annotations: Harvesting Data from Text Notes
Investigators at CHOP have not had a method for gleaning large amounts of note data for their studies. But they soon will.
Feasibility Analysis Using Arcus Cohort Discovery
How can you decide, more quickly, whether your idea for a study is feasible, given CHOP’s patient population? Arcus has a new solution that gives you more autonomy. Learn more about how you can discover cohorts using Arcus Cohort Discovery!
Meet the Arcus Library Science Team
Meet the information scientists who are helping Arcus create exciting new solutions for the organization, preservation, and interconnection of CHOP’s research efforts: our Library Science team!
Why Archivists and Librarians?
Why does Arcus have a team of highly-trained archivists and librarians as part of our efforts to create a more nimble, innovative, and interconnected research environment at CHOP? Learn about how the original information science, library science, is revolutionizing the way we work with data at CHOP.
The Argument Against Aggregation
Not everybody thought taking the average was a good idea.
Swirl: Learn R in R
Detailed instructions on how to start learning R using R’s swirl package.
Statistics Chapter 1: Measures of Central Tendency and Dispersion
Start here to learn in depth about the theory and practice of statistics.
Variable Types
Variable types commonly used in statistical analysis and their basic properties.
Do Patterns in Missing Data Matter?
Do missing data represent information from which we might draw conclusions?
Tiny Munge
An example of a typical small data munging problem with its solution.
Descriptive Statistics: The Bullet
Follow these (very) basic rules about displaying variable types.
Date Pairing in R
Do you have data that includes two or more repeated measurements? Need to figure out which ones to use? It gets complicated! Find out how to avoid multiple rows in your merge and choose the right pair of measures for your research.
Comparing Parts of Speech with NLTK
In this lab, we compare parts of speech using Natural language processing (NLP) via NTLK. Do presidents differ by party in their State of the Union language? Let’s find out!
Data Preparation
How do you get from raw data to something you can do statistical analysis on? And who’s responsible for cleaning your data? Read more about it in this post.
Clinical Data in R
Need to work with clinical data? Not sure how to start with the big, messy, raw data you were given? Give this article a read!
Best Practices for REDCap Variables and Instruments
Making sure your fields and instruments are named well and properly set up in REDCap will ensure that your data collection goes smoothly. Find out more in this article!
Collecting Sex and Gender Data
Making sure your fields and instruments are named well and properly set up in REDCap will ensure that your data collection goes smoothly. Find out more in this article!
REDCap Race and Ethnicity Data Collection
The collection of race and ethnicity can be surprisingly inconsistent across studies. Find out the right way to collect this data according to federal standards.
REDCap: PHI and Permissions
How can you include PHI in your REDCap database in a safe way? Read this article to find out.
REDCap Data Collection Overview
Do you use REDCap to collect data, or to track subjects? You’ll want to read about some common pitfalls and how to avoid them in this series of articles!
REDCap Free Text Collection
Allowing free text fields may seem attractive and simple, but this data collection strategy comes at a cost. Find out more about REDCap’s free text fields in this article.
REDCap Field Types
Getting field types wrong can mean data ends up difficult or impossible to use at analysis time. Find out how to ensure your REDCap field types are correct in this article.
REDCap Free Text Collection
Are you inadvertently combining two or more data points into one field? This is hard to tease out at the end of data collection, and will complicate your research. Find out what we mean by data combining in this article, one of a series of articles to improve your experience of REDCap.
My File is Over There: File Paths for Data Scientists
Let’s say that you have some .csv files locally on your computer, and you want to load them into R or Python. You’re working in RStudio or a Jupyter notebook, and you’re not sure how to point to the file you want to bring in. This can be considerably painful if you are new to the concept of file paths. If you’re new to writing code, or you’ve encountered problems with this, read on!
FIPs and the Belmont Report: Divergence
The FIPs and the Belmont Report treat bias and discrimination differently. Want to learn more? Read this third of a series of three articles on FIPs and the Belmont Report.
Ordinary Linear Regression in R
Want to learn how to do ordinary linear regression in R? Read on!
Null Hypothesis Statistical Testing (NHST)
If it’s been awhile since you had statistics, or you’re brand new to research, you might need to brush up on some basic topics. In this article, we’ll take on hypothesis testing.
Python Lab for Beginners
In this lab, we’ll walk you through what to do when you get a .csv – how to bring it into Python, do some data cleaning, gather summary statistics for reporting, and do some initial data visualizations. This is a great place to start if you’re brand new to Python!
FIPs and the Belmont Report: Similarities
Both the FIPs and the Belmont Report emphasize the importance of obtaining a subjects’ proper consent, although each accomplishes this through different means. Want to learn more? Read this second of a series of three articles on FIPs and the Belmont Report.
Linear Algebra, a Geometric Approach
Linear Algebra, what even is it? If you’re baffled by linear algebra notation or the role of linear algebra in statistical metrics, or you just want to improve your basic grasp of linear algebra, check out this article.
The p Value Controversy
Why are p values under fire? In this article, we try to explain a bit of why the p value is often considered insufficent evidence by many statisticans and researchers.
Customizing ggplot2 Visualizations With ggThemeAssist
Ever struggle with getting your ggplot2 visualization to meet all of your needs? Tired of having to go to Stack Overflow every time you prepare a graph for publication? Read this piece.
FIPs and the Belmont Report: Principles
The Fair Information Practices (FIPs) and the Belmont Report are two key sets of principles that have been implemented into laws both nationally and internationally since their respective publications in 1973 and 1976. These principles have helped shape the regulations that guide researchers. Want to learn more? Read this first of a series of three articles on FIPs and the Belmont Report.
Social Justice and Data Science
Researchers have a special duty to ensure that their research is just. Data scientists, also, need to keep social justice in mind when analyzing and preparing data.
Intro to the Linux Command Line
Need to work with the Linux command line, and you’ve never done it? Not sure where to start? Start here!
Mapping Environmental Exposures
Ever want to more detail about the environmental context of your subjects? Want to map outcomes, disparities, or healthcare realities? This hands-on code lab shows you how to map Philadelphia’s recent shooting history alongside your research data.
What is an API?
Why is Arcus interested in creating APIs for researcher use? What even is an API, and why would you use it?
What is Metadata?
The Arcus Data Catalog will use metadata to facilitate data discovery. Why is metadata so important in research?
Arcus Clinical Cohorts
Defining a clinical cohort can be challenging, and can represent many hours of work. How can we make this effort be more productive and benefit other researchers? Find out how Arcus is working to improve clinical cohort definition in this article.
Arcus Data Catalog
Ever resort to Google to find CHOP researchers? Wonder what else is going on in the Research Institute, but not sure how to find out? An important tool Arcus will bring to the research process at CHOP is a Data Catalog that helps you find research that’s pertinent to you.
Code Readability
How can I make my code more readable? A few tips and tricks for those who want to make sure their code is understandable.
Getting to one row
Do you have a pile of data about patients or research subjects, and you want one row per person, but can’t seem to get there easily? Find out why and how to solve this conundrum!
What is SQL?
Do you have data that’s held in a SQL database? Find out more about this important database technology and what you really need to know to get started.
Data Combining in R
Do you need to combine a few different datasets into one? R excels at this. Find out about merging, column binding, and row binding here!
ggplot overview
Do you want to make useful, attractive data visualizations in R? ggplot is the visualization solution you’re looking for!
Intro to Machine Learning: Trees
What is predictive, supervised machine learning? Can you do it in R? Find out more by examining one machine learning algorithm here!
Natural Language Processing with NLTK
Natural language processing (NLP) will come in handy if you analyze things like physician notes or language samples from research subjects. It allows you to examine language in various ways. Try a brief lab in working with a language sample!
Excel, if you must…
The use of Excel in science is a hotly debated issue with strong feelings. If you choose to use Excel, you should understand why it’s so controversial and how to use it as safely as possible!
Regex 101
Regex is a way to find strings that match a pattern you’re looking for. It’s handy in data processing, as well as in writing scripts. Read more about this skill (some would say arcane art) in this post.
High Performance Computing
What is high-performance computing? What resources does CHOP have available for your computing needs, when your laptop just won’t cut it? Find out more in this article.
Arcus’s Virtual Biobank
What is a virtual biobank? How does Arcus support genome-phenome studies?
Understanding Pearson’s r
Ever feel like you don’t have an intuitive grasp of what Pearson’s R correlation score is? You might know that scores with an absolute value close to 1 are useful, but why? And what’s the relationship between correlation and a linear model? Find out more here!
Statistical Programming Languages
What statistical programming language should you use?
Intro to NetworkX
NetworkX is a python package you can use to do graph analysis or construct network diagrams. Read more and run code in Python 3 to see how this module works!
Why Use Literate Statistical Programming?
Why does literate statistical programming matter to a biomedical researcher? Learn how this programming paradigm can increase your productivity and scientific rigor.
R Markdown 101
In this video lab, you’ll make your first R Markdown document. This is an application of literate statistical programming.
Clinical Data at CHOP
How is clinical data at CHOP stored? How can it be accessed by researchers?
Recording Consent
It’s important to track your consent in a digital way, just as you record your research data in a digital way. Check out some suggestions on how to work with consent data in this article.
Distributed Humaning
When should you use human effort for things like coding data, versus developing automated solutions? A few thoughts.
Privacy Risks
What legal and financial risks do you undertake when you work with data? Learn more about the consequences of data carelessness in this article by attorney and privacy expert Dianna Reuter!
Interrogating the Data Until it Confesses
A big part of reproducible research is the responsible, rigorous conduct of research with regards to statistical methods. There are a number of ways to get a publishable manuscript which do not actually give reproducible results. Learn more about p-hacking, harking, and other shady practices here!
Statistical Intervals and Visualizations: Difference Between Means
One criticism that seems to have gained traction in the quest for reproducible research is that traditional methods rely too much on point estimates instead of interval estimates. In a 2014 article by Geoff Cummings, a number of visualizations are offered as improvements in demonstrating interval estimates. Learn how to reproduce those graphics in ggplot2. This article shows how to do a means differences graph.
Data Dictionaries
What’s a data dictionary? What do you put in it, and why does it matter?
R Lab for Beginners
In this lab, we’ll walk you through what to do when you get a .csv – how to bring it into R, do some data cleaning, gather summary statistics for reporting, and do some initial data visualizations. This is a great place to start if you’re brand new to R!
Scripted Analysis for Reproducibility
Lots of digital ink has been spilled over the topic of reproducibility in science. This article addresses the technology that supports reproducible analysis of data, using scripted analysis.
Base R Plotting
Base R plotting is great for fast data exploration. While these graphics aren’t likely to be attractive enough for publication, they’re perfect for checking out hypotheses and understanding your data more easily.
Flat File Data Storage
Flat file data storage includes file types like .csv, .json, and .xml. Learn how these differ from one another and their relative advantages and disadvantages.
Cartesian Result Sets
Why does my result set after a data pull have more than one row per subject? If you’ve ever wondered why a simple data query turns so complicated, with multiple rows per person, this article is for you!
Literate Statistical Programming
What is literate statistical programming? How is it different from just commenting some code well, or documenting all the changes you do to a dataset in a separate file? Read more here about how literate statistical programming can streamline your scientific production.
Sparklines in ggplot2
Sparklines are a great way to show trends over time. In this article we gradually build up a ggplot2 sparkline visualization.
Welcome to the Tidyverse!
What is tidy data? This post explores the set of R tools called ‘the Tidyverse’ as well as explaining a bit about tidy data generally.
Jupyter 101
If you’re new to Python, working in an interactive environment like a Jupyter notebook might help you hone your code more easily.
Git 102
Learn how to use CHOP GitHub by creating your first repository.
Git 101
Explaining what Version Control is and beginning an exploration of git / GitHub.
Writing Functions in R
If you’re not an experienced programmer, understanding why writing functions is important may seem very abstract. This article gives you some code examples to explain why writing functions makes your code stronger and easier to use.
When R Gets Too Helpful
Sometimes R oversteps its bounds by assuming you want something you really don’t. Learn about a few cases where this might happen, and how to avoid it!
Version Control Curriculum
Introduction to Version Control using Git and GitHub
Glossary of Terms
Educational Pathways