List Your Installed R Packages

    Sometimes you’ll need to export your list of installed packages to help a colleague or to memorialize your build before updating R or RStudio. Here’s how.

    User Groups at CHOP

    Are you an R or Python user? Are you interested in either language? You might find CHOP’s User Groups interesting and useful!

    Arcus Annotations and cTAKES

    A given cTAKES configuration, called a pipeline, has multiple stages (annotators) that perform specific tasks like part-of-speech tagging or detecting term negation. The steps are run in sequence.

    Arcus Annotations and RDoC

    Rather than classifying symptoms into disorders as in the DSM system, any aspect of human experience can be measured, multidimensionally, by scoring it within a carefully designed system of domains, constructs within domains, and sub-constructs on the one hand and units of analysis on the other. These factors are examined ‘in a context emphasizing developmental trajectories and the individual’s interactions with his or her environment.’

    Feasibility Analysis Using Arcus Cohort Discovery

    How can you decide, more quickly, whether your idea for a study is feasible, given CHOP’s patient population? Arcus has a new solution that gives you more autonomy. Learn more about how you can discover cohorts using Arcus Cohort Discovery!

    Meet the Arcus Library Science Team

    Meet the information scientists who are helping Arcus create exciting new solutions for the organization, preservation, and interconnection of CHOP’s research efforts: our Library Science team!

    Why Archivists and Librarians?

    Why does Arcus have a team of highly-trained archivists and librarians as part of our efforts to create a more nimble, innovative, and interconnected research environment at CHOP? Learn about how the original information science, library science, is revolutionizing the way we work with data at CHOP.

    Variable Types

    Variable types commonly used in statistical analysis and their basic properties.

    Tiny Munge

    An example of a typical small data munging problem with its solution.

    Using the REDCap API

    Tired of working with downloaded .csv files? Want to reach into your REDCap database in a scripted way that always gets the freshest copy of data? Read on!

    Date Pairing in R

    Do you have data that includes two or more repeated measurements? Need to figure out which ones to use? It gets complicated! Find out how to avoid multiple rows in your merge and choose the right pair of measures for your research.

    Comparing Parts of Speech with NLTK

    In this lab, we compare parts of speech using Natural language processing (NLP) via NTLK. Do presidents differ by party in their State of the Union language? Let’s find out!

    Data Preparation

    How do you get from raw data to something you can do statistical analysis on? And who’s responsible for cleaning your data? Read more about it in this post.

    Clinical Data in R

    Need to work with clinical data? Not sure how to start with the big, messy, raw data you were given? Give this article a read!

    Collecting Sex and Gender Data

    Making sure your fields and instruments are named well and properly set up in REDCap will ensure that your data collection goes smoothly. Find out more in this article!

    REDCap Data Collection Overview

    Do you use REDCap to collect data, or to track subjects? You’ll want to read about some common pitfalls and how to avoid them in this series of articles!

    REDCap Free Text Collection

    Allowing free text fields may seem attractive and simple, but this data collection strategy comes at a cost. Find out more about REDCap’s free text fields in this article.

    REDCap Field Types

    Getting field types wrong can mean data ends up difficult or impossible to use at analysis time. Find out how to ensure your REDCap field types are correct in this article.

    REDCap Free Text Collection

    Are you inadvertently combining two or more data points into one field? This is hard to tease out at the end of data collection, and will complicate your research. Find out what we mean by data combining in this article, one of a series of articles to improve your experience of REDCap.

    My File is Over There: File Paths for Data Scientists

    Let’s say that you have some .csv files locally on your computer, and you want to load them into R or Python. You’re working in RStudio or a Jupyter notebook, and you’re not sure how to point to the file you want to bring in. This can be considerably painful if you are new to the concept of file paths. If you’re new to writing code, or you’ve encountered problems with this, read on!

    Python Lab for Beginners

    In this lab, we’ll walk you through what to do when you get a .csv – how to bring it into Python, do some data cleaning, gather summary statistics for reporting, and do some initial data visualizations. This is a great place to start if you’re brand new to Python!

    FIPs and the Belmont Report: Similarities

    Both the FIPs and the Belmont Report emphasize the importance of obtaining a subjects’ proper consent, although each accomplishes this through different means. Want to learn more? Read this second of a series of three articles on FIPs and the Belmont Report.

    Linear Algebra, a Geometric Approach

    Linear Algebra, what even is it? If you’re baffled by linear algebra notation or the role of linear algebra in statistical metrics, or you just want to improve your basic grasp of linear algebra, check out this article.

    The p Value Controversy

    Why are p values under fire? In this article, we try to explain a bit of why the p value is often considered insufficent evidence by many statisticans and researchers.

    FIPs and the Belmont Report: Principles

    The Fair Information Practices (FIPs) and the Belmont Report are two key sets of principles that have been implemented into laws both nationally and internationally since their respective publications in 1973 and 1976. These principles have helped shape the regulations that guide researchers. Want to learn more? Read this first of a series of three articles on FIPs and the Belmont Report.

    Social Justice and Data Science

    Researchers have a special duty to ensure that their research is just. Data scientists, also, need to keep social justice in mind when analyzing and preparing data.

    Mapping Environmental Exposures

    Ever want to more detail about the environmental context of your subjects? Want to map outcomes, disparities, or healthcare realities? This hands-on code lab shows you how to map Philadelphia’s recent shooting history alongside your research data.

    What is an API?

    Why is Arcus interested in creating APIs for researcher use? What even is an API, and why would you use it?

    What is Metadata?

    The Arcus Data Catalog will use metadata to facilitate data discovery. Why is metadata so important in research?

    Arcus Clinical Cohorts

    Defining a clinical cohort can be challenging, and can represent many hours of work. How can we make this effort be more productive and benefit other researchers? Find out how Arcus is working to improve clinical cohort definition in this article.

    Arcus Data Catalog

    Ever resort to Google to find CHOP researchers? Wonder what else is going on in the Research Institute, but not sure how to find out? An important tool Arcus will bring to the research process at CHOP is a Data Catalog that helps you find research that’s pertinent to you.

    Code Readability

    How can I make my code more readable? A few tips and tricks for those who want to make sure their code is understandable.

    Getting to one row

    Do you have a pile of data about patients or research subjects, and you want one row per person, but can’t seem to get there easily? Find out why and how to solve this conundrum!

    What is SQL?

    Do you have data that’s held in a SQL database? Find out more about this important database technology and what you really need to know to get started.

    Data Combining in R

    Do you need to combine a few different datasets into one? R excels at this. Find out about merging, column binding, and row binding here!

    ggplot overview

    Do you want to make useful, attractive data visualizations in R? ggplot is the visualization solution you’re looking for!

    Natural Language Processing with NLTK

    Natural language processing (NLP) will come in handy if you analyze things like physician notes or language samples from research subjects. It allows you to examine language in various ways. Try a brief lab in working with a language sample!

    Excel, if you must…

    The use of Excel in science is a hotly debated issue with strong feelings. If you choose to use Excel, you should understand why it’s so controversial and how to use it as safely as possible!

    Regex 101

    Regex is a way to find strings that match a pattern you’re looking for. It’s handy in data processing, as well as in writing scripts. Read more about this skill (some would say arcane art) in this post.

    High Performance Computing

    What is high-performance computing? What resources does CHOP have available for your computing needs, when your laptop just won’t cut it? Find out more in this article.

    Understanding Pearson’s r

    Ever feel like you don’t have an intuitive grasp of what Pearson’s R correlation score is? You might know that scores with an absolute value close to 1 are useful, but why? And what’s the relationship between correlation and a linear model? Find out more here!

    Intro to NetworkX

    NetworkX is a python package you can use to do graph analysis or construct network diagrams. Read more and run code in Python 3 to see how this module works!

    R Markdown 101

    In this video lab, you’ll make your first R Markdown document. This is an application of literate statistical programming.

    Recording Consent

    It’s important to track your consent in a digital way, just as you record your research data in a digital way. Check out some suggestions on how to work with consent data in this article.

    Distributed Humaning

    When should you use human effort for things like coding data, versus developing automated solutions? A few thoughts.

    Privacy Risks

    What legal and financial risks do you undertake when you work with data? Learn more about the consequences of data carelessness in this article by attorney and privacy expert Dianna Reuter!

    Interrogating the Data Until it Confesses

    A big part of reproducible research is the responsible, rigorous conduct of research with regards to statistical methods. There are a number of ways to get a publishable manuscript which do not actually give reproducible results. Learn more about p-hacking, harking, and other shady practices here!

    Statistical Intervals and Visualizations: Difference Between Means

    One criticism that seems to have gained traction in the quest for reproducible research is that traditional methods rely too much on point estimates instead of interval estimates. In a 2014 article by Geoff Cummings, a number of visualizations are offered as improvements in demonstrating interval estimates. Learn how to reproduce those graphics in ggplot2. This article shows how to do a means differences graph.

    R Lab for Beginners

    In this lab, we’ll walk you through what to do when you get a .csv – how to bring it into R, do some data cleaning, gather summary statistics for reporting, and do some initial data visualizations. This is a great place to start if you’re brand new to R!

    Scripted Analysis for Reproducibility

    Lots of digital ink has been spilled over the topic of reproducibility in science. This article addresses the technology that supports reproducible analysis of data, using scripted analysis.

    Base R Plotting

    Base R plotting is great for fast data exploration. While these graphics aren’t likely to be attractive enough for publication, they’re perfect for checking out hypotheses and understanding your data more easily.

    Flat File Data Storage

    Flat file data storage includes file types like .csv, .json, and .xml. Learn how these differ from one another and their relative advantages and disadvantages.

    Cartesian Result Sets

    Why does my result set after a data pull have more than one row per subject? If you’ve ever wondered why a simple data query turns so complicated, with multiple rows per person, this article is for you!

    Literate Statistical Programming

    What is literate statistical programming? How is it different from just commenting some code well, or documenting all the changes you do to a dataset in a separate file? Read more here about how literate statistical programming can streamline your scientific production.

    Sparklines in ggplot2

    Sparklines are a great way to show trends over time. In this article we gradually build up a ggplot2 sparkline visualization.

    Welcome to the Tidyverse!

    What is tidy data? This post explores the set of R tools called ‘the Tidyverse’ as well as explaining a bit about tidy data generally.

    Jupyter 101

    If you’re new to Python, working in an interactive environment like a Jupyter notebook might help you hone your code more easily.

    Git 102

    Learn how to use CHOP GitHub by creating your first repository.

    Git 101

    Explaining what Version Control is and beginning an exploration of git / GitHub.

    Writing Functions in R

    If you’re not an experienced programmer, understanding why writing functions is important may seem very abstract. This article gives you some code examples to explain why writing functions makes your code stronger and easier to use.

    When R Gets Too Helpful

    Sometimes R oversteps its bounds by assuming you want something you really don’t. Learn about a few cases where this might happen, and how to avoid it!