How to Get Started Using R

Feb 27, 2023

Open up RStudio and let’s run some code!

R is an interpreter, meaning it will take whatever code you type and translate it into computer commands. To run a piece of code, type it into the Console pane and press Enter on your keyboard. In the Console pane, `>` means R is ready for a new command and `+` means R is waiting for more information. If you get stuck with `+`, press the ESC key to get back to `>`.

Calculations

R can do calculations. Type the following into the Console pane and press Enter.

`1 + 2`

What did you get back?

You should see `[1] 3`. The `[1]` is the index of the output, and `3` is the answer or output of `1 + 2`.

Let’s try a few more calculations. Type the following into the Console pane and press Enter after each line.

`2 * 3`
` `
`10 / 2`

Notice again that the index is returned along with the answer of the computation.

Functions

R also uses something called a function, and we will be using many functions as we create plots and process data in R. Functions accept inputs (called arguments) and create output. Functions are useful for performing tasks in R. You can write your own functions, but in this book we will focus on using existing functions.

One function in R is `c()`. This function combines values into a vector, which you can think of as a list. Let’s see how it works. Type the following code into the Console pane and press Enter.

`c(1, 2, 3, 4, 5)`

What does R return? You should see this in your Console:

``[1] 1 2 3 4 5``

Like before `[1]` provides the index of the output. In this case, there are actually 5 indices because there are five values, but we won’t worry too much about that at the moment. For now, just know that the `[1]` represents the index, not the actual values. Then R prints out the five numbers we put inside the `c()` function.

Objects

What if we wanted to save the output of this function (or the output of our earlier calculations) somewhere?

This is where objects come in. Objects are a way to save data or output in the current session or environment. When you create an object, it will appear in the Environment pane in RStudio. Objects and code typed into the Console pane only exist in the current session, which means if you close RStudio, they will be deleted. Later in this chapter, I’ll discuss R scripts, which are a way to save your code so you can run it again later. For now, let’s talk more about objects.

Objects must have a name, and there are several conventions for creating object names. It’s important to know that names in R are case sensitive, meaning `myData` is distinct from `mydata` and `MyData`. The naming convention that you use can be based on your personal preference, but it’s helpful to stick to one.

Here are some different naming conventions:

• All lowercase - words are put together in all lowercase. For example: `mydata` and `example2`

• Period separated - words are separated by a period. For example: `my.data` and

• Underscore separated - words are separated by an underscore. Fore example: `my_data` and

• Lower Camel Case - the first word is all lowercase and all following words start with an uppercase. For example: `myData` and

• Upper Camel Case - each word starts with an uppercase. For example: `MyData` and

I use underscore separated, which you will see throughout this book. But it’s up to you which naming convention you’d like to use. Again, I recommend picking one and sticking with it.

Certain object names are not allowed. Some words already have meaning in R, so they cannot be used. For example, `TRUE`, `function`, and `NA`. Names also cannot start with a number or space.

To create an object, you create an assignment using the `<-` or assignment operator. The name of the object goes on the left-side of the assignment operator and the code or information to be stored in the object goes on the right-side of the operator.

In RStudio, you can use a keyboard shortcut to generate `<-` rather than typing the two symbols. In Windows use Alt + - ; in Mac use Option + - to create the assignment operator.

The following code assigns the number `1` to the object `a_number`.

`a_number <- 1`

Running this code by typing it into the Console and pressing enter will not result in any output in the pane, like we saw earlier. However, it has created an object called `a_number`, which can be seen now in the Environment pane.

We can assign a vector of numbers to an object like this:

``some_numbers <- c(1, 2, 3, 4, 5)``

Again if you type this into the Console and press enter, you won’t see any output in that pane, but a new object called `some_numbers` will appear in the Environment pane.

To see the values in `some_numbers`, all we need to do is type `some_numbers` and press enter in the Console.

``some_numbers``
``[1] 1 2 3 4 5``

Once you have an object, you can perform operations and pass it to a function. For example, we can multiply all of the numbers in `some_numbers` by 2.

``some_numbers * 2``
``[1]  2  4  6  8 10``

We can also find the mean of the five numbers stored in `some_numbers`.

``mean(some_numbers)``
``[1] 3``

Notice that `some_numbers * 2` multiplied each individual number by 2 while `mean(some_numbers)` took the average over all five numbers and returned a single number. `mean()` is a function, which allows us to provide it with some input, in this case `some_numbers`, it completes an operation, in this case computing the average, and provides us with the output.

We can store the output of a function as a new object called `avg_number`. This will store the output of the `mean()` function in the object. Then we can view the contents of `avg_number`.

``````avg_number <- mean(some_numbers)
avg_number``````
``[1] 3``

As we saw earlier, functions take inputs called arguments. These arguments have names as well, although we don’t always have to use them. In the `mean()` function, the first argument is named `x`, and it’s the object for which the function will compute the average. We could have written `mean(x = some_numbers)` earlier. If we don’t provide the name of the argument, R makes an assumption based on the order in which we supply inputs. In the case of the `mean()` function, R assumes we mean `x = some_numbers` when we just put `some_numbers` inside the parentheses.

So far we’ve been working with single numbers and a list of numbers. When plotting data, we will be working with a whole table of information of different types - numbers, words, dates, etc. The format we will be using for storing data is a type of R object called a data frame. A data frame allows us to store data in rows and columns, similarly to the way data is seen in a table or spreadsheet. For now, we’re going to move on to a few more key topics for general R use. Then we’ll come back to working with data in R.

Packages

What you just did in R was all with what’s called base R. That’s what was installed when you installed R from the Comprehensive R Archive Network (CRAN). Because R is open source, people can add onto it. They add onto it by creating packages, which create more functionality for R.

Packages allow us to create visualizations, do more advanced analyses, make processing data easier, and provide complex graphing capabilities.

To use packages, you have to install them - just like how you had to install R (or any other software) on your computer in order to use it. This installation is only required one time - again like installing R or other software.

Once a package is installed, it has to be loaded each time you want to use it. Similarly to how you have to open RStudio to use it, you have to load (i.e. open) the package to use it.

To install a package, we use the function `install.packages()` and to load a package, we use the function `library()`. Inside the parentheses of these functions, you put the package name.

R Scripts

Most often, you will be loading packages using the `library()` function at the top of a R script in the Source pane.

Here are three ways to create a new R script in RStudio:

1. Click on File > New File > R Script

2. Click on the icon of a white box with a green plus sign then click R Script

3. Press CTRL/COMMAND + SHIFT + N

R scripts are where I code most often so that I can save my work. Earlier in the Console, you ran code by typing or copying and pasting code next to the `>` symbol and pressing enter. In an R script, you run code by placing your cursor at the end of a line (or by highlighting a section of code) and pressing CTRL/COMMAND + ENTER or clicking the Run button at the top of the Source pane.

Getting Help

When R gets something it does not expect, it returns an error. For example:

``function <- "test"``
``````Error: <text>:1:10: unexpected assignment
1: function <-
^``````

This returns an error because `function` is a reserved word in R, which means we cannot use it as an object name. If we change the name of the object to `word`, this code will work just fine and create a new object called `word` with the value `"test"` .

``word <- "test"word``
``[1] "test"``

Error messages can be difficult to understand, especially when you’re first getting started in R. When you receive an error message, first see if you can understand it. In the example above, “unexpected assignment” tells us that there’s something R doesn’t understand about our desire to assign `"test"` to an object called `function`, so we can try changing the name of the object to something else.

Using a search engine to find solutions can also help when you’ve run into an error. It usually helps to add “R” or “RStats” to your search words, and if you received an error with a specific function or package, add the name of the function or package too. Then you can copy and paste part of the error message.

In our example, `Error: unexpected assignment function <-`, `function <-` is unique to our code, so including that in the search text won’t help. But the first part of the message, `Error: unexpected assignment` is not unique to our code, so we can put that into a search engine with “R” in front. The first search result does not solve our exact problem, but it does hint at the issue by telling us that this error means we have a syntax error or a typo in our code, so we can go back and try to adjust the code with a different word. You might also notice that in the code `function <- "test"`, the word `function` is blue and, in the book text, bold. This is another hint that something about this word is different from a normal object name, which does not appear in blue and bold font.

All this to say, errors in R are expected - code will not run perfectly every single time. The key is to double check the code you ran - are you using a reserved name? Is there a missing parenthesis or quotation mark? Then, explore the error message and see if you can interpret it’s meaning (this does get easier as you code more in R). Next, try searching online using some part of the error message to see if someone else has had a similar problem and solved it.

Another thing to remember if your code doesn’t seem to be running as expected is to double check the Console - remember `+` means R is waiting for more information, and if you get stuck with `+`, press the ESC key to get back to `>`.

Help Pages

Finally, R functions have help pages, which you can access by typing a `?` and then the function name in R. This will open up the Help pane in the bottom right corner of the RStudio software. Try typing `?mean` in the Console and then pressing enter.

A help page includes the name of the function and its package, in this case at the top of the page, you’ll see `mean {base}` because the name of the function is mean, and it’s in base R. Then you’ll see a short explanation of the function does, in this case “Arithmetic Mean”. Under that, there’s a Description of the function and an examples of its use. I usually focus on the Arguments section, which appears next, and the Examples section, which appears at the bottom of the page.

The Arguments section of a help page lists each input or argument for that function along with a definition explaining the possible values for that argument.

The Examples section provides example code of how to use the function, which can be very helpful to copy over into the Console pane to get a better idea of how the function works.

The Value section in the middle of the page explains what the function returns - in this case “the arithmetic mean of the values in x is computed…”.

Some help pages are better than others, but in general, they are a useful source for information when you’re trying to figure out what arguments you can use in a function and how a function operates.

This is an excerpt from my upcoming book, Data Visualization in R. To get the latest on the release of this book, upcoming trainings, and data viz tips, subscribe to my newsletter below.