# A quick introduction to R

This page is part of a series under tag r-at-school. The series is about improving how we teach Science in schools. Find out more: Using R at school

This note provides an introduction to the R environment. In the context of the Using R at school initiative, this is basically where teachers should learn the basics of R so they can go ahead and try some of the workshops I have already published and will publish in the next weeks.

You can, correctly, argue that this note should have been the first one to go out. That would be a fair argument! However, the reason why the introduction to R come now is because I first wanted to show, in the past two workshops I posted prior to this note, that R workshops with students are not difficult. I also wanted to provide the big picture of what it means, in practical terms, to integrate programming in every subject and every aspect of learning.

A key part of my vision to change the way we teach science and coding in schools, is that teachers are required to learn a bit of R before running the recipes. Students, on the other hand, will learn it on the fly, during the workshops and assignments that are the main topics of my posts.

So in this note, my aim and focus is to mainly address teachers who are willing to integrate some of the R recipes I publish in their programs. Don’t worry, here we will cover just some basics about R, as not much is actually required to get the ABC of it.

## Installing R and RStudio

There are so many great articles out there about getting set up with R and RStudio, so I am not going to add redundancy in the Internet and publish yet another post about how to install what you need. I will rather point you to some of the best resources I could find.

Before installing the components you need, it is a good thing to read the official (and short) page About R. After doing so, please jump to the following content and make sure you install either R or RStudio. You can install both as well, but one does not require the other (though, to be precise, RStudio will actually install R as well).

### Installing R

R is nothing more than a big collection of libraries and a basic user interface consisting in a simple commandline1 tool. You can read more about the R Environment from the official CRAN manual page. After that, you should install R. The official installation guide: R Installation and Administration, is very detailed.

Use one of the links above to jump straight to the installation steps (depending on your system).

### Installing RStudio

Most users do not feel confortable about the commandline experience that the R installation provides and prefer to use a GUI2 instead. If you also feel that way, you can install RStudio:

RStudio comes in different versions: the free desktop release comes with an Open Source license which is cost-free; you should install that version.

Plese note that RStudio is not essential to work with R and will not be required in the upcoming recipes I will publish.

## Basic R concepts

Whether you installed plain vanilla R or RStudio, in both cases you will interface with R via commandline (RStudio will add more graphic elements on top, so the experience will be more comfortable for GUI-users). Using a commandline is the fastest and, therefore, most preferred way developers interact with systems. When you open the R console or RStudio, the first thing you see is a window with a blinking cursor ready to accept your commands.

When you interact with R using the commandline or RStudio, the actions you perform, the interactions you carry on, will all be part of the same session until you exit R.

Let’s now get started and cover the most basic concepts to get familiar with this environment and learn about its features.

### Working Directory

The first thing we should learn about is the concept of Working Directory or, shortly, WD. R will always be started from a certain location in your computer, that directory will become the WD of your current R session. Knowing the WD is important when you load or save stuff from and to your system, the paths you are going to use will be relative to your WD. Try the following:

1. Start R. On Windows and Mac, you can simply click on the program icon in the menu, on Unix you can use the shell.
2. In the commandline, type the following command:
getwd()

The output will give your current WD. You can set a different WD in R by using the setwd command:

setwd("<path-to-new-wd>")

After executing the command, if you run getwd again, it will show the updated WD. The path you specify can be absolute, or relative to the current WD.

The WD is a concept that you will understand better as you start using it. We are going to demostrate more features, and some of them will help you understand better what the WD is used for.

### Variables

A very important concept in R and all programming languages out there is the one of variable. A variable is a place, a box, where we can store something and then retrieve later. We are going to work with a lot of variables in our R projects. To create a variable, the syntax is very simple; in your current R session, create a variable called neper and give it a value:

neper <- 2.71

By using the assignment operator <-, we can place whatever expression (in this case a simple number) on the RHS3 to the variable on the LHS4. If we want to get the value of neper, we can just type the name of the variable in the R console, and that will return its value (after hitting Enter).

We can store many things inside a variable, not just numbers:

my_number <- 10
my_decimal_number <- 3.14
my_string <- "This is a text"
my_array <- c(1, 2, 3, 4)

We can store all kinds of numbers, strings5 and also arrays6.

What are variables useful for? They get handy when we need to (re)use them. The idea is that we create a variable and place a value inside it that we can use again later.

name <- "Mr. Anderson"
print("Hello " + name)

name <- "Neo"
print("Hello " + name)

If you run the code above (type each line in order and press Enter every time), you will get "Hello Mr. Anderson" first and then "Hello Neo" second (just a quick note: when the + operator is used with two strings, it will concatenate7 them). Once created, you can access the value of a variable by simply using the variable itself.

### Understanding the workspace

All the variables you create will be saved in your current workspace. If you want to see what variables you have created and are currently available so far, you can query your workspace to get the list:

objects()

If you run the command above in your R console, this will give you an overview of your workspace. If you are using RStudio, the workspace is always displayed in the top right corner of your RStudio window, inside a tab called: Environment.

#### Saving and restoring the workspace

R will always try to keep you in a safe spot. When you end your R session (by exiting the R console or closing RStudio), you will be prompted to save your workspace image. If you do so, next time you open the R commandline or RStudio in the same working directory, R will automatically restore your previous workspace! It means that R will load all the variables and the command history you had since you left last time.

## Basic R language features

Now that we are familiar with the basics of the R environment, we can move on to introduce some very basic features of the language.

This simplest thing in a programming language is commenting. A comment is a piece of text which will simply be ignored. Comments are useful to add annotations to the code.

# Comments start with a '#' and will cover all the line
v <- "I am creating a variable" # And next to it I can also write a comment

Also, please remember one important rule:

Don’t explain what the code is doing, explain why!

The essence of the prescription above is the following: if you write good code, you will not need comments to explain what the code is doing (the code itself tells you that). You should need comments only to explain why you chose a certain approach over another or to justify your strategies.

### Variables and types

We know what variables are and how to create them. Let’s spend a little more time to understand a few more things about variables, especially in relation to types! R is a loosely typed language, this characteristic comes with certain consequences. The first and most important is that R variables do not have a fixed type! To understand this, consider the following:

v1 <- 3.14
v1 <- 1

v2 <- 1024
v2 <- "Hello World"

In the example above, we create two variables and reassign values. v1 is first assigned with a number and then gets another number. But v2, while initially given a number, is then assigned to a string. What is the lesson here? The lesson is that a variable in R can host whatever value, it is not bound to a type. This is not true in all languages, some of them support fixed typed variables, which means that, when creating a variable, the programmer also has to specify the type of it, and only that type of values can be assigned to that variable. This is not the case of R.

### Functions

In programming, a very important principle is the one of reuse. Programmers are obsessed with duplication and all possible ways to avoid it. A function is a way to remove duplication in code and enhance reuse. A function is a piece of code that you can write once and then use later as many times as you want.

The concept is similar to the one of variable with one difference: a variable stores a value, a function stores code! You have already used a function today, right here in this quick tutorial a few minutes ago: print is a function!

A function can be thought as a black box.

A function accepts an input (the arguments to the function), and returns an output. Consider this example:

my_fun <- function (name) { "Hello " + name }

This function takes a string in input, in an argument variable called name and returns a string as output.

The code above demostrates how to define a function, but how do we use (or, in programming jargon, call or invoke) it?

my_fun("Morpheus")

Using brackets (), after the name of a function, will invoke it. Also note how we pass the value to parameter name.

### Packages

Have I already mentioned that developers are obsessed with reusing code and avoiding duplication? Probably yes. Packages are another mechanism that allows us to reuse stuff that we write. Variables allow us to reuse values. Functions allow us to reuse code. Packages allow us to reuse a whole collection of code files.

We will not enter in the details of how to create packages, but we will talk about how to use them. The community around R is very big and its members have contributed in creating a vast spectrum of packages, each providing functionalities for specific classes of problems. For example, package stringr encapsulates functions to manipulate strings. Package reshape2 provides tools for working with data and manipulating it. And there are many more packages we will discover in the next notes I will publish.

To use a package, the flow is, typically, the following:

1. If the package is not part of the base R installation, it must be downloaded and installed.
2. When we want to use a package, we must import it in the current R workspace to make it accessible.
3. We can start using the package.

Let’s make an example and try to use package stringr in our R session.

#### Installing a package

Of course we are assuming you have just installed R, which implies that package stringr should not be available because it does not come with the base installation8. So we need to download and install it. However, let’s just confirm that. To get the list of installed packages, just use:

installed.packages()

The above command will return a table with package names and versions you have installed. It can be difficult and a bit time-consuming to go through the list, so a more compact list can be generated by using:

rownames(installed.packages())

Function rownames will take the table as input, and display only the list of values in the first column so that we get a more concise list of package names. Again, in this case as well, the list might still be difficult to read, so to check that a specific package is installed or not, just use this:

"<package-name>" %in% rownames(installed.packages())

In our case, we would do:

"stringr" %in% rownames(installed.packages())

The command above will check that a specific value can be found inside the list. Be careful because the comparison is case-sensitive, which means that you must type the package name on the left respecting uppercase and lowercase characters. The command above should return FALSE, which means that the package is not installed. The next step is to install it. To do that, just use the following command:

install.packages("stringr")

The command will automatically fetch the package from the Internet (this of course requires you are connected to the Internet) and copies it into your R installation folder.

#### Using a package

Be aware that installing a package does not mean you can just use it. After you install a package, it becomes locally available. To use it in code, you need to load it into your session.

library(stringr)

The command above will make sure the package specified in brackets becomes directly available in your current session. So that you can use everything that package exposes, directly in your code. To see this in action, let’s first have a look at the package documentation:

?stringr

When you type ?<something>, R will try to fetch the documentation for whatever you specified after the question mark. In this case, we get redirected to the stringr package documentation. Inside there, you can find the functions the package provides; one of those is function str_split, and in its documentation page, we see that it allows us to separate an input string into two strings basing on a separator. Let’s use it.

str_split("Follow the white rabbit", "white")

Since we have loaded package stringr, we can now use all its functions inside our R session. If we hadn’t used library(stringr) before, we couldn’t have accessed str_split directly. To do so, we would have had to use the package name as a prefix for the function like:

stringr::str_split("Follow the white rabbit", "white")

The syntax above is used when we do not want to directly load a package in our R session. There are many reasons for doing that (for example, when two packages we need, both contain a function with the same name).

## That’s it

Believe it or not, this introduction is over. The topics we covered are the most basic things you need to know about R to start playing with it!

1. A commandline is the most basic type of user interfaces that applications use to interact with users. They simply consist of a window where the user can type and submit commands, and a message box where the program displays results.

2. Graphic User Interface.

3. Right Hand Side.

4. Left Hand Side.

5. A string is a sequence of characters.

6. An array is an ordered sequence of elements.

7. Concatenating two strings is a typical operation of generating a third string which is the result of attaching the second to the first. For example, concatenating "Hello " (notice the space at the end) and "world" will result in "Hello world".

8. When installing R or RStudio, only some base packages are downloaded. This is called the base distribution of R. The packages inside the base distribution are necessary to carry on the most basic tasks, for more advanced operations, one can download and install other packages from the Internet.