Wednesday, October 24, 2018

R - Working with Vectors

1 comment

Complex datasets are usually broken down into components that are vectors. For example, in a dataframe such as the CO2 emissions seen in the earlier post , each column is a vector.  


Creating Numeric and Character Vectors
There are different ways to create a vector and one of them is the concatenate function c():
>country <- c(“Italy” , “canada” , “egypt”)
>codes     <- c(100, 200 , 300)
  

Name the columns of a Vector
As shown in the screenshot we can assign names to codes .Here we have associated the number 100 with name “Italy” , 200 with “Canada” etc .. and the object codes continues to be a numeric vector. We can also use the names() function to assign names .
>names(codes> <- country



Generate Numeric Sequences
Another important function for generating sequences is seq() which takes 2 parameters  , the first parameter is the start of the sequence and the second parameter is the end of the sequence.There is also a third parameter that tells the sequence how much to jump by , in the example below it is 2 .So the result would print out all the odd numbers from 1 to 10.
>seq(1,10)
The shorthand notation for this is :
>1:10  
>seq(1,10,2)  



Access specific elements or parts of a vector
You will often have to access a specific element or parts in a vector. This is called subsetting , which is a very important topic. For example, if we want to access the second element in our vector codes , use the square brackets and the index
> codes[2]

You can access more than one entry by using a multi-entry vector as an index. Suppose , we want to access the first and third element in the vector codes :
>codes[c(1,3)]

Suppose you want to access a sequence of elements :
>codes[1:2]

We can also access the elements using names :
>codes[“canada”]

>codes[c(“egypt”,”Italy”)]

Coerce data into different data types as needed
If we try to combine Characters and say numbers in a vector , you might expect an error , but R behaves differently from most of the languages. If we assign 1, “Canada” and 3 to a variable say x , we don’t get an error . We don’t even get a warning.
>x <- c(1 , “Canada” , 3)


If we print x , we can see that R has converted the numbers 1 and 3 to character strings and the class of x is Character . We call this process as coercion. Here R has guessed that we meant to put 1 and 3 as characters . R also offers functions to force specific coercion , i.e you can turn numbers into Characters using as.character().
>x <- seq(1,5)
>y <- as.character(x) 

Missing data
Missing data in datasets is very common in practice. In R , we have a special value for missing data ,i.e NA. We can get NA’s from coercion . For eg, when R fails to coerce something , we get NA.
>x <-  c(1,”b”,3)

>y <- as.numeric(x)

As a data scientist, you will encounter the NA often , as they are used a missing data.This is very common in real life data sets. So be sure to know what NA means and be ready to see a lot of them.

Refer the R Manual for more details on Vectors and Vector arithmetic.

1 comment :