Complex datasets are usually broken down into
components that are vectors. For example, in a dataframe such as the CO2
emissions seen in the earlier post , each column is a vector.
Creating Numeric and Character Vectors
There are different ways to create a vector
and one of them is the concatenate function c():
>country
<- c(“Italy” , “canada” , “egypt”)
>codes <- c(100, 200 , 300)
Name the columns of a Vector
As shown in the screenshot we can assign
names to codes .Here we have associated the number 100 with name “Italy” , 200
with “Canada” etc .. and the object codes continues to be a numeric vector. We
can also use the names() function to assign names .
>names(codes> <- country
Generate Numeric Sequences
Another important function for generating sequences
is seq() which takes 2 parameters , the first parameter is the start of the sequence and the second parameter is the end of the sequence. There is also a third parameter that tells the sequence how much to jump by , in the example below it is 2 .So the result would print out all the odd numbers from 1 to 10.
>seq(1,10)
The shorthand notation for this is :
>1:10
>seq(1,10,2)
Access specific elements or parts of a vector
You will often have to access a specific
element or parts in a vector. This is called subsetting , which is a very important
topic. For example, if we want to access the second element in our vector codes
, use the square brackets and the index
> codes[2]
You can access more than one entry by using a
multi-entry vector as an index. Suppose , we want to access the first and third
element in the vector codes :
>codes[c(1,3)]
Suppose you want to access a sequence of
elements :
>codes[1:2]
We can also access the elements using names :
>codes[“canada”]
>codes[c(“egypt”,”Italy”)]
Coerce data into different data types as needed
If we try to combine Characters and say
numbers in a vector , you might expect an error , but R behaves differently
from most of the languages. If we assign 1, “Canada” and 3 to a variable say x
, we don’t get an error . We don’t even get a warning.
>x <- c(1 , “Canada” , 3)
If we print x , we can see that R has converted
the numbers 1 and 3 to character strings and the class of x is Character . We call
this process as coercion. Here R has guessed that we meant to put 1 and 3 as
characters . R also offers functions to force specific coercion , i.e you can
turn numbers into Characters using as.character().
>x <- seq(1,5)
>y <- as.character(x)
Missing data
Missing data in datasets is very common in
practice. In R , we have a special value for missing data ,i.e NA. We can get
NA’s from coercion . For eg, when R fails to coerce something , we get NA.
>x <- c(1,”b”,3)
>y <- as.numeric(x)
As a data scientist, you will encounter the
NA often , as they are used a missing data.This is very common in real life data
sets. So be sure to know what NA means and be ready to see a lot of them.
Nice article. I liked very much. All the informations given by you are really helpful for my research. keep on posting your views.
ReplyDeleteJava Training in Chennai
Java course in Chennai
Big Data Course in Chennai
Software Testing Training in Chennai
Selenium Training in Chennai
Java Training in Tnagar
Java Training in OMR