Variables in R can be different types and we
need to distinguish numbers from character strings and tables from simple lists
of numbers .To distinguish numbers from characters we always use quotes("") for characters. The function class() helps us determine the type of an object .
Storing Data in R – Data Frames
Up to now we have defined variables in R ,
but the most common way of storing data in R is by using data frames . We can
think of data frames as tables. Rows represent observations , and different
columns are represented by different variables . R shows the data type for such
objects as data.frame and to find out more about these objects we can use the
function str() which stands for structure.In our first post we have seen
how to load datasets.The example below shows a dataset for CO2 emissions data
for each plant.Here is the structure of this dataset:
The output shows us the no.of observations , no.of rows , and variables , variable names etc. This
is what is going to help us answer data analysis questions on this data.We can show the first
6 lines of this data frame using the fn head().
Data Accessor - '$'
For our analysis , we will need to access the
different variables in this data. We use the accessor symbol ‘$’ to access
these variables. To access the variable Plant in this dataser we type the dataset name i.e CO2 followed by the accessor ($) and the variable
name(Plant) as shown below :
We can also use the below syntax to access a variable .
>b <- CO2[["Plant"]]
The output shown in above is what is called a
vector . It is not a single value . A vector may have several entries and the
function length() tells you how many entries it has .The vector above is of length 84 .
Other data types
Logicals
Besides Numeric and Character vectors we also have logical vectors which store the value TRUE or FALSE . We will see
these examples in later posts.
these examples in later posts.
Factors
There is one more important data type which is called “Factors”. In the CO2 dataset we have the columns Type and Treatment .
Seeing the data we would think that the class for these columns would be Character , but it’s actually “factor”. This data type
appears frequently in R and data science. Factors are useful for storing categorical data.There are only 2 categories for
these variables Type and Treatment . Storing categorical data this way is more memory efficient and in the background R stores
the levels as integers.Integers are smaller memory wise than characters.If we want to see the different categories we use :
Seeing the data we would think that the class for these columns would be Character , but it’s actually “factor”. This data type
appears frequently in R and data science. Factors are useful for storing categorical data.There are only 2 categories for
these variables Type and Treatment . Storing categorical data this way is more memory efficient and in the background R stores
the levels as integers.Integers are smaller memory wise than characters.If we want to see the different categories we use :
>levels(CO2$Type)
>levels(CO2$Treatment)
No comments :
Post a Comment