Tuesday, October 16, 2018

R programming - Introduction and Basics

3 comments

Introduction
This is my first post on R programming. In this series of posts , we will cover the building blocks of R I.e different data types used , Vectors ,Operations on these vectors like Vector Arithmetic, Sorting , indexing , plots , programming basics of R i.e using the conditional operators , for loops , functions etc. I will not be covering installation since there are various existing detailed sources available . 

Why R ?
1.     There are a number of languages that can be used for data analysis. However, since data analysis is an interactive process, an interactive language like R proves highly beneficial.
2.     R has a great mechanism for working with data structures. It is easy to generate graphics in R, which is central to data analysis. Missing values and objects are integral part of R and come very handy when one must deal with real data.
3.     R also includes a package system that allows users to add their individual functionality in a manner that is indistinguishable from the core of R.
4.     R is actively used for statistical computing and design. It has brought about revolutionary improvements in big data and data analytics. It is the most-widely used language in the world of data science! Some of the big shots in the industry like Google, LinkedIn and Facebook, rely on R for many of their operations.

The R environment
R is an integrated suite of software facilities for data manipulation, calculation and graphical display.It is very much a vehicle for newly developing methods of interactive data analysis. R is an environment within which many classical and modern statistical techniques have been implemented. A few of these are built into the base R environment, but many are supplied as packages. There are about 25 packages supplied with R (called “standard” and “recommended” packages) and many more are available through the CRAN family of Internet sites. It has developed rapidly, and has been extended by a large collection of packages. Among other things it has

1.     an effective data handling and storage facility,
2.     a suite of operators for calculations on arrays, in particular matrices,
3.     a large collection of intermediate tools for data analysis,
4.     graphical facilities for data analysis and display 
5.     a well developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities.

I will be using Datacamp to run the R commands in my browser. You can either install R or use datacamp .Once you have installed R , all you have to do is to run simple commands in the R prompt.

Variables and Assignment operator
Before we start working with extensive datasets let look at how we work with variables in R .We can define variables in R like in any other programming language and use these variables to hold values . Here in the below example you are assigning 1 to a using the assignment operator = . You can also use the operator <- which we will be using more frequently .

>a = 1  
>a <- 1

Notice that the assignment operator (‘<-’), points to the object receiving the value of the expression. In most contexts the ‘=’ operator can be used as an alternative.Assignment can also be made using the function assign().

Packages and Libraries
Like mentioned earlier lot of functionality in R is provided in various packages.To be able to use them ,R makes it very easy to install packages from within R itself. For example, to install a package, you would just type 

>install.packages ("<packagename>")

Once you hit Return, R will automatically install this package. You need to be connected to the internet to download the package and install it.Once the package is installed, we can then load the package into our R session using the library function. Once you install a package, it remains in place and only needs to be loaded with library.

>library(<packagename>)

Functions in R
Data analysis can be usually described as a process where we apply a series of functions to the data available. R includes several predefined functions :

Eg :To compute the square root of a number
>sqrt(16)
To compute the log of a number
>log2(8)

R documents its functions . Help files are like user manuals for the predefined functions. They show us the information required for using a function like the arguments(required and optional) that are supposed to be passed to the function call.To see the documentation:
>help(“log”)
>?log  à shorthand notation




From the screenshot , we come to know that the log functions expects x , a value
And the base argument is optional and defaults to 1 if not specified. We can also create our own functions in R which we will see in later posts.

Datasets
There are pre-built datasets that are included with R for users to practice and test the functions .You can see the available data sets using the function:
>data()

There are also other mathematical objects that are pre-built like the value of pi. Refer the documentation for more constants . 
>pi

In the next post we will take a look at the different data types in R and various operations that we can do using them.


3 comments :