Posted in Big Data

Big Data Post #6 – R (*σωσ)シ

R is a language and environment for statistical computing and graphics. It is a GNU project which is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R.
R provides a wide variety of statistical (linear and nonlinear modeling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participate in that activity.
One of R’s strengths is the ease with which well-designed publication-quality graphical plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.
R is available as Free Software under the terms of the Free Software Foundation’s GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

This programming language was named R, based on the first letter of the first name of the two R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs Language S.

Features of R

As stated earlier, R is a programming language and software environment for statistical analysis, graphics representation, and reporting. The following are the important features of R −
  • R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities.
  • R has an effective data handling and storage facility,
  • R provides a suite of operators for calculations on arrays, lists, vectors, and matrices.
  • R provides a large, coherent and integrated collection of tools for data analysis.
  • R provides graphical facilities for data analysis and display either directly on the computer or printing at the papers.
As a conclusion, R is world’s most widely used statistics programming language. It’s the # 1 choice of data scientists and supported by a vibrant and talented community of contributors. R is taught in universities and deployed in mission-critical business applications. 
Or at least that was my conclusion after my search and I feel that it is my duty to recommend  this language to you because knowing only Java and Hadoop will not make the cut in this tech-crazed world 
( ⚆ _ ⚆ )
Installation for windows  : They come with the documentation so help you… But this link is for Windows only and I highly recommend you to change your OS, to Linux .. actually anything other than windows is totally okay with Big Data learning and implementation…
Binaries for Linux: use the yum command and check for proper installation by typing in R… and if installation is done you will end up in the R Prompter (‘>’ that symbol in the terminal/console/kernel?) and you can install the necessary packages using the install(“package-name”)
R programming is like a scripting language.. somewhat similar to the already discussed language Python

//Directly in the CMD

> myString
> print ( myString)

[1] “Hello, World!”

//RScript file
//save as file.R

# R Programming
print ( myString)

//In CMD

$ Rscript file.R

[1] “Hello, World!”

Okay so that being said , I am just gonna give the basic syntax of the R language and then dive into how to use it to manipulate various elements necessary for Big Data Analytics … 

Okay, I change my mind… Down here is the link to the R file with as much syntax in one file as possible  … So yeah download and look at it maybe… 

Okay, so what will you find in that link provided above…
  • The basic syntax (just like how I have dealt with any other language) starting at distinguishing between variables and going into how to use them to apply logic using looping structures,functions, and so on…

  • And then the R specific operations which I happened to learn under 2 major classification, the Data Interface and The Charts and Graphs
  • And then some implementation oriented examples if I have time to upload maybe 


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s