Learning Library

← Back to Library

Choosing Python vs R for Data Science

Key Points

  • Your choice between Python and R should depend on factors like prior programming experience, the importance of visualizations, the type of analysis (ML vs. statistical), and what your teammates are already using.
  • Python, released in 1989, is a general‑purpose, object‑oriented language prized for readability and backed by popular libraries such as NumPy, pandas, TensorFlow, and a Jupyter notebook workflow.
  • R, introduced in 1992, is purpose‑built for statistical analysis and graphics, offering a rich ecosystem of CRAN packages, strong data‑modeling tools, and an RStudio IDE for reporting and visualization.
  • Both languages are open source with vibrant communities, so the real advantage lies in leveraging each where it excels—e.g., using R for customer‑behavior analytics and Python for building machine‑learning or computer‑vision applications.
  • Ultimately, rather than picking one “the best” language, most data scientists benefit from a hybrid approach, selecting the tool that best fits the specific problem at hand.

Full Transcript

# Choosing Python vs R for Data Science **Source:** [https://www.youtube.com/watch?v=4lcwTGA7MZw](https://www.youtube.com/watch?v=4lcwTGA7MZw) **Duration:** 00:07:08 ## Summary - Your choice between Python and R should depend on factors like prior programming experience, the importance of visualizations, the type of analysis (ML vs. statistical), and what your teammates are already using. - Python, released in 1989, is a general‑purpose, object‑oriented language prized for readability and backed by popular libraries such as NumPy, pandas, TensorFlow, and a Jupyter notebook workflow. - R, introduced in 1992, is purpose‑built for statistical analysis and graphics, offering a rich ecosystem of CRAN packages, strong data‑modeling tools, and an RStudio IDE for reporting and visualization. - Both languages are open source with vibrant communities, so the real advantage lies in leveraging each where it excels—e.g., using R for customer‑behavior analytics and Python for building machine‑learning or computer‑vision applications. - Ultimately, rather than picking one “the best” language, most data scientists benefit from a hybrid approach, selecting the tool that best fits the specific problem at hand. ## Sections - [00:00:00](https://www.youtube.com/watch?v=4lcwTGA7MZw&t=0s) **Choosing Between Python and R** - A quick decision guide helps listeners pick Python or R for data science based on their programming background, visualization needs, problem type, and team usage, while also discussing how to leverage both languages together. ## Full Transcript
0:00python is an open source 0:03programming language commonly used in 0:06data science 0:07as is 0:10are 0:11which one should you be using 0:13at this point you might be expecting a 0:15fence sitting well it depends kind of 0:19answer but no i'm going to tell you 0:22exactly which one to pick 0:24right now so here goes 0:27i ask you a question and based on your 0:28answer you'll know 0:30which language to go for ready 0:34okay so do you have much in the way of 0:36programming experience 0:38none 0:39use r 0:40sum go for python lots 0:43r again i'll i'll explain 0:46okay question two do you care about 0:49awesome looking visualizations and 0:51graphics if yes 0:53go with r 0:55what about the problem you're trying to 0:56solve machine learning stuff go with 0:59python statistical learning r is your 1:03best bet and finally what do most of 1:06your colleagues use 1:08use that 1:10glad to get that off of my chest now we 1:12could all just finish here and go about 1:14our day but i'd like to explain a little 1:16bit more about what these two languages 1:19are and how they're best put to use 1:21because increasingly the question isn't 1:24which to choose but how to make the best 1:27use of both programming languages for 1:29your specific use cases 1:32so let's start with the slightly older 1:35of the two which is python 1:38now python was released in 1989 1:43and it's a general purpose 1:45object-oriented programming language 1:47that emphasizes code readability through 1:50its oh-so-generous use of white space 1:53and it's super popular just behind java 1:56and c in popularity in fact 1:59there are some awesome libraries that 2:01support data science tasks so for 2:03example we have numpty 2:07it's actually num pi 2:10num t 2:11that's british slang 2:13for an idiot 2:16num t 2:17and numpty is used for large dimensional 2:20arrays and then for data manipulation we 2:23have pandas 2:27there are also specialized tools for 2:29deep learning so you can use things like 2:32tensorflow 2:36and you'll often find yourself working 2:38with python in 2:40jupyter 2:44notebooks 2:47as your ide 2:49now let's compare that 2:51to r 2:52which is optimized for statistical 2:55analysis and data visualization so it 2:57was developed just a little later in 3:001992 3:03and it has a rich ecosystem with complex 3:05data models and elegant tools for data 3:07reporting there are thousands of 3:09packages available via the comprehensive 3:12r archive network otherwise known as 3:15cran 3:17and these things are for deep analytical 3:20tasks 3:21now r provides a broad variety of 3:23libraries and tools for things like 3:25cleansing data creating visualizations 3:27and training deep learning algorithms 3:29and r is commonly used with our 3:32studio which is an integrated 3:35development environment for simplified 3:37statistical analysis visualization and 3:39reporting so 3:41both r and python are open source and 3:44are supported by large communities 3:46continuously extending their libraries 3:48and tools 3:50really the biggest differentiator is how 3:52they are used and r as i've mentioned is 3:55mainly used for statistical analysis 3:57while python provides a more general 3:59approach to data wrangling you might use 4:01r for customer behavior analysis and 4:04then you might use python to build a 4:06facial recognition application 4:09now right up front i said if you have no 4:11programming experience 4:13or quite a lot of programming experience 4:16r was the better bet 4:18if you fall somewhere in between then 4:21python is easier to pick up but how can 4:23how can that be 4:25well 4:25python is 4:27multi-purpose it's considered a 4:30multi-purpose 4:32language 4:35much like c plus and java are and it has 4:37a readable syntax that's easy to learn 4:40it's considered a good language for 4:42beginner programmers or those with 4:44experience in similar languages now r on 4:47the other hand is built by statisticians 4:50and leans heavily into statistical 4:52models and specialized 4:57specialized analytics 5:00now novices can be running data analysis 5:02tasks within minutes with just a few 5:05lines of code using r but the complexity 5:08of advanced functionality in r makes it 5:10more difficult to develop expertise 5:13now a few other considerations to keep 5:15in mind and they all relate specifically 5:17to 5:19data 5:21now when it comes to 5:22data collection 5:25so actually gathering the data in the 5:27first place python supports all kinds of 5:30data formats from comma separated value 5:33files or csv files to jyson source from 5:36the web in contrast r is designed for 5:38data analysts to import to data from 5:41things like excel and text files 5:44now for data exploration 5:47then you can use the pandas library to 5:50filter sort and display data in a matter 5:52of seconds if you use python and r on 5:55the other hand is optimized for 5:56statistical analysis so you can build 5:58probability distributions or apply 6:00different statistical models 6:02and then finally data modeling 6:06has some differences too python has 6:09libraries for data modeling like numpty 6:11in r you'll sometimes have to rely on 6:13packages outside of r's core 6:15functionality 6:17did i see finally there's one more and 6:19that's visualization and with 6:21visualization r has the clear edge with 6:24a base graphics module allowing you to 6:27easily create basic charts and plots and 6:30you can use ggplot2 for more advanced 6:32plots such as complex scatter plots with 6:35regression lines 6:37r and python have their strengths but in 6:40truth 6:41most organizations use a combination of 6:44both languages you might conduct early 6:46stage data analysis and exploration in r 6:49and then switch to python when it's time 6:51to ship some data products so which 6:54should you use 6:56both you're probably going to use a bit 6:59of both 7:00and if you want to see more videos like 7:02this in the future please like and 7:04subscribe thanks for watching