It is best practice to import all the libraries we are going to use at the beginning of the project, so people reading or reviewing our code know roughly what is coming up so there are no surprises.įor this tutorial, we are only going to use two libraries - MySQL Connector and pandas. pip install pandas Importing LibrariesĪs with every project in Python, the very first thing we want to do is import our libraries. We are also going to be using pandas, so make sure that you have that installed as well. To do this, follow the instructions, or just use pip: pip install mysql-connector-python Once you have those set up, we will need to get them to communicate with each other.įor that, we need to install the MySQL Connector Python library. Here are guides for Mac and Linux users too (although it may vary by Linux distribution). If you are using Windows, this guide will help you get set up. We will be using MySQL Community Server as it is free and widely used in the industry. You can also use Binder to code along with the associated Jupyter Notebook. Just google "how to install Python" if you need further help. I use Anaconda, but there are lots of ways to do this. To code along with this tutorial, you will need your own Python environment set up. Getting Started Requirements & Installation The rest of this article will be devoted to showing you exactly how we can do that. Knowing how to use Python and SQL together will give you even more of an advantage when it comes to working with your data. This means for a working Data Analyst, a strong understanding of SQL is hugely important. But the concepts, structures and operations are largely identical. Note that each RDBMS uses a slightly different flavour of SQL, so SQL code written for one will usually not work in another without (normally fairly minor) modifications. The most widely used relational database management systems (RDBMSs) - Oracle, MySQL, Microsoft SQL Server, PostgreSQL, IBM DB2 - all use the Structured Query Language (SQL) to access and make changes to the data. Relational databases are an extremely efficient, powerful and widely-used way to create, read, update and delete data of all kinds. While there is a massive variety of sources for datasets, in many cases - particularly in enterprise businesses - data is going to be stored in a relational database. So if we're using Python for data analysis, it's worth asking - where does all this data come from? With its (relatively) easy learning curve and versatility, it's no wonder that Python is one of the fastest-growing programming languages out there. We have pandas, NumPy and Vaex for data analysis, Matplotlib, s eaborn and Bokeh for visualisation, and TensorFlow, scikit-learn and PyTorch for machine learning applications (plus many, many more). A huge range of open-source libraries make it an incredibly useful tool for any Data Analyst. If you are not familiar with SQL and the concepts behind relational databases, I would point you towards that series (plus there is of course a huge amount of great stuff available here on freeCodeCamp!) Why Python with SQL?įor Data Analysts and Data Scientists, Python has many advantages. The database and SQL code used here is all from my previous Introduction to SQL series posted on Towards Data Science ( contact me if you have any problems viewing the articles and I can send you a link to see them for free). Let's get into it!Ī quick note before we start: there is a Jupyter Notebook containing all the code used in this tutorial available in this GitHub repository. That is a lot of very useful and very cool stuff.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |