How to Get Started with Supervised Learning

Hello, in this article, I will be giving you a brief introduction on Supervised Learning and then, I will show you a very easy 6-line code to implement the code.

Theory

Supervised Learning occurs when we are provided with a dataset and we train our model according to that data. The dataset is generally divided into two sets called as training data and test data. As the name suggests, we use the training data to train our model and then, test the accuracy of our trained model with the test data. In case of Supervised Learning, our training data is labeled already, unlike unsupervised learning.

With supervised learning, each piece of data is passed to the model as input model, also called as sample and label.  So, the model learns mapping from the given inputs to particular outputs based on what it’s learning from the labelled output. Once the model is trained, the model is given a test data in order to find out the accuracy of our model.

This is the summary of Supervised Learning. There are many different types of algorithms that are used to implement Supervised Learning. Some of the commonly used are as follows:

  • Nearest Neighbor
  • Naïve Bayes
  • Decision Trees
  • Linear Regression
  • Support Vector Machines
  • Neural Networks

These are few of the many that are in the arsenal of the machine learning algorithms.

Experiment

Let’s try making a model that will use Supervised Learning and is easy to make. In this example, I will be using the decision tree algorithm.

I will be writing a code that will help us differentiate between apples and oranges. It will only be taking 6 lines of code. This is the magic of machine learning. If we were to write the code for differentiating between two, the code would have been really long, as we had to think of so many test cases and the conditions in order to differentiate. Even then there will be some cases where our code will break. Also, if there is a new differentiation between two different kinds of fruits, we have to start again whereas in case of our model, we just need to provide it with new data and some time to let it train itself and we got ourselves a new differentiator. Let’s start with it.

I will be using Anaconda as environment and will be coding in jupyter notebook.

I will be using tensorflow and scikit-learn in this code.

Datasets: Our dataset is really small. It has two features that is differentiating between apples and oranges. We are using Weight and Texture as differentiating features. The dataset also has labels which tells us which fruit it is.

 

Here is the example of our dataset:

While writing in the code, we have to change the strings into numbers, so that they can be fed to our model. The first two columns are called features and the last one is the label. I am changing smooth as 1 and bumpy as 0. And I used apple as 0 and orange as 1.

Later, we train our model using decision tree classifier. This is how it will work:

This is how our final code looks like:

Since, it is our first time, we have used a smaller dataset. Let’s hope that we will be able to make some really cool models that will use huge datasets.

I hope you liked this article and if you have any questions, please let me know. I hope that I will also find some really cool algorithms and will try to find time to share it with you guys. Thank you for reading till end and let’s grow together.

Related posts