Supervised learning is a popular concept of machine learning applied to real-life situations. As the name suggests, we need to supervise our machine while it is learning or getting trained to work on its own. For this, we require datasets (labeled training data) for making predictions. These datasets consists of input and output values. Predictions are made on the basis of these datasets. We build a model that makes predictions based on past data for new datasets. Suppose we provide pictures of animals and specify them. When our machine learns to make accurate prediction then we provide new pictures of animals. This time the machine will make prediction for new pictures based on the past training.
So, basically it is a method of facilitating machines to classify objects, situations or problems based on whatever the data is fed into the machines. In this, we feed the machines with patterns, color, dimensions of objects, people or situations or other information as data until the machines execute classifications accurately. We use supervised learning when we have a specified target value that we want to predict.
Supervised learning has two types:
Let us talk about them in detail.
1. Classification: In classification we basically look for an output that is in the form of either ‘yes’ or ‘no’ or ‘black’ or ‘white’. When we talk about a classified model then we try to draw some conclusion from the observation. It helps in determining what category a particular thing belongs to. In classification, the target values are in the form of labeled and categorical data. The Gmail spam filtering method falls under this category. With the help of spam filtering method Gmail easily classifies mails as spam or not spam. The e-mail clients use spam filters to keeps the users away from spam mail. These spam filters are updated from time to time.
Similarly, we can classify objects into different categories on the basis of their features or properties. We will pass the picture of objects for training our model and we will specify them. This is basically known as image classification. After that we pass new/unseen picture of objects. And then we can get the targeted output. Classification has different algorithms. They are as:
· Logistic Regression: Though it has Regression word in it but it is not used for regression. It is used for classification. By using logistic regression we can predict discrete values. It is a simple machine learning approach used for predicting the value of numerical categorical variable on the basis of its relationship with the predictor variables. There are few types of logistic regression:
- Binary Logistic Regression: It has only two possible results. Example: Spam or Not Spam, Yes or No
2. Ordinal Logistic Regression: It has result in the form of ordering. Example: Ratings from 1 to 10.
3. Multinomial Logistic Regression: It has three or more category. But has no ordering. Example: Which color will be the most preferred- black, blue, purple or brown.
For predicting, to which category a particular data belongs to we set a threshold. On the basis of threshold the classification is done. For example: if predictedvalue > 2 then mark mail as spam else not spam.
· Decision Tree: It is a simple structure whose non terminal nodes represent test on one or more attributes and terminal nodes reflect decision outcomes. It is built by selecting subset of instances from a training sets. The remaining instances test the accuracy of the constructed tree. If the decision tree classifies the instances correctly, the procedure terminates. If the instances are incorrectly classified, the instances are added to the selected subset of training instances and a new tree is constructed. This process continues until a tree that correctly classifies all non selected instances is created or a decision tree is built from the entire training set.
For this let us take an example- Suppose a man wants to know whether a guy will watch his show or not. Now he will collect the information about what kind of shows did the guy watched in past and those attributes that will describe the choice of the guy. The man will put these information into the decision tree, from where he will get some rule. By this the man can predict if the guy will watch his show or not.
· Naïve Bayes: Naïve Bayes algorithm is based upon Bayes’ Theorem. Naïve Bayes algorithm can be quite useful for large datasets. It predicts relationship probabilities for every class, like: the probability that a given record or data fits into a particular class or not. The class having the highest probability is known to be the most likely class. The assumption in Naïve Bayes classifier is that it supposes all features are unrelated to each other and the presence or absence of any attribute would not affect the absence or presence of another attribute.
Suppose any animal can be considered as cow based on its four legs, tail, height, color and horn. Though these features depend on each other or on some other features, but the Naïve Bayes classifier will take these features as independent features contributing to the probability that the animal is cow.
· Random Forest:
As the name says, it makes forest by taking number of trees. As we know, we can call a forest to be a good forest only if there is huge number of trees. The similar thing happens in random forest too. The higher number of trees in random forest, the more accurate result will be generated by this approach. Using random forest we can also handle the missing values. Suppose there’s a lady called Ria who wants to start reading novels. So, she will ask from her friend who reads novels. Her friend will ask Ria about her genre or what she likes and on the basis of this she will suggest her few names and will tell her which were the best among those which she read. Now Ria will again consult few more friends and her friends will too ask her more questions about her taste. On the basis of Ria’s interest they will recommend her few novels. Now Ria will choose the novel which was most suggested. So, this is Random forest algorithm.
· K Nearest Neighbor: The K Nearest Neighbor the predictions made are based on at what extent the training observations are similar to the new observations. In this, we put a new data point into its most likely neighboring group. The K in KNN is the integer value greater than 1. Whenever we have a new data point that we want to classify, we will compute to find which neighboring group it is closest to.
In the picture we can see that there are triangles and starts. The green colored square has to be put in one of the two groups. So we encircle the square. The triangles are more in number than the star. Hence, the square will be put into the triangle group.
The KNN has some assumptions like:
§ The datasets has little noise.
§ Datasets has relevant features and is labeled.
The KNN algorithm is used in databases where data points have to be separated into various classes for predicting the classification of a new data point. We use KNN in stock price prediction and recommendation system.
2. Regression: In regression, we have target variable as numerical data. We use past data to predict some value rather than classifying them. In classification approach we had the output value as well with input value in the training dataset. But in regression we don’t have output value in the training data set. For example: Suppose you are moving to a new city and you want to know what the price of flats in that particular city is? So, in this we get values indicating the prices of flats.
In this approach we have an independent variable x, on the basis of x we calculate the value of a dependent variable y.
· Linear Regression:
It is a linear modeling approach to find relationship between one or more independent variables (predictors) denoted as x and a dependent variable (target) denoted as y. We call it linear because the equation has no non-linear component. It can be called as a statistical machine learning method that we use to quantify make prediction based on the relationship between numerical variables. It is all about finding the best fit line in a recursive manner. But what is this best fit line.
We can see very clearly there are so many points on the graph. If we are able to draw a line which has minimum distance from all the points then the line is known to be the best fit regression line.
Reference: Data Mining, Richard Roiger and Michael Greatz, Pearson