The Spark Foundation

By: Karan

Task-2 Prediction using Unsupervised ML, Visulization

Problem statment

● predict the optimum number of clusters and represent it visually.

● dataset: https://bit.ly/3kXTdox

In [1]:
import pandas as pd
from matplotlib import pyplot as plt
import numpy as np

Data analysis

In [2]:
data=pd.read_csv("iris.csv")
data
Out[2]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 Iris-virginica
146 147 6.3 2.5 5.0 1.9 Iris-virginica
147 148 6.5 3.0 5.2 2.0 Iris-virginica
148 149 6.2 3.4 5.4 2.3 Iris-virginica
149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

we haveto predict the class lable ie. Species wrt the 4 data points | Specied id dependent variable

(SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm)

Data cleaning

In [4]:
# no of indepenent data for each Species   
data["Species"].value_counts()
Out[4]:
Iris-virginica     50
Iris-versicolor    50
Iris-setosa        50
Name: Species, dtype: int64

its a balanced dataset with each of 50 points

In [6]:
data.plot(x='SepalLengthCm',y='SepalWidthCm')
data
Out[6]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 Iris-virginica
146 147 6.3 2.5 5.0 1.9 Iris-virginica
147 148 6.5 3.0 5.2 2.0 Iris-virginica
148 149 6.2 3.4 5.4 2.3 Iris-virginica
149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

Instead using default line graph, we use scatter plot

visilisation

In [8]:
data.plot(kind='scatter',x='SepalLengthCm',y='SepalWidthCm');
data         # 2-d array , sl and sw
Out[8]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 Iris-virginica
146 147 6.3 2.5 5.0 1.9 Iris-virginica
147 148 6.5 3.0 5.2 2.0 Iris-virginica
148 149 6.2 3.4 5.4 2.3 Iris-virginica
149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

we have 4 independent variables ie we have 4c2 = 6 (4=independent variable 2= x,y axis)

sl=sw, sl=pw, sl=pl, sw=pl, sw=pw, pl=pw

Pair plot

In [10]:
import seaborn as sns
sns.pairplot(data,hue="Species" ,vars = ["SepalLengthCm", "SepalWidthCm","PetalLengthCm","PetalWidthCm"])
plt.show()

we have much clear picture in 12 plot ie (pl and pw)

ploting histogram, pdf cdf, boxplot, violon for better understanding

observation in cm

if PetalLength is<=2 and PetalWidth is<=1 result is Iris Setosa

if PetalLength is<=5 and >=3 and PetalWidth is<=2 and >=1 result is Iris Versicolor

if PetalLength is<=7 and >=5 and PetalWidth is<=3 and >=1-5 result is Iris Virginica

In [13]:
sns.pairplot(data,hue="Species" ,vars = ["PetalWidthCm"])     

plt.show()

data pre processing

In [14]:
data.corr()       #corelation
Out[14]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
Id 1.000000 0.716676 -0.397729 0.882747 0.899759
SepalLengthCm 0.716676 1.000000 -0.109369 0.871754 0.817954
SepalWidthCm -0.397729 -0.109369 1.000000 -0.420516 -0.356544
PetalLengthCm 0.882747 0.871754 -0.420516 1.000000 0.962757
PetalWidthCm 0.899759 0.817954 -0.356544 0.962757 1.000000
In [15]:
corr=data.corr()
fig, axis=plt.subplots(figsize=(10,10))
sns.heatmap(corr,annot=True)              # ploting corelation 
Out[15]:
<matplotlib.axes._subplots.AxesSubplot at 0x1e015478>

observation= 0.96 or 96% is the highest corelation value


Machine Learning

In [16]:
# lable encoder used to convert the dependent column ie Specirs into numeric values to Machine readable form
In [17]:
data.describe()
Out[17]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm
count 150.000000 150.000000 150.000000 150.000000 150.000000
mean 75.500000 5.843333 3.054000 3.758667 1.198667
std 43.445368 0.828066 0.433594 1.764420 0.763161
min 1.000000 4.300000 2.000000 1.000000 0.100000
25% 38.250000 5.100000 2.800000 1.600000 0.300000
50% 75.500000 5.800000 3.000000 4.350000 1.300000
75% 112.750000 6.400000 3.300000 5.100000 1.800000
max 150.000000 7.900000 4.400000 6.900000 2.500000
In [18]:
data.plot(kind='box')   # outliers
Out[18]:
<matplotlib.axes._subplots.AxesSubplot at 0x1e09be08>
In [19]:
#converting species into numeric max unique values

def train_species(x):
    if x == 'Iris-setosa':
        return 0
    if x =='Iris-versicolor':
        return 1
    if x =='Iris-virginica':
        return 2
In [20]:
data['output']=data['Species'].apply(train_species)
del data["Species"]
data
Out[20]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm output
0 1 5.1 3.5 1.4 0.2 0
1 2 4.9 3.0 1.4 0.2 0
2 3 4.7 3.2 1.3 0.2 0
3 4 4.6 3.1 1.5 0.2 0
4 5 5.0 3.6 1.4 0.2 0
... ... ... ... ... ... ...
145 146 6.7 3.0 5.2 2.3 2
146 147 6.3 2.5 5.0 1.9 2
147 148 6.5 3.0 5.2 2.0 2
148 149 6.2 3.4 5.4 2.3 2
149 150 5.9 3.0 5.1 1.8 2

150 rows × 6 columns

In [21]:
from sklearn.preprocessing import MinMaxScaler
scaler=MinMaxScaler()
scaled_data=scaler.fit_transform(data)
scaled_data
Out[21]:
array([[0.        , 0.22222222, 0.625     , 0.06779661, 0.04166667,
        0.        ],
       [0.00671141, 0.16666667, 0.41666667, 0.06779661, 0.04166667,
        0.        ],
       [0.01342282, 0.11111111, 0.5       , 0.05084746, 0.04166667,
        0.        ],
       [0.02013423, 0.08333333, 0.45833333, 0.08474576, 0.04166667,
        0.        ],
       [0.02684564, 0.19444444, 0.66666667, 0.06779661, 0.04166667,
        0.        ],
       [0.03355705, 0.30555556, 0.79166667, 0.11864407, 0.125     ,
        0.        ],
       [0.04026846, 0.08333333, 0.58333333, 0.06779661, 0.08333333,
        0.        ],
       [0.04697987, 0.19444444, 0.58333333, 0.08474576, 0.04166667,
        0.        ],
       [0.05369128, 0.02777778, 0.375     , 0.06779661, 0.04166667,
        0.        ],
       [0.06040268, 0.16666667, 0.45833333, 0.08474576, 0.        ,
        0.        ],
       [0.06711409, 0.30555556, 0.70833333, 0.08474576, 0.04166667,
        0.        ],
       [0.0738255 , 0.13888889, 0.58333333, 0.10169492, 0.04166667,
        0.        ],
       [0.08053691, 0.13888889, 0.41666667, 0.06779661, 0.        ,
        0.        ],
       [0.08724832, 0.        , 0.41666667, 0.01694915, 0.        ,
        0.        ],
       [0.09395973, 0.41666667, 0.83333333, 0.03389831, 0.04166667,
        0.        ],
       [0.10067114, 0.38888889, 1.        , 0.08474576, 0.125     ,
        0.        ],
       [0.10738255, 0.30555556, 0.79166667, 0.05084746, 0.125     ,
        0.        ],
       [0.11409396, 0.22222222, 0.625     , 0.06779661, 0.08333333,
        0.        ],
       [0.12080537, 0.38888889, 0.75      , 0.11864407, 0.08333333,
        0.        ],
       [0.12751678, 0.22222222, 0.75      , 0.08474576, 0.08333333,
        0.        ],
       [0.13422819, 0.30555556, 0.58333333, 0.11864407, 0.04166667,
        0.        ],
       [0.1409396 , 0.22222222, 0.70833333, 0.08474576, 0.125     ,
        0.        ],
       [0.14765101, 0.08333333, 0.66666667, 0.        , 0.04166667,
        0.        ],
       [0.15436242, 0.22222222, 0.54166667, 0.11864407, 0.16666667,
        0.        ],
       [0.16107383, 0.13888889, 0.58333333, 0.15254237, 0.04166667,
        0.        ],
       [0.16778523, 0.19444444, 0.41666667, 0.10169492, 0.04166667,
        0.        ],
       [0.17449664, 0.19444444, 0.58333333, 0.10169492, 0.125     ,
        0.        ],
       [0.18120805, 0.25      , 0.625     , 0.08474576, 0.04166667,
        0.        ],
       [0.18791946, 0.25      , 0.58333333, 0.06779661, 0.04166667,
        0.        ],
       [0.19463087, 0.11111111, 0.5       , 0.10169492, 0.04166667,
        0.        ],
       [0.20134228, 0.13888889, 0.45833333, 0.10169492, 0.04166667,
        0.        ],
       [0.20805369, 0.30555556, 0.58333333, 0.08474576, 0.125     ,
        0.        ],
       [0.2147651 , 0.25      , 0.875     , 0.08474576, 0.        ,
        0.        ],
       [0.22147651, 0.33333333, 0.91666667, 0.06779661, 0.04166667,
        0.        ],
       [0.22818792, 0.16666667, 0.45833333, 0.08474576, 0.        ,
        0.        ],
       [0.23489933, 0.19444444, 0.5       , 0.03389831, 0.04166667,
        0.        ],
       [0.24161074, 0.33333333, 0.625     , 0.05084746, 0.04166667,
        0.        ],
       [0.24832215, 0.16666667, 0.45833333, 0.08474576, 0.        ,
        0.        ],
       [0.25503356, 0.02777778, 0.41666667, 0.05084746, 0.04166667,
        0.        ],
       [0.26174497, 0.22222222, 0.58333333, 0.08474576, 0.04166667,
        0.        ],
       [0.26845638, 0.19444444, 0.625     , 0.05084746, 0.08333333,
        0.        ],
       [0.27516779, 0.05555556, 0.125     , 0.05084746, 0.08333333,
        0.        ],
       [0.28187919, 0.02777778, 0.5       , 0.05084746, 0.04166667,
        0.        ],
       [0.2885906 , 0.19444444, 0.625     , 0.10169492, 0.20833333,
        0.        ],
       [0.29530201, 0.22222222, 0.75      , 0.15254237, 0.125     ,
        0.        ],
       [0.30201342, 0.13888889, 0.41666667, 0.06779661, 0.08333333,
        0.        ],
       [0.30872483, 0.22222222, 0.75      , 0.10169492, 0.04166667,
        0.        ],
       [0.31543624, 0.08333333, 0.5       , 0.06779661, 0.04166667,
        0.        ],
       [0.32214765, 0.27777778, 0.70833333, 0.08474576, 0.04166667,
        0.        ],
       [0.32885906, 0.19444444, 0.54166667, 0.06779661, 0.04166667,
        0.        ],
       [0.33557047, 0.75      , 0.5       , 0.62711864, 0.54166667,
        0.5       ],
       [0.34228188, 0.58333333, 0.5       , 0.59322034, 0.58333333,
        0.5       ],
       [0.34899329, 0.72222222, 0.45833333, 0.66101695, 0.58333333,
        0.5       ],
       [0.3557047 , 0.33333333, 0.125     , 0.50847458, 0.5       ,
        0.5       ],
       [0.36241611, 0.61111111, 0.33333333, 0.61016949, 0.58333333,
        0.5       ],
       [0.36912752, 0.38888889, 0.33333333, 0.59322034, 0.5       ,
        0.5       ],
       [0.37583893, 0.55555556, 0.54166667, 0.62711864, 0.625     ,
        0.5       ],
       [0.38255034, 0.16666667, 0.16666667, 0.38983051, 0.375     ,
        0.5       ],
       [0.38926174, 0.63888889, 0.375     , 0.61016949, 0.5       ,
        0.5       ],
       [0.39597315, 0.25      , 0.29166667, 0.49152542, 0.54166667,
        0.5       ],
       [0.40268456, 0.19444444, 0.        , 0.42372881, 0.375     ,
        0.5       ],
       [0.40939597, 0.44444444, 0.41666667, 0.54237288, 0.58333333,
        0.5       ],
       [0.41610738, 0.47222222, 0.08333333, 0.50847458, 0.375     ,
        0.5       ],
       [0.42281879, 0.5       , 0.375     , 0.62711864, 0.54166667,
        0.5       ],
       [0.4295302 , 0.36111111, 0.375     , 0.44067797, 0.5       ,
        0.5       ],
       [0.43624161, 0.66666667, 0.45833333, 0.57627119, 0.54166667,
        0.5       ],
       [0.44295302, 0.36111111, 0.41666667, 0.59322034, 0.58333333,
        0.5       ],
       [0.44966443, 0.41666667, 0.29166667, 0.52542373, 0.375     ,
        0.5       ],
       [0.45637584, 0.52777778, 0.08333333, 0.59322034, 0.58333333,
        0.5       ],
       [0.46308725, 0.36111111, 0.20833333, 0.49152542, 0.41666667,
        0.5       ],
       [0.46979866, 0.44444444, 0.5       , 0.6440678 , 0.70833333,
        0.5       ],
       [0.47651007, 0.5       , 0.33333333, 0.50847458, 0.5       ,
        0.5       ],
       [0.48322148, 0.55555556, 0.20833333, 0.66101695, 0.58333333,
        0.5       ],
       [0.48993289, 0.5       , 0.33333333, 0.62711864, 0.45833333,
        0.5       ],
       [0.4966443 , 0.58333333, 0.375     , 0.55932203, 0.5       ,
        0.5       ],
       [0.5033557 , 0.63888889, 0.41666667, 0.57627119, 0.54166667,
        0.5       ],
       [0.51006711, 0.69444444, 0.33333333, 0.6440678 , 0.54166667,
        0.5       ],
       [0.51677852, 0.66666667, 0.41666667, 0.6779661 , 0.66666667,
        0.5       ],
       [0.52348993, 0.47222222, 0.375     , 0.59322034, 0.58333333,
        0.5       ],
       [0.53020134, 0.38888889, 0.25      , 0.42372881, 0.375     ,
        0.5       ],
       [0.53691275, 0.33333333, 0.16666667, 0.47457627, 0.41666667,
        0.5       ],
       [0.54362416, 0.33333333, 0.16666667, 0.45762712, 0.375     ,
        0.5       ],
       [0.55033557, 0.41666667, 0.29166667, 0.49152542, 0.45833333,
        0.5       ],
       [0.55704698, 0.47222222, 0.29166667, 0.69491525, 0.625     ,
        0.5       ],
       [0.56375839, 0.30555556, 0.41666667, 0.59322034, 0.58333333,
        0.5       ],
       [0.5704698 , 0.47222222, 0.58333333, 0.59322034, 0.625     ,
        0.5       ],
       [0.57718121, 0.66666667, 0.45833333, 0.62711864, 0.58333333,
        0.5       ],
       [0.58389262, 0.55555556, 0.125     , 0.57627119, 0.5       ,
        0.5       ],
       [0.59060403, 0.36111111, 0.41666667, 0.52542373, 0.5       ,
        0.5       ],
       [0.59731544, 0.33333333, 0.20833333, 0.50847458, 0.5       ,
        0.5       ],
       [0.60402685, 0.33333333, 0.25      , 0.57627119, 0.45833333,
        0.5       ],
       [0.61073826, 0.5       , 0.41666667, 0.61016949, 0.54166667,
        0.5       ],
       [0.61744966, 0.41666667, 0.25      , 0.50847458, 0.45833333,
        0.5       ],
       [0.62416107, 0.19444444, 0.125     , 0.38983051, 0.375     ,
        0.5       ],
       [0.63087248, 0.36111111, 0.29166667, 0.54237288, 0.5       ,
        0.5       ],
       [0.63758389, 0.38888889, 0.41666667, 0.54237288, 0.45833333,
        0.5       ],
       [0.6442953 , 0.38888889, 0.375     , 0.54237288, 0.5       ,
        0.5       ],
       [0.65100671, 0.52777778, 0.375     , 0.55932203, 0.5       ,
        0.5       ],
       [0.65771812, 0.22222222, 0.20833333, 0.33898305, 0.41666667,
        0.5       ],
       [0.66442953, 0.38888889, 0.33333333, 0.52542373, 0.5       ,
        0.5       ],
       [0.67114094, 0.55555556, 0.54166667, 0.84745763, 1.        ,
        1.        ],
       [0.67785235, 0.41666667, 0.29166667, 0.69491525, 0.75      ,
        1.        ],
       [0.68456376, 0.77777778, 0.41666667, 0.83050847, 0.83333333,
        1.        ],
       [0.69127517, 0.55555556, 0.375     , 0.77966102, 0.70833333,
        1.        ],
       [0.69798658, 0.61111111, 0.41666667, 0.81355932, 0.875     ,
        1.        ],
       [0.70469799, 0.91666667, 0.41666667, 0.94915254, 0.83333333,
        1.        ],
       [0.7114094 , 0.16666667, 0.20833333, 0.59322034, 0.66666667,
        1.        ],
       [0.71812081, 0.83333333, 0.375     , 0.89830508, 0.70833333,
        1.        ],
       [0.72483221, 0.66666667, 0.20833333, 0.81355932, 0.70833333,
        1.        ],
       [0.73154362, 0.80555556, 0.66666667, 0.86440678, 1.        ,
        1.        ],
       [0.73825503, 0.61111111, 0.5       , 0.69491525, 0.79166667,
        1.        ],
       [0.74496644, 0.58333333, 0.29166667, 0.72881356, 0.75      ,
        1.        ],
       [0.75167785, 0.69444444, 0.41666667, 0.76271186, 0.83333333,
        1.        ],
       [0.75838926, 0.38888889, 0.20833333, 0.6779661 , 0.79166667,
        1.        ],
       [0.76510067, 0.41666667, 0.33333333, 0.69491525, 0.95833333,
        1.        ],
       [0.77181208, 0.58333333, 0.5       , 0.72881356, 0.91666667,
        1.        ],
       [0.77852349, 0.61111111, 0.41666667, 0.76271186, 0.70833333,
        1.        ],
       [0.7852349 , 0.94444444, 0.75      , 0.96610169, 0.875     ,
        1.        ],
       [0.79194631, 0.94444444, 0.25      , 1.        , 0.91666667,
        1.        ],
       [0.79865772, 0.47222222, 0.08333333, 0.6779661 , 0.58333333,
        1.        ],
       [0.80536913, 0.72222222, 0.5       , 0.79661017, 0.91666667,
        1.        ],
       [0.81208054, 0.36111111, 0.33333333, 0.66101695, 0.79166667,
        1.        ],
       [0.81879195, 0.94444444, 0.33333333, 0.96610169, 0.79166667,
        1.        ],
       [0.82550336, 0.55555556, 0.29166667, 0.66101695, 0.70833333,
        1.        ],
       [0.83221477, 0.66666667, 0.54166667, 0.79661017, 0.83333333,
        1.        ],
       [0.83892617, 0.80555556, 0.5       , 0.84745763, 0.70833333,
        1.        ],
       [0.84563758, 0.52777778, 0.33333333, 0.6440678 , 0.70833333,
        1.        ],
       [0.85234899, 0.5       , 0.41666667, 0.66101695, 0.70833333,
        1.        ],
       [0.8590604 , 0.58333333, 0.33333333, 0.77966102, 0.83333333,
        1.        ],
       [0.86577181, 0.80555556, 0.41666667, 0.81355932, 0.625     ,
        1.        ],
       [0.87248322, 0.86111111, 0.33333333, 0.86440678, 0.75      ,
        1.        ],
       [0.87919463, 1.        , 0.75      , 0.91525424, 0.79166667,
        1.        ],
       [0.88590604, 0.58333333, 0.33333333, 0.77966102, 0.875     ,
        1.        ],
       [0.89261745, 0.55555556, 0.33333333, 0.69491525, 0.58333333,
        1.        ],
       [0.89932886, 0.5       , 0.25      , 0.77966102, 0.54166667,
        1.        ],
       [0.90604027, 0.94444444, 0.41666667, 0.86440678, 0.91666667,
        1.        ],
       [0.91275168, 0.55555556, 0.58333333, 0.77966102, 0.95833333,
        1.        ],
       [0.91946309, 0.58333333, 0.45833333, 0.76271186, 0.70833333,
        1.        ],
       [0.9261745 , 0.47222222, 0.41666667, 0.6440678 , 0.70833333,
        1.        ],
       [0.93288591, 0.72222222, 0.45833333, 0.74576271, 0.83333333,
        1.        ],
       [0.93959732, 0.66666667, 0.45833333, 0.77966102, 0.95833333,
        1.        ],
       [0.94630872, 0.72222222, 0.45833333, 0.69491525, 0.91666667,
        1.        ],
       [0.95302013, 0.41666667, 0.29166667, 0.69491525, 0.75      ,
        1.        ],
       [0.95973154, 0.69444444, 0.5       , 0.83050847, 0.91666667,
        1.        ],
       [0.96644295, 0.66666667, 0.54166667, 0.79661017, 1.        ,
        1.        ],
       [0.97315436, 0.66666667, 0.41666667, 0.71186441, 0.91666667,
        1.        ],
       [0.97986577, 0.55555556, 0.20833333, 0.6779661 , 0.75      ,
        1.        ],
       [0.98657718, 0.61111111, 0.41666667, 0.71186441, 0.79166667,
        1.        ],
       [0.99328859, 0.52777778, 0.58333333, 0.74576271, 0.91666667,
        1.        ],
       [1.        , 0.44444444, 0.41666667, 0.69491525, 0.70833333,
        1.        ]])
In [22]:
# getting back my data
data_scale=pd.DataFrame(scaled_data)
data_scale.columns=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm','output','Id']
In [23]:
# seprating features and response
feature=['SepalLengthCm','SepalWidthCm','PetalLengthCm','PetalWidthCm','Id']
response=['output']
X=data_scale[feature]
y=data_scale[response]
In [24]:
# X= independent variable
# y= dependent variable
print(X,y)
     SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm   Id
0         0.000000      0.222222       0.625000      0.067797  0.0
1         0.006711      0.166667       0.416667      0.067797  0.0
2         0.013423      0.111111       0.500000      0.050847  0.0
3         0.020134      0.083333       0.458333      0.084746  0.0
4         0.026846      0.194444       0.666667      0.067797  0.0
..             ...           ...            ...           ...  ...
145       0.973154      0.666667       0.416667      0.711864  1.0
146       0.979866      0.555556       0.208333      0.677966  1.0
147       0.986577      0.611111       0.416667      0.711864  1.0
148       0.993289      0.527778       0.583333      0.745763  1.0
149       1.000000      0.444444       0.416667      0.694915  1.0

[150 rows x 5 columns]        output
0    0.041667
1    0.041667
2    0.041667
3    0.041667
4    0.041667
..        ...
145  0.916667
146  0.750000
147  0.791667
148  0.916667
149  0.708333

[150 rows x 1 columns]
In [25]:
# dividing into test and train data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test= train_test_split(X,y,test_size=0.2,random_state=0)
# importing necessary packages
from sklearn.linear_model import LinearRegression
from sklearn import metrics
In [26]:
# filtering LR model
model=LinearRegression()
model.fit(X_train,y_train)
Out[26]:
LinearRegression()
In [27]:
# checking accuracy on test data
accuracy= model.score(X_test,y_test)
print(accuracy*100,'%')
91.69961348559065 %
In [28]:
# predicted data
X_train
Out[28]:
SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Id
137 0.919463 0.583333 0.458333 0.762712 1.0
84 0.563758 0.305556 0.416667 0.593220 0.5
27 0.181208 0.250000 0.625000 0.084746 0.0
127 0.852349 0.500000 0.416667 0.661017 1.0
132 0.885906 0.583333 0.333333 0.779661 1.0
... ... ... ... ... ...
9 0.060403 0.166667 0.458333 0.084746 0.0
103 0.691275 0.555556 0.375000 0.779661 1.0
67 0.449664 0.416667 0.291667 0.525424 0.5
117 0.785235 0.944444 0.750000 0.966102 1.0
47 0.315436 0.083333 0.500000 0.067797 0.0

120 rows × 5 columns

In [29]:
# actual data
y_train
Out[29]:
output
137 0.708333
84 0.583333
27 0.041667
127 0.708333
132 0.875000
... ...
9 0.000000
103 0.708333
67 0.375000
117 0.875000
47 0.041667

120 rows × 1 columns

In [30]:
model.predict(X_test) # predicting value on test data
Out[30]:
array([[0.76348892],
       [0.41484727],
       [0.09450299],
       [0.86186724],
       [0.05807366],
       [0.89378869],
       [0.05803313],
       [0.5604445 ],
       [0.54127268],
       [0.45968466],
       [0.81722898],
       [0.53223549],
       [0.55977648],
       [0.51392988],
       [0.56080564],
       [0.05991724],
       [0.54697691],
       [0.54305424],
       [0.04843725],
       [0.05253414],
       [0.74953694],
       [0.56646678],
       [0.13514951],
       [0.03743826],
       [0.70899115],
       [0.0313773 ],
       [0.16238032],
       [0.52088214],
       [0.39226226],
       [0.08419598]])
In [31]:
model.intercept_
Out[31]:
array([-0.0852818])
In [32]:
# deploying model
import pickle 
pickle.dump(model,open('model.pkl','wb'))
In [33]:
# reloading object
model=pickle.load(open('model.pkl','rb'))
In [34]:
# print(model.predict([[30000,24]]))
In [35]:
X_train['Id'].sort_values(ascending=False)
Out[35]:
110    1.0
125    1.0
143    1.0
129    1.0
139    1.0
      ... 
3      0.0
15     0.0
13     0.0
48     0.0
47     0.0
Name: Id, Length: 120, dtype: float64