东北露脸一级毛片,无码中文一区二区免费视频,日本精品中文字幕

logistic回歸是一種廣義的線(xiàn)性回歸，通過(guò)構(gòu)造回歸函數(shù)，利用機(jī)器學(xué)習(xí)來(lái)實(shí)現(xiàn)分類(lèi)或者預(yù)測(cè)。

原理

上一文簡(jiǎn)單介紹了線(xiàn)性回歸，與邏輯回歸的原理是類(lèi)似的。

預(yù)測(cè)函數(shù)（h）。該函數(shù)就是分類(lèi)函數(shù)，用來(lái)預(yù)測(cè)輸入數(shù)據(jù)的判斷結(jié)果。過(guò)程非常關(guān)鍵，需要預(yù)測(cè)函數(shù)的“大概形式”，比如是線(xiàn)性還是非線(xiàn)性的。本文參考機(jī)器學(xué)習(xí)實(shí)戰(zhàn)的相應(yīng)部分，看一下數(shù)據(jù)集。

// 兩個(gè)特征

-0.017612 14.053064 0

-1.395634 4.662541 1

-0.752157 6.538620 0

-1.322371 7.152853 0

0.423363 11.054677 0

0.406704 7.067335 1

如上圖，紅綠代表兩種不同的分類(lèi)。可以預(yù)測(cè)分類(lèi)函數(shù)大概是一條直線(xiàn)。Cost函數(shù)（損失函數(shù)）：該函數(shù)預(yù)測(cè)的輸出h和訓(xùn)練數(shù)據(jù)類(lèi)別y之間的偏差，（h-y）或者其他形式。綜合考慮所有訓(xùn)練數(shù)據(jù)的cost，將其求和或者求平均，極為J函數(shù)，表示所有訓(xùn)練數(shù)據(jù)預(yù)測(cè)值和實(shí)際值的偏差。

顯然，J函數(shù)的值越小，表示預(yù)測(cè)的函數(shù)越準(zhǔn)確（即h函數(shù)越準(zhǔn)確），因此需要找到J函數(shù)的最小值。有時(shí)需要用到梯度下降。

具體過(guò)程

構(gòu)造預(yù)測(cè)函數(shù)

邏輯回歸名為回歸，實(shí)際為分類(lèi)，用于兩分類(lèi)問(wèn)題。這里直接給出sigmoid函數(shù)。

接下來(lái)確定分類(lèi)的邊界，上面有提到，該數(shù)據(jù)集需要一個(gè)線(xiàn)性的邊界。不同數(shù)據(jù)需要不同的邊界。

確定了分類(lèi)函數(shù)，將其輸入記做z ，那么

向量x是特征變量，是輸入數(shù)據(jù)。此數(shù)據(jù)有兩個(gè)特征，可以表示為z = w0x0 + w1x1 + w2x2。w0是常數(shù)項(xiàng)，需要構(gòu)造x0等于1（見(jiàn)后面代碼）。向量W是回歸系數(shù)特征，T表示為列向量。之后就是確定最佳回歸系數(shù)w（w0， w1， w2）。cost函數(shù)

綜合以上，預(yù)測(cè)函數(shù)為：

這里不做推導(dǎo)，可以參考文章 Logistic回歸總結(jié)

有了上述的cost函數(shù)，可以使用梯度上升法求函數(shù)J的最小值。推導(dǎo)見(jiàn)上述鏈接。

綜上：梯度更新公式如下：

接下來(lái)是python代碼實(shí)現(xiàn)：

# sigmoid函數(shù)和初始化數(shù)據(jù)

def sigmoid（z）：

return 1 / （1 + np.exp（-z））

def init_data（）：

data = np.loadtxt（‘data.csv’）

dataMatIn = data［：， 0：-1］

classLabels = data［：， -1］

dataMatIn = np.insert（dataMatIn， 0， 1， axis=1） #特征數(shù)據(jù)集，添加1是構(gòu)造常數(shù)項(xiàng)x0

return dataMatIn， classLabels

復(fù)制代碼

// 梯度上升

def grad_descent（dataMatIn， classLabels）：

dataMatrix = np.mat（dataMatIn） #（m，n）

labelMat = np.mat（classLabels）.transpose（）

m， n = np.shape（dataMatrix）

weights = np.ones（（n， 1）） #初始化回歸系數(shù)（n， 1）

alpha = 0.001 #步長(zhǎng)

maxCycle = 500 #最大循環(huán)次數(shù)

for i in range（maxCycle）：

h = sigmoid（dataMatrix * weights） #sigmoid 函數(shù)

weights = weights + alpha * dataMatrix.transpose（） * （labelMat - h） #梯度

return weights

// 計(jì)算結(jié)果

if __name__ == ‘__main__’：

dataMatIn， classLabels = init_data（）

r = grad_descent（dataMatIn， classLabels）

print（r）

輸入如下：

［［ 4.12414349］

［ 0.48007329］

［-0.6168482 ］］

上述w就是所求的回歸系數(shù)。w0 = 4.12414349， w1 = 0.4800， w2=-0.6168 之前預(yù)測(cè)的直線(xiàn)方程0 = w0x0 + w1x1 + w2x2，帶入回歸系數(shù)，可以確定邊界。 x2 = （-w0 - w1*x1） / w2

畫(huà)出函數(shù)圖像：

def plotBestFIt（weights）：

dataMatIn， classLabels = init_data（）

n = np.shape（dataMatIn）［0］

xcord1 = ［］

ycord1 = ［］

xcord2 = ［］

ycord2 = ［］

for i in range（n）：

if classLabels［i］ == 1：

xcord1.append（dataMatIn［i］［1］）

ycord1.append（dataMatIn［i］［2］）

else：

xcord2.append（dataMatIn［i］［1］）

ycord2.append（dataMatIn［i］［2］）

fig = plt.figure（）

ax = fig.add_subplot（111）

ax.scatter（xcord1， ycord1，s=30， c=‘red’， marker=‘s’）

ax.scatter（xcord2， ycord2， s=30， c=‘green’）

x = np.arange（-3， 3， 0.1）

y = （-weights［0， 0］ - weights［1， 0］ * x） / weights［2， 0］ #matix

ax.plot（x， y）

plt.xlabel（‘X1’）

plt.ylabel（‘X2’）

plt.show（）

如下：

算法改進(jìn)

隨機(jī)梯度上升

上述算法中，每次循環(huán)矩陣都會(huì)進(jìn)行m * n次乘法計(jì)算，時(shí)間復(fù)雜度是maxCycles* m * n。當(dāng)數(shù)據(jù)量很大時(shí)，時(shí)間復(fù)雜度是很大。這里嘗試使用隨機(jī)梯度上升法來(lái)進(jìn)行改進(jìn)。隨機(jī)梯度上升法的思想是，每次只使用一個(gè)數(shù)據(jù)樣本點(diǎn)來(lái)更新回歸系數(shù)。這樣就大大減小計(jì)算開(kāi)銷(xiāo)。算法如下：

def stoc_grad_ascent（dataMatIn， classLabels）：

m， n = np.shape（dataMatIn）

alpha = 0.01

weights = np.ones（n）

for i in range（m）：

h = sigmoid（sum（dataMatIn［i］ * weights）） #數(shù)值計(jì)算

error = classLabels［i］ - h

weights = weights + alpha * error * dataMatIn［i］

return weights

進(jìn)行測(cè)試：

隨機(jī)梯度上升的改進(jìn)

def stoc_grad_ascent_one（dataMatIn， classLabels， numIter=150）：

m， n = np.shape（dataMatIn）

weights = np.ones（n）

for j in range（numIter）：

dataIndex = list（range（m））

for i in range（m）：

alpha = 4 / （1 + i + j） + 0.01 #保證多次迭代后新數(shù)據(jù)仍然有影響力

randIndex = int（np.random.uniform（0， len（dataIndex）））

h = sigmoid（sum（dataMatIn［i］ * weights）） # 數(shù)值計(jì)算

error = classLabels［i］ - h

weights = weights + alpha * error * dataMatIn［i］

del（dataIndex［randIndex］）

return weights

可以對(duì)上述三種情況的回歸系數(shù)做個(gè)波動(dòng)圖。可以發(fā)現(xiàn)第三種方法收斂更快。評(píng)價(jià)算法優(yōu)劣勢(shì)看它是或否收斂，是否達(dá)到穩(wěn)定值，收斂越快，算法越優(yōu)。

總結(jié)

這里用到的梯度上升和梯度下降是一樣的，都是求函數(shù)的最值，符號(hào)需要變一下。梯度意味著分別沿著x， y的方向移動(dòng)一段距離。（cost分別對(duì)x， y）的導(dǎo)數(shù)。

完整代碼請(qǐng)查看： github： logistic regression

參考文章：機(jī)器學(xué)習(xí)之Logistic回歸與Python實(shí)現(xiàn)

聲明：本文內(nèi)容及配圖由入駐作者撰寫(xiě)或者入駐合作網(wǎng)站授權(quán)轉(zhuǎn)載。文章觀(guān)點(diǎn)僅代表作者本人，不代表電子發(fā)燒友網(wǎng)立場(chǎng)。文章及其配圖僅供工程師學(xué)習(xí)之用，如有內(nèi)容侵權(quán)或者其他違規(guī)問(wèn)題，請(qǐng)聯(lián)系本站處理。舉報(bào)投訴

機(jī)器學(xué)習(xí)

機(jī)器學(xué)習(xí)

+關(guān)注

關(guān)注
66

文章
8546

瀏覽量
136520
Logistic

Logistic

+關(guān)注

關(guān)注
0

文章
11

瀏覽量
9056
線(xiàn)性回歸

線(xiàn)性回歸

+關(guān)注

關(guān)注
0

文章
41

瀏覽量
4534