latest #79
Day 13 DataFrame 操作
Pandas Cheat Sheet
https://pandas.pydata.or...
懶人包
立即下載
數值資料分組
https://images.plurk.com/4tTTvkMqSJDner1D9PLUKK.png
np.inf 無窮大
np.inf到0為第一組,1到2為第二組
https://images.plurk.com/18AbkP11jaaK12fWWxuK3t.png
用CNT_C_G和TARGET分組,並求出各組的所得平均
上述的視覺化
https://images.plurk.com/2WQpaLJ78GYCk6TKSMwcWk.png
Day 14 太廢惹 就放一起吧
相關係數 https://images.plurk.com/3aCBsRZnG3NVB48p3xjNhH.png
Day15 相關係數綜合練習
沒什麼新內容 跳過
Day16 繪圖調整與KDE
https://images.plurk.com/1rZ8JN6L9yhuBZysAxN6Tp.png
https://images.plurk.com/3u809BGBw5uNLPLYhiCVpk.png
作業
https://images.plurk.com/13m4i8wRQJnWCw1AIz7yYI.png
https://images.plurk.com/5yCTw85P01MJMyToGnnr7R.png
https://images.plurk.com/6u6gz2OmWdkvA85JmqI9RA.png
Day17 連續資料離散化
(其實就是分組的樣子)
https://images.plurk.com/12w3bfIxcZk5hqNYMVpt1u.jpg https://images.plurk.com/5aK7Upm7276MCLoEu3H7Ck.jpg
大致操作
https://images.plurk.com/1zsQao2vdXPwazvvEIE52B.jpg https://images.plurk.com/5md4CCsGgLqg45BUrv8OCk.jpg
Day18 離散化綜合練習
Day19 Subplot
https://images.plurk.com/5HaCWHNgvy7lX8BdLNJEl6.jpg https://images.plurk.com/49N64NLcim7ksaTpf7CD9o.jpg
Day20 Heatmap & Grid-plot
進階畫圖(?
https://images.plurk.com/4skUPkIJsxpazncPT1FxyO.jpg https://images.plurk.com/5hYb6GDl8xjjSIOMmP965Z.jpg https://images.plurk.com/2bTvT2vjXek6syjqfn8Cwp.jpg https://images.plurk.com/3eXS7GTk0rtFKu8u8cM8xI.jpg
gridplot好像有在統計課本裡看過
https://images.plurk.com/47ysjPX4Ayn7VUmDiHPV2X.jpg https://images.plurk.com/1frKPx7EJFKRUGBzEW2wqU.jpg https://images.plurk.com/1091w8xENPzLRBSHBwsVe6.jpg https://images.plurk.com/2M6GIFyyh56dDRl45ZYvz1.jpg
大樣本的點陣圖好醜
Day21
上傳kaggle初體驗(?
Day22 特徵工程入門
https://images.plurk.com/7sbtDVeEhai5EqDhMap1OK.jpg https://images.plurk.com/45Pem7TDogYlujeMyCe6mS.jpg
https://images.plurk.com/6vr68apPE0HWWrr1FEYyz6.jpg
Day23 去偏態
https://images.plurk.com/3jE65UhLrWpzaxZeNLCEsg.jpg https://images.plurk.com/KBnkVVXj72SbuD4aUHn9b.jpg https://images.plurk.com/77Ey70p5v3KjPjmGwHXvFo.png
Day24 類別特徵基礎處理
其實一樣是在講標籤和獨熱編碼
Day25 類別資料處理 - 均值編碼
https://images.plurk.com/LiHRltEXZVSXVzKjZvJ17.jpg https://images.plurk.com/580eteZPt3QKKCvTc2Bs16.jpg https://images.plurk.com/74MBP7FjwxctAV5bJ8LmDf.jpg https://images.plurk.com/1J4hzAOMePA10obAKs7YW1.jpg https://images.plurk.com/4mT3fZE8FotVNdutRYrTxt.jpg
Day26 時間資料處理
周期循環特性 sin/cos解
https://images.plurk.com/1lONxRKXMy2B0zk0FmgSYI.jpg https://images.plurk.com/1EpsTCMmCZU6nL8iOpmNmM.jpg https://images.plurk.com/6p9yascnzEP7BIsLagWkjq.jpg
啊漏記了 上面的時間資料是Day27啦
真·Day 26 類別資料 - 其他編碼方式
計數編碼
特徵雜湊
Day28 數值資料間的組合使用
如何組合、為何組合 需依靠領域知識
https://images.plurk.com/dhkRcgl8JuzOwnV8kXNXl.jpg https://images.plurk.com/1evve4H0negwRj6wTuZEci.jpg https://images.plurk.com/2XmOQ2l0qspWycov8EJSRV.jpg
Day29 類別與數值資料的組合使用
群聚編碼:
和均值類似 但不使用應變數
而是藉由其他有相關性的自變數編碼
https://images.plurk.com/78ScRS4NW1NxqcrxisRVbJ.jpg https://images.plurk.com/4bHfhd5PquZ3oLOxyPRvDw.jpg https://images.plurk.com/1903JKI6NOHfkvgroJprgd.jpg

https://images.plurk.com/2OGncNmjRyqjg96AHWGQVp.jpg https://images.plurk.com/4xfT1VKQm4j6Ymti2yuO9m.jpg https://images.plurk.com/1eqrXZ5Xl9U6G3Y7RLlXuG.jpg https://images.plurk.com/5dMb7sNhlpwI4ep2mmF0jY.jpg
Day30 特徵選擇
刪除較無用的變數
https://images.plurk.com/62SIfy0dpM27Z35D7pP9nB.jpg https://images.plurk.com/40VCNnBLnkXBjZHHjFTbMX.jpg https://images.plurk.com/7uZp1eP6D3A30DeOyu7LAa.jpg https://images.plurk.com/3cscoIXIOrXGL8b9FY5o6x.jpg https://images.plurk.com/2PrWHEW7CDlWyL9PmUy7fq.jpg https://images.plurk.com/60X9yBOKOcA9HKWtjMQKCA.jpg
範例
https://images.plurk.com/4sUs9wVxmnaed9FmELs4ZJ.jpg https://images.plurk.com/A2YOpjTf1cLdpCuHXxtM6.jpg https://images.plurk.com/3N7KBb8rndgIdBWmmmnbgN.jpg https://images.plurk.com/3kDpkqs9cr5QfTF9Ptdbwp.jpg https://images.plurk.com/7vmflbtfH2v7IE0RsIexDF.jpg
Day31 特徵重要性
https://images.plurk.com/6HFw9y9MZA5BlCGHduvECc.jpg https://images.plurk.com/6DtZE2hVFmFCA2Vcy60AyV.jpg https://images.plurk.com/3PoeSOyvAcXwemAVREemxq.jpg https://images.plurk.com/1BA68AisutjCx5eIlcMV9f.jpg https://images.plurk.com/5ZrgXEjVF2AD27qWEgT0fu.jpg https://images.plurk.com/77EjposZwECTWBlbx5uaNs.jpg

https://images.plurk.com/3B9cEhrQkUs4boFiHdBhOM.jpg https://images.plurk.com/5V3Z905vrbMyIijn0TAB0C.jpg https://images.plurk.com/BI9lkOeTOL0S1ZubALwGw.jpg https://images.plurk.com/1EhyZzvUkW0EEMph5wei3a.jpg https://images.plurk.com/2cAhzoZqwRuSaSIpWhNK3J.jpg
Day32 葉編碼
https://images.plurk.com/4fZAyefbH1Qe2tu8iuxG6u.jpg https://images.plurk.com/1WuPLoSYyyrnCvZYr76pEO.jpg https://images.plurk.com/2UthQ02pjsl8YoBh8OoDM3.jpg https://images.plurk.com/2lfEqRCCEynEjoXWfkN8aZ.jpg
CTR预估[十一]: Algorithm-GBDT Encoder這段其實我沒有很明白
Day33 機器學習概要(?
定義模型 設定參數
評估模型:loss func.
找出最好的辣個

過擬合 over-fitting
吸收太多雜音 使模型過於神經質
如何知道有過擬合?
可保留部分資料,觀察其誤差是否與訓練資料的誤差有改變趨勢
解決過擬合或欠擬合
增加資料量、降低模型複雜度、正規化
增加模型複雜度、減輕或不使用正規化

當資料太少就容易發生過擬合,或使用決策樹模型也較容易發生過擬合
找最佳模型:
可考慮梯度下降(Gradient Descent)、增量訓練(Additive Training)

欠擬合這個說法真的好好笑
(看到該彎不彎的模型)
你是不是欠擬合!
Day34 訓練/驗證/測試集之切分

為何需要驗證/測試集
評估訓練情形,確認模型是否過擬合

如何切分:
Python Scikit-learn的 train_test_split

K-fold Cross-validation
讓每個資料集都可以當一次測試集

驗證(validation)與測試(testing)之差異
驗證用來修正與檢驗模型,測試則為最終評估,為求樣本外表現結果,只有最後才可以使用(個人理解
資料不均勻時的切分
https://images.plurk.com/6cIQbME8jktsbbkK1OmkPJ.png
https://images.plurk.com/7JipxPp8QJRqtD0KSQZEZO.png
Day35 迴歸v.s分類問題
https://images.plurk.com/7a9UCPTSb2zkt71LpCex1A.jpg https://images.plurk.com/7mUTuOELKowvuZ6NMSFbuQ.jpg https://images.plurk.com/1TtgUcmrJ5qK2kmiCweQHX.jpg https://images.plurk.com/5GLhJAnqDns8APH1zS3i0H.jpg
Day36 評估指標選定
https://images.plurk.com/2UzBLMabHDG9khUPTBeEkz.jpg https://images.plurk.com/5CjTsUck6NmkPHkorFmWIR.jpg https://images.plurk.com/4LptBQ3oSI1d7qajzbg4vi.jpg https://images.plurk.com/7gMASPPkyw8RpfXtkGOhuk.jpg https://images.plurk.com/4SMvtFXvsFtntAkpNkF6yK.jpg https://images.plurk.com/1bQMfaq9ZnaW48iqITvFPf.jpg https://images.plurk.com/3AmeKEU4V5Ubi4Kdx53mQC.jpg https://images.plurk.com/16nTL4PhByCfTvaiagAnYg.jpg https://images.plurk.com/cYPZFUdkiwGmobE8kyPV9.jpg https://images.plurk.com/1jx2w07YOkJz0JlmGrVUKN.jpg
Day37、38 迴歸模型介紹
這個我熟
https://images.plurk.com/37uacP4MUGJaTHwO9I5Tqs.jpg https://images.plurk.com/kDhMSXKgvjMRWu9bHdxbv.jpg
https://images.plurk.com/3pOU8q8YuBKkvPqXe6rnQY.jpg https://images.plurk.com/1HQX26vhg35Qgx9bHCcEwx.jpg
Day39 Lasso/Ridge regression
https://images.plurk.com/1wewWpHjwLJrQHdYueEAXz.jpg https://images.plurk.com/39oVUhn4yhwOAEXeFOQmt0.jpg https://images.plurk.com/6Z6GhRw24vEcouQkHuIpD6.jpg https://images.plurk.com/7sRqAu2p4yIJzHJBjlhXpY.jpg https://images.plurk.com/2gJwmvKwVOzLEMARfSmVOq.jpg
Linear least squares, Lasso,ridge regression有何本质区别? ...
Ridge 是為了解決高度線性重合而生
但不能進行特徵選擇
而Lasso會將貢獻太爛的變數權重→0
可以進行特徵選擇 但不能group selection
Day40 Lasso/Ridge的實作
https://images.plurk.com/2wRCzOqr8t4rVyEIHPakkX.jpg https://images.plurk.com/3nJHoP5qrQZBpDdI12m6fp.jpg 別人都寫好了
https://images.plurk.com/1NAjwPg58obUM5kVIMPWh8.jpg https://images.plurk.com/7DdHEnAGbtjumtqrCskCYI.jpg
Day41 決策樹基本概念
https://images.plurk.com/7sd1VEOiSoaCZo72DMoKgr.jpg https://images.plurk.com/6tfPQNlMGUrnNZ2hTOMU2u.jpg https://images.plurk.com/1p8D2hKgndFuV1uaEl8Som.jpg https://images.plurk.com/6RF2nxL1L83L1gBMrG1WaN.jpg https://images.plurk.com/6MvrqoCVUe5NytQqkaMo3r.jpg https://images.plurk.com/4TVS9GpOtLNs95p4nMva2.jpg
作業參考
https://images.plurk.com/1IwWpaNxLsKzEexR0zESwA.jpg
Day42 決策樹實做

https://images.plurk.com/6ApaxIsEN7q5AiYkxpAdC0.jpg https://images.plurk.com/2pmdnpnuIP8jwmrmAscXWI.jpg https://images.plurk.com/2EZDWqZNRI4Yurykr7RP6d.jpg https://images.plurk.com/5mAW1ITnR2a8AulBJUDt8y.jpg

https://images.plurk.com/8tud2QwcIZfNjL3UTGjlK.jpg https://images.plurk.com/32aJvtxWxopA6O9oL9Lexg.jpg https://images.plurk.com/5TXH2yxjjs3ijiho9yhPJD.jpg
Day43 隨機森林基本概念
https://images.plurk.com/2k0cAm12QypZpQxXYkAWs3.jpg https://images.plurk.com/4owJHtlrrSBQFltSDvni4m.jpg https://images.plurk.com/1B7vfhQKICWnFjfSTiaV6o.jpg
https://images.plurk.com/2MQSGiFipyjFkfFBVJhTaY.jpg
我自己的理解:

因為決策樹是把所有資料丟進去跑,容易有overfit的問題,所以以隨機森林作為改善。
隨機森林隨機在每棵樹只選取部分資料與特徵,再看那筆資料各樹統計下來哪個預測值最多來決定最終預測值。就是從資料裡重複抽樣的意思,和bootstrap相似。
Day44 隨機森林實做

https://images.plurk.com/4TdDzbEnGi4hGOg20Lgmfk.jpg https://images.plurk.com/3c2oFWc4j6mjC9E0THVOVI.jpg https://images.plurk.com/3ZIMX5J96a6cB0wiGGEggK.jpg

https://images.plurk.com/3nIkznRrOXp0QOgyfxlnlb.jpg https://images.plurk.com/361YFVZrXTB5ZymR7lwn0u.jpg
藍白拖的真諦
2 years ago @Edit 2 years ago
Day25-26 梯度提升概念與實作
好程序员Python教程:30 梯度提升树原理(一)我覺得直接看這個老師的就好
而且這個老師講話好圓潤啊(?
感覺是蠻有趣的人
藍白拖的真諦
2 years ago @Edit 2 years ago
Day47 超參數調整
窮舉 Grid Search : 一個一個找
隨機 Ransom Search : 隨便找(×
https://images.plurk.com/1iJnj2pQdx0lqguBO0TPMX.png https://images.plurk.com/gZGz3sv90G1v6auM7w4Dg.png
Day48 Kaggle介紹與實作
好麻煩要花比較多時間
明天再來完成這部分
Day49-50 模型集成
https://images.plurk.com/67k0pzxg1zQLBeu3mSsNnN.png https://images.plurk.com/59T6ByU2u2L9KpcCX3xKpM.png https://images.plurk.com/764FSgXoh8BAa6kNEVd4ZV.png https://images.plurk.com/6UZiQ8EMc3o95B00OjNqJS.png https://images.plurk.com/6s0JB1MsA82R1GwgwpdlbS.png https://images.plurk.com/3gk9TFvjuDYLwrK9fUne9Z.png https://images.plurk.com/v7zfEGdUI3grGRCJKkEvn.png https://images.plurk.com/6gWsMD8zzuEipKbDGvRU2g.png https://images.plurk.com/7mjw1jSWXImFCh26VYgUuj.png https://images.plurk.com/6T9gU3l0yfsfnU6GXiDfq0.png
藍白拖的真諦
2 years ago @Edit 2 years ago
https://images.plurk.com/51zBRTH0QBtg8hN2b24u2g.png https://images.plurk.com/1GwcfYa5XxJ1gRZvsgucg9.png
back to top