tsung 正在
10 years ago
Real-time Streaming Classification with Storm – The Pinball system
Jason Lin
latest #41
tsung
10 years ago
Yahoo! Taiwan EC Data Team
tsung
10 years ago
tsung
10 years ago
Norman Huang
立即下載
tsung
10 years ago
Jason Lin [email protected]
tsung
10 years ago
Challenges => Solution: Pinball
tsung
10 years ago
收集過去某段時間的資料做分析
tsung
10 years ago
最近幾個小時的資料就不會被納入分析
tsung
10 years ago
72% 的 user 在當天就會決定他要買哪些東西.
tsung
10 years ago
偏好分析如果需要幾天, 就會 lost 這些 user
tsung
10 years ago
即時分析、推薦
tsung
10 years ago
Pinball: User -> more classifier -> Profile B 將使用者分到某個類別, 在預測他想要的商品
tsung
10 years ago
real-time: Storm (Pinball 架構在 Apache Storm)
tsung
10 years ago
演算法: Buying Intention Detetion
tsung
10 years ago
Buying Intention (BI)
tsung
10 years ago
Pinball: Buyer -> Storm -> Learning
tsung
10 years ago
Learning -> Is Potential Buyer? -> Promotions -> Visitor
tsung
10 years ago
將 Visitor 轉換成 Buyer
tsung
10 years ago
Storm (Tuple & Streams), Stream = (Tuple *)
tsung
10 years ago
運算單元: Spouts & Bolts (Spout: Tuple *) 產出 Blot
tsung
10 years ago
Streams -> Spout (水龍頭) -> Bolt -> Bolt (Bolt 可以丟給多個 Bolt, 或者合起來丟給新的 Bolt)
tsung
10 years ago
Hadoop (Map & Reduce) = Storm (Spout & Bolt)
tsung
10 years ago
Storm: Nimbus -> Zookeeper -> Supervisor
tsung
10 years ago
Supervisor (Worker processes)
tsung
10 years ago
Buying Intention: 瀏覽行為幾次後, 應該是想要購買的使用者 (在此類別瀏覽越多, 應該就會越想購買)
tsung
10 years ago
餅乾看過2次就會想購買, 電視看6次就會想購買
tsung
10 years ago
Navigation Event Streaming -> Behavioral events -> Pinball (Buy - Learning) + View (Buying Intention Qualification)
tsung
10 years ago
Learn module 拆解: Adaptive Learning -> Learning Result
tsung
10 years ago
BI' = [BI + |PV - BI| x r] ex: r = 0.1, BI = 3, BI' = [3 + }6-3} * 0.1] = 4
tsung
10 years ago
上面 6 = pv
tsung
10 years ago
Buying Intention Qualification 拆解: View -> Callifier -> Buying Intention Qualification -> Celients
tsung
10 years ago
Buy -> User Buy History Bolt -> Learn bolt -> Classifier
tsung
10 years ago
Topology Design
tsung
10 years ago
View -> User History Bolt -> Classifier
tsung
10 years ago
Lambda Architecture: Batch -> Realtime processing (x), 不是要做切換, 而是要混合 (Hybrid Batch and real-time processing)
tsung
10 years ago
batch layer (Hadoop, Spark), Spark 很快
tsung
10 years ago
Speed layer: Storm, Spark Straming, Samza
tsung
10 years ago
Summingbird 可以同時轉換出 Storm 和 Hadoop mr job
tsung
10 years ago
PV:4, BI:3, BI': ?
tsung
10 years ago
隨著資料分析, BI' 會跟著計算變化
tsung
10 years ago
演算法: Buying Intention, Fraud Detection 詐騙帳號、帳號盜用
back to top