Yahoo! Taiwan EC Data Team
Challenges => Solution: Pinball
72% 的 user 在當天就會決定他要買哪些東西.
偏好分析如果需要幾天, 就會 lost 這些 user
Pinball: User -> more classifier -> Profile B 將使用者分到某個類別, 在預測他想要的商品
real-time: Storm (Pinball 架構在 Apache Storm)
演算法: Buying Intention Detetion
Pinball: Buyer -> Storm -> Learning
Learning -> Is Potential Buyer? -> Promotions -> Visitor
Storm (Tuple & Streams), Stream = (Tuple *)
運算單元: Spouts & Bolts (Spout: Tuple *) 產出 Blot
Streams -> Spout (水龍頭) -> Bolt -> Bolt (Bolt 可以丟給多個 Bolt, 或者合起來丟給新的 Bolt)
Hadoop (Map & Reduce) = Storm (Spout & Bolt)
Storm: Nimbus -> Zookeeper -> Supervisor
Supervisor (Worker processes)
Buying Intention: 瀏覽行為幾次後, 應該是想要購買的使用者 (在此類別瀏覽越多, 應該就會越想購買)
Navigation Event Streaming -> Behavioral events -> Pinball (Buy - Learning) + View (Buying Intention Qualification)
Learn module 拆解: Adaptive Learning -> Learning Result
BI' = [BI + |PV - BI| x r] ex: r = 0.1, BI = 3, BI' = [3 + }6-3} * 0.1] = 4
Buying Intention Qualification 拆解: View -> Callifier -> Buying Intention Qualification -> Celients
Buy -> User Buy History Bolt -> Learn bolt -> Classifier
View -> User History Bolt -> Classifier
Lambda Architecture: Batch -> Realtime processing (x), 不是要做切換, 而是要混合 (Hybrid Batch and real-time processing)
batch layer (Hadoop, Spark), Spark 很快
Speed layer: Storm, Spark Straming, Samza
Summingbird 可以同時轉換出 Storm 和 Hadoop mr job
演算法: Buying Intention, Fraud Detection 詐騙帳號、帳號盜用