Walkinggeek
4 months ago
"reason across audio, vision, and text in real time"NEW GPT-4o: My Mind is Blown.
Walkinggeek
4 months ago
兩個AI的對話, 一個有視覺另一個沒有Two GPT-4os interacting and singing
Walkinggeek
4 months ago
講話聲音的"表情"已經不會輸從小接受戲劇或是歌唱或是演說的人
立即下載
Walkinggeek
4 months ago
response time came down from 2-3 seconds to 0.2-0.3 seconds... 跟一般人對話已經沒兩樣
Walkinggeek
4 months ago
以前的 multi-modal 把聲音先轉成text不同, 這次應該是把聲音影像文字一起放在一個神經網路裡訓練 https://images.plurk.com/5ynofzkaryEJ2rpn2ddbn7.png