SWS 2014 語音訊號處理研討會

語音訊號處理研討會是中華民國計算語言學學會,一年一度定期舉辦的學術交流盛會,邀請國內外知名的學者專家。此外,本次會議將同時舉辦 101 年度國科會計畫成果發表會。

本次會議所邀請之演講者,包括香港大學陳霏教授、中央研究院楊奕軒博士、清華大學李祈均教授、中央研究院訪問博士後研究員林沛、彰化師範大學陳正賢教授,演講的內容涵蓋助聽器及電子耳之語音訊號處理、音樂訊號處理、流利語音韻律訊息處理、以及human behavioral attributes辨識等。

除了上述演講外,本次會議同時舉辦相關系統展示,提供語音、語言及音樂相關工作者極為豐富之內容。希冀藉由此次會議,一方面可將國科會研究計畫之成果介紹給學術界與產業界應用,另一方面可與業界分享交流技術與經驗,共同討論相關處理技術的新研究與應用方向,因此本次會議預期將可大幅提昇國內數位訊號處理產業技術水準,對工程科技推展效益至鉅。

重要時程

06/15 開始報名及繳費

06/20 公佈議程資訊

07/25 報名截止

07/28 繳費截止

07/28 公佈講演投影片

08/01 SWS 2014 !

時間地點

2014 年 8 月 1 日 星期五
國立臺北大學 公共事務大樓 國際會議廳
新北市三峽區大學路 151 號

主辦單位

國立臺北大學 通訊工程學系

中華民國計算語言學學會中華民國計算語言學學會

協辦單位

中央研究院資訊科技創新研究中心中央研究院資訊科技創新研究中心

科技部工程技術研究發展司工程技術研究發展司

Keynote Speakers
Learn from these great fellows

陳霏
香港大學
陳霏 Fei CHEN
Research Assistant Professor, Hong Kong University Faculty of Education Division of Speech and Hearing Science
香港大學 言語及聽覺科學部 助理教授[研究]

Dr. Chen received his B.S. and M.Phil. degrees in electronics engineering from Nanjing University, China, and graduated with his Ph.D. degree in electric engineering from The Chinese University of Hong Kong in 2005. He worked in the Cochlear Implants laboratory at the University of Texas at Dallas in 2009-2011, and joined the HKU Speech & Hearing Science Division at the end of 2011. His research interests include speech processing (speech enhancement and intelligibility modeling), speech perception (by normal and hearing-impaired listeners), and Cochlear Implants.

陳正賢
國立彰化師範大學
陳正賢 Cheng-Hsien CHEN
Assistant Professor, Department of English, NCUE
國立彰化師範大學 英語學系 助理教授

My research interests center around corpus linguistics and spontaneous speech processing. In 2011, I finished my Ph.D. thesis at NTU, entitled Prosodic Phrasing in Mandarin Spontaneous Speech: A Computational-acoustic Perspective. Our study is the first attempt to investigate prosodic units in conversation from a constructionist perspective and at the same time adopt a computational corpus-based method. Based on our analyses on the acoustic patterns, we suggest that spontaneous speech shows clear evidence that our conceptual planning precedes on a clausal basis. In 2012, our findings were presented in The 6th International Conference on Speech Prosody, entitled Conceptual Planning in Conversational Mandarin: Pitch Variation in Prosodic Phrasing (co-authored with Shu-Chuan Tseng). In 2013, we further extended our project to a comparative study between French and Mandarin in collaboration with the French National Centre for Scientific Research (CNRS) and presented our preliminary observation in PACLIC 27, entitled A quantitative comparative study of prosodic and discourse units: the case of French and Taiwan Mandarin (co-authored with Laurent Prévot, Shu-Chuan Tseng, and Klim Peshkov). This paper presents our efforts for establishing a spoken dataset for crosslinguistic studies by adding morpho-syntactic, chunking, prosodic and discourse annotation in order to be able to carry out quantitative comparative studies of the syntax-discourse-prosody interfaces.

楊奕軒
中央研究院
楊奕軒 Yi-Hsuan YANG
Assistant Researcher, Academia Sinica Research Center for Information Technology Innovation
中央研究院 資訊科技創新研究中心 助理研究員

Yi-Hsuan Yang earned his Ph.D. degree in Communication Engineering from National Taiwan University, Taiwan, in 2010. Since 2011, he has been affiliated with the Research Center for Information Technology Innovation, Academia Sinica as a tenure-track Assistant Research Fellow. He is the founder and director of the Music and Audio Computing Lab in Academia Sinica. He is also an Adjunct Assistant Professor with National Cheng Kung University (NCKU). Dr. Yang's research concerns music information retrieval, musical signal processing, affective computing, and machine learning. Dr. Yang was awarded the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award. He won First Prize of the 2012 ACM Multimedia Grand Challenge for his work on emotion-based music video generation. He was a tutorial speaker on music affect analysis in the 2012 International Society for Music Information Retrieval Conference (ISMIR 2012) and an author of the book 'Music Emotion Recognition' (CRC Press 2011). He is the recipient of the Academia Sinica Career Development Award in 2012 and the Excellent Junior Research Investigator Award by the National Science Council of Taiwan in 2013. He is an affiliated member of the Multimedia Systems and Applications Technical Committee (MSA TC) of the IEEE Circuits and Systems Society. He served as workshop organizer, special session organizer, and technical program committee member at IEEE ICME, ACM MM, and APSIPA conferences. He is serving as a technical program chair of ISMIR 2014, and a Guest Editor of IEEE Transactions on Affective Computing for the Special Issue on Advances on Affective Analysis in Multimedia.

李祈均
國立清華大學
李祈均 Chi-Chun LEE
Assistant Professor, National Tsing Hua University Department of Electrical Engineering
國立清華大學 電機工程學系 助理教授

Chi-Chun LEE (Jeremy) is currently at National Tsing Hua University, Electrical Engineering Department as an Assistant Professor. Jeremy received the Bachelor of Science (B.S.) degree with honor, Magna Cum Laude, in Electrical Engineering (EE) and minor in Business Administration from the University of Southern California (USC) in May 2007. He has been a member of Signal Analysis and Interpretation Laboratory (SAIL) directed by Prof. Shrikanth Narayanan from 2007 - 2012. Jeremy obtains his PhD degree in Electrical Engineering in December 2012. Jeremy was a data scientist at id:a lab at ID Analytics, San Diego from Feb. 2013 - Dec. 2013. His research interests are in human-centered behavioral signal processing, emphasizing the development of computational frameworks in recognizing human behavioral attributes and internal states using machine learning and signal processing techniques. He has been involved in multiple interdisciplinary research projects and has conducted collaborative research with researchers across domains of behavioral sciences. He was awarded with the USC Annenberg Fellowship (2007 - 2009). He led a team to participate and win the first Emotion Challenge - Classifier SubChallenge in 2009 InterSpeech. He is currently a member of Institute of Electrical and Electronics Engineers (IEEE) and International Speech Communication Association (ISCA).

林沛
中央研究院
林沛 Payton LIN
Postdoctoral Research Fellow, Academia Sinica Research Center for Information Technology Innovation
中央研究院 資訊科技創新研究中心 博士後研究員

Payton LIN was a premed and received his B.S in Cognitive Neuroscience from University of California, San Diego. He attended UC Irvine to obtain his PhD in Biomedical Engineering in order to help patients with hearing loss while also experimenting with the first FDA-approved neuroprosthesis: the cochlear implant. Under Professor Fan-Gang Zeng, he published research on auditory masking between electric and acoustic stimulations (EAS) in order to discover improvements for speech recognition in noise. After graduating, Dr. Lin worked on the hardware of cochlear implant microelectrode arrays in a microfabrication laboratory at City University of Hong Kong. Now at Academica Sinica’s Research Center for Information Technology Innovation, Dr. Lin aims to improve the speech signal processing of cochlear implants by researching the human auditory system from a computational modeling perspective.

會議議程

09:00 ~ 09:30

報到

09:30 ~ 09:40

開幕致詞

陳裕賢 院長國立臺北大學 電機資訊學院
09:40 ~ 10:40

Speech Perception in Adverse Listening Conditions

陳霏 Fei CHEN

Speech perception in adverse listening conditions (e.g., in noise and in reverberation) is extremely challenging, especially to hearing impaired listeners. Unfortunately, our understanding of this problem is so limited that we have not been able to find a satisfactory solution. This knowledge is also important for the design of novel speech processors in assistive hearing devices, e.g., hearing aids and cochlear implants. In this talk, I will introduce my recent experimental results on Mandarin speech perception, involving the role of tone contour in Mandarin speech perception, the perceptional contributions of vowels and consonants in interrupted speech, the impacts of spectral degradation and reduced dynamic range to speech recognition, etc. Meanwhile, speech processing techniques for cochlear implants will be reviewed, and discussion will be made on how to improve the design of present speech processing methods in cochlear implants.

主持人:曹昱 助研究員中央研究院資訊科技創新研究中心
10:40 ~ 11:00

Coffee Break

11:00 ~ 12:00

Prosodic Phrasing in Spontaneous Speech: A Linguistic Perspective

陳正賢 Cheng-Hsien CHEN

Prosodic Phrasing is argued to provide significant information for spontaneous speech processing, thus often serving as a fundamental unit for various NLP tasks. This study analyzes the grammatical configuration of the prosodic phrasing boundaries with a comprehensive set of acoustic-prosodic measures. The purpose of this study is two-fold. First, the acoustic measures adopted will be introduced with a highlight of their linguistic grounds. Second, a semantic-based analysis of the prosodic phrasing boundaries will be made to investigate how our speech prosody may interact with our basic grammatical unit, the clause unit.

主持人:王逸如 副教授國立交通大學 電機工程學系
12:00 ~ 13:30

午餐

13:30 ~ 14:30

Emotion-based Music Analysis and Beyond

楊奕軒 Yi-Hsuan YANG

Automatic recognition of the perceived emotion of music allows users to retrieve and organize their music collections in a fashion that is content-centric and intuitive. A typical approach to music emotion recognition categorizes emotions into a number of classes and applies machine learning techniques to train a classifier. This approach, however, faces a granularity issue that the number of emotion classes is too small in comparison with the richness of emotion perceived by humans. In this talk, I would introduce some research that takes a very different perspective and views emotions as points in a 2-D space spanned by two latent dimensions: valence (how positive or negative) and arousal (how exciting or calming). In this approach, music emotion recognition becomes the prediction of the valence and arousal values of a song corresponding to a point in the emotion plane. This way, the granularity and ambiguity issues associated with emotion classes no longer exist since no categorical class is needed. Moreover, because the 2D plane provides a simple means for user interface, new emotion-based music organization, browsing, and retrieval can be easily created for mobile devices that have small display area.

主持人:王新民 研究員中央研究院資訊科學研究所
14:30 ~ 15:30

Understanding Dyadic Human Spoken Interactions Using Speech Processing Techniques: Case Studies in Autism Spectrum Disorder (ASD) and Behavioral Couple Therapy

李祈均 Chi-Chun LEE

Imagining humans as complex dynamical systems: systems that are characterized by multiple interacting layers of hidden internal states (e.g., processes involve cognition, perception, production, emotion, and social interaction) producing expressive observed multimodal behavior signals (e.g., body gestures, facial expressions, physiology, and speech). Behavioral Signal Processing (BSP) is an emerging frontier of interdisciplinary research effort between behavior science and engineering from this abstraction. It aims at mathematically modeling the manifested behaviors using common observational data (e.g., audio-video recordings), with methods grounded in advanced signal processing and machine learning techniques. The tangible outcomes of BSP, i.e., behavioral informatics, offer novel computational methods for better human decision-making with application domains ranging from mental health, education, personalized human-machine interfaces, to user-centric commercial applications.

In this talk, I will discuss the role of speech processing techniques in the realm of BSP – grounding the effort in mental health research and practices. In specific, I will present two aspects of my research works on utilizing speech processing technologies for analyzing dyadic face-to-face interactions in mental health applications: 1) spontaneous prosody analyses in analyzing ADOS diagnostic session for autism spectrum disorder (ASD), 2) signal-derived quantification of vocal behavior synchrony in interaction analysis for behavioral couple therapy. Each of these perspectives is an attempt to quantitatively study spontaneous spoken interactions in order to advance understanding of human behaviors under different disordered conditions, and each of these methodologies are grounded through a tight interdisciplinary collaboration.

主持人:陳柏琳 教授國立台灣師範大學 資訊工程系
15:30 ~ 15:50

Coffee Break

15:50 ~ 16:50

Automatic Speech Recognition with Primarily Temporal Information

林沛 Payton LIN

The present study proposes HMM-based ASR as an effective screening system for optimizing speech processing strategies for individual cochlear implant users. Synthesized vocoded testing sets approximate the effects of reduced spectral analysis and source segregation in the auditory periphery and brainstem. Training sets were manipulated to approximate the effects of impaired lexical, prosodic, and phonetic analysis in the cortex. Postoperative language development for postlingually deaf cochlear implant users was simulated with large vocabulary speech databases, context-dependent acoustic models, and tri-gram language models. Procedures were all extended to tonal languages (i.e., Mandarin Chinese) to contrast the effects of F0 contours and alternative temporal filters. A bio-inspired approach to designing noise robust speech recognition systems is discussed with the unveiling of the SOUND algorithm (spectral or undertone normalization decomposition), the first attempt to combine both the place code and temporal code of phase-locked action potentials in human auditory systems.

主持人:江振宇 助理教授國立臺北大學 通訊工程學系
16:50 ~ 17:00

閉幕

報名系統

  • 本次會議一律採線上報名,欲報名者請輸入報名者資料。
  • 報名費包含講義、午餐、茶點,由中華民國計算語言學學會收取,繳交報名費後,學會將會開立收據。
  • 以信用卡方式報名者,除線上填寫報名資料外,需下載表格填寫信用卡相關資料,資料填妥後請傳真(02-27881638)或E-mail:aclclp@hp.iis.sinica.edu.tw
  • 交通路線

    國立臺北大學新北市三峽區大學路 151 號

    自行開車
    • 臺北南下者請走二高(3號國道),由鶯歌交流道下,於第一個紅綠燈(消防隊)右轉隆恩街,前行經客家文化園區,過二高涵洞後,直接進入本校隆恩哨側門。
    • 中南部北上者,請走3號國道下三峽交流道,左轉復興路往鶯歌方向,第一個紅綠燈(消防隊)右轉隆恩街,前行經客家文化園區,過二高涵洞後,直接進入本校隆恩哨側門。
    • 您可以下載 Speech Signal Processing Workshop 停車證並列印,於正門出示停車證可免停車費。
    搭乘大眾交通工具
    • 捷運景安站→轉乘 921 公車至「臺北大學正門」站
    • 捷運永寧站→轉乘 916、922 公車至「臺北大學正門」站
    地圖

    聯絡我們

    主辦人
    江振宇 助理教授
    cychiang@mail.ntpu.edu.tw

    23741 新北市三峽區大學路 151 號
    國立臺北大學 法學大樓 8F08

    聯絡電話:02-8674-1111 分機 67732

    助理
    洪宇平 研究生
    sws2014.ntpu@gmail.com

    23741 新北市三峽區大學路 151 號
    國立臺北大學 法學大樓 5F24

    聯絡電話:02-8674-1111 分機 67733

    有關繳費事宜
    黃琪 小姐
    aclclp@hp.iis.sinica.edu.tw

    中華民國計算語言學學會

    聯絡電話:02-2788-3799 分機 1502