語音訊號處理研討會是中華民國計算語言學學會，一年一度定期舉辦的學術交流盛會，邀請國內外知名的學者專家。此外，本次會議將同時舉辦 101 年度國科會計畫成果發表會。
本次會議所邀請之演講者，包括香港大學陳霏教授、中央研究院楊奕軒博士、清華大學李祈均教授、中央研究院訪問博士後研究員林沛、彰化師範大學陳正賢教授，演講的內容涵蓋助聽器及電子耳之語音訊號處理、音樂訊號處理、流利語音韻律訊息處理、以及human behavioral attributes辨識等。
08/01 SWS 2014 !
Dr. Chen received his B.S. and M.Phil. degrees in electronics engineering from Nanjing University, China, and graduated with his Ph.D. degree in electric engineering from The Chinese University of Hong Kong in 2005. He worked in the Cochlear Implants laboratory at the University of Texas at Dallas in 2009-2011, and joined the HKU Speech & Hearing Science Division at the end of 2011. His research interests include speech processing (speech enhancement and intelligibility modeling), speech perception (by normal and hearing-impaired listeners), and Cochlear Implants.
My research interests center around corpus linguistics and spontaneous speech processing. In 2011, I finished my Ph.D. thesis at NTU, entitled Prosodic Phrasing in Mandarin Spontaneous Speech: A Computational-acoustic Perspective. Our study is the first attempt to investigate prosodic units in conversation from a constructionist perspective and at the same time adopt a computational corpus-based method. Based on our analyses on the acoustic patterns, we suggest that spontaneous speech shows clear evidence that our conceptual planning precedes on a clausal basis. In 2012, our findings were presented in The 6th International Conference on Speech Prosody, entitled Conceptual Planning in Conversational Mandarin: Pitch Variation in Prosodic Phrasing (co-authored with Shu-Chuan Tseng). In 2013, we further extended our project to a comparative study between French and Mandarin in collaboration with the French National Centre for Scientific Research (CNRS) and presented our preliminary observation in PACLIC 27, entitled A quantitative comparative study of prosodic and discourse units: the case of French and Taiwan Mandarin (co-authored with Laurent Prévot, Shu-Chuan Tseng, and Klim Peshkov). This paper presents our efforts for establishing a spoken dataset for crosslinguistic studies by adding morpho-syntactic, chunking, prosodic and discourse annotation in order to be able to carry out quantitative comparative studies of the syntax-discourse-prosody interfaces.
Yi-Hsuan Yang earned his Ph.D. degree in Communication Engineering from National Taiwan University, Taiwan, in 2010. Since 2011, he has been affiliated with the Research Center for Information Technology Innovation, Academia Sinica as a tenure-track Assistant Research Fellow. He is the founder and director of the Music and Audio Computing Lab in Academia Sinica. He is also an Adjunct Assistant Professor with National Cheng Kung University (NCKU). Dr. Yang's research concerns music information retrieval, musical signal processing, affective computing, and machine learning. Dr. Yang was awarded the 2011 IEEE Signal Processing Society (SPS) Young Author Best Paper Award. He won First Prize of the 2012 ACM Multimedia Grand Challenge for his work on emotion-based music video generation. He was a tutorial speaker on music affect analysis in the 2012 International Society for Music Information Retrieval Conference (ISMIR 2012) and an author of the book 'Music Emotion Recognition' (CRC Press 2011). He is the recipient of the Academia Sinica Career Development Award in 2012 and the Excellent Junior Research Investigator Award by the National Science Council of Taiwan in 2013. He is an affiliated member of the Multimedia Systems and Applications Technical Committee (MSA TC) of the IEEE Circuits and Systems Society. He served as workshop organizer, special session organizer, and technical program committee member at IEEE ICME, ACM MM, and APSIPA conferences. He is serving as a technical program chair of ISMIR 2014, and a Guest Editor of IEEE Transactions on Affective Computing for the Special Issue on Advances on Affective Analysis in Multimedia.
Chi-Chun LEE (Jeremy) is currently at National Tsing Hua University, Electrical Engineering Department as an Assistant Professor. Jeremy received the Bachelor of Science (B.S.) degree with honor, Magna Cum Laude, in Electrical Engineering (EE) and minor in Business Administration from the University of Southern California (USC) in May 2007. He has been a member of Signal Analysis and Interpretation Laboratory (SAIL) directed by Prof. Shrikanth Narayanan from 2007 - 2012. Jeremy obtains his PhD degree in Electrical Engineering in December 2012. Jeremy was a data scientist at id:a lab at ID Analytics, San Diego from Feb. 2013 - Dec. 2013. His research interests are in human-centered behavioral signal processing, emphasizing the development of computational frameworks in recognizing human behavioral attributes and internal states using machine learning and signal processing techniques. He has been involved in multiple interdisciplinary research projects and has conducted collaborative research with researchers across domains of behavioral sciences. He was awarded with the USC Annenberg Fellowship (2007 - 2009). He led a team to participate and win the first Emotion Challenge - Classifier SubChallenge in 2009 InterSpeech. He is currently a member of Institute of Electrical and Electronics Engineers (IEEE) and International Speech Communication Association (ISCA).
Payton LIN was a premed and received his B.S in Cognitive Neuroscience from University of California, San Diego. He attended UC Irvine to obtain his PhD in Biomedical Engineering in order to help patients with hearing loss while also experimenting with the first FDA-approved neuroprosthesis: the cochlear implant. Under Professor Fan-Gang Zeng, he published research on auditory masking between electric and acoustic stimulations (EAS) in order to discover improvements for speech recognition in noise. After graduating, Dr. Lin worked on the hardware of cochlear implant microelectrode arrays in a microfabrication laboratory at City University of Hong Kong. Now at Academica Sinica’s Research Center for Information Technology Innovation, Dr. Lin aims to improve the speech signal processing of cochlear implants by researching the human auditory system from a computational modeling perspective.
Speech perception in adverse listening conditions (e.g., in noise and in reverberation) is extremely challenging, especially to hearing impaired listeners. Unfortunately, our understanding of this problem is so limited that we have not been able to find a satisfactory solution. This knowledge is also important for the design of novel speech processors in assistive hearing devices, e.g., hearing aids and cochlear implants. In this talk, I will introduce my recent experimental results on Mandarin speech perception, involving the role of tone contour in Mandarin speech perception, the perceptional contributions of vowels and consonants in interrupted speech, the impacts of spectral degradation and reduced dynamic range to speech recognition, etc. Meanwhile, speech processing techniques for cochlear implants will be reviewed, and discussion will be made on how to improve the design of present speech processing methods in cochlear implants.
Prosodic Phrasing is argued to provide significant information for spontaneous speech processing, thus often serving as a fundamental unit for various NLP tasks. This study analyzes the grammatical configuration of the prosodic phrasing boundaries with a comprehensive set of acoustic-prosodic measures. The purpose of this study is two-fold. First, the acoustic measures adopted will be introduced with a highlight of their linguistic grounds. Second, a semantic-based analysis of the prosodic phrasing boundaries will be made to investigate how our speech prosody may interact with our basic grammatical unit, the clause unit.
Automatic recognition of the perceived emotion of music allows users to retrieve and organize their music collections in a fashion that is content-centric and intuitive. A typical approach to music emotion recognition categorizes emotions into a number of classes and applies machine learning techniques to train a classifier. This approach, however, faces a granularity issue that the number of emotion classes is too small in comparison with the richness of emotion perceived by humans. In this talk, I would introduce some research that takes a very different perspective and views emotions as points in a 2-D space spanned by two latent dimensions: valence (how positive or negative) and arousal (how exciting or calming). In this approach, music emotion recognition becomes the prediction of the valence and arousal values of a song corresponding to a point in the emotion plane. This way, the granularity and ambiguity issues associated with emotion classes no longer exist since no categorical class is needed. Moreover, because the 2D plane provides a simple means for user interface, new emotion-based music organization, browsing, and retrieval can be easily created for mobile devices that have small display area.
Imagining humans as complex dynamical systems: systems that are characterized by multiple interacting layers of hidden internal states (e.g., processes involve cognition, perception, production, emotion, and social interaction) producing expressive observed multimodal behavior signals (e.g., body gestures, facial expressions, physiology, and speech). Behavioral Signal Processing (BSP) is an emerging frontier of interdisciplinary research effort between behavior science and engineering from this abstraction. It aims at mathematically modeling the manifested behaviors using common observational data (e.g., audio-video recordings), with methods grounded in advanced signal processing and machine learning techniques. The tangible outcomes of BSP, i.e., behavioral informatics, offer novel computational methods for better human decision-making with application domains ranging from mental health, education, personalized human-machine interfaces, to user-centric commercial applications.
In this talk, I will discuss the role of speech processing techniques in the realm of BSP – grounding the effort in mental health research and practices. In specific, I will present two aspects of my research works on utilizing speech processing technologies for analyzing dyadic face-to-face interactions in mental health applications: 1) spontaneous prosody analyses in analyzing ADOS diagnostic session for autism spectrum disorder (ASD), 2) signal-derived quantification of vocal behavior synchrony in interaction analysis for behavioral couple therapy. Each of these perspectives is an attempt to quantitatively study spontaneous spoken interactions in order to advance understanding of human behaviors under different disordered conditions, and each of these methodologies are grounded through a tight interdisciplinary collaboration.
The present study proposes HMM-based ASR as an effective screening system for optimizing speech processing strategies for individual cochlear implant users. Synthesized vocoded testing sets approximate the effects of reduced spectral analysis and source segregation in the auditory periphery and brainstem. Training sets were manipulated to approximate the effects of impaired lexical, prosodic, and phonetic analysis in the cortex. Postoperative language development for postlingually deaf cochlear implant users was simulated with large vocabulary speech databases, context-dependent acoustic models, and tri-gram language models. Procedures were all extended to tonal languages (i.e., Mandarin Chinese) to contrast the effects of F0 contours and alternative temporal filters. A bio-inspired approach to designing noise robust speech recognition systems is discussed with the unveiling of the SOUND algorithm (spectral or undertone normalization decomposition), the first attempt to combine both the place code and temporal code of phase-locked action potentials in human auditory systems.
國立臺北大學新北市三峽區大學路 151 號
23741 新北市三峽區大學路 151 號
國立臺北大學 法學大樓 8F08
聯絡電話：02-8674-1111 分機 67732
23741 新北市三峽區大學路 151 號
國立臺北大學 法學大樓 5F24
聯絡電話：02-8674-1111 分機 67733
聯絡電話：02-2788-3799 分機 1502