Publications
可搜索论文条目中的所有可见文本,包括:论文标题、作者姓名、会议/期刊名、年份、摘要内容等。
You can search for all visible text in the publication entries, including the title, authors, conference/journal name, year, absract content, etc.
2026
2025
-
TASLPSteering Language Model to Stable Speech Emotion Recognition via Contextual Perception and Chain of ThoughtIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
-
TASLPFPO: Fine-grained Preference Optimization Improves Zero-shot Text-to-SpeechIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2025
2024
-
Speech CommunicationWhisper-SV: Adapting Whisper for low-data-resource speaker verificationSpeech Communication, 2024
-
SPLMMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech RecognitionIEEE Signal Processing Letters, 2024
2023
-
SLT
2022
-
InterspeechNPU-HCSpeaker Verification System for Far-field Speaker Verification Challenge 2022In Interspeech, 2022
-
ICASSPTEA-PSE: Tencent-ethereal-audiolab personalized speech enhancement system for ICASSP 2022 DNS CHALLENGEIn ICASSP, 2022
-
ISCSLPThe ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge (ICSRC): Dataset, Tracks, Baseline and ResultsIn ISCSLP, 2022
-
Neural NetworksTwo-stage streaming keyword detection and localization with multi-scale depthwise temporal convolutionNerual Networks, 2022
-
Neural NetworksNeural speech enhancement with unsupervised pre-training and mixture trainingNerual Networks, 2022
-
SPLCross-speaker Emotion Transfer through Information Perturbation in Emotional Speech SynthesisIEEE Signal Processing Letters, 2022
-
SPLCross-speaker Emotion Transfer through Information Perturbation in Emotional Speech SynthesisIEEE Signal Processing Letters, 2022
2021
-
APSIPA ASCTarget Speaker Extraction for Customizable Query-by-Example Keyword SpottingIn APSIPA ASC, 2021
-
ICMLEfficient Gradient-Based Neural Architecture Search For End-to-End ASRIn ICML, 2021
-
ICML
-
ICMLNoise Robust Singing Voice Synthesis Using Gaussian Mixture Variational AutoencoderIn ICML, 2021
-
ASRUBoundary and Context Aware Training for CIF-based Non-Autoregressive End-to-end ASRIn ASRU, 2021
-
Speech CommunicationFactorized WaveNet for voice conversion with limited dataSpeech Communication, 2021
-
Computer Speech and LangaugeEffective and direct control of neural TTS prosody by removing interactions between different attributesComputer Speech & Language, 2021
-
SPLLET-Decoder: A WFST-based lazy-evaluation token-group decoder with exact lattice generationIEEE Signal Processing Letters, 2021
2020
-
ICASSPMining Effective Negative Training Samples for Keyword SpottingIn ICASSP, 2020
-
ICASSPEffective Wavenet Adaptation for Voice Conversion with Limited DataIn ICASSP, 2020
-
ICASSPTime-Domain Neural Network Approach for Speech Bandwidth ExtensionIn ICASSP, 2020
-
TASLPFast Query-by-example Speech Search using Attention-based Deep Binary EmbeddingsIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2020
-
Neural NetworksOn the localness modeling for the self-attention based end-to-end speech synthesisNerual Networks, 2020
-
TALLIPLoanword Identification in Low-resource Languages with Minimal SupervisionACM Transactions on Asian and Low-Resource Language Information Processing, 2020
2019
-
ASRUWavenet Factorization with Singular Value Decomposition for Voice ConversionIn ASRU, 2019
-
ASRUImproving Mandarin End-to-End Speech Synthesis by Self-Attention and Learnable Gaussian BiasIn ASRU, 2019
-
ASRUVerifying Deep Keyword Spotting Detection with Acoustic Word EmbeddingsIn ASRU, 2019
-
ASRUControlling Emotion Strength with Relative Attribute for End-To-End Speech SynthesisIn ASRU, 2019
-
ASRULearning Hierarchical Representations for Expressive Speaking Style in End-to-End Speech SynthesisIn ASRU, 2019
-
ASRUVirtual Adversarial Training for DS-CNN Based Small-Footprint Keyword SpottingIn ASRU, 2019
-
ASRUIncremental Lattice Determinization for Wfst DecodersIn ASRU, 2019
-
ICMIDeep Audio-visual System for Closed-set Word-level Speech RecognitionIn ICMI, 2019
-
APSIPA ASCExploring RNN-Transducer for Chinese Speech RecognitionIn APSIPA ASC, 2019
-
APSIPA ASCMultiple Fixed Beamformers with a Spacial Wiener-form Postfilter for Far-Field Speech RecognitionIn APSIPA ASC, 2019
-
InterspeechUnsupervised Adaptation with Adversarial Dropout Regularization for Robust Speech RecognitionIn Interspeech, 2019
-
InterspeechAdversarial Regularization for End-to-end Robust Speaker VerificationIn Interspeech, 2019
-
InterspeechTowards Language-Universal Mandarin-English Speech RecognitionIn Interspeech, 2019
-
ICASSPEnhancing Hybrid Self-Attention Structure with Relative-Position-Aware Bias for Speech SynthesisIn ICASSP, 2019
-
ICASSPInvestigating End-To-End Speech Recognition for Mandarin-English Code-SwitchingIn ICASSP, 2019
-
ICASSPComponent Fusion: Learning Replaceable Language Model Component for End-To-End Speech Recognition SystemIn ICASSP, 2019
-
ICASSPA Pitch-Aware Approach to Single-Channel Speech SeparationIn ICASSP, 2019
-
ICASSPDomain Adversarial Training for Improving Keyword Spotting Performance of Esl SpeechIn ICASSP, 2019
-
ICASSPAn Attention-Based Neural Network Approach for Single Channel Speech EnhancementIn ICASSP, 2019
-
ICASSPAdversarial Examples for Improving End-To-End Attention-Based Small-Footprint Keyword SpottingIn ICASSP, 2019
-
ICASSPRobust Audio-Visual Speech Recognition Using Bimodal Dfsmn with Multi-Condition Training and Dropout RegularizationIn ICASSP, 2019
-
CHiMEThe NWPU System for CHiME-5 ChallengeIn CHiME, 2019
-
CHiMEMultiple Beamformers with ROVER for the CHiME-5 ChallengeIn CHiME, 2019
-
TETCIImproving Adversarial Neural Machine Translation for Morphologically Rich LanguageIEEE Transactions on Emerging Topics in Computational Intelligence, 2019
-
TASLPAdversarial Regularization for Attention Based End-to-End Robust Speech RecognitionIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2019
-
SPLRegion Proposal Network Based Small-Footprint Keyword SpottingIEEE Signal Processing Letters, 2019
-
AccessPre-Alignment Guided Attention for Improving Training Efficiency and Model Stability in End-to-End Speech SynthesisIEEE Access, 2019
-
AccessQuery-by-Example Speech Search Using Recurrent Neural Acoustic Word Embeddings With Temporal ContextIEEE Access, 2019
2018
-
ISCSLPA Refined Query-by-Example Approach to Spoken-Term-Detection on ESL Learners’ SpeechIn ISCSLP, 2018
-
ACM MMA Kullback-Leibler Divergence Based Recurrent Mixture Density Network for Acoustic Modeling in Emotional Statistical Parametric Speech SynthesisIn ACM MM, 2018
-
ACM MMA Comparison of Expressive Speech Synthesis Approaches based on Neural NetworkIn ACM MM, 2018
-
ICASSPUnsupervised Domain Adaptation Via Domain Adversarial Training for Speaker RecognitionIn ICASSP, 2018
-
Journal of Signal Processing SystemsGuest Editorial: Advances in Deep Learning for Speech ProcessingJournal of Signal Processing Systems, 2018
2017
-
ASRUMultilingual Bottle-Neck Feature Learning from Untranscribed SpeechIn ASRU, 2017
-
ASRUExtracting Bottleneck Features and Word-Like Pairs from Untranscribed Speechfor Feature RepresentationIn ASRU, 2017
-
APSIPA ASCAn End-to-End Neural Network Approach to Story SegmentationIn APSIPA ASC, 2017
-
APSIPA ASCTopic Embedding of Sentences for Story SegmentationIn APSIPA ASC, 2017
-
APSIPA ASCA Segmental DNN/i-vector Approach for Digit-Prompted Speaker VerificationIn APSIPA ASC, 2017
-
InterspeechDenoising Recurrent Neural Network for Deep Bidirectional LSTM based Voice ConversionIn Interspeech, 2017
-
ICASSPPairwise learning using multi-lingual bottleneck features for low-resource query-by-example spoken term detectionIn ICASSP, 2017
-
ICMIThe I2R-NWPU Text-to-Speech System for Blizzard Challenge 2017In ICMI, 2017
-
Frontiers of Computer ScienceSound image externalization for headphone based real-time 3D audioFrontiers of Computer Science, 2017
-
Journal of Signal Processing SystemsA Bidirectional LSTM Approach with Word Embeddings for Sentence Boundary DetectionJournal of Signal Processing Systems, 2017
-
J-STSPMulti-Task Feature Learning for Low-Resource Query-by-Example Spoken Term DetectionIEEE Journal of Selected Topics in Signal Processing, 2017
-
Signal ProcessingLearning Distributed Sentence Representations for Story SegmentationSignal Processing, 2017
2016
-
ISCSLPInvestigating Neural Network based Query-by-Example Keyword Spotting Approach for Personalized Wake-up Word Detection in Mandarin ChineseIn ISCSLP, 2016
-
ISCSLPA Bi-directional LSTM Approach for Polyphone Disambiguation in Mandarin ChineseIn ISCSLP, 2016
-
ISCSLPInvestigating LSTM for Punctuation PredictionIn ISCSLP, 2016
-
APSIPA ASCPredicting Articulatory Movement From Text Using Deep Architecture with Stacked Bottleneck FeaturesIn APSIPA ASC, 2016
-
InterspeechUnsupervised Bottleneck Features for Low-Resource Query-By-Example Spoken Term DetectionIn Interspeech, 2016
-
InterspeechLearning Neural Network Representations Using Cross-Lingual Bottleneck Features with Word-Pair InformationIn Interspeech, 2016
-
InterspeechA DNN - Hmm Approach to Story SegmentationIn Interspeech, 2016
-
InterspeechDeep Bidirectional Lstm Modeling of Timbre and Prosody for Emotional Voice ConversionIn Interspeech, 2016
-
InterspeechToward High-Performance Language-Independent Query-By-Example Spoken Term Detection for Mediaeval 2015: Post-Evaluation AnalysisIn Interspeech, 2016
-
ICMEDeep Neural Network Derived Bottleneck Features for Accurate Audio ClassificationIn ICME, 2016
-
ICASSPExemplar-Based Sparse Representation of Timbre and Prosody for Voice ConversionIn ICASSP, 2016
-
ICASSPApproximate Search of Audio Queries Using Dtw with Phone Time Boundary and Data AugmentationIn ICASSP, 2016
-
ASRUAutomatic Prosody Prediction for Chinese Speech Synthesis Using Blstm-Rnn and Embedding FeaturesIn ASRU, 2016
-
APSIPA ASCA Waveform Representation Framework for High-Quality Statistical Parametric Speech SynthesisIn APSIPA ASC, 2016
-
APSIPA ASCA Density Peak Clustering Approach to Unsupervised Acoustic Subword Units DiscoveryIn APSIPA ASC, 2016
-
APSIPA ASCNon-Negative Matrix Factorization Using Stable Alternating Direction Method of Multipliers for Source SeparationIn APSIPA ASC, 2016
-
InterspeechParallel Inference of Dirichlet Process Gaussian Mixture Models for Unsupervised Acoustic Modeling: A Feasibility StudyIn Interspeech, 2016
-
InterspeechAn Alternating Optimization Approach for Phase RetrievalIn Interspeech, 2016
-
InterspeechArticulatory Movement Prediction Using Deep Bidirectional Long Short-Term Memory Based Recurrent Neural Networks and Word/Phone EmbeddingsIn Interspeech, 2016
-
InterspeechRegularized Non-Negative Matrix Factorization Using Alternating Direction Method of Multipliers and Its Application to Source SeparationIn Interspeech, 2016
-
ICASSPPhoto-Real Talking Head with Deep Bidirectional LstmIn ICASSP, 2016
-
ICASSPLanguage independent query-by-example spoken term detection using N-best phone sequences and partial matchingIn ICASSP, 2016
-
TASLPModeling Latent Topics and Temporal Distance for Story Segmentation of Broadcast NewsIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2016
-
NeurocomputingAn unsupervised deep domain adaptation approach for robust speech recognitionNeurocomputing, 2016
2015
-
MTAA Deep Bidirectional Lstm Approach for Video-Realistic Talking HeadMultimedia Tools and Applications, 2015
2014
-
APSIPA ASCMulti-View Features in A Dnn-Crf Model for Improved Sentence Unit Detection on English Broadcast NewsIn APSIPA ASC, 2014
-
InterspeechSpeech-Driven Head Motion Synthesis Using Neural NetworksIn Interspeech, 2014
-
InterspeechA Deep Neural Network Approach for Sentence Boundary Detection in Broadcast NewsIn Interspeech, 2014
-
InterspeechIntrinsic Spectral Analysis Based on Temporal Context Features for Query By Example Spoken Term DetectionIn Interspeech, 2014
-
InterspeechStereo Acoustic Echo Suppression Using Widely Linear Filtering in the Frequency DomainIn Interspeech, 2014
-
ISCSLPA Hybrid Virtual Bass System with Improved Phase Vocoder and High EfficiencyIn ISCSLP, 2014
-
ISCSLPExperimental Study on Dereverberation and Noise Reduction for Distant Speech RecognitionIn ISCSLP, 2014
-
ICIPAn Ensemble of Deep Neural Networks for Object TrackingIn ICIP, 2014
-
ICASSPUnsupervised Broadcast News Story Segmentation Using Distance Dependent Chinese Restaurant ProcessesIn ICASSP, 2014
-
CHINA SIPLearning Optimal Features for Music TranscriptionIn CHINA SIP, 2014
-
CHINA SIPSentence Boundary Detection in Chinese Broadcast News Using Conditional Random Fields and Prosodic FeaturesIn CHINA SIP, 2014
-
MTAMultimodal Joint Information Processing in Human Machine Interaction: Recent AdvancesMultimedia Tools and Applications, 2014
-
MTAA Statistical Parametric Approach to Video-Realistic Text-Driven Talking AvatarMultimedia Tools and Applications, 2014
-
SOFT COMPUTTopic Segmentation on Spoken Documents Using Self-Validated Acoustic CutsSoft Computing, 2014
-
MTAHead Motion Synthesis From Speech Using Deep Neural NetworksMultimedia Tools and Applications, 2014
-
TMMTennis Ball Tracking Using A Two-Layered Data Association ApproachIEEE Transactions on Multimedia, 2014
2013
-
ACM MMOnline Object Tracking Based on Cnn with Metropolis-Hasting Re-SamplingIn ACM MM, 2013
-
YESFilter Bank Design for Automatic Music TranscriptionIn YES, 2013
-
APSIPA ASCPersonalized 3-D Facial Expression Synthesis Based on Landmark ConstraintIn APSIPA ASC, 2013
-
APSIPA ASCNumerical Calculation of the Head-Related Transfer Functions with Chinese Dummy HeadIn APSIPA ASC, 2013
-
ICASSPA Tighter Lower Bound Estimate for Dynamic Time WarpingIn ICASSP, 2013
-
ICASSPA Two Layered Data Association Approach for Ball TrackingIn ICASSP, 2013
-
ICASSPBroadcast News Story Segmentation Using Latent Topics on Data ManifoldIn ICASSP, 2013
-
ICASSPMeasuring semantic similarity by contextualword connections in Chinese news story segmentationIn ICASSP, 2013
-
APSIPA ASCFace Sketch-To-Photo Synthesis From Simple Line DrawingIn APSIPA ASC, 2013
-
ACLBroadcast News Story Segmentation Using Manifold Learning on Latent Topic DistributionsIn ACL, 2013
-
APSIPA ASCContext-Dependent Deep Neural Networks for Commercial Mandarin Speech Recognition ApplicationsIn APSIPA ASC, 2013
-
QHDXXBHead Motion Generation for Speech-Driven Talking AvatarJournal of Tsinghua University (Science and Technology), 2013
-
QHDXXBMandarin speech pattern discovery using segmental dynamic timJournal of Tsinghua University (Science and Technology), 2013
2012
-
InterspeechSpeech Pattern Discovery Using Audio-Visual Fusion and Canonical Correlation AnalysisIn Interspeech, 2012
-
InterspeechA Two Stage Mask Estimation Approach to Robust Speaker VerificationIn Interspeech, 2012
-
InterspeechLexical Story Co-Segmentation of Chinese Broadcast NewsIn Interspeech, 2012
-
ISCSLPProsody-Based Sentence Boundary Detection in Chinese Broadcast NewsIn ISCSLP, 2012
-
APSIPA ASCDetection of Ball Hits in A Tennis Game Using Audio and Visual InformationIn APSIPA ASC, 2012
-
ICASSPAcoustic Texttiling for Story Segmentation of Spoken DocumentsIn ICASSP, 2012
-
ICALIPDual-Microphone Based Binary Mask Estimation for Robust Speaker VerificationIn ICALIP, 2012
-
ICALIPComprehensive Comparison of the Least Mean Square Algorithm and the Fast Deconvolution Algorithm for Crosstalk CancellationIn ICALIP, 2012
2011
-
InterspeechProbabilistic Latent Semantic Analysis for Broadcast News Story SegmentationIn Interspeech, 2011
-
APSIPA ASCBroadcast News Story Segmentation Using Probabilistic Latent Semantic Analysis and Laplacian EigenmapsIn APSIPA ASC, 2011
-
APSIPA ASCMultiple Sparse Sources Separation Based on Multichannel Frequency Domain Adaptive FilteringIn APSIPA ASC, 2011
-
APSIPA ASCA Block-Based Blind Source Separation Approach with Equilateral Triangular Microphone ArrayIn APSIPA ASC, 2011
-
Information SciencesOn The Effectiveness Of Subwords for Lexical Cohesion Based Story Segmentation Of Chinese Broadcast NewsInformation Sciences, 2011
-
Multimedia SystPitch-Density-Based Features And An Svm Binary Tree Approach for Multi-Class Audio Classification in Broadcast NewsMultimedia Systems, 2011
-
QHDXXBReal-Time Speech Driven Talking AvatarJournal of Tsinghua University (Science and Technology), 2011
-
QHDXXBSemi - Blind Dual - Microphone Noise Reduction with Known Target LocalizationJournal of Tsinghua University (Science and Technology), 2011
-
CJEAn Automatic Caption Generator for Mandarin Broadcast NewsChinese Journal of Electronics, 2011
-
TASLPLaplacian Eigenmaps for Automatic Story Segmentation Of Broadcast NewsIEEE/ACM Transactions on Audio, Speech, and Language Processing, 2011
2010
-
ISCSLPMulti-Modal Feature Integration for Story Boundary Detection in Broadcast NewsIn ISCSLP, 2010
-
APSIPA ASCModeling Broadcast News Prosody Using Conditional Random Fields for Story SegmentationIn APSIPA ASC, 2010
-
InterspeechMaximum Lexical Cohesion for Fine-Grained News Story SegmentationIn Interspeech, 2010
-
InterspeechPhoneme Lattice Based Texttiling Towards Multilingual Story SegmentationIn Interspeech, 2010
-
ICALIPIntegrating Acoustic and Lexical Features in Topic Segmentation of Chinese Broadcast News Using Maximum Entropy ApproachIn ICALIP, 2010
-
ICALIPLaplacian Eigenmaps for Automatic News Story SegmentationIn ICALIP, 2010
-
UICSpeech and Auditory Interfaces for Ubiquitous, Immersive and Personalized ApplicationsIn UIC, 2010
-
ICWMMNAn Experimental Comparison on Kemar and Bhead210 Dummy Heads for Hrtf-Based Virtual Auditory on Chinese SubjectsIn ICWMMN, 2010
-
Information SciencesMinimizing The Expected Complete Influence Time Of A Social NetworkInformation Sciences, 2010
2009
-
ACCVMulticue Graph Mincut for Image SegmentationIn ACCV, 2009
-
AIRSA subword normalized cut approach to automatic story segmentation of chinese broadcast newsIn AIRS, 2009
-
ISCSLPA Two - Stage Multi - Feature Integration Approach to Unsupervised Speaker Change Detection in Real - Time News BroadcastingIn ISCSLP, 2009
-
HHMEAnchor Labeling System for Broadcast News Using Alize ToolkitIn HHME, 2009
-
JVLCAudio - Visual Human Recognition Using Semi - Supervised Spectral Learning And Hidden Markov ModelsJournal of Visual Languages and Computing, 2009
-
Information SciencesCascade Markov Random Fields for Stroke Extraction Of Chinese CharactersInformation Sciences, 2009
-
IEICE TISDynamic Bayesian Network Inversion for Robust Speech RecognitionIEICE Transactions on Information and Systems, 2009
2008
-
ISCSLPSubword Latent Semantic Analysis for Textiling - Based Automatic Story Segmentation Of Chinese Broadcast NewsIn ISCSLP, 2008
-
PCMSubword Lexical Chaining for Automatic Story Segmentation in Chinese Broadcast NewsIn PCM, 2008
-
AIRSMulti - Scale Textiling for Automatic Story Segmentation in Chinese Broadcast NewsIn AIRS, 2008
-
Multimedia SystDiscovering Salient Prosodic Cues And Their Interactions for Automatic Story Segmentation in Mandarin Broadcast NewsMultimedia Systems, 2008
2007
-
ICMENoise Robust Features for Speech/Music Discrimination in Real - Time TelecommunicationIn ICME, 2007
-
NCMMSCClassification of Music and Speech in Mandarin News BroadcastsIn NCMMSC, 2007
-
InterspeechModeling the Statistical Behavior of Lexical Chains to Capture Word Cohesiveness for Automatic Story SegmentationIn Interspeech, 2007
2006
-
ICPRSpeech animation using coupled hidden Markov modelIn ICPR, 2006
-
ICSMCLip assistant: Visualize speech for hearing impaired people in multinIn ICSMC, 2006
-
ISCSLPA Cantonese Speech - Driven Talking Face Using Translingual Audio-to-Visual ConversionIn ISCSLP, 2006
-
ICMLCMulti - Stream Articulator Model with Adaptive Reliability Measure for Audio Visual Speech RecognitionIn ICMLC, 2006
-
NAACLCombined Use of Speaker-and Tone - Normalized Pitch Reset with Pause Duration for Automatic Story Segmentation in Mandarin Broadcast NewsIn NAACL, 2006
-
ICASSPAn articulatory approach to video-realistic mouth animationIn ICASSP, 2006
-
TMMRealistic Mouth - Synching for Speech - Driven Talking Face Using Articulatory ModellingIEEE Transactions on Multimedia, 2006
-
PRA Coupled Hmm Approach for Video-Realistic Speech AnimationPattern Recognition, 2006