Samuel Thomas  Samuel Thomas photo       

contact information

Speech Recognition
Thomas J. Watson Research Center, Yorktown Heights, NY USA
  +1dash914dash945dash1142

links



2017

A Recorded Debating Dataset
Mirkin, Shachar and Jacovi, Michal and Lavee, Tamar and Kuo, Hong-Kwang and Thomas, Samuel and Sager, Leslie and Kotlerman, Lili and Venezian, Elad and Slonim, Noam
arXiv preprint arXiv:1709.06438, 2017

Efficient Knowledge Distillation from an Ensemble of Teachers
Fukuda, Takashi and Suzuki, Masayuki and Kurata, Gakuto and Thomas, Samuel and Cui, Jia and Ramabhadran, Bhuvana
Proc. Interspeech 2017, 3697--3701
Abstract

Effective Joint Training of Denoising Feature Space Transforms and Neural Network Based Acoustic Models
Takashi Fukuda, Osamu Ichikawa, Gakuto Kurata, Ryuki Tachibana, Samuel Thomas, Bhuvana Ramabhadran
proc. of ICASSP, pp. pp. 5190-5194, 2017

English Conversational Telephone Speech Recognition by Humans and Machines
Saon, George and Kurata, Gakuto and Sercu, Tom and Audhkhasi, Kartik and Thomas, Samuel and Dimitriadis, Dimitrios and Cui, Xiaodong and Ramabhadran, Bhuvana and Picheny, Michael and Lim, Lynn-Li and others
arXiv preprint arXiv:1703.02136, 2017
Abstract


2016

Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings.
Suzuki, Masayuki and Tachibana, Ryuki and Thomas, Samuel and Ramabhadran, Bhuvana and Saon, George
INTERSPEECH, pp. 1588--1592, 2016
Abstract

Multilingual Data Selection For Low Resource Speech Recognition
Thomas, Samuel and Audhkhasi, Kartik and Cui, Jia and Kingsbury, Brian and Ramabhadran, Bhuvana
2016

Domain Adaptation of CNN Based Acoustic Models Under Limited Resource Settings
Suzuki, Masayuki and Tachibana, Ryuki and Thomas, Samuel and Ramabhadran, Bhuvana and Saon, George
Interspeech 2016, 1588--1592

An Investigation on the Use of i-vectors for Robust ASR
Dimitriadis, Dimitrios and Thomas, Samuel and Ganapathy, Sriram
Interspeech 2016, 3828--3832
Abstract

CNMF-based acoustic features for noise-robust ASR
Vaz, Colin and Dimitriadis, Dimitrios and Thomas, Samuel and Narayanan, Shrikanth
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pp. 5735--5739
Abstract

Invariant Representations for Noisy Speech Recognition
Serdyuk, Dmitriy and Audhkhasi, Kartik and Brakel, Phil{'e}mon and Ramabhadran, Bhuvana and Thomas, Samuel and Bengio, Yoshua
arXiv preprint arXiv:1612.01928, 2016
Abstract

On the importance of event detection for ASR
Haws, David and Dimitriadis, Dimitrios and Saon, George and Thomas, Samuel and Picheny, Michael
Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE International Conference on, pp. 5705--5709
Abstract


2015

The IBM BOLT speech transcription system.
Thomas, Samuel and Saon, George and Kuo, Hong-Kwang Jeff and Mangu, Lidia
INTERSPEECH, pp. 3150--3153, 2015
Abstract

Investigating factor analysis features for deep neural networks in noisy speech recognition.
Ganapathy, Sriram and Thomas, Samuel and Dimitriadis, Dimitrios and Rennie, Steven J
INTERSPEECH, pp. 1898--1902, 2015
Abstract

Improvements to the IBM speech activity detection system for the DARPA RATS program
Thomas, Samuel and Saon, George and Van Segbroeck, Maarten and Narayanan, Shrikanth S
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on, pp. 4500--4504
Abstract


2014

Robust language identification using convolutional neural network features.
Ganapathy, Sriram and Han, Kyu Jeong and Thomas, Samuel and Omar, Mohamed Kamal and Van Segbroeck, Maarten and Narayanan, Shrikanth S
INTERSPEECH, pp. 1846--1850, 2014
Abstract

Robust Language Identification Using Convolutional Neural Network Features
Sriram Ganapathy, Kyu Han, Samuel Thomas, Mohamed Omar, Maarten Van Segbroeck, Shrikanth S Narayanan
Fifteenth Annual Conference of the International Speech Communication Association, 2014

Deep order statistic networks
Steven Rennie, Vaibhava Goel, Samuel Thomas
Proc. of the IEEE Workshop on Spoken Language Technology (SLT), 2014

Annealed dropout training of deep networks
Steven Rennie, Vaibhava Goel, Samuel Thomas
Spoken Language Technology (SLT), IEEE Workshop on. IEEE, 2014

Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions
Samuel Thomas, Sriram Ganapathy, George Saon, Hagen Soltau
2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2519--2523


2013

A Summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition| Macquarie University ResearchOnline
Aren Jansen, Emmanuel Dupoux, Mike Seltzer, Pascal Clark, Ian McGraw, Balakrishnan Varadarajan, Erin Bennett, Benjamin Borschinger, Justin Chiu, Ewan Dunbar, others
ICASSP, 2013

Data-driven Neural Network Based Feature Front-ends for Automatic Speech Recognition
Samuel Thomas
2013 - old-site.clsp.jhu.edu

Developing a speaker identification system for the DARPA RATS project.
Oldrich Plchot, Spyros Matsoukas, Pavel Matejka, Najim Dehak, Jeff Z Ma, Sandro Cumani, Ondrej Glembek, Hynek Hermansky, Sri Harish Reddy Mallidi, Nima Mesgarani, others
ICASSP, pp. 6768--6772, 2013

Weak top-down constraints for unsupervised acoustic model training.
Aren Jansen, Samuel Thomas, Hynek Hermansky
ICASSP, pp. 8091--8095, 2013

The IBM speech activity detection system for the DARPA RATS program.
George Saon, Samuel Thomas, Hagen Soltau, Sriram Ganapathy, Brian Kingsbury
INTERSPEECH, pp. 3497--3501, 2013

Deep neural network features and semi-supervised training for low resource speech recognition
Samuel Thomas, Michael L Seltzer, Kenneth Church, Hynek Hermansky
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 6704--6708

A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition.
Aren Jansen, Emmanuel Dupoux, Sharon Goldwater, Mark Johnson, Sanjeev Khudanpur, Kenneth Church, Naomi Feldman, Hynek Hermansky, Florian Metze, Richard C Rose, others
ICASSP, pp. 8111--8115, 2013


2012

Feature extraction using 2-d autoregressive models for speaker recognition.
Ganapathy, Sriram and Thomas, Samuel and Hermansky, Hynek
Odyssey, pp. 229--235, 2012
Abstract

Adaptation transforms of auto-associative neural networks as features for speaker verification.
Thomas, Samuel and Mallidi, Sri Harish Reddy and Ganapathy, Sriram and Hermansky, Hynek
Odyssey, pp. 98--104, 2012
Abstract

Adaptation transforms of auto-associative neural networks as features for speaker verification
Samuel Thomas, Sri Harish Mallidi, Sriram Ganapathy, Hynek Hermansky
Proceedings of Odyssey, pp. 98--104, 2012

Exploiting Discriminative Point Process Models for Spoken Term Detection.
Atta Norouzian, Aren Jansen, Richard C Rose, Samuel Thomas
INTERSPEECH, 2012

Feature extraction using 2-d autoregressive models for speaker recognition
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
ISCA Speaker Odyssey, Citeseer, 2012

The UMD-JHU 2011 speaker recognition system
Daniel Garcia-Romero, Xinhui Zhou, D Zotkin, B Srinivasan, Yuancheng Luo, Sriram Ganapathy, Samuel Thomas, S Nemala, Garimella SVS Sivaram, Majid Mirbagheri, others
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 4229--4232

Intrinsic Spectral Analysis for Zero and High Resource Speech Recognition.
Aren Jansen, Samuel Thomas, Hynek Hermansky
INTERSPEECH, 2012

Data-driven posterior features for low resource speech recognition applications
Samuel Thomas, Sriram Ganapathy, Aren Jansen, Hynek Hermansky
Interspeech, 2012

Acoustic and Data-driven Features for Robust Speech Activity Detection.
Samuel Thomas, Sri Harish Reddy Mallidi, Thomas Janu, Hynek Hermansky, Nima Mesgarani, Xinhui Zhou, Shihab A Shamma, Tim Ng, Bing Zhang, Long Nguyen, others
INTERSPEECH, 2012

Multilingual MLP features for low-resource LVCSR systems
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pp. 4269--4272


2011

The subspace Gaussian mixture model—A structured model for speech recognition
Povey, Daniel and Burget, Luk{'a}{v{s}} and Agarwal, Mohit and Akyazi, Pinar and Kai, Feng and Ghoshal, Arnab and Glembek, Ond{v{r}}ej and Goel, Nagendra and Karafi{'a}t, Martin and Rastrow, Ariya and others
Computer Speech & Language 25(2), 404--439, Elsevier, 2011
Abstract

Mesgarani, JASA--EL Toward optimizing stream fusion in multistream recognition of speech Running title: Multistream speech recognition
Mesgarani, Nima and Thomas, Samuel and Hermansky, Hynek
Journal of Acoustical Society of America - Express Letters, Citeseer, 2011
Abstract

Performance monitoring for robustness in automatic recognition of speechi.
Hynek Hermansky, Nima Mesgarani, Samuel Thomas
MLSLP, pp. 31--34, 2011

MLP based phoneme detectors for automatic speech recognition
Samuel Thomas, Patrick Nguyen, Geoffrey Zweig, Hynek Hermansky
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 5024--5027

Mixture of Auto-Associative Neural Networks for Speaker Verification.
Garimella SVS Sivaram, Samuel Thomas, Hynek Hermansky
INTERSPEECH, pp. 2381--2384, 2011

Toward optimizing stream fusion in multistream recognition of speech
Nima Mesgarani, Samuel Thomas, Hynek Hermansky
The Journal of the Acoustical Society of America 130(1), EL14--EL18, Acoustical Society of America, 2011

Adaptive Stream Fusion in Multistream Recognition of Speech.
Nima Mesgarani, Samuel Thomas, Hynek Hermansky
INTERSPEECH, pp. 2329--2332, 2011

Rapid Evaluation of Speech Representations for Spoken Term Discovery.
Michael A Carlin, Samuel Thomas, Aren Jansen, Hynek Hermansky
INTERSPEECH, pp. 821--824, 2011

Speech recognitionwith segmental conditional random fields: A summary of the JHU CLSP 2010 summer workshop
Geoffrey Zweig, Patrick Nguyen, Dirk Van Compernolle, Kris Demuynck, Les Atlas, Pascal Clark, Gregory Sell, Meihong Wang, Fei Sha, Hynek Hermansky, others
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on, pp. 5044--5047

The subspace Gaussian mixture model - A structured model for speech recognition
Daniel Povey, Luk\'a\v{s} Burget, Mohit Agarwal, Pinar Akyazi, Feng Kai, Arnab Ghoshal, Ond\v{r}ej Glembek, Nagendra Goel, Martin Karafi\'at, Ariya Rastrow, others
Computer Speech \& Language 25(2), 404--439, Elsevier, 2011


2010

Speech recognition with segmental conditional random fields: final report from the 2010 JHU summer workshop
Geoffrey Zweig, Patrick Nguyen, Dirk Van Compernolle, Kris Demuynck, Hynek Hermansky, Damianos Karakos, Keith Kintzley, Samuel Thomas, Sivaram GSVS, Sam Bowman, others
Technical Report, Technical Report MSR-TR-2010-173, Microsoft Reasearch, 2010. 112, 113

A phoneme recognition framework based on auditory spectro-temporal receptive fields.
Samuel Thomas, Kailash Patil, Sriram Ganapathy, Nima Mesgarani, Hynek Hermansky
INTERSPEECH, pp. 2458--2461, 2010

Robust spectro-temporal features based on autoregressive models of hilbert envelopes
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 4286--4289

Comparison of modulation features for phoneme recognition
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 5038--5041

Temporal envelope compensation for robust phoneme recognition using modulation spectrum
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
The Journal of the Acoustical Society of America 128(6), 3769--3780, Acoustical Society of America, 2010

A multistream multiresolution framework for phoneme recognition.
Nima Mesgarani, Samuel Thomas, Hynek Hermansky
INTERSPEECH, pp. 318--321, 2010

A novel estimation of feature-space MLLR for full-covariance models
Arnab Ghoshal, Daniel Povey, Mohit Agarwal, Pinar Akyazi, Lukas Burget, Kai Feng, Ondrej Glembek, Nagendra Goel, Martin Karafi\'at, Ariya Rastrow, others
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 4310--4313

Approaches to automatic lexicon learning with limited training examples
Nagendra Goel, Samuel Thomas, Mohit Agarwal, Pinar Akyazi, Lukas Burget, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Martin Karafi\'at, Daniel Povey, others
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 5094--5097

Cross-lingual and multi-stream posterior features for low resource LVCSR systems.
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
INTERSPEECH, pp. 877--880, 2010

Multilingual acoustic modeling for speech recognition based on subspace Gaussian mixture models
Lukas Burget, Petr Schwarz, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra Goel, Martin Karafi\'at, Daniel Povey, others
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 4334--4337

Subspace Gaussian mixture models for speech recognition
Daniel Povey, Lukas Burget, Mohit Agarwal, Pinar Akyazi, Kai Feng, Arnab Ghoshal, Ondrej Glembek, Nagendra K Goel, Martin Karafi\'at, Ariya Rastrow, others
Acoustics Speech and Signal Processing (ICASSP), 2010 IEEE International Conference on, pp. 4330--4333


2009

Applications of signal analysis using autoregressive models for amplitude modulation
Sriram Ganapathy, Samuel Thomas, Petr Motlicek, Hynek Hermansky
Applications of Signal Processing to Audio and Acoustics, 2009. WASPAA'09. IEEE Workshop on, pp. 341--344

Tandem representations of spectral envelope and modulation frequency features for ASR.
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
INTERSPEECH, pp. 2955--2958, 2009

Temporal envelope subtraction for robust speech recognition using modulation spectrum
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
Automatic Speech Recognition \& Understanding, 2009. ASRU 2009. IEEE Workshop on, pp. 164--169

Static and dynamic modulation spectrum for speech recognition.
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
INTERSPEECH, pp. 2823--2826, 2009

Phoneme recognition using spectral envelope and modulation frequency features
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on, pp. 4453--4456

Modulation frequency features for phoneme recognition in noisy speech
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
The Journal of the Acoustical Society of America 125(1), EL8--EL12, Acoustical Society of America, 2009


2008

Front-end for far-field speech recognition based on frequency domain linear prediction
Ganapathy, Sriram and Thomas, Samuel and Hermansky, Hynek
Interspeech 2008
Abstract

Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech
Thomas, Samuel and Ganapathy, Sriram and Hermansky, Hynek
Interspeech 2008
Abstract

Spectro-temporal features for automatic speech recognition using linear prediction in spectral domain
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Proceedings of the 16th European Signal Processing Conference (EUSIPCO 2008), Lausanne, Switzerland

Front-end for far-field speech recognition based on frequency domain linear prediction.
Sriram Ganapathy, Samuel Thomas, Hynek Hermansky
INTERSPEECH, pp. 984--987, 2008

Hilbert envelope based features for far-field speech recognition
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Machine Learning for Multimodal Interaction, pp. 119--124, Springer, 2008

Hilbert envelope based spectro-temporal features for phoneme recognition in telephone speech.
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
INTERSPEECH, pp. 1521--1524, 2008

Recognition of reverberant speech using frequency domain linear prediction
Samuel Thomas, Sriram Ganapathy, Hynek Hermansky
Signal Processing Letters, IEEE15, 681--684, IEEE, 2008


2007

SSML Extensions for Indian Languages
Samuel Thomas, Ashish Verma, Nitendra Rajput
W3C Workshop, 2007

Language identification of person names using CF-IOF based weighing function.
Samuel Thomas, Ashish Verma
INTERSPEECH, pp. 1769--1772, 2007

Natural sounding text-to-speech synthesis based on syllable-like units
Samuel Thomas
Ph.D. Thesis, 2007


2006

Natural sounding speech based on syllable-like units
Samuel Thomas, M Nageshwar Rao, Hema A Murthy, CS Ramalingam
EUSIPCO, Florence, Italy, 2006


2005

Distributed Text to Speech Synthesis for Embedded Systems--An analysis
Samuel Thomas, Hema A Murthy, C Chandra Sekhar
Proceedings of the Eleventh National Conference on Communications: NCC-2005, 28-30 January, 2005, pp. 273

Text-to-Speech Synthesis using syllable-like units
M Nageshwara Rao, Samuel Thomas, T Nagarajan, Hema A Murthy
Proceedings of National Conference on Communications, IIT, India, pp. 277--280, 2005