Samuel Thomas

Overview

Title

Senior Research Scientist - Speech Recognition and Spoken Language Understanding

Location

IBM Research - Yorktown Heights Yorktown Heights, NY USA

Bio

Samuel Thomas received his B.Tech degree in Computer Engineering from the Cochin University of Science and Technology, India and M.S degree in Computer Science and Engineering from the Indian Institute of Technology Madras, India before earning his Doctor of Philosophy degree from the Johns Hopkins University, Baltimore. Since graduation, he has been at the IBM T.J. Watson Research Center, New York with the Speech Technologies Group. In the past, he has worked on several speech research projects and workshops with the Center for Language and Speech Processing (CLSP) at JHU, the Idiap Research Institute, Switzerland and the TeNeT group, IIT Madras. His research interests include speech processing and machine learning for speech recognition, spoken language understanding, speech synthesis and speaker recognition. Samuel is an IBM Master Inventor, a Senior Member of the IEEE and also an Associate Editor of the IEEE/ACM Transactions on Audio, Speech, and Language Processing. He is also an elected member of the IEEE Speech and Language Technical Committee (SLTC).

Publications

ConvKT: Conversation-Level Knowledge Transfer for Context Aware End-to-End Spoken Language Understanding
- - Vishal Sunder
  - Eric Fosler-Lussier
  - et al.
- 2023
- INTERSPEECH 2023
Comparison of Multilingual Self-Supervised and Weakly-Supervised Speech Pre-Training for Adaptation to Unseen Languages
- - Andrew Rouditchenko
  - Sameer Khurana
  - et al.
- 2023
- INTERSPEECH 2023
Effective Training of RNN Transducer Models on Diverse Sources of Speech and Text Data
- - Takashi Fukuda
  - Samuel Thomas
- 2023
- ICASSP 2023
Fine-Grained Textual Knowledge Transfer to Improve RNN Transducers for Speech Recognition and Understanding
- - Vishal Sunder
  - Samuel Thomas
  - et al.
- 2023
- ICASSP 2023
Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition
- - Samuel Thomas
  - Hong-Kwang J. Kuo
  - et al.
- 2023
- ICASSP 2023
C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
- - Andrew Rouditchenko
  - Yung-Sung Chuang
  - et al.
- 2023
- ICASSP 2023
Extending RNN-T-based speech recognition systems with emotion and language classification
- - Zvi Kons
  - Hagai Aronowitz
  - et al.
- 2022
- INTERSPEECH 2022
Global RNN Transducer Models For Multi-dialect Speech Recognition
- - Takashi Fukuda
  - Samuel Thomas
  - et al.
- 2022
- INTERSPEECH 2022
Tokenwise Contrastive Pretraining for Finer Speech-to-BERT Alignment in End-to-End Speech-to-Intent Systems
- - Vishal Sunder
  - Eric Fosler-Lussier
  - et al.
- 2022
- INTERSPEECH 2022
Everything at Once - Multi-modal Fusion Transformer for Video Retrieval
- - Nina Shvetsova
  - Brian Chen
  - et al.
- 2022
- CVPR 2022

Visit Google Scholar

Patents

- 11 Mar 2024
- US
- 11929062
End-to-end Spoken Language Understanding Without Full Transcripts
- 19 Feb 2024
- US
- 11908454
Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models
- 19 Feb 2024
- US
- 11907845
Training Teacher Machine Learning Models Using Lossless And Lossy Branches
- 12 Feb 2024
- US
- 11900922
Multilingual Intent Recognition
- 10 Jan 2024
- TW
- I829312
Integrating Text Inputs For Training And Adapting Neural Network Transducer Asr Models
- 28 Aug 2023
- US
- 11741355
Training Of Student Neural Network With Teacher Neural Networks
- 25 Jul 2023
- GB
- 2603573
Multi-modal Lung Capacity Measurement For Respiratory Illness Prediction
- 19 Jun 2023
- CN
- ZL202080079920.0
Using Closed Captions As Parallel Training Data For Customization Of Closed Captioning Systems
- 08 May 2023
- US
- 11645329
Constructing, Evaluating, And Improving A Search String For Retrieving Images Indicating Item Use
- 20 Mar 2023
- US
- 11610108
Training Of Student Neural Network With Switched Teacher Neural Networks

Top collaborators