Wen Liu photo

Research Areas

Project Name

iTrans (IBM Transcription Server)


Tab navigation

The exponential increasing of information has made accessing, managing and integrating heterogeneous information a strategic priority for enterprise. However, most of today's information management systems are still text-based ones, which means, the content inside audio (or video) materials is very hard to be accessed and utilized effectively and there is a strong requirement for a speech transcription platform from multiple industries:

1. University lecture management (Education Industry): There is no written record after lectures, unless someone takes notes. Using speech transcription technology, it is possible to create a transcription which then can be used to replicate the lecture, for multiple purposes, like archiving, searching or summarization, etc. It is also possible to create a translated version as well utilizing text-based machine translation technology, once a transcription has been created.

2. Multimedia retrieval (INTERNET or Media Industry): Audio/video is becoming very pragmatic and a common communication method over INTERNET, this creates an obstacle, however, for information searching. If audio/video materials can be transcribed, besides searching, other value-add activities will become possible, for example, the materials are then "viewable" in noisy environment where audio output is not possible. In addition, these materials then become accessible to PwD (person with disability).

3. Quality monitoring (Call Center): Currently, quality monitoring in call center is done by sampling a very small portion of the calls (e.g. 0.5%) and involving human listeners to perform assessment. This is highly inefficient because the majority of calls is uninteresting and is therefore a waste of time to listen to. Further, there can be a high degree of inconsistence between monitors because of the subjectivity of listeners. With the help of speech transcription technology, conversation between agents and customers can be monitored and scored automatically in an objective manner.

4. Real-time analytics to assist agent (Call Center): In today's call center, to maintain a service team with qualified skills is a challenge because of high turn-over rate of agents and dynamic changing of service content and other reasons. Real-time transcription of conversation between agent and customer to detect the topic of the conversation and provide suggestion to assist agent in problem solving, cross-sell and up-sell, etc., will significantly improve call center's business performance by reducing agent handle time and improving customer satisfaction.



description
Figure 1.0 iTranS Architecture


As figure 1.0 illustrated, iTranS leverages IBM's advanced speech transcription technology to convert audio documents into text documents, and it opens a door for enterprise to extract insight from massive amount of audio and video materials. On top of the state of the art speech transcription engine named Attila which is developed by IBM Research and now jointly owned by IBM and Nuance, iTranS provides a scalable speech transcription cloud service with a set of easy-to-use APIs which make it extremely easy for enterprise to integrate speech transcription capability into business process.

Looking forward, iTranS team of IBM Research - China will extend iTranS’s data connection capability to make it accept not only requests in batch mode but also requests in streaming mode. iTranS team will also inject new analytics capabilities like speaker diarization, speak emotion detection, human object identification, etc., all human related traits discovery capabilities into iTranS to support advanced analytics of complex information related to human in many types. We believe, iTranS will be an important building block to support emerging applications which are strategically important for enterprises to understand and touch people in the moving from transaction to engagement.