Speech Technology



July/August 2002

Send To A Friend
Printer-Friendly Version
“Smart” Call Centers

Building Natural Language Intelligence into Voice-Based Applications

By Amy Neustein, Ph.D

Call centers present a ripe opportunity for trying out intelligence-based natural language application software – that is, software that responds to intuitive voice commands, instead of depending on the menu driven, directed dialog approach in which a caller is asked a set of tedious questions one at a time (city of origin, destination city, date of travel, etc.). One obvious benefit of this kind of software is that fully interactive conversational dialog can provide a call center with enough information to assess whether the caller needs to be transferred to a human agent – a very important determination for making proper use of an increasingly scarce and costly resource.

However, to optimally respond to users’ wide range of flexible, intuitive voice commands, a “smart” natural language application is required. In other words, natural language intelligence must be built into voice-based applications to compensate for some of the inherent weaknesses in natural language systems. Professor Roni Rosenfeld (School of Computer Science, Carnegie-Mellon University), speaking at AVIOS this year, had this in mind when he pointed out that although “natural language systems…require no previous knowledge or training on the part of the user…this [same] flexibility obscures the functional limitations of the system and makes it difficult for the user to understand what the application can and cannot do (it also strains the recognition and understanding technology)”(1). Similarly, many system designers have warned us of the danger that users may become too chatty or digressive when using a speech system that permits fully interactive, flexible voice commands.

To design smart NL applications one must recognize that human speech is punctuated by idioms, colloquialisms, incomplete phrases and ellipses. In addition, human speakers often grope for words, trying out substitute words and phrases in place of the word they’re really looking for – and this tendency worsens with age. In human-to-human communication, we manage to understand one another, and to accomplish the task at hand, in spite of these kinds of linguistic deficiencies. But how do we design speech interfaces that truly understand human dialog that is just that – “human”?

The New Frontier of Artificial Intelligence in Voice-Based Applications

Conventional NL systems are designed to locate key words and phrases in the caller’s natural language commands. But what happens when a natural language application servicing a call center cannot locate any recognizable semantic entry in the dialog between the user and the automated agent? Solving this problem requires a new method, called Sequence Package Analysis (SPA). Context-free-grammar (CFG) rules guide a speech recognizer at the lower sentence/utterance level, but SPA operates on a different plane, one especially useful in those instances when callers fail to utter the expected key word or word phrase (2).

SPA works by examining a series of related turns and turn construction units, discretely packaged as a sequence of (conversational) interaction. Rather than parsing the dialog for key words or phrases, an SPA approach examines a whole dialog “package” (consisting of one entire speaking turn or more). Its focus is on the unit of interaction in its entirety rather than a single (or multiple) lexical item.

Using this approach, the natural language system can uncover the “key” word(s) residing outside the preset lexicon. The role of SPA is not to replace existing speech recognizers that use the conventional NL methods, but to add natural language intelligence to voice-based applications so that a natural language interface can better understand users who lapse into a relaxed, intuitive style of speaking.

Here is an example of a help-line desk call from a caller who is distressed over a technical matter. As a result, the caller lapses into a repetitive use of vague descriptors to explain his problem and what needs to be done to solve it. Yet the caller’s use of descriptors, albeit vague, occurs within the framework of well organized conversational sequence patterns, meaning that discourse grammars built on an SPA approach would be able to spot the sequence packages in the dialog by looking for those conversational sequence patterns that underlie the caller’s production of vague descriptions of a problem and his ambiguous, indirect request for help.

Caller: “I really can’t do this myself, I can’t get this to work without someone coming here. I really don’t know what to do with this.”

Notice how this caller, who is seeking a service call, clouds his talk with repeated pronouns and similar imprecise descriptors, never mentioning his specific need for a “service call” or a “technician.” NL interfaces that use standard grammars which are geared toward recognizing key words or phrases cannot make much sense out of such a complaint. By contrast, a speech recognizer using SPA would enable a “smart” call center to recognize the caller’s report of a trouble and his accompanying request for assistance.

Here’s how. First, it would map out the sequential patterns of pronoun usage, identifying in this sequence package a high usage of pronouns in close proximity. The sequence package is divided into three parts: 1) a short, condensed complaint; 2) an expansion of the complaint; and 3) a recycle of the short complaint:

I really can’t do this myself.
I can’t get this to work without someone coming here.
I really don’t know what to do with this.

Second, the NL interface would use a heuristic procedure for identifying what the caller is actually requesting. Specifically, it would decipher what the caller embeds within his neatly ordered set of pronouns by going straight to the second of the three parts of the sequence package. What such a grammar would then uncover in the caller’s natural language dialog is his need to have “someone coming here.” The semantic meaning of that statement would be made transparent to the speech system since SPA would be built on top of conventional speech recognizers, whose probabilistic knowledge of conceptual relationships would dictate that in a call to a help-line desk the caller who refers to “someone coming here” is not offering an invitation for dinner, but rather is asking for a service call.

Intelligent Audio Data Mining in Call Centers

Call centers that, for quality assurance purposes, record thousands of hours of calls between customer service agents and callers, offer excellent data for testing the effectiveness of intelligent audio data mining NL application programs. Certainly the industry has reason to seek out more intelligent data mining: eager to improve customer relationship management, companies already do all they can to gather data about their customers, including the technical and service issues that confront them. At the same time, call centers have adopted their own ways of monitoring their agents’ skill in handling customer calls by looking, for example, at how many times these agents offer customers upgraded products.

Robert Weideman, vice president of global marketing for ScanSoft, summarizes the importance of audio data mining in call centers in this way: “Audio Mining…allows organizations to overcome barriers to productivity.” But while speed and accuracy in audio data mining are improving, thanks to the advanced technology of leading companies such as ScanSoft, audio data mining programs still need the addition of natural language intelligence to better understand what callers are trying to convey to call center agents, and to better understand those verbal interactions that go awry.

For example, in help-line calls, a customer may reject well-intentioned advice offered by agents; a caller may present his complaint unclearly and then become argumentative with the agent for not understanding him; or the caller may feel the agent has abruptly terminated the call, especially in those instances where customer service representatives do not automatically query the caller about additional service items.

Just as SPA adds NL intelligence to speech interfaces, it can add NL intelligence to audio data mining programs. One critical goal of intelligence-based audio data mining is to train call center agents how to better recognize some of the important subtleties in callers’ speech, such as early warning signs of caller frustration. By being able to recognize these warnings, call center agents can respond more quickly to customer’s concerns, averting the serious communication breakdowns that occur when customers feel agents do not understand what they are trying to say.

In these applications, too, SPA is particularly useful when callers fail to utter the expected key word(s) or phrase(s) pertaining to their problem. In fact, SPA is particularly well suited to intelligent audio mining in help-line dialog because such dialog is so highly scripted (human agents more or less read from a well formatted script) that sequence packages may be readily identified. To spot a sequence package – a unit of interaction – in help-line dialog, a grammar can begin simply by locating the predicted utterance components in the agent’s own scripted text, and then proceeding to identify the sequence package components in the caller’s utterances.

Here is an example of a conversation between a caller and a call center agent at a computer software help desk. The caller presents her problem in a roundabout fashion, and soon becomes frustrated with the agent for not understanding what she is requesting of him. (The dialog extract below appeared in the published proceedings of the Symposium on Help-lines, held in Aalborg, Denmark a couple of years ago (3).)

Caller: “I’ve installed Office ninety seven and…I was a bit stupid I went into uninstall and um pulled off a whole stack of items off the uninstall and it was a very silly thing to do so now when I start up my computer I get a screen um which says um a black- a black and white screen which says never delete this item it’s a message screen and every time I start up it comes up.”
……………………………………………………………………………………………………
……………………deleted dialog…………………………………………
……………………………………………………………………………………………………
Caller: “I’m wondering if I reinstall will I wipe out?”

Agent: “Okay well look I could certainly have a technician look at the problem for you; we do charge for support are you aware of that?”

Caller: “I’m just asking a question -- I’m just wondering whether or not I should uninstall Microsoft Word?”

Notice, the core issue for the caller is whether or not she should uninstall Microsoft Word. However, before reducing this concern to a direct inquiry, she gives long-winded explanations in narrative form. The caller then becomes somewhat contentious, showing her frustration with the agent upon his attempts to arrange for a technician to examine the problem. Undoubtedly, the mention of cost might be vexing for any caller using a help-line. However, if the agent had been alerted to troublesome dialog sequences (before they escalate into defensive, argumentative assertions), some of the communication difficulties found here might have been avoided.

An SPA approach would break down the dialog above into four discrete sequence package parts, as follows:

1)A narrative, story-telling format for presenting the problem:

Caller: “I’ve installed Office ninety seven and…I was a bit stupid I went into uninstall and um pulled off a whole stack of items off the uninstall and it was a very silly thing to do so now when I start up my computer I get a screen um which says um a black- a black and white screen which says never delete this item it’s a message screen and every time I start up it comes up.”

2)An elliptical, unfocused format for presenting the problem:

Caller: “I’m wondering if I reinstall will I wipe out [my documents]…”

3)A direct offer of assistance and the associated payment conditions:

“Okay well look I could certainly have a technician look at the problem for you; we do charge for support are you aware of that?”

4) A recycled question, one that is more direct and succinct than its earlier version, marked by predicate phrases that may indicate frustration:

Caller: “I’m just asking a question -- I’m just wondering whether or not I should uninstall Microsoft Word?”

The kinds of data found in this sequence package could be easily overlooked by standard grammars that focus exclusively on finding subject-specific key words and phrases, such as language that would be relevant to the nuts and bolts of software installation. For example, the caller’s elliptical statements – “I’m wondering if I reinstall” (what?) “will I wipe out” (what?) – would not be recognized for what they are by standard natural language application programs. Since SPA is focused on the unit of interaction, rather than on purely lexical entries, the data that depict conversational sequence patterns would, however, be readily detected by a speech recognizer with the capability of spotting sequence packages.

Note that in this example, the elliptical statements and the long-winded narratives are the harbingers of communication troubles that culminate in the caller’s recycling of her question, using predicate phrases that underscore her frustration. Intelligent audio data mining programs that use SPA can train agents to be aware of the sequence package features indicating possible communication problems before they escalate into an argument.

It’s a reasonable forecast that natural language intelligence will play an increasing role in building voice-based applications for the call centers of the present and future. I also predict that Sequence Package Analysis will be part of that story, helping to make natural language application software programs “smart” enough to fulfill their promise to enterprise customers.

1.Stefanie Shriver and Roni Rosenfeld. “Keyword Selection, and the Universal Speech Interface Project.” AVIOS Speech Expo 2002.
2.Amy Neustein. “Using Sequence Package Analysis to Improve Natural Language Understanding.” International Journal of Speech Technology, 4(1) (March 2001).
3.Michael Emmison. “Calling for Help, Charging for Support: Some Features of the Introduction of Payment as a Topic in Calls to a Software Help-Line.” Symposium on Helplines, Aalborg, Denmark (September 8-10, 2000).

Amy Neustein, Ph.D. is the president and founder of Linguistic Technology Systems, a New York-area think tank for voice-enabled applications and services. She can be reached at lingtec@banet.net.



Current Issue | NewsBlast | Industry News | Industry Links | Subscribe | Subscribe to NewsBlast
Subscribe to Magazine | Back Issues | NewsBlast Archives | Conference | Exhibitor | Registration | Press Room
About Us | Ad Info | Contact Us | Editorial Calendar | Submissions Request | Events | Privacy Statement