Speech interaction system – how to increase its usability? Fang Chen

Speech interaction system – how to increase its usability?
Fang Chen
Department of Computing Science,
Chalmers University of Technology, SE-412 96 Göteborg, Sweden
[email protected]
This paper discussed different issues related to the usability of
speech interaction system. It includes the usability concept,
different design approaches, design process and evaluation
questions for speech interaction system. Usability is a very
fuzzy concept, especially when it related to the speech
interaction system: it is hard to measure and it is very much
context dependent. The traditional user-centered design
approach may not be suitable for the speech interaction system
design since the users might not have enough knowledge to
see what the technology can do. Usage-centered design may
be the better method but there is not comprehensive theory
and methodology for the design process and evaluation.
1. Introduction
Speech technology has been significant progress. The
increased automatic speech recognition (ASR) rates (99 to
100% accuracy in laboratory condition) with large vocabulary
capacity and naturalness and robustness speech recognition,
the humanized synthetic voice, robustness dialogue system
design and language understand, together with the increased
computer speed and memory capacity, make the speech based
interaction system possible for the real-time application [1-3].
The results from Leduc’s performance benchmarking survey
[4] clearly indicated that speech recognition is becoming a
more mass-market option.
When it comes to the real-time application, usability of
the speech interaction system becomes important for the
acceptance of the user. There is new challenge for the speech
system interface design. The environment where the system is
going to be used can be dynamic; user’s vocal quality can be
unstable, and speech can be variance. Many research works
related to human factors on speech related interface design
were published in later 80’s and beginning of 90’ [5-7]. At
that time, the ASR technology was very poor; most of studies
were focus on feedback design and error corrections. The
results may be invalid for present system design applications.
The constant interests of employing speech interaction
design come from the common agreed advantage of speech as
“a natural and intuitive communication method” of human
beings. But such advantages are not obviously appearing in
any speech interaction system unless the designers understand
human cognitive behavior, human needs, and usability issues.
2. The usability concept
The usability requirements for the speech interaction system
are facing the new way of measuring the effectiveness,
efficiency and satisfaction, the three elements in the usability
evaluation. Among these three elements, effectiveness and
efficiency are close related to the functionality of the system
and directly affect the satisfaction, therefore is the essential
for the design. The effectiveness can be measured in terms of
the extent to which a goal or a task is achieved. The
efficiency means the amount of effort required to accomplish
a goal. Gibbon, et al [8] has described a large number of
different effectiveness and efficiency measurements.
The effectiveness can be understood as the acceptable
performance that should be achieved by a defined proportion
of the user population, over a specified range of tasks and in a
specified range of environments as the system design for.
The efficiency might be measured in terms of the time
taken to complete one task, or the errors that user makes
during the performance, as well as how much effort the users
have to invest on learning and understanding of how the
system is working and to be able to work on it. Acceptable
performance should be achieved within acceptable human
costs, in terms of fatigue, stress, frustration, and discomfort.
Sometime the usability evaluation by measuring these
three elements separately does not meet each other properly.
For example, two dialogue systems (MIMI and Tap&Talk)
for train timetable information were compared with usability
evaluations [9]. Effectiveness is measured by the number of
dialogues that were completed successfully, while efficiency
is measured by task completion time and user satisfaction is
measured subjectively. The results showed that: MIMI was
slightly better than Tap&Talk by effectiveness measurement
because solving recognition error was easier with the MIMI
interface; Tap&Talk is significantly better than MIMI by
efficiency measurement, because there were spoken prompts
in MIMI interface, but not in Tap&Talk, so the later one
worked faster. The satisfaction is measured by giving users
many questions, even most of the statements in this study
were judged bout equal for both interface, sometimes it was
in favor of the Tap&Talk interface. Even though, the overall
user satisfaction was not significantly higher for the
Tap&Talk interface, but most people are prefer to use this
system anyway.
Hone and Graham [10] made some systematic study on
user satisfaction towards speech input/output systems. There
are six factors in user attitude: perceived system response
accuracy, likeability, cognitive demand, annoyance,
habitability and speed. Under each label of the attitude, one
can design a set of questions to ask.
Satisfaction can be understood in different levels. It is the
human nature that never satisfies with what they have. The
needs to fulfill the functional requirement, to be able to solve
the problems are fundamental. The basic level is the
comfortable and confidence feeling when using the interface.
Learnability and flexibility of the system may affect the
comfortable feeling of the users [11]. As soon as this need is
fulfilled, people will look for the higher level of satisfaction,
such as the pleasant, exciting, fulfillment and happiness.
Figure 1 shows the pyramid of the human needs in different
Self-actualization needs:
Psychological needs:
Belonging and love
Basic needs:
How to do?
Feeling of being part of user community
A sense of mastery and competence
User feel comfortable in learning
Logical structure of the information presentation
Accurate, reliable, and predictable
Figure 1, The hierarchy of user’s needs I. Suggested
by Coe [12]
Usability is indeed a fuzzy concept. It can only be
meaningful within a specific context. One particular system
placed in one context will probably display different usability
characteristics when placed in a second context [13].
Usability is a property of the interaction among a product or a
system, a user and the task, or set of tasks and the
organization, society, environment the system is in use.
3. Usability in design process
To conceptualizing usability in the design process, Don
Norman [14], Ravden and Johnson [15] have pointed out
some design principles:
x Visibility: information presented should be clear, well
organized, unambiguous and easy to understand.
x Feedback: users should be given clear, informative
feedback on where they are in the system, what actions
they have taken, whether these actions have been
successful and what actions should be taken next.
x Consistency and compatibility: The way the system
looks and works should be consistent at all times, and
compatible with user expectations.
x Explicitness: The way the system works and is
structured should be clear to the user so user will easily
know how to use it. It shall show the relationship
between actions and their effects.
x Flexibility and constraints: the structure and the
information presentation should be sufficiently flexible
in terms of what the user can do, to suit different user
needs and allow them to feel in control of the system. At
the same time, the system shall also restrict certain kind
of user interaction that can take place at a given
x Error prevention and correction: The possibility user
error should be minimized, automatically detected and
easy to handle those which do occur.
x User guidance and support: easy-to-read and understand,
relevant guidance and support should be provided to
help the user understand the system.
There are very few studies on usability issues in design
speech interaction system. Dybkjaer and Bernsen [16]
discussed different criteria for spoken language dialogue
systems design. Their results matched well with above
x Learnability: The design shall always clear about the
user’s experience/knowledge to the system and how
quick they can learn about the interaction.
x Visibility: The system’s output language should be
naturalness and to guide users’ input language so that the
input language becomes manageable for the system.
x Explicitness: The system shall express its understanding
of the user’s intention and provide the information to the
user in a clear, unambiguous, correct, accurate and using
the language that the familiar to the user.
x Flexibility: Multimodal interaction is always more
preferable, but be careful to select an appropriate
modality for interaction on the specific task domain.
x Feedback: The user shall be informed on what is going
on in the system. The output voice quality shall has
natural intonation and prosody, with an appropriate
speaking rate.
x Error prevention and correction: Error handling is
always important for speech interaction system, as the
error may come from the system mis-recognize what the
user said, or even users’ mad the error.
x User guidance and support: Interaction guidance is
necessary for the users to feel in control during
interaction. A long and complicated “user manual”
provided to the first-time user is not suitable.
Leduc and Dougherty et al [4] has specially pointed out
the important of consistency in the design. The consistency
has two aspects; consistent with previous usage and internally
consistent. The task handling by speech system shall match
the users’ pre-experiences of handling the task by other
systems. The similar tasks will be fulfilled in a similar
manner using identical terms throughout the speech
It is necessary to specify the usability design principles
into different speech interactive systems. The detail of these
design principles for in-vehicle information system design
shall be in somehow different compare with a spoken
dialogue system placed inside the house, as in a mobile
environment, user’s attention shall be kept on the road. The
designer will not only consider the information system itself,
but also the safety drive and effect from stress.
4. Problems with user-centred approach to the
The user-centered approach to the design (UCD) can
enhance the usability of the products. In this approach, the
potential users are involved in the entire design process from
the early beginning. The design process is more or less driven
by the user. It specializes of user experiences. The typical
design process is shown in figure 2.
In practical, there are very few organizations managing to
implement the UCD process. The problems associated with
UCD come from the possible issues like [13]: User issues;
organizational commitment; developer skills; and resource
constraints. User experience and knowledge, user’s
expectation, user contribution and agreement and user
diversity are the factors make the user involve difficulties.
Users may expect the new system to be simply an improved
version of the old one; users may not be able to step back
from their daily practices to see how technology can change
the way they work, and/or they might not be familiar with the
design methods used or the technology, and may simply feel
over-awed by the design process (which leads to them feeling
unqualified to comment). It is hard to have the user to
contribute to the quality of the design solution. At the same
time, it is not easy to collect all the information about the user
6. Context of evaluation
Usability evaluation
Meets all the requirements
1. Plan the human centrd process
Design philosophy
Identify design team and users
Success criteria
2. Specify the context of use
Understand the characteristics of
User, tasks, organization,
Task analysis
5. Evaluate designs against user
Getting feedback for design
Assess the achievement of user and
Organizational objectives
3. Specify the user and
organizational requirements
Allocation of tasks among users
Functional requirements
Performance criteria
Usability criteria
4. Produce design solutions
Collect knowledges for design
Concrete design solution
Prototypes and user tests
Iterating above process untill satisfy
Figure 2. The interdependence of user centred design
activities (developed from ISO-13407)
A speech interaction system is not simply replacing
keyboard input with speech input, which has been
unfortunately the case as in the history of speech technology
application. There are many studies on comparing the
keyboard input with speech input in different context and the
results show sometime positive and sometimes negative of
using speech as one of the input modality.
The speech interaction system should take the advantages
of naturalness and the intuitive of human speech
communication, so the interaction between human and system
shall be totally different compared with the traditional
human-computer interactive systems as using manual input
and visual output. Even the speech communication is
happened to people’s daily life, it does not mean that users
would know what the speech technology can do and how the
interface shall be looked like, as their past experiences with
the human computer interaction system may not apply to the
speech interaction system.
The UCD process might not be the best choice for speech
interaction system design. Instead of focus on the user,
Rakers [18] suggests to focusing on the roles, goals and
responsibilities people have. This idea leads to the usagecentred approach of the design. The usage-centred approach
focuses on the use of the interaction between humans and the
system, its environment and social-organization. Human
behavior is goal orientated and event that happens in the
living and working environment have its meaning to the user.
The notion of meaning, constraints from technology and
environment and goal are closely related to its specific
context task that the user performs [19]. In this approach, the
potential users are not necessary be involved into the design
process. The possibility of what the technology can do shall
be externally exploded by the technical experts, the usability
of the system shall be the essential for the design, while the
user’s high level fulfillment (as shown in figure 1) shall be
The usage-centred approach to the speech interaction
system design is a new concept and lack of the
comprehensive design theories and methodologies on design
process, problem analysis and evaluation.
5. Problems with user evaluation
In the usage-centered approach, some user tests shall be
carried out in the design process. The purpose to carry out
such test is not for getting user’s opinion of the design, but to
understand better the user and the interaction between the
user and the products, thus to take the most benefit from the
developed technology and to increase the usability.
There is no applicable usability evaluation theory for the
speech interaction system. In general, any evaluation theory
should be able to handle at least the following problems [20]:
x What are the characteristics of overall system
performance? How to measure it? At what level should
they be taken?
x How should we choose the test persons? Should they be
naïve people, or experts?
x How much training is required to arrive at stable
performance where we can be sure that we are
evaluating the properties of the interface and not just the
learning and adaptive behavior of the operators?
x How detail and fidelity the scenarios should be? What
should be their properties? Which properties of the
interface should be used? How can context be specified?
x If general principles are being evaluated, what is the
minimal set of applications which are needed in the
The above questions are more or less related to the
evaluation of the basic requirements to the usability. How to
measure the high level fulfillment of the needs and
satisfaction? Any evaluation is context driving, is it possible
to build a general theory to guide the usability evaluation for
different speech interaction system? External studies need to
be carried out in this area.
6. Problem with design guidelines
Usability design requires integration of multiple disciplines,
such as speech technology, computer technology, the
knowledge in the application domain, human cognition,
social/organizational knowledge. It is impossible for one
designer to have all the knowledge. Therefore many design
“guidelines” are issued in different user interface design
books, or journal papers [21-23]. It was believed that design
“guidelines” should be the helpful and useful for the
engineers to use in their design. Many (if not all) guidelines
vary in the extent to which they are derived from their
specific research findings. Their scope is rarely made explicit
and it remains for the designer to judge the applicability of a
guideline to a particular user interface and to apply it
accordingly. The body of the guidelines is incomplete – many
design issues are just not covered by guidelines [24].
Guidelines from different resources may differ in detail, may
not tell clearly of the application conditions; and may even be
contradictory. People try to abstract the guidelines with
abstracted statement to show its “external validity”, or try to
pretend as having a high “concept level”, while any design is
context depending. How to make sure that the engineer used
the “right” guideline? For example, the guidelines for speech
interface design provided by Baber [23]:
x Match the type of work the recognizer is intended to
perform with the characteristics of the available
There are many problems with such “guidelines”: how to
match the technology and the type of the work? What are the
criteria? How to measure the match?
Life [24] has suggested that if the guideline could be
presented as “IF (condition) THEN (system performance
consequence) BECAUSE (interaction model constraint),
HENCE (guideline, expressed as a system design
prescription)”, then it can be very handy. This is almost an
impossible dream, because nobody can cover all the possible
application conditions that a speech interaction system may
be applied. Many of the application conditions are
unpredictable in present condition, and application context
may change due to the development of the technology and
people’s needs. At the same time, there are almost unlimited
consequences, and constraints one can identify according to
the application context. The matrix of the three entities with
enormous amount of variability in each can come out with
millions of detail guidelines, which may be difficult for the
designers to find out the proper guidelines, while it may still
be danger of not covering the situation which the designer is
working for.
7. Discussion and conclusion
Usability is very much context dependent. The traditional
user-centered design approach may not be suitable for the
speech interaction system design due to the limited
knowledge and understanding of what the technology can do.
Usage-centered design focus on the roles, goals and
responsibilities people has. To fulfill the high level needs is
the final goal of the design, while the usability of the system
is the fundamental requirement. Usage-centered design is a
new concept. There is not comprehensive theory and
methodology for the design process and evaluation.
Design “guidelines” is not a good solution to help the
designer to increase the usability of the system. The
integration of multiple disciplines knowledge is important for
the design and it is hard to find the short cuts.
8. References
Steeneken, H. J. M., "Potentials of speech and language
technology systems for military use: an application and
technology oriented survey," NATO, Defence Research Group
AC/243(Panel 3)TR/21, 1996.
Weinstein, C. J., "Opportunities for advanced speech processing
in military computer-based systems," Proceedings of the IEEE,
vol. 79, 1991, pp. 1626-1641.
Weinstein, C. J., "Military and government applications of
human-machine communication by voice," Proc. Natl. Acad.
Sci. USA, vol. 92, 1995, pp. 10011-10016.
Leduc, N., Dougherty, M., Ankaitis, V., "Measuring the
performance of speech applications: a user-centered approach,"
in Universal Access in HCI: Towards an information society for
all, C. Stephanidis, Ed., Lawrence Erlbaum Associates,
publishers, 2001, pp. 372-376.
Jones, D., Hapeshi, K., Frankish, C., " Human factors and the
problems of evaluation in the design of speech systems
interfaces," People and Computers III: proceedings of the third
conference of the British Computer society; Human-computer
interaction specialist group, 1987, pp. 41-49.
Jones, D. M., "Automatic speech recognition in practice,"
Behav. & lnf Tech, vol. 11, 1992, pp. 109-122.
Baber, C., Hone, K.S., "Modelling error recovery and repair in
automatic speech recognition.," International Journal of ManMachine Studies, vol. 39, 1993, pp. 495-515.
Gibbon, D., "Handbook of multimodal and spoken dialogue
systems: resources, terminology and product evaluation," in The
Kluwer international series in engineering and computer
science: Kluwer Academic, 2000.
Sturm, J., Bakx, I., Cranen, B., Terken, J., "Comparing the
usability of a user driven and a mixed initiative multimodal
dalogue system for train timetable information," EuroSpeech
2003, 2003, pp. 2245.
Hone, K. S., Graham, R., "Subjective assessment of speechsystem interface usability," Eurospeech 2001, 2001.
Stanton, N., Human Factors in Consumer Products, Taylor &
Francis, 1998.
Coe, M., Human Factors for Technical Communicators, John
Wiley & Sons, 1996.
Smith, A., Human-Computer Factors: A study of Users and
Information Systems, The McGraw-Hill Companies, 1997.
Norman, D., The Design of Everyday Things. New York, Basic
Books, 1988.
Ravden, S. J., Johnson, G.I., Evaluating Usability of HumanComputer Interfaces: A practical method. Chichester, Ellis
Horwood, 1989.
Dybkjaer, L., Bernsen, N.O., "Usability evaluation in spoken
language dialogue system," Proceedings of the Workshop on
Evaluation for Language and Dialogue Systems, Association
for Computational Linguistics 39th Annual Meeting and 10th
Conference of the European Chapter (ACL/EACL) 2001, 2001,
pp. 9-18.
Eason, K. D., "User-centred design: for users or by users?,"
Ergonomics, vol. 38, 1995, pp. 1667-1673.
Rakers, G., "Interation design process," in User Interface
Design for Electronic Appliances, B. T. K. Baumann, Ed.
London and New York, Taylor & Francis, 2001, pp. 7-47.
Flach, J. M., Tanabe, F., Monta, K., Vicente, K.J., Rasmussen,
J., "An ecological approach to interface design," Proceedings of
Human Factors and Ergonomics Society 42nd Annual Meeting,
1998, pp. 295-299.
Moray, N., "Advanced displays can be hazardous: the problem
of evaluation," pp. 59-62.
Jones, D., Hapeshi, K., & Frankish, C., "Design guidelines for
speech recognition interfaces," Applied Ergonomics, vol. 20,
1989, pp. 47-52.
Baber, C., "Automatic speech recognition in adverse
environments," Human factors, vol. 38, 1996, pp. 142-155.
Baber, C., Noyes, J., "Speech control," in User Interface Design
for Electronic Appliances, B. T. K. Baumann, Ed. London and
New York, Taylor & Francis, 2001, pp. 190-208.
Life, M. A., Long, J.B., "Providing human factors knowledge to
non-specialists: a structured method for the evaluation of future
speech interfaces," Ergonomics, vol. 37, 1994, pp. 1801-1842.