Give away publication word - The Positive Psychology People

Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Nonnegative Tensor Factorization for EEG Pattern Classification
Seungjin Choi
Department of Computer Science
[email protected]
Learning fruitful representation from data is one of fundamental problems in machine learning and
pattern recognition. Various methods have been developed, including factor analysis, principal component
analysis (PCA), independent component analysis (ICA), manifold learning, and so on. Among those,
nonnegative matrix factorization (NMF) has recently drawn extensive attention, since promising results
were reported in handling nonnegative data such as document, image data, spectrograms of audio. NMF
seeks a decomposition of a nonnegative data matrix into a product of two factor matrices (basis matrix
and encoding matrix) such that all factor matrices are forced to be nonnegative. In this talk, I will begin
with the useful behavior of NMF in EEG pattern classification, which plays a critical role in noninvasive
brain computer interface (BCI).
Next, I will introduce multiway extension of NMF, what is called, “nonnegative tensor factorization” and
will emphasize its useful behavior in EEG pattern classification. A tensor is nothing but ‘multiway array’,
generalizing vector and matrix in order to accommodate higher-order representations. For instance, a
vector is a 1-way tensor, a matrix is a 2-way tensor, and a cube is a 3-way tensor, etc. Through this talk, I
will stress why tensor is useful in learning fruitful representation, compared to existing matrix-based
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Sequence learning: a network model of the entorhinal cortex layer II with
hippocampal-entorhinal loop circuit
Hatsuo Hayashi
Department of Brain Science and Engineering,
Graduate School of Life Science and Systems Engineering,
Kyushu Institute of Technology
Email: [email protected]
Finding of place cells encoding place information in phase of the theta rhythm has revealed that the hippocampus
has ability to represent spaces. The firing phase with respect to the theta rhythm advances within a theta cycle when
the animal traverses a place field [1, 2]. Although the phase precession was found in the hippocampal CA3 and CA1,
it has also been found in the dentate gyrus [2], and then in grid cells in the medial entorhinal cortex [3]. So, a model
of the entorhinal-hippocampal system accounting for the phase precession has been proposed [4]. This model is rather
abstract, and the several layers of the hippocampus and the entorhinal cortex were modeled by simple oscillator units
to describe local field theta rhythms. Another model of the entorhinal cortex layer II network that has the
entorhinal-hippocampal loop circuit has also been proposed [5]. This model consisted of conductance-based
multi-compartmental models of the stellate cell and the inhibitory interneuron, and the entorhinal-hippocampal loop
circuit was modeled by delay lines whose signal transmission delay was 20 ms, based on physiological data. In this
lecture, we will review this model having mechanisms of theta rhythm generation and theta phase coding, and show
how successfully sequence learning is done.
We also tested how the shape of STDP rule affected the performance of the present model amid background noise
[6]. STDP rules observed in the hippocampus and the entorhinal cortex are classified into two types: temporally
symmetric and asymmetric STDP rules. The symmetric STDP rule that has LTD windows on both sides of the central
LTP window prevented irrelevant enhancement of loop connections caused by noise in cooperation with loop
connections having a larger signal transmission delay and the theta rhythm pacing the activity of the stellate cells.
However, the asymmetric STDP rule that has no LTD window in the range of positive-spike timing did not prevent
such irrelevant enhancement, and the memory pattern was blurred by noise.
Important elements of the present model for sequence learning are (1) symmetric STDP rule, (2) loop connections
having a large signal transmission delay, and (3) theta rhythm pacing the activity of stellate cells. These elements also
contribute to robust sequence learning amid background noise. Above all, the LTD window in the range of positive
spike-timing is important to prevent influences of noise with the progress of sequence learning.
[1] O’Keefe J and Recce ML, “Phase relationship between hippocampal place units and the EEG theta rhythm,”
Hippocampus, vol. 3, pp. 317-330, 1993
[2] Skaggs WE, McNaughton BL, Wilson MA, and Barnes CA, “Theta phase precession in hippocampal neuronal
populations and the compression of temporal sequence,” Hippocampus, vol. 6, pp. 149-172, 1996
[3] Hafting T, Fyhn M, Bonnevie T, Moser M-B, and Moser E, “Hippocampus-independent phase precession in
entorhinal grid cells,” Nature, vol. 453, pp. 1248-1252, 2008
[4] Yamaguchi Y, “A theory of hippocampal memory based on theta phase precession,” Biol. Cybern., vol. 89, pp.
1-9, 2003
[5] Igarashi J, Hayashi H, and Tateno K, “Theta phase coding in a network model of the entorhinal cortex layer II
with entorhinal-hippocampal loop connections,” Cogn. Neurodyn., vol. 1, pp. 169-184, 2007
[6] Igarashi J and Hayashi H, “Theta phase coding and suppression of irrelevant plastic change through STDP in the
entorhino-hippocampal system amid background noise,” in “Advances in Cognitive Neurodynamics,” (Eds)
Wang R, Gu F, and Shen E, Springer, in press, 2008
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Keiichi Kaneto, Hirotaka Suematsu, and Kentaro Yamato
Department of Biological Function and Engineering, Kyushu Institute of Technology,
Email: [email protected]
Present robots are heavy and noisy, because
they are driven by motors made with copper and iron. In
order to build light weight and human friendly robots,
artificial muscles or soft actuators are demanded to
replace motors. Artificial muscles are required not only
to drive robots but also to mimic biological motions like
a snake, fishes and insects for medical equipments and
variety of application. The research on soft actuators [17] began 20 years ago based on ionic polymers
(membrane), polymer gels, dielectric elastic polymers
and conducting polymers. Among them conducting
polymers are the most prospective material from the
view point of low operating voltage, large strain and
stress[8]. The conducting polymers are actuated or
deformed by electrochemical oxidation and reduction.
The deformation results from the insertion of bulky ions
as well as the change of polymer conformation due to
the delocalization of π-electron upon oxidation. The
deformation is named as the electrochemomechanical
strain (ECMS).
In various conducting polymers, polypyrrole
(PPy) based ECMS is currently studied, since PPy
shows excellent performance in strain, stress and cycle
life compared with those of polyaniline [9] and
polythiophene [10]. Efforts have been mostly paid to
improve the strain, stress and response time [3,4],
however, characteristics of ECMS under tensile loads
have not been investigated from the view points of
training, fatigue and aging.
Muscles can be strengthen by training through
heavy tasks, though it feels fatigue. One may consider
training of artificial muscle is impossible, however, we
have found that a training of an artificial muscle
fabricated with conducting polymers [11] is feasible.
The training effect was observed as the increased strain
after an experience of heavier tensile stress. This is
resulted from a relaxation of anisotropic deformation,
which was induced along the tensile direction as
realignment of polymeric molecules during a creeping
under higher stresses.
In this paper, ECMS responses of cation driven
PPy actuators [11-14] operated under various tensile
stresses are mentioned. After the application of large
tensile stresses, the recovery of strain due to creeping
was studied to demonstrate the effects of training and
fatigue of artificial muscles. The results are discussed
taking a model of relaxation of anisotropic deformation
induced by high tensile stress into consideration. The
energy conversion efficiency from electrical input
energy to mechanical output work as the function of
tensile stress is also reported.
Polypyrrole films were electrodeposited on a Ti
plate in an aqueous electrolyte solution of 0.15 M pyrrole
and 0.25 M of DBS acid at a constant current of 1
mA/cm2 for 1500s vs. a Ni counter electrode as
described in previous paper [11,14]. The thickness and
conductivity of typical film (named as PPy/DBS) was
approximately 18 μm and 30-50 S/cm, respectively.
Young modulus of film was 140 and 70 MPa at oxidized
and reduced states, respectively.
Cyclic voltammograms (CV) of PPy/DBS film in 1
M NaCl aqueous solution are shown in Fig.1 for various
tensile stresses. The film was cycled for 3 times at given
tensile stress from 0 to 5 MPa by a step of 1MPa
sequentially, and the CVs obtained at 3rd cycle are shown
in Fig.1. The film was expanded by the negative voltage
application (reduction), and contracted by the positive
voltage application (oxidation) as observed in cation
driven films [11,14]. It was found that with increasing
the tensile stress a shoulder at around 0V in the oxidation
cycle increased and shifted to higher potential. Similarly,
the peak at around -400 mV in the reduction cycle also
shifted to higher potential side. It is interesting to note
that after removal of 5 MPa tensile stress, the CV at 0
MPa traced almost the same to that of 5 MPa as shown in
Fig.1 by the dotted line (0 after training).
Fig. 1 CV of PPy/DBS film at various tensile stresses
from 0 to 5 MPa in aqueous NaCl electrolytes.
Curves in Fig.2 show the deformation of the film
during the CV measurement. The deformation is an
increment of the film length with 10 mm upon cycling. It
is clearly observed that the film was elongated by each
electrochemical reduction and creeping, which was
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
enhanced at higher tensile stresses. However, after
removal of 5 MPa, the film contracted upon cycles,
resulting from the relaxation of anisotropic strain
induced during creeping.
4. Summary
The cyclic voltammetry and electrochemical
actuation of PPy/DBS film have been studied as the
function of tensile stresses. It was found that the stroke
of contraction was increased by the training with larger
tensile stress (training effect). The result was explained
by an anisotropic deformation of the film due to uniaxial
stress or by creeping and polymer conformation memory
effect. The conversion efficiency was found to be lower
than 0.07 % of the input electrical energy. It was also
found that the degradation of films was significantly
accelerated under larger tensile stresses. It is interesting
to compare the results of conducting polymer artificial
muscle with natural muscles.
This work was supported by a Grant-in-Aid for
Science Research in a Priority Area “Super-Hierarchical
Structures” from the Ministry of Education, Culture,
Sports, Science and Technology, Japan.
Fig.2 Time dependence of deformation during the
CV measurement shown in Fig.1.
The charge dependence of deformations of the
PPy/DBS film derived from Fig.2 is shown in Fig.3 (a)
and (b) for oxidation (contraction) and reduction
(elongation), respectively, at various tensile stresses up
to 5 MPa. In the oxidation process at 0 or lower tensile
stresses as shown in Fig.3 (a), the deformation was
proportional to the number of charges. At higher stress,
however, the deformation saturated at higher charges,
indicating that the film kept the length even cations
went out from the film. In other words, the film shrunk
anisotropic manner as ellipsoidal stretching along the
tensile stress. The idea is supported by the reversed
process, i.e. reduction as shown in Fig.3 (b) at the
higher tensile loads, namely, the film deformed slowly
at the beginning then steeply elongated. The anisotropic
deformation is a resultant of competition between the
uniaxial stress and the elasticity of polymer chains or
the thermal fluctuation. Consequently, one of the
mechanisms for creeping is due to the anisotropic strain,
besides slipping and breaking of chains. The anisotropic
strain is considered as a memory effect. It is noted that
the number of charges for reduction is larger than that
for oxidation, resulting in some degradation of the film.
[1] T. Mirfakhrai, J.D.W. Madden, and R.H. Baughman,
Materials Today, 10 (2007) p.30.
[2] M. Watanabe and T. Hirai, Appl. Phys. Lett., 74
(1999) p. 2717.
[3] S. Hara, T. Zama, W. Takashima and K. Kaneto,
Polym. J., 36 (2004) p.151.
[4] T. Zama, N. Tanaka, W. Takashima and K. Kaneto,
Polymer Journal, 38 (2006) pp.669.
[5] T. Osada, H. Okuzaki and H. Hori, Nature 355
(1992) p. 242.
[6] S. M. Ha, W. Y. Yuan, Q. Pei, R. Pelrine, and S.
Stanford, Advanced Materials, 18 (2006) 887.
[7] T. Fukushima, K. Asaka, A. Kosaka and T. Aida,
Agrew Chem. Int. 44 (2005) 22.
[8] K. Kaneto, Y. Sonoda and W. Takashima. Jpn. J.
Appl. Phys. 39 (2000) p. 5918.
[9] W. Takashima, M. Nakashima, S. S. Pandey, and K.
Kaneto ,Electrochimica Acta, 49 (2004) pp.4239.
[10] M. Fuchiwaki, W. Takashima and K. Kaneto, Mol.
Crys and Liq. Crys., .374 (2002) p. 513.
[11] H. Suematsu, K. Yamato and K. Kaneto,
Bioinspiration and Biomimetics, submitted (2007).
[12] H. Fujisue, T. Sendai, K. Yamato, W. Takashima
and K. Kaneto, Bioinspiration and Biomimetics, 2 (2007)
[13] K. Yamato and K. Kaneto, Analytica Chimica Acta,
568 (2006) p.133.
[14] K. Kaneto, H. Fujisue, M. Kunifusa and W.
Takashima, J. Smart Mat. and Struct. 16 (2007) p.S250.
Fig.3 Deformations of PPyDBS film as the function
of charges for (a) oxidation and (b) reduction.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Motion Estimation in Sub-pixel Resolution
Using Fast Belief Propagation
Intae Na, Sungchan Park, and Hong Jeong
Department of Electronic and Electrical Engineering, POSTECH, Korea
{bluewing, mrzoo, [email protected]
In motion estimation, iterative global matching methods outperform existing local methods in terms of
robustness and accuracy of results. Unfortunately, global methods usually suffer from large computational
burden, which obstructs high-speed processing.
In this study, we suggest a novel method for motion
estimation in sub-pixel resolution. Hierarchical fast belief propagation was applied to reduce computational
and memory complexity. Furthermore, angular errors
from coarse resolution of discrete motion vector were
effectively reduced by dynamic message update in
two-layered MRF.
Experimental results show that our system has a lowest
angular error performance and utilizes 40 times less
well as number of hidden states.
Instead of assigning nodes to all sub-pixels, we used the
model that consists of two different 2D-MRFs which correspond to motion vectors with coarse and fine resolution.
Two layers use different energy model and message update
scheme. In coarse MRF, whole process is same as the
standard belief propagation. The energy function of coarse
MRF is,
E (U ) = ∑ V (u p , u q ) + ∑ D p (u p )
( p , q )∈N E
where N E denotes edges in MRF, and U denotes the set
of motion vectors. The first term and the second term are
called as smoothness cost and data cost function, respectively. Assuming image pair satisfy the intensity constancy
condition, data cost of node p is given as,
D p (u p ) = min(Cd | I t +1 (x + u p ) − I t (x) |, K v )
which is truncated version of absolute difference. Data cost
indicates the degree of dissimilarity between pixels of two
images. Smoothness cost function in coarse MRF model is,
V (u p , u q ) = min(Cv | u p − u q |, K v )
1. Introduction
Motion estimation has been, and continuous to be, one of
the major tasks in computer vision. Recently, global methods like graph cut[1] or belief propagation[2] has
outperformed traditional local methods in early vision
field. Global methods usually adopt global energy minimization strategy with probabilistic model such as
Markov random field. They consider not only the local
evidence from two images, but also the spatial smoothness between neighboring pixels.
Although global methods give far more robust results
under discontinuities and occlusions, there are some
drawbacks. Since global methods require huge number
of computation in general, they are time-consuming in
most cases. Moreover, due to the computational complexity, only limited number of discrete hidden states is
available, which may cause large angular errors of motion vector.
We present a novel sub-pixel resolution motion estimation method based on two-layered MRF and dynamic
message calculation, which relieves not only severe
angular errors from coarse resolution of motion vector,
but also large memory complexity of belief propagation.
which is truncated linear function. The cost has lower value
if u p , u q is similar. That is, node p is more likely to
have similar hidden state with that of neighboring
As we constructed our coarse MRF model so far, we
need to find the set of optimal motion vectors that minimizes energy function. As well known, loopy belief
propagation gives approximate MAP solution in MRF
with message propagation. Here, we used max-product
algorithm with logarithm, which is formulated as,
mtpq (uq ) =
min(V (u p , u q ) + D p (u p ) +
r∈Nb ( p )\ q
(mrpt −1 (u p ) − α )) (4)
When the algorithm is converged, we estimate coarse
motion vector uˆ q using following formula.
uˆ q = arg min( Dq (u q ) +
p∈Nb ( q )
m Lpq (u q ))
The basic energy model for fine MRF is similar to
coarse MRF, while its cost functions are slightly different. Cost function of fine MRF incorporates with estimated state in coarse MRF. Data cost and smoothness
cost functions of hidden state δ u p are defined using
uˆ q as,
2. Optimization in two-layered MRF model
Let the size of hidden motion vector states is S × S . To
estimate sub-pixel resolution result using standard belief
propagation, we need to use more numbers of hidden states,
which results in increasing number of nodes in MRF as
D p (δ u p ) = min(Csd | I t +1 (x + uˆ p + δ u p ) − I t (x) |, K sd ),
V (u p , u q ) = min(Csv | (δ u p − δ u q ) + (uˆ p − uˆ q ) | , K sv ),
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
ratio is calculated using,
Data cost is similar to coarse MRF, while smoothness
cost function has truncated quadratic function which is
more sensitive to hidden state variation between two
neighboring node. Final motion vector in sub-pixel resolution is calculated as,
u p = uˆ p + δ u p .
Rm =
K −1
k =0
K −1
B ( Lk + 1)( M / 2k )
k =0
Because of the powerful memory reduction of FBP sequence, parallel processing using distributed memory architecture and individual parallel processors becomes
feasible. As a result, FBP can effectively reduce memory
complexity as well as computational complexity.
Optimization process in fine MRF follows iterative
message propagation scheme as standard belief propagation, while message calculation is changed. Message
calculation for fine MRF depends on the resulting uˆ q of
coarse MRF. Whole process includes 3 steps as following
· First step
msum (δ u p , uˆ p ) = D p (δ u p , uˆ p ) + ∑ (mrpl −1 (δ u p , uˆ p ) − α )
4. Experimental results
Middlebury motion data sets were used in our verification.
For quantitative analysis, average angular error(AAE) and
standard deviation of angular error is calculated using,
Ψ p = cos −1 (
r∈Nb ( p )\ q
· Second step
msum (δ u p , uˆ q ) = msum (δ u p + (uˆ p − uˆ q ), uˆ p )
B ( N / 2k )( M / 2k )
· Third step
mlpq (δ u p , uˆ p ) = min V (δ u p + uˆ p , δ u q + uˆ q ) + msum (δ u p , uˆ q )
u% p
,σ Ψ =
u% true
u% true
(Ψ p − mΨ ) 2
M ×N
M ×N
Figure 2 shows sample image (Yosemite) that is used for
verification. Overall results are shown in Table 1.
mΨ =
u% p
δ up
At first step, messages from neighbors and data cost
are accumulated. As neighboring nodes in fine MRF
may have different coarse motion vector, we shift the
value with amount of difference between coarse motion
vectors in second step. Finally, message for fine node is
calculated using smoothness cost and msum . Remaining
optimization procedure is exactly same as in coarse
Figure 2. Sample frame from Yosemite sequence
3. FBP for sub-pixel resolution
Table 1. Overall results(AAE/STD) for set of test
image data.
5. Conclusions
In this paper, we present a novel approach for motion estimation in sub-pixel resolution. The problem of large
number of nodes is relieved by constructing two-layered
MRF model. Moreover, applying multi-scale FBP sequence, our method could dramatically reduce complexity
in terms of memory as well as computation.
Figure 1. FBP sequence in multi-scale MRF
So far, we have discussed our two-layered MRF model
and optimization scheme using belief propagation. Although our method can successfully reduces number of
nodes for sub-pixel resolution application, computational
and memory burden are still large.
To cope with complexity problem, we adopted an memory efficient sequence for iterative optimization of belief
propagation so-called fast belief propagation(FBP)[3]. FBP
represents iterative message passing sequence for whole
image as layered 3D graph whose layer implies the number
of iteration. FBP changes the direction of iteration in the
3D graph by adopting slanted processing group. Changed
architecture is shown in figure 1. When FBP sequence is
combined with multi-scale optimization, required amount
of memory is reduced 40 times. Overall memory reduction
[1] V. Komogorov and R. Zabih. Computing visual correspondence with occlusions using graph cuts. In ICCV, no.2,
pp.508-515, 2001.
[2] P.F. Felzenszwalb and D.R. Huttenlocher. Efficient
belief propagation for early vision, In CVPR, no.1,
pp.1261-1268, 2004.
[3] S.C. Park and H. Jeong. Stereo matching using FBP : A
memory efficient parallel architecture. In IPCV, no.2,
pp.550-556, 2008.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
A Neural Network Based Retrainable
Framework for Robust Object Recognition with
Application to Mobile Robotics
Su-Yong An, Jeong-Gwan Kang, Won-Seok Choi, Se-Young Oh
Department of Electronic and Electrical Engineering, POSTECH, Korea
{grasshop, naroo1, onlydol, [email protected]
In this paper, we address object recognition for a
mobile robot which is deployed in a multistory building.
To move to another floor, a mobile robot should
recognize various objects related to an elevator, e.g.,
elevator control, call buttons, and LED displays. To this
end, we propose a neural network based retrainable
framework for object recognition, which consists of four
components- preprocessing, binary classification, object
identification, and outlier rejection. The binary classifier,
a key component of our system, is a neural network that
can be retrained, the motivation of which is to adapt to
varying environments, especially with illuminations.
Without incurring any extra process to prepare new
training samples for retraining, they are freely obtained
as a result of the outlier rejection component, being
extracted on-line. To realize a practical system, we
adopt a parallel architecture integrating both recognition
and retraining processes for seamless object recognition,
and furthermore detect and cope with the deterioration
of a retrained neural network to ensure high reliability.
We demonstrate the positive effect of retraining on the
object recognition performance by conducting
experiments over hundreds of images obtained in
daytime and nighttime.
preceding three steps. In brief, the main contribution of
this paper is to provide simple but robust recognition
system for mobile robots which adapt to varying
environments through online retrainable neural network.
2. System Overview
Fig. 1 Structure of the CPEs and two image queues.
The CPEs consist of four components and two
external image storages, defined as the Object Image
Queue (ObjIQ) and the Outlier Image Queue (OutIQ)
(Fig. 1). Each component has its unique function.
1. Introduction
To navigate in a multistory environment, the robot
should be able to exploit an elevator [1]. From a mobile
robot’s point of view, elevator access is a very
challenging task which has not yet been completely
resolved. At the outset, a preliminary question which
must be considered is an ability to recognize the status
of an elevator, such as moving up or down.
We proposed a sequential framework, named as the
Cascaded Processing Elements (CPEs). The CPEs
include: preprocessing, binary classification, object
identification, and outlier rejection. The first step of
preprocessing extracts object candidates from an input
image regardless of varying illumination conditions
using an adaptive thresholding and a size filtering
technique. The binary classifier, as a key component of
the proposed method, is implemented with a neural
network whose weights can adapt to changing
accomplished by template matching, giving an
identified index to each candidate. In the last step, graph
partitioning in conjunction with a priori knowledge is
utilized to reject outliers which were not removed by the
3. Preprocessing and Neural network
In the preprocessing component, an adaptive
thresholding technique is applied to the whole region of
the input image to extract object candidates under
varying illumination conditions. Then, the several
extracted object candidates go through size filtering,
using its geometric constraints. (Fig. 2)
Fig. 2 Preprocessing
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
In this study, the neural network, as the second
component of the CPEs, classifies each candidate into
an object or a non-object when an input image patch is
fed into it.
4. Outlier Rejection by Graph Portioning and A
Priori Knowledge
Fig. 4 Performance improvement by the retrained neural
network (a) Recognition result at frame 42. (b)
Recognition result at frame 127.
Fig. 3 (a) Control panel model. (b) Graph representation
of object candidates
Usually a graph G is expressed by a set of vertices V
and edges E connecting them. An image space can be
transformed into a graph space; object candidates and
their relationships in the image space are mapped to
vertices and edges, respectively, in the graph space by
the control panel model shown in Fig. 3.
In most cases, an off-line training of the neural
network is not enough to cope with all the possible
variations of the environment including illumination
conditions; hence it requires an on-line retraining
behavior that can be performed in parallel to the
recognition process. We now begin with a discussion of
the four fundamental questions related to the on-line
retraining neural network:
 Generation and reorganization of a new training set
in real time for on-line retraining; this procedure has to
be embodied in the recognition process so as to reduce
extra processing time to generate new training data. It
should also give a solution to the problem of forgetting
some of the previously learned knowledge in the neural
 How to determine when to retrain; the decision is
based on whether the environment changed or not.
 Evaluation of the retrained neural network; it is
necessary after every retraining to validate the retrained
neural network, avoiding getting stuck in local minima.
 Simultaneous retraining and recognition process; the
retraining process has to be done in parallel to
recognition process.
Fig. 5 Performance comparison with varying OutIQ size
The network performance was determined by the
total number of outliers Noutlier occurring in each image
set or the average number of outliers per image which is
also called the outlier occurrence rate Ro (Fig. 5).
7. Conclusions
A retrainable framework approach to object recognition
in a visually dynamic environment has been presented.
First, in order to extract object candidates, basic image
processing techniques including adaptive thresholding,
connected component labeling, and template matching
were applied to an input image. An on-line retraining
neural network as a primary component of the proposed
method, serving as a binary classifier, has then been
employed as a practical tool for coping with varying
illumination conditions. The performance enhancement
of the retrained neural network over a non-retrained one
was verified by experiments over hundreds of images,
being represented by a quantitative difference in the
number of outliers that occurred.
This work was supported by the grant No. RT104-02-06
fromthe Regional Technology Innovation Program of
the Ministry of Commerce, Industry and Energy
(MOCIE), of Korea.
6. Experimental Results
Fig. 4 directly illustrates the effect of retraining. The
red and green squares correspond to candidates which
go through the neural network and template matching,
but red squares indicate the outliers that are removed by
the outlier rejection component in the CPEs and the
green ones are the recognized objects.
[1] R. Simmons et al., “Grace: An autonomous robot for the AAAI
robot Challenge,” AAAI Magazine, vol. 42, No. 2, pp 51–72, 2003.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Brain-inspired Emergence of Behaviors by Reinforcement Learning
Masumi Ishikawa1, Mikio Morita1, Takao Hagiwara2
Department of Brain Science and Engineering, Kyushu Institute of Technology
Email:[email protected], 2Cuebs
We aim at developing truly autonomous mobile
robots capable of behaving appropriately without being
told what to do. This is what mammals do in their daily
lives. We believe a developmental approach is effective
to realize this. The desire for existence, specific
curiosity, diversive curiosity, boredom, and novelty are
important factors contributing to learning and development. We propose mathematical models for internal
rewards in reinforcement learning.
Schmidhuber proposed two curiosity principles in
response to prediction errors and predictor improvements [1]. Oudeyer et al. proposed architecture based
on the second curiosity principle [2]. Stout et al. used
both external and internal rewards, but the latter may be
interpreted as a subgoal provided by a designer to
accelerate learning [3]. If internal rewards are task
dependent and are provided by a designer, the resulting
mobile robots can no longer claim to be autonomous.
In this paper we propose the followings, which we
believe form a novel framework for useful and effective
intelligent systems.
Firstly, we propose to use multiple sources of
rewards to endow mobile robots with ability to behave
properly in the real world. This is because we think
external rewards and curiosity rewards alone are not
enough for mobile robots to behave properly in the
complex and context-dependent real world.
Secondly, we propose task-independent internal
rewards, because task-dependent rewards provided by a
designer are not applicable to other tasks as they are
and do not contribute much to genuine autonomy.
Thirdly, we propose to attain engineering merit of
internal rewards, in addition to scientific interest.
These proposals share one characteristic, i.e., use of
reinforcement learning and internal rewards. The reason
for this is that reward based learning with trial and error
is, we believe, essential in developing intelligent
behaviors. On the other hand, hand-crafted programs
tend to impose a burden on humans, supervised
learning has difficulty in providing teacher signals, and
self-organizing learning alone is not powerful enough.
A pursuit-evasion game comprising a predator and
its prey on a robotic field is selected as a testbed.
Simulation experiments as well as real experiments
using mobile robots, WITHs, are carried out to
demonstrate the utility and benefit of internal rewards
in reinforcement learning.
To evaluate the utility and benefit of internal
rewards, we use a predator and its prey on a robotic
field as a testbed. A task, here, is that the predator
learns to capture the prey for its survival, and the prey
also learns to escape from the predator for its survival.
The desire for existence is represented by an intrinsic
effort to keep the level of a battery of an agent higher
than a threshold. They also learn a map of the
environment to make pursuit or evasion more efficient.
The following is a prospective task scenario of agents.
When the level of its battery is sufficiently high, an
agent efficiently learns the environment motivated by
curiosity, boredom and novelty [4]. Under this condition, curiosity, either specific or diversive, motivates
the agent to learn the environment through a desire to
increase accumulated future rewards. When some part
of the environment is sufficiently learned, the agent
gets bored and goes away by a negative reward caused
by boredom, which provides a stronger motivation than
diversive curiosity. When a novel object is observed,
novelty motivates the agent to approach and closely
observe it, which in turn motivates specific curiosity.
On the other hand, when the level of its battery
becomes lower, the predator starts to pursue its prey for
its survival by taking advantage of the knowledge
acquired through curiosity and novelty.
Due to space limitation, only the desire for
existence is presented here.
To keep the level of a battery of an agent higher
than a threshold, the predator must learn when to start
pursuit of the prey. This is because if the predator
captures the prey too early, i.e., when the battery level
is significantly higher than the threshold, the external
reward becomes less, hence the accumulated future
rewards becomes less. Therefore, this pursuit timing is
an important target of learning.
As is well known, Q-learning has a difficulty of
large computational cost. We propose to decrease the
number of states by various approximations. The state
here is defined by the battery level (20-level) and the
distance to the prey (20-level), without absolute
coordinates. When there are obstacles, however, they
are not enough to define the state, and the distance to an
obstacle in the forward direction (10-level) is added to
the definition of state.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
The speed of the agent (10-level) constitutes action
alternatives. The ego-centric orientation of motion (5level) is added to the definition of the action in the
presence of obstacles.
The speed of the predator is given by,
speed = (specified relative speed) * (maximum speed)
* (motion efficiency)
where motion efficiency decreases as the battery level
falls off. The prey moves slowly when the distance is
large enough, but escapes at the maximum speed when
the distance is less than a threshold, which is also to be
When the predator captures the prey at a high
battery level, a negative reward is given to suppress too
early capture. Owing to these rewards, the predator is
able to learn appropriate timing for effective pursuit.
Fig. 1 illustrates that the remaining battery level at
capture is about 36%, which is much lower than about
75% in case without learning of pursuit timing. Fig. 2
depicts that negative rewards are given to adjust the
pursuit timing, especially in the beginning stage of
learning. Fig. 3 illustrates that the predator captures the
prey with low battery level owing to learning of the
pursuit timing.
remaining battery at capture
No. of episodes
Fig. 1 The level of remaining battery
at the time of capture
rewards at capture
No. of episodes
Fig. 2 The amount of reward at the time of capture
- 10 -
Distance to the prey
Battery level
Fig. 3 The resulting Q-values with training of battery
We carried out simulation experiments on the
predator and the prey task to demonstrate the utility and
benefit of introducing the desire for existence, specific
curiosity, diversive curiosity, boredom, and novelty as
internal rewards into reinforcement learning. Skill
acquired through learning by simulation was implemented on real mobile robots, WITHs.
Simulation experiments as well as real experiments
using mobile robots on a robotic field well demonstrated the effectiveness of introducing internal rewards in addition to external rewards in reinforcement
What was done, however, is still a small step
towards truly autonomous mobile robots. Much
remains to be done for future study.
This research was supported by a COE program
(#J19) granted to KIT by MEXT. We also acknowledge
valuable discussions with Profs. Masahiro Nagamatsu,
Takeshi Yamakawa, Kazuo Ishii, Hideki Nakagawa,
and Hong Zhang at KIT.
[1] J. Schmidhuber, “Self-motivated development
through rewards for predictor errors/ improvements”,
Developmental Robotics AAAI Spring Symp., 2005.
[2] P-Y. Oudeyer and F. Kaplan and V. V. Hafner,
“Intrinsic Motivation Systems for Autonomous
Mental Development,” IEEE Trans. EC, Vol.11,
No.2, pp.265-286, 2007.
[3] A. Stout, G. D. Konidaris, A. G. Barto, “Intrinsically motivated reinforcement learning: A promising
framework for developmental robot learning,”
Developmental Robotics AAAI Spring Symp., 2005.
[4] M. Ishikawa, T. Hagiwara, N. Yamamoto, F.
Kiriake, “Brain-inspired emergence of behaviors in
mobile robots by reinforcement learning with
internal rewards,” HIS2008 (to appear).
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Hyohyeong Kang, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{paanguin, [email protected]
To reduce number of support vectors of Ranking SVM,
we derived a modified Ranking SVM algorithm and
compared with Ranking SVM. One approach for
reducing number of support vectors is to formulate
SVM with a non-convex loss function. We applied that
approach to reformulate Ranking SVM’s objective
function, and derived an iterative training algorithm
using concave convex procedure (CCCP). Number of
support vectors of the proposed algorithm was reduced
while accuracy of ranking was retained. The algorithm
improved the scalability of Ranking SVM at prediction
function, the concave convex procedure (CCCP) [8] is
applied, and an iterative optimization procedure derive is
In this paper, we tried that approach to reduce the
number of support vectors in Ranking SVM. We derived
the CCCP for the ramp loss Ranking SVM and compared it
with the Ranking SVM on a synthetic data.
Let us briefly summarizes the Ranking SVM
formulation [2]. We are given a set of queries {qk } and
documents {d i } . For each query, a partial ranking of
corresponding documents can be specified as ordered pairs
( qk , d i ) f ( qk , d j ) which represents d i is more relevant
than d j for the query qk . For simplicity, let us denotes a
The ranking problem is to learn the ranking over all
items using training data. Recently, this problem has been
intensively considered in machine learning to develop web
search engines and recommendation systems. The problem
can be solved by finding the ranking function that returns
higher scores for items in higher ranks and lower scores for
items in lower ranks. Many algorithms such as Ranking
SVM [1], [2], RankNet [3] and RankBoost [4] have been
proposed to learn the ranking function from training data.
Ranking SVM [1], [2] is one algorithm that solves the
ranking problem. To formulate the ranking problem,
Ranking SVM applies a support vector machine (SVM) [5],
which is a well-known machine learning algorithm for
classification and regression. In classification problem,
SVM finds the separating hyperplane which maximizes the
margin. The margin of a separating hyperplane is the
distance between the nearest points which are separated as
distinct classes by the hyperplane. Ranking SVM showed
that the idea of maximizing the margin also can be applied
for the ranking problem [1].
Although the Ranking SVM trains an accurate ranking
function, it has some problems with scalability in the
prediction stage. In SVM, the number of support vectors,
which are a subset of the training data that construct the
decision boundary, increases linearly with number of
training samples [6]. One approach in classification that
reduces the number of support vectors is to replace the
convex loss function with a non-convex function [7].
When the hinge loss function in the SVM objective
function is replaced with a ramp loss function, number of
support vectors is reduced and grows more slowly than in
the standard SVMs. To optimize the non-convex objective
vector corresponds to each query-document pair ( q, d )
as x and the set of all ordered pairs as R . The goal of
the ranking problem is to find the ranking function that
generalizes ranking to test samples from the given training
samples. The Ranking SVM finds a linear function
f ( x ) = w Φ ( x ) such that
∀( xi f x j ) ∈ R,
f ( xi ) > f ( x j )
where Φ( x) is a feature mapping function and w is a
weight vector learned by the algorithm. The margin for
Ranking SVM is given as the minimum difference
between ordered pair f ( xi ) − f ( x j ) where ( xi f x j ) ∈ R .
We can find w that maximizes the margin by solving the
constrained optimization problem
min || w || +C ∑ ξ ij
w ,ξ
subject to w (Φ ( xi ) − Φ ( x j )) ≥ 1 − ξ ij
ξ ij ≥ 0,
∀( xi f x j ) ∈ R
where C is a constant and ξij is a slack variable.
By using the hinge loss function
H s (t ) = max(0, s − t ),
we can express the optimization problem (1) as
|| w || +C ∑ H 1 ( w (Φ ( xi ) − Φ ( x j ))).
The ramp loss function
- 11 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Rs (t ) = min(1 − s, max(0,1 − t ))
is a non-convex function which is used to reduce the
number of support vectors in [7]. By replacing the hinge
loss function in (2) with the ramp loss function, we obtain
|| w || +C Rs ( w (Φ ( xi ) − Φ ( x j ))) (3)
for a predefined parameter s < 1 . Because the ramp loss
function can be expressed as
Rs (t ) = H 1 (t ) − H s (t ) ,
Number of Support Vectors
so that objective function (3) can be expressed as
J ( w) =
Ranking SVM
Ramp loss Ranking SVM (s = -1)
Ramp loss Ranking SVM (s = -0.5)
Ramp loss Ranking SVM (s = -0.3)
|| w ||2 +C ∑ H1 ( wT (Φ ( xi ) − Φ ( x j )))
1200 1400
Number of Training Samples
Fig. 1 Number support vectors vs. number of training
J convex ( w )
−C ∑ H s ( wT (Φ ( xi ) − Φ ( x j ))).
J concave ( w )
Since the objective function is decomposed as sum of a
concave function and a convex function, we can apply the
concave convex procedure (CCCP) [8] to derive an
iterative optimization algorithm [7]. Let us denotes
Fig. 2 Averaged performance of Ranking SVM and
ramp loss Ranking SVM.
⎧C if wT (Φ ( xi ) − Φ ( x j )) < s
β ij =
∂f w (Φ ( xi ), Φ ( x j )) ⎩ 0
∂J concave ( w)
We implemented the ramp loss SVM using CCCP and
compared its complexity and accuracy with on two ranking
datasets. Compared to the Ranking SVM, our proposed
algorithm reduced the number of support vectors while
retaining the three performance measures approximately
same. The algorithm improved the scalability of Ranking
SVM at prediction stage.
where f w (Φ ( xi ), Φ ( x j )) = w (Φ ( xi ) − Φ ( x j )) . Then at
each iteration of CCCP for the ramp loss ranking SVM,
we solve the convex optimization problem
w = arg min[ || w || + C ∑ H 1 ( w (Φ ( xi ) − Φ ( x j )))
+ ∑ β ij w (Φ ( xi ) − Φ ( x j ))],
where βij is determined by w in the previous iteration.
This problem can be reformulated as similar dual
problem in the standard SVM, so that we can solve it
using the same method that solves the standard SVM
[1] R. Herbrich, T. Graepel, and K. Obermayer, Large margin rank
boundaries for ordinal regression. MIT Press, Cambridge, MA, 2000.
[2] T. Joachims, “Optimizing search engines using clickthrough data,”
in KDD ’02: Proceedings of the eighth ACM SIGKDD international
conference on Knowledge discovery and data mining. New York, NY,
USA: ACM, 2002, pp. 133–142.
[3] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N.
Hamilton, and G. Hullender, “Learning to rank using gradient
descent,” Bonn, Germany, 2005, pp. 89–96.
[4] Y. Freund and R. E. Schapire, “A decision-theoretic generalization
of on-line learning and an application to boosting,” in European
Conference on Computational Learning Theory, 1995, pp. 23–37.
[5] V. N. Vapnik, The nature of statistical learning theory. New York,
NY, USA: Springer-Verlag New York, Inc., 1995.
[6] I. Steinwart, “Sparseness of support vector machines—some
asymptotically sharp bounds,” in Advances in Neural Information
Processing Systems 16, S. Thrun, L. Saul, and B. Sch¨olkopf, Eds.
Cambridge, MA: MIT Press, 2004.
[7] R. Collobert, F. Sinz, J. Weston, and L. Bottou, “Trading
convexity for scalability,” in ICML ’06: Proceedings of the 23rd
international conference on Machine learning. New York, NY, USA:
ACM, 2006, pp. 201–208.
[8] A. L. Yuille and A. Rangarajan, “The concave-convex procedure
(cccp),” in NIPS, 2001, pp. 1033–1040.
[9] K. J¨arvelin and J. Kek¨al¨ainen, “Ir evaluation methods for
retrieving highly relevant documents,” in SIGIR ’00: Proceedings of
the 23rd annual international ACM SIGIR conference on Research
and development in information retrieval. New York, NY, USA:
ACM Press, 2000, pp. 41–48.
We randomly generated 10,000 10-dimensional vectors.
Each dimension of the vectors was generated from a
uniform distribution defined between [0, 1]. We assigned
relevance score using a linear function f ( x ) = w x + e
where w is also randomly generated from the uniform
distribution and e is Gaussian noise generated from the
standard normal distribution.
We compared the number of support vectors of Ranking
SVM and Ramp loss ranking SVM with three distinct s
values; -1, -0.5 and -0.3. We compared the quality of
ranking by using three ranking measures; [email protected]
([email protected]), mean average precision (MAP) and normalized
discount cumulative gain ([email protected]) [9]. In Ramp loss
ranking SVM, the number of support vectors reduced (Fig.
1), while retaining quality of the ranking (Fig. 2).
- 12 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Nonnegative Tensor Factorization for EEG-based BCI
Hyekyoung Lee, Yong-Deok Kim, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{leehk, karma13, [email protected]
In this paper we present a method for continuous EEG
classification, where we employ nonnegative tensor
factorization (NTF) to determine discriminative spectral
features and use the Viterbi algorithm to continuously
classify multiple mental tasks. This is an extension of
our previous work on the use of nonnegative matrix
factorization (NMF) for EEG classification. Numerical
experiments with two data sets in BCI competition,
confirm the useful behavior of the method for
continuous EEG classification.
(higher-order SVD) are exemplary methods which are
extensions of factor analysis and SVD. PARAFAC model
was exploited in EEG data analysis. Nonnegative tensor
factorization (NTF) incorporates nonnegativity constraints
into PARAFAC model, extending NMF in the framework
of tensor algebra. Various information divergences were
employed in NTF [1]. In this paper, we revisit PARAFAC
model and present a NTF algorithm in a compact form, in
the framework of tensor algebra. Then we cast the
time-frequency representation of multichannel EEG data
into a N-way tensor and apply NTF to determine
discriminative spectral features. With these features, we
use the Viterbi algorithm for continuous EEG classification
with no trial structure.
Brain computer interface (BCI) is a system that is designed
to translate a subject's intention or mind into a control
signal for a device such as a computer, a wheelchair, or a
neuroprosthesis. BCI provides a new communication
channel between human brain and computer and adds a
new dimension to human computer interface (HCI). The
most popular sensory signal used for BCI is
electroencephalogram (EEG) which is the multivariate
time series data where electrical potentials induced by
brain activities are recorded in a scalp. Exemplary spectral
characteristics of EEG, in motor imagery tasks which are
considered in this paper, are mu rhythm (8-12 Hz) and beta
rhythm (18-25 Hz) which decrease during movement or in
preparation for movement(event-related desynchronization,
ERD) and increase after movement and in relaxation
(event-related synchronization, ERS). However those
phenomena could happen in different frequency bands,
depending on subjects. For instance, they might occur in
16-20 Hz, not in 8-12 Hz. Therefore, methods for
determining meaningful discriminative features become
more important. Nonnegative matrix factorization (NMF)
is interesting linear data model, which is more appropriate
for handling nonnegative data [3]. NMF allows only
non-subtractive combinations of nonnegative basis vectors,
providing a parts-based representation. The time-frequency
representation of EEG data computed by short-time
Fourier transform or wavelet transform, can be cast into a
nonnegative data matrix. Recently, NMF was shown to be
useful in determining discriminative basis vectors which
well reflect meaningful spectral characteristics without the
cross-validation in motor imagery EEG task.
Multiway analysis extends aforementioned linear
methods, working with multiway data array which is
referred to as tensor. PARAFAC and multiway SVD
For our empirical study, we used two data sets: one is the
dataset III in BCI competition II, which was provided by
the Laboratory of Brain-Computer Interfaces (BCI-Lab),
Graz University of Technology, and the other is the dataset
V in BCI competition III, which was provided by the
IDIAP Research Institute. The Graz dataset involves
left/right imagery hand movements and consists of 140
labeled trials for training and 140 unlabelled trials for test.
It contains EEG acquired from two channels: C3 and C4.
The IDIAP dataset contains EEG data recorded from 3
normal subjects with eight channels C3, Cz, C4, CP1, CP2,
P3, Pz, and P4, and involves three tasks, including the
imagination of repetitive self-paced left/right hand
movements and the generation of words beginning with the
same random letter. EEG data is not splitted in trials, since
the subjects are continuously performing any of the mental
tasks (i.e., no trial structure).
Preprocessing We construct various types of a data
tensor from spectral EEG data obtained by short-time
Fourier transform or wavelet transform in Table 1.
Figure 1. 3-way tensor (PARAFAC model) PARAFAC
finds the meaning component matrices for each mode.
-way data tensor
4 (time)
4 (class)
Table 1. N-way data tensors used for NTF
- 13 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Figure 2. An Illustration for a four-way NTF of EEG data (frequency x time x channels x trials). In the left figure, the structure
of the spectral EEG data tensor is illustrated. To display four-way tensor structure, we place three-way tensors in a column (of trials).
Upper 70 trials are for imagination of left-hand movement, and lower 70 trials are for imagination of right-hand movement. In the
right figure, the results of four-way NTF with four components are shown. The core tensor G is a super-diagonal tensor. The
components of a factor matrix are displayed in a column. Each factor matrix represents the components of the frequency, time, space
(C3 and C4 channels) and trial domain, respectively. Component 1 is generated by the imagination of left-hand movement, and
Component 2 is by right-hand one. Component 3 shows a subject cannot concentrate on the imagination for the specific hand after a
few seconds passed in a trial. The theta rhythm (4-8 Hz), which is related with quiet concentration, is shown in the Component 4.
Feature Extraction We extract discriminative spectral
features by applying NTF to preprocessed data. Depending
on a way of constructing a data tensor, classification results
are slightly different. One exemplary result of NTF shows
in Figure 2.
Table 2. Classification Accuracy of IDIAP data
spectral characteristics) without cross-validation. NTF,
more general framework with multiway structure, can find
the hidden structures for new dimension such as time or
class. Continuous EEG classification don't need the trial
structure. The Viterbi algorithm can use the classifier to
infer the possible hidden labels from observed data
sequence. Our experiments show that it could be applied to
uncued EEG classification successfully.
Figure 3. Experimental results of Graz data (left) and
IDIAP data (right) in time domain.
Classification For the single-trial online classification for
Graz data (with trial structure), we use a Gaussian
probabilistic model-based classifier where Gaussian
class-conditional probabilities for a single point in time
are integrated temporally by taking the expectation of the
class probabilities with respect to the discriminative power
at each point in time [2]. For the on-line classification for
IDIAP data which consist of uncued EEG signals, we use
the Viterbi algorithm that is a dynamic programming
algorithm for finding a most probable sequence of hidden
states that explains a sequence of observations.
Classification results shows in Figure 3 and Table 2.
This work was supported by KOSEF Basic Research
Program (grant R01-2006-000-11142-0) and National
Core Research Center for Systems Bio-Dynamics.
[1] Hyekyoung Lee, Yong-Deok Kim, Andrzej Cichocki, and
Seungjin Choi, Nonnegative tensor factorization for continuous
EEG classification, International Journal of Neural Systems, vol.
17, no. 4, pp. 305-317, August 2007.
[2] S. Lemm, C. Schafer, and G. Curio, BCI competition
2003-data set III: Probabilistic modeling of sensorimotor mu
rhythms for classification of imaginary hand movements, IEEE
Trans. Biomedical Engineering, 51(6), 2004.
[3] H. Lee, A. Cichocki, and S. Choi, Nonnegative matrix
factorization for motor imagery EEG classification, Proc. Int'l
Conf. Artificial Neural Networks, 2006.
We have presented an NTF-based method of feature
extraction and the Viterbi algorithm for continuous EEG
classification. The NMF-based method, linear data model
with nonnegativity constraint, could find discriminative
and representative basis vectors (which reflected proper
- 14 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Orthogonal Nonnegative Matrix Factorization:
Multiplicative Updates on Stiefel Manifolds
Jiho Yoo, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{zentasis, [email protected]
Nonnegative matrix factorization (NMF) is a popular
method for multivariate analysis of nonnegative data,
the goal of which is decompose a data matrix into a
product of two factor matrices with all entries in factor
matrices restricted to be nonnegative. NMF was shown
to be useful in a task of clustering (especially document
clustering). In this paper we present an algorithm for
orthogonal nonnegative matrix factorization, where an
orthogonality constraint is imposed on the nonnegative
decomposition of a term-document matrix. We develop
multiplicative updates directly from true gradient on
Stiefel manifold, whereas existing algorithms consider
additive orthogonality constraints. Experiments on
several different document data sets show our
orthogonal NMF algorithms perform better in a task of
clustering, compared to the standard NMF and an
existing orthogonal NMF.
Nonnegative matrix factorization (NMF) is a
multivariate analysis method which is proven to be
useful in learning a faithful representation of
nonnegative data such as images, spectrograms, and
documents [1]. Incorporating extra constraints such as
locality and orthogonality was shown to improve the
decomposition, identifying better local features or
providing more sparse representation [2]. Orthogonality
constraints were imposed on NMF [3], where nice
clustering interpretation was studied in the framework
of NMF.
A prominent application of NMF is in document
clustering [4, 5], where a decomposition of a
term-document matrix was considered. In this paper we
consider orthogonal NMF and its application to
document clustering, where an orthogonality constraint
is imposed on the nonnegative decomposition of a
term-document matrix. We develop new multiplicative
updates for orthogonal NMF, which are directly derived
from true gradient on Stiefel manifold, while existing
algorithms consider additive orthogonality constraints.
Experiments on several different document data sets
show our orthogonal NMF algorithms perform better in
a task of clustering, compared to the standard NMF and
an existing orthogonal NMF.
NMF seeks a decomposition of X ∈ m× N that is of the
X ≈ UV T ,
where U ∈
and V ∈ N × K are restricted to be
nonnegative matrices as well and K corresponds to
the number of clusters when NMF is used for clustering.
Applying NMF to a term-document matrix for
document clustering, each column of X is treated as a
data point in m -dimensional space. In such a case, the
factorization (1) is interpreted as follows.
- U ij corresponds to the degree to which term ti
m× K
belongs to cluster c j . In other words column j of U ,
denoted by u j , is associated with a prototype vector
(center) for cluster c j .
- Vij corresponds to the degree document di is
normalization, Vij
j .
is proportional to a posterior
probability of cluster c j given document di .
2.1 Multiplicative updates for NMF
With the least squares error function, NMF involves the
following optimization:
arg minU ≥ 0,V ≥ 0 ε = ‖X − UV T‖2 .
Gradient descent learning (which is additive update) can
be applied to determine a solution to (2), however,
nonnegativity for U and V is not preserved without
further operations at iterations.
On the other hand, a multiplicative method developed
in [6] provides a simple algorithm for (2). We give a
slightly different approach from [6] to derive the same
multiplicative algorithm. Suppose that the gradient of an
error function has a decomposition that is of the form
∇ε = [∇ε ]+ − [∇ε ]− ,
where [∇ε ]+ > 0 and [∇ε ]− > 0 . Then multiplicative
update for parameters Θ has the form
⎛ [∇ε ]− ⎞
+ ⎟
⎝ [∇ε ] ⎠
represents Hadamard product (element-wise
product) and (·).η denotes the elementwise power and
η is a learning rate (0 < η ≤ 1) . It can be easily seen
that the multiplicative update
nonnegativity of the parameter
when the convergence is achieved.
With the gradient calculations of
η = 1 yields the well-known
multiplicative updates [6]
- 15 -
(4) preserves the
Θ , while ∇ε = 0
(2), the rule (4) with
Lee and Seung’s
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
performance of different algorithms. Because the
algorithms gave different results depending on the initial
conditions, we calculated the mean of 100 runs for
different initial conditions. Our orthogonal NMF algorithm
gave better performance than the standard NMF and DTPP
for the most of the datasets (Table 1).
The orthogonality of the matrix V is also measured by
using ‖V T V − I‖. The changes of the orthogonality over
the iterations are measured and averaged for 100 trials.
V ←V
where the division is also an elementwise operation.
Orthogonal NMF with V T V = I is formulated as
following optimization problem:
arg minU ,V ε = ‖X − UV T‖2
subject to V V = I , U ≥ 0, v ≥ 0
In general, the constrained optimization problem (2) is
solved by introducing a Lagrangian with a penalty term
tr {Λ (V T V − I )} where Λ is a symmetric matrix
U ←U
Table 1. Mean clustering accuracies (n=100) of
standard NMF, Ding's orthogonal NMF (DTPP), and
our orthogonal NMF (ONMF) for six document
containing Lagrangian multipliers. Ding et al. [3] took this
approach with some approximation, developing
multiplicative updates.
Here we present a different approach, incorporating the
gradient in a constraint surface on which V T V = I is
satisfied, into (4). With U fixed in (2), we treat (2) as a
function of V . Minimizing (2) where V is constrained to
the set of n × K matrices such that V T V = I was well
studied in [7]. Here we incorporate nonnegativity constraints
on V to develop multiplicative updates with preserving the
orthogonality constraint V T V = I . The constraint surface
which is the set of n × K orthonormal matrices such that
V T V = I is known as the Stiefel manifold [8].
% ε can be
The gradient on the Stiefel manifold ∇
computed as
% ε = ∇ ε − V (∇ ε )T V ,
where ∇V is the gradient of V computed on the
Euclidean space. Thus, with partial derivatives in (?), the
gradient in the Stiefel manifold is calculated as
% ε = (− X T U + VU T U ) − V (− X T U + VU T U )T V
= VU T XV − X T U
% ε ]+ − [∇
% ε ]−
= [∇
We have developed multiplicative updates on Stiefel
manifold for orthogonal NMF. We have successfully
applied the algorithm to a task of document clustering,
empirically indicating performance gains over standard
NMF and existing orthogonal NMF.
※This paper is a shortened version of recently accepted
paper [9]
[1] Lee, D.D., Seung, H.S.: Learning the parts of objects by
non-negative matrix factorization. Nature 401 (1999) 788–791
[2] Li, S.Z., Hou, X.W., Zhang, H.J., Cheng, Q.S.: Learning
spatially localized parts-based representation. In: Proceedings
of the IEEE International Conference on Computer Vision and
Pattern Recognition (CVPR), Kauai, Hawaii (2001) 207–212
[3] Ding, C., Li, T., Peng, W., Park, H.: Orthogonal
nonnegative matrix tri-factorizations for clustering. In:
Proceedings of the ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining (KDD),
Philadelphia,PA (2006)
[4] Xu, W., Liu, X., Gong, Y.: Document clustering based on
non-negative matrix factorization. In: Proceedings of the
ACM SIGIR International Conference on Research and
Development in Information Retrieval (SIGIR), Toronto,
Canada (2003)
[5] Shahnaz, F., Berry, M., Pauca, P., Plemmons, R.:
Document clustering using nonnegative matrix factorization.
Information Processing and Management 42 (2006) 373–386
[6] Lee, D.D., Seung, H.S.: Algorithms for non-negative
matrix factorization. In: Advances in Neural Information
Processing Systems (NIPS). Volume 13., MIT Press (2001)
[7] Edelman, A., Arias, T., Smith, S.T.: The geometry of
algorithms with orthogonality constraints. SIAM J. Matrix
Anal. Appl. 20(2) (1998) 303–353
[8] Stiefel, E.: Richtungsfelder und fernparallelismus in
n-dimensionalem mannig faltigkeiten. Commentarii Math.
Helvetici 8 (1935-1936) 305–353
[9] Jiho Yoo and Seungjin Choi, "Orthogonal nonnegative
matrix factorization: Multiplicative updates on Stiefel
manifolds," in Proceedings of the 9th International Conference
on Intelligent Data Engineering and Automated Learning
( IDEAL-2008 ), Daejon, Korea, November 2-5, 2008.
Invoking the relation (4) with replacing ∇V by ∇
V ←V
which is our ONMF algorithm. The updating rule for
U is the same as (5).
We tested our orthogonal NMF algorithm on the six
standard document datasets (CSTR, k1a, k1b, re0, and re1)
and compared the performance with the standard NMF and
the Ding’s orthogonal NMF (DTPP)[3]. We applied the
stemming and stop-word removal for each dataset, and
select 1,000 terms based on the mutual information with
the class labels. Normalized-cut weighting [4] is applied to
the input data matrix.
We use the accuracy to compare the clustering
- 16 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Facial Expression Recognition using Feature Tracking
Jongju Shin, Daijin Kim
Department of Computer Science, POSTECH, Korea
{jjshin, [email protected]
In this paper, we propose facial expression recognition
using feature tracking. Facial features are tracked using
Inverse Compositional Image Alignment (ICIA)
algorithm. This algorithm is faster than the conventional
tracking algorithm like Lucas and Kanade [3]. To
initialize the facial features, we use Active Appearance
Models(AAMs). AAMs generate the model instance
which is similar with the input face image and find the
facial features automatically. Support Vector Machine
(SVM) is used to classify the facial expression with the
shape parameters and the appearance parameters of the
tracking result. The classification result shows more
than 94%.
Fig 1. The face with 18 landmark points.
basis. In figure (1), 18 tracking regions are represented as
the square. Each square has 11-by-11 pixels.
Facial expression recognition is the important area to
Human Computer Interaction (HRI). The methods of the
facial expression recognition are classified as two
approaches’ : holistic approach and analytic approach. In
the holistic approach, the face is represented as a whole
unit. Usually, the intensities of the whole face is used to
recognize the facial expression. In the analytic approach,
the face is modeled as a set of facial points. Active
Appearance Models (AAMs) [1], Active Shape Models
(ASMs) [2] are used to model the face. AAM uses the
whole face’s intensities with 70 points to model the face.
To recognize the facial expression, the whole face and the
70 feature points are not necessary. We need to know
where the eye, eyebrow, and the mouth are because they
are related to the facial expression. We use the feature
tracking algorithm – Inverse Compositional Image
Alignment (ICIA) - for certain ROI(Region Of Interest) for
facial expression recognition. Before tracking the features
of the face, AAM is used to detect features of the face at
the first frame. The features of the face are then tracked in
each frame.
3. Inverse Compositional Image Alignment
To track the features of the face, we use the image
alignment algorithm. Lucas and Kanade (LK)
algorithm[3] is the most famous algorithm in the image
alignment. But LK algorithm needs to compute the
jacobian, the steepest descent images, and the hessian in
each iteration. To reduce the computation cost of the image
alignment, Inverse Compositional Image Alignment
(ICIA) algorithm[4] is proposed by Matthews and Baker.
ICIA algorithm changes the role of the model and the input
image. The jacobian, the steepest descent images, and the
hessian are computed once before iteration. ICIA algorithm
is then faster than the LK algorithm, and shows similar
tracking result.
The minimization term of the ICIA algorithm is
∑ [T (W (Χ; ΔΡ)) − I (W (Χ; Ρ))] ,
2. 2D Face Model
The face is represented by 18 points like figure (1). 18
points are enough to recognize the facial expression. The
face shape model is represented by the combination of the
mean shape and the shape basis :
where T is the model, and I is the input image. Taylor
expansion with the first order term on equation (2) is
∑Χ [T (W (Χ;0)) + ∇T ∂Ρ ΔΡ − I (W (Χ; Ρ))]2 . (3)
Then, the least square solution on ΔΡ is
ΔΡ = H −1 ∑ ⎢∇T
ΔΡ ⎥[ I (W ( Χ; Ρ)) − T ( Χ)]2 , (4)
Χ ⎣
Where H is the Hessian matrix :
s = s0 + ∑ pi si ,
⎤ ⎡
H = ∑ ⎢∇T
ΔΡ ⎥ ⎢∇T
ΔΡ ⎥ .
⎦ ⎣
Χ ⎣
ΔΡ is updated by (6) iteratively :
where pi is the shape parameters and si is the shape
- 17 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
W ( Χ; Ρ) = W ( Χ; Ρ) o W ( Χ; ΔΡ) −1
6. Facial Expression Recognition using SVM
To recognize the facial expression, we use the shape
simultaneously. The shape parameters obtained in the
procedure of tracking algorithm. The appearance
parameters are computed by projection on the
appearance basis :
λ = Abasis
( A − A0 ) ,
4. Tracking
Instead of the model T , we use n − 1 frame as the
model to track. We suppose we know the parameters Ρ
on the n − 1 frame. The minimization term is then
n −1
(W ( Χ; ΔΡ)) − I n (W ( Χ; Ρ))]2 ,
A is the appearance of the input image, A0 is the
mean appearance, and Abasis is the appearance basis. Two
parameters are concatenated as [Ρ λ ] and treated as the
where I n−1 is the n − 1 frame as the model, and I n is
n frame as the current input image. The error term from
equation (7) is
[ I n (W ( Χ; Ρ)) − I n−1 ( Χ)] .
input parameters of the RBF kernel function :
To be a robust tracker, we add the first frame :
K ( xi , x j ) = e
[(1 − ε )( I n (W ( Χ; Ρ)) − I n−1 ( Χ)) + ε ( I n (W ( Χ; Ρ)) − I1 ( Χ ))] . (9)
( xi − x j ) 2
2σ 2
We classified 4 facial expression: neutral, angry, happy,
This error term reduces the drift problem in the tracking.
5. Initialization using AAM
In the tracking problem, we supposed we know the
parameters Ρ on the n − 1 frame. This makes the
feature detection algorithm is needed in the first frame. We
make Active Appearance Models on the face with 64
landmarks like figure (2).
7. Experiments
We tested 200 images each facial expression. The result
of the classification is on table (1).
Table (1). The result of the facial expression
Neutral and happy are well classified. But classification
result of angry is lower than others. Angry is sometimes
misclassified as neutral because the shape parameters of
angry are not much different from those of neutral.
5. Conclusions
We implemented Inverse Compositional Image
Alignment (ICIA) algorithm to track facial features and
Support Vector Machine to classify the facial features
by the shape parameters and the appearance parameters..
Because facial feature tracking result is better than
AAM, the facial expression recognition result using
facial feature tracking is better than that using AAM.
Figure (2). The mean shape of AAM with 64
In figure (2), the mean shape doesn’t contain the cheek
which is not needed in the facial expression recognition.
AAM algorithm detects the facial feature points
automatically. By AAM, we can track feature points.
But AAM fails sometimes when the input face is far
from the mean shape statistically. Figure (3) shows the
case of the fail of AAM. Due to this problem, AAM is
used at the first frame. The features from AAM are then
tracked by ICIA algorithm.
[1] T.F. Cootes, G.J. Edwards and C.J. Taylor. (1998).
Active appearance models. In Proceedings of the European
Conference on Computer Vision, Vol. 2, pp. 484-498,
Springer, 1998.
[2] T.F. Cootes, D. Cooper, C.J. Taylor and J. Graham,
Active Shape Models - Their Training and Application.
Computer Vision and Image Understanding. Vol. 61, No.
1, Jan. 1995, pp. 38-59.
[3] B.D. Lucas and T. Kanade, An iterative image
registration technique with an application to stereo
vision, in Proceedings of the 7th International Joint
Conference on Artificial Intelligence(IJCAI ’81), April
1981, p.674-679.
[4] S. Baker, R. Gross, and I. Matthews, “Lucas-kanade
20 years on: A unifying framework: Part 3,” Robotics
Institute, Carnegie Mellon University, Pittsburgh, PA,
Tech. Rep. CMU-RI-TR-03-35, November 2003.
Figure (3). The example of the fail of AAM. AAM fails
to detect the features of mouth.
- 18 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
A Generative/Discriminative Hybrid Model for Discriminative
Motif Discovery
Jong Kyoung Kim, and Seungjin Choi
Department of Computer Science, POSTECH, Korea
{blkimjk, [email protected]
We present a probabilistic model to identify the
binding sites of transcription factors in a discriminative
setting. We compare generative and discriminative
approaches for learning and provide a hybrid
generative/discriminative model to take advantages of
both methods. In particular, our hybrid model can make
use of unlabeled DNA sequences for semi-supervised
learning. We demonstrate the performance of our hybrid
model using both synthetic and real biological data for
discovering DNA motifs. Our results show that the
optimum performance is obtained between the purely
generative and the purely discriminative, and the
semi-supervised learning improves the performance
when the labeled sequences are limited.
additional experimental data. However, it is difficult to
provide enough labeled sequences with high confidence,
and so a semi-supervised learning approach which uses
unlabeled sequences should be incorporated.
In this paper we propose a probabilistic model for
discriminative motif discovery. Our model has two
essential features that distinguish our approach from other
existing methods. First, we develop a hybrid model
combining generative and discriminative approaches to
take advantages of both methods. Second, we can make
use of unlabeled sequences to expand the training set
consisting of labeled sequences. We consider a motif
model Θ as a position frequency matrix which specifies a
probability distribution over the four nucleotides for each
position within a motif site.
The transcription of protein-coding genes is tightly
regulated by DNA binding transcription factors (TF). The
TFs bind to specific DNA regulatory sequences which are
often referred to as transcription factor binding sites
(TFBS). The precise identification of these regulatory
sequences is a very first step toward a full understanding of
the complex gene regulatory networks consisting of
thousands of TFs and their target genes. The main
bottleneck of searching and predicting novel TFBSs is the
fact that the binding sites of the same TF are short
(typically less than 12-14 base pairs in length) and
degenerate (allow multiple mismatches) [1]. This property
makes the prediction problem highly complicated because
it leads to high false positive and negative rates in a
genome-scale application. In addition, randomly occurring
sequence patterns or repeating elements in eukaryotic
genomes make the pattern detection problems further
harder. To reduce the intrinsic error, a variety of integrative
approaches using additional biological information have
been proposed. The recent methods for discriminative
motif discovery, as a specific example of the integrative
approaches, attempt to find binding sites of an unknown
TF relatively over-represented in a positive set of
sequences compared to a negative set. The concept of
relative over-representation, contrast to the classic
non-discriminative motif discovery in which only one set
of sequences is considered to find over-represented binding
sites, can help distinguish genuine TFBSs from repeating
elements or uninteresting TFBSs because the negative set
provides an informative criterion to separate them. The
positive and negative sets of sequences for discriminative
motif discovery can be easily obtained by combining
Recently, several hybrid models of generative and
discriminative have been proposed. The hybrid models can
be divided into two groups according to the technique of
combining the generative and discriminative likelihood.
First approach considers the two likelihoods separately,
and then combines the log transformed likelihoods linearly
[2,3]. Another approach defines two probabilistic models
for generative and discriminative approaches with two
separate parameters specifying the two models, and then
combines them by defining a joint prior distribution over
the two parameters [4]. We adopt the second approach
because it is based on the true likelihood of the
well-defined model. We first consider a hybrid model with
two separate motif models Θ G and Θ D for generative and
discriminative components respectively. Now, the two
motif models will be treated as random variables in order
to impose a prior distribution. We consider the following
form of prior distributions
Here, the variance s(α ) of the conditional distribution is
defined by
- 19 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
framework. We observed improved performance on
synthetic and real biological data, by combining
where α ∈ [0,1]. For α → 0, our model corresponds to
the generative model. Conversely for α → 1, we obtain the
discriminative model. The joint distribution for this hybrid
model is given by
The marginal likelihood of the input sequences can be
interpreted as parameter regularization. We estimate the
parameters of the hybrid model by maximizing the log
likelihood using the gradient-based optimization algorithm.
In this section, we show experimental results that
demonstrate the ability of discovering motifs on
synthetic and real biological data. In the first
experiments with the synthetic data, we address the
effect of α on identifying motif sites, and the merit of
semi-supervised learning. Figure 1 (a) depicts the effect
of α on the performance coefficient, where we run 10
times with differing random initialization and reported
the average values with ±1 standard deviations. From
Fig. 1 Effect of α and semi-supervised learning
Table 1. Results on yeast ChIP-chip data. A:
AlignACE, C: CONVERGE, D: MDScan, K: Kellis,
M: MEME, N: MEME_c. α =0.8.
the results, we see that α strongly influence the
performance for motif discovery, and the best
performance is obtained between generative and
discriminative extremes. Figure 1 (b) shows that
semi-supervised learning greatly improves the
performance when the number of labeled sequences is
very small. In the second experiment with yeast
ChIP-chip data [5], we compare our method to 6
popular motif discovery algorithms to see the merits of
discriminative and semi-supervised learning for motif
discovery. We define the positive set as the one of probe
sequences found to bind a TF (p-value ≤0.001). The
negative set consists of randomly selected probe
sequences found not to bind a TF (p-value >0.5), where
the negative set contains twice as many sequences as the
positive set. Unlabeled sequences are defined to be
(0.001<p-value ≤0.002). We selected 76 TFs which have
known binding specificity and non-empty unlabeled
sequences. Table 1 shows the number of successfully
identified TF motifs out of 76 TFs. The ability of our
method to discover motifs is comparable to other motif
discovery algorithm. Using unlabeled sequences, the
performance of our method is increased by successfully
identifying motifs with small training data.
generative and discriminative approaches and
augmenting the training data with unlabeled sequences.
In future works, we will incorporate the concept of
evolutionary conservation of motifs by introducing a
joint prior distribution on the motif models of closely
related species (transfer learning) and by using strongly
labeled sites obtained from the sequence conservation.
[1] G. Pavesi, G. Mauri, and G. Pesole. In silico representation
and discovery of transcription factor binding sites. Brief
Bioinform, 5:217-236, 2004.
[2] O. Yakhnenko, A. Silvescu, and V. Honavar.
Discriminatively trained markov model for sequence
classication. In ICDM, 2005.
[3] A McCallum, C. Pal, G. Druck, and X. Wang.
Multi-conditional learning: Generative/discriminative training
for clustering and classication. In AAAI, 2006.
[4] J. A. Lasserre, C. M. Bishop, and T. P. Minka. Principled
hybrids of generative and discriminative models. In CVPR,
[5] C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K.
D. Macisaac, T. W. Danford, N. M. Hannett, J. B. Tagne, D. B.
Reynolds, J. Yoo, E. G. Jennings, J. Zeitlinger, D. K.
Pokholok, M. Kellis, P. A. Rolfe, K. T. Takusagawa, E. S.
Lander, D. K. Gifford, E. Fraenkel, and R. A. Young.
Transcriptional regulatory code of a eukaryotic genome.
Nature, 431:99.104, 2004.
We have presented a probabilistic model for
discriminative motif discovery which can make use of
unlabeled sequences and a prior knowledge of known
motif sites. We also provided a hybrid
generative/discriminative model that takes advantages
of both learning approaches. Our model was compared
with six different probabilistic models for motif
discovery in terms of a unifying probabilistic
- 20 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Robust lip feature tracking for lip-reading
Jin Lee, Daijin Kim
Department of Computer Science, POSTECH, Korea
{leejin, [email protected]
The speech recognition tools show more than 90%
recognition rate under limited silent condition. But the
recognition rate is fallen down rapidly with the raise of
the nose. So, many researchers studied a visual-audio
speech recognition system which is more stable under
the noisy environment. We studied about the robust
visual feature tracking methods as a previous study for
the visual speech recognition system. In this paper, we
suggest a robust lip feature tracking method that uses
active appearance model for the initializing the lip shape
model and uses Lucas-Kanade method(LK) based feature
tracking algorithm for the lip feature tracking. In the
experiment we can see the stable and robust fitting
performance after more than 2000 frames since the
initializing of the first frame.
points on the lip outer line and 20 points on the inner lip
line.(see Fig. 1)[1]
To make shape model, we first pointed on the training
images by hand and get the training data points X i , If the
X i = ( xi1 , yi1 , xi 2 , yi 2 ,..., xi 44 , yi 44 ) then two similar
shapes X 1 and X 2 can be aligned by minimizing,
E = ( x1 − M ( s, θ )[ x2 ] − t )T W ( x1 − M ( s, θ )[ x2 ] − t )
Where the pose transform for scale, s, rotation, θ ,and
translation in x and y (t x , t y ) is,
⎡ x jk ⎤ ⎛ ( s cos θ ) x jk − ( s sin θ ) y jk ⎞
M ( s, θ ) ⎢ ⎥ = ⎜
jk ⎠
⎣⎢ jk ⎦⎥ ⎝
t = (t x1 , t y1 , t x 2 , t y 2 ,..., t xN , t yN )
Audio based speech recognition system is very weak
under noisy environment. So, many researchers studied
about visual-audio speech recognition system for the
robust speech recognition under noisy environments. Most
of the study about lip-reading was focused on lip feature
tracking and analysis with the recognition tools such as
Neural Network(NN), Support Vector Machine(SVM) or
hidden Markov models(HMM). Then lip tracking is not
simple problem. The lip is the most various part of the face
organization. In general case, the head is always moving to
the arbitrary directions then the lightening condition also
can be changed. Further more, when a human speaks, the
lip is very rapidly moving. So, the lip tracking algorithm
must be vary fast and robust under various lighting
condition and movements of the head and the lip.
In this paper, we suggest a adaptive and robust lip feature
tracking method that uses active appearance model for the
initial lip shape model fitting and uses Lucas-Kanade
method(LK) based feature tracking algorithm.
And W is a diagonal weight matrix for each point with
weights that are inversely proportional to the variance of
each point.
Any valid shape can be approximated by adding a reduced
subset, t, of these modes to the mean shape,
xs = xs + Ps bs
Where Ps is the matrix of the first t eigenvectors,
Ps = ( p1 , p2 ,..., pt ) and bs is a vector of t weights,
bs = (b1 , b2 ,..., bt )T . bs can be calculated from an
example set of points,
bs = PsT ( xs − xs ) .
We can get shape model like (see Fig. 2).
figure 2. Mean shape and top 3 modes of shape
Each shape mode can be obtained from the Principle
Component Analysis (PCA) of the shape data.
Appearance mode of the AAM model can be calculated
from the PCA of the training data. (see Fig. 3)
figure 1. (a) AAM lip model (b)Appearance parts
We made the 44 points lip shape model which consists 24
figure 3.
- 21 -
Mean and top 3 modes of the appearance
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
4. Experiment Results
AAM lip model shows a little good fitting performance in
one image(see Fig. 5), but AAM is very sensitive to the
initial model position, and, when we tracking the lip of the
sequential image frames, the error is accumulated and
fitting points are diverged. (see Fig. 6)
For more detailed explain about AAM, refer to Iain
Matthews’ paper[see 1].
For the fast and robust lip feature tracking, we use another
lip model that consists with 24 lip outer shape points. We
used not inner lip shape points. From the experiments, we
know that the variance of the appearance around inner lip
lines is so many changed that stable lip model fitting is
figure 5. AAM lip fitting progress
figure 6. AAM lip fitting diverges
figure 4. Lip shape model with 24 points, each point
consist with 10 by 10 pixel window.
Lucas-Kanade method(LK) is a two-frame differential
method for optical flow estimation. In our algorithm, each
window consisted with 10x10 pixels is tracked with LK
algorithm(see Fig. 4).
LK method calculates the motion between two image
frames which are taken at times t and t + δt at every
pixel position. This based on local Taylor series
approximations for the image signal and use partial
derivatives with respect to the spatial and temporal
coordinates. As a voxel at location (x,y,z,t) with
intensity I(x,y,z,t) will have moved δx, δy, δz and δt
between the two frames, so, image constraint equation
can be given:
I(x, y, z, t) = I(x + δx, y + δy, z + δz, t + δt) (6)
Assuming the movement to be small enough, it can be
presented Taylor series:
Instead of that, LK feature tracking method is very
stable tracking results. From the experiment, we could
find that the feature points are tracked exactly along the
more than 2000 frames.(Fig. 7)
(a) 1st frame
figure 7. 1st and 2000th frame LK feature tracking
I(x + δx, y + δy, z + δz, t + δt)
= I(x, y, z, t) +
δx + δy + δz + δt + H .O.T .
where H.O.T means higher order terms, which are small
enough to be ignored. From these equation follows that:
∂I δx ∂I δy ∂I δz ∂I δt
∂x δt ∂y δt ∂z δt ∂t δt
[1] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R.
Harvey, Extraction of Visual Features for Lipreading, in
Proceedings of the Pattern Analysis and Machine Intelligence,
IEEE Transactions on, 2002 , pp.198-213
Thus ∇I • V = − I t
5. Conclusions
The LK based feature tracking method shows most stable
tracking performance. In the more than 2000 frames since
the initializing of the first frame, it did not miss the feature
points. But it needs another initializing algorithm. So, we
can solve that problem with Active Appearance Model
(AAM). Then AAM lip model did not show so good fitting
performance. But, AAM face model shows more stable
performance compare to the lip model.
This work was supported by grant No. RTI04-02-06
from the Regional Technology Innovation Program of
the Ministry of Commerce, Industry and Energy
Which results in
Vx + V y + Vz + = 0
(b)2000th frame
[2] S. B. Gokturk, J. Bouguet, C. Tomasi and B. Girod,
Model-Based Face Tracking for View-Independent Facial
Expression Recognition, in Proceedings of the IEEE
International Conference on Automatic Face and Gesture
Recognition, 2002, pp. 287-293
[3] I. Matthews, T. Ishikawa and S. Baker, The template
update problem, in Proceedings of the IEEE Transactions
on Pattern Analysis and Machine Intelligence, 2004, Vol.
24, Issue. 6, pp. 810-815
Assuming that the flow V is constant in small window
of size m x m x m with m>1 , we can calculate the V
easily with the pseudo inverse of ∇I . And we using
the each center points of the each window must follow
the shape model variation condition. So we can track the
each point robustly.
For more detailed explain about LK algorithm, refer to
Iain Matthews’ paper[see 3].
- 22 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Car/Human classification using intuitive feature
Jiman Kim and Daijin Kim
Department of Computer Science, POSTECH, Korea
{jmk, [email protected]
Object classification is necessary to outdoor surveillance
system. Specially, car/human classification is most
important issue. Many methods are proposed for
car/human classification. But most of methods cannot
satisfy the accuracy and processing speed. In this paper we
propose the efficient feature which is based on intuitive
property for object classification. The feature is robust to
external environmental change and the rotation of objects.
The experimental result shows that the proposed feature is
very simple and has high classification performance.
and classify the object.
2.1. Edge extraction
We extract the edge component for vertical and horizontal
direction. Sobel operator is used to detect edges. The
operator consists of a pair of 3x3 convolution kernels as
shown in Fig. 1.
Figure 1. Sobel convolution kernels.
After the vertical and horizontal edge detection from gray
image, we select the strong edges which have intensity
value greater than threshold value. Then we compute the
length of strong edges and select the major edges which
contain the 75% of total lengths for each direction for noise
The edge detection processing within bounding box is
shown in Fig. 2.
1. Introduction
We consider the outdoor environment. And we assume
that the detected objects are car or human. Object
classification in outdoor environment is difficult because of
the change of the object’s property.
Many approaches are proposed for car/human
classification. First category of object classification is
learning method [1, 2, 3]. Learning method consists of four
steps that is feature extraction, dimension reduction,
training of classifier and search/decision. But the learning
method is time consuming approach because of training
and search. And there must be exists different classifier
according to the view of object.
Another category is intuitive method. Intuitive features
represent the overall properties such as aspect ratio, color,
dispersedness, velocity, rigidity, area ratio[4, 5, 6]. Intuitive
method is simple and fast. But the features are not robust to
the distance between camera and object and the quality of
camera image. To solve this problem, we propose an
efficient feature using edge.
The proposed feature uses the length of vertical and
horizontal edges. The distributions of the vertical and
horizontal edges of car/human are different because the
appearance of car/human is different. We classify objects
using the property.
Section 2 describes edge detection processing and edge
length ratio (ELR) feature. Section 3 shows the
experimental result of the proposed feature and compare
with other features. Section 4 describes conclusion.
Figure 2. The edge detection processing.
2.2. Edge length ratio
The positions where vertical/horizontal edge is detected
are as follow.
2. Car / Human classification
We assume that object detection is done in advance. We
can obtain the bounding box regions which contain
foreground pixels. Then we perform the edge extraction for
each bounding box, compute the edge length ratio (ELR)
Figure 3. Vertical / Horizontal edge positions.
- 23 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Fig. 3 represents that car has more horizontal edges
than vertical edges. On the other hand, human has more
vertical edges than horizontal edges. We classify objects
using the property.
We compute the total length of all edges for each
direction. Then we obtain the edge length ratio (ELR) as
Vertical edge length
Horizontal edge length
classification because there are many vertical edges in
The ELR is robust to the distance between the object
and camera because ELR consider major edges. The
major edge of human in the distance is the silhouette.
The major edge of car in the distance is silhouette and
horizontal strong edges such as windshield, side panel
and bumper. And ELR is robust to rotation of object if
the object is not rotated by 45o. In addition to, ELR can
classify peoples correctly.
Figure 5. The result of incorrect classification
4. Conclusion
We proposed the simple and efficient intuitive feature
for object classification. The feature uses the vertical
and horizontal edges of foreground bounding box. After
edge detection for each direction, we select the strong
edges using threshold value. Then we select the major
edges from strong edges using length. Finally, edge
length ratio(ELR) is computed by total length of vertical
and horizontal edges. If the ELR is more than any
threshold value, the object is human. Otherwise, the
object is car.
The ELR is more robust to the distance and rotation
of object than other intuitive features. The experimental
result represents that the ELR feature is more accurate
than other features.
We will try other approaches using the distribution of
vertical and horizontal edges. And the additional
post-processing to eliminate the background effect is
3. Experimental results
We use the StreetScene database in web. We decide the
threshold value of ELR from many arbitrary car/human
images. The threshold value is about 0.9~1. The ELR of
car is less than the threshold value. Contrarily, the ELR of
human is more than the threshold value.
Fig. 4 shows the results of edge detection processing for
car/human images.
This work was supported by the Intelligent Robotics
Development Program, one of the 21st Century Frontier R&D
Programs funded by the Ministry of Commerce, Industry and
Energy (MOCIE) and the authors would like to thank the
Ministry of Education of Korea for its financial support
toward the Division of Mechanical and Industrial Engineering,
and the Division of Electrical and Computer Engineering at
POSTECH through BK21 program.
[1] B. Leung, Component-based car detection in street scene
images, MIT, 2004
[2] A. Mohan, C. Papageorgiou and T. Poggio, Example-based
object detection in images by components, Pattern analysis and
machine intelligence, 2001
[3] S. Agarwal, A. Awan and D. Roth, Learning to detect objects
in images via a sparse, part-based representation, Pattern analysis
and machine intelligence, 2004
[4] A. J. Lipton, H. Fujiyoshi and R. S. Patil, Moving target
classification and tracking from real-time video, Applications of
Computer Vision, 1998
Figure 4. The results of edge detection.
We use other intuitive features such as color histogram
variance, velocity, dispersedness and area ratio in addition
to ELR for comparing. Table. 1 show the classification
accuracy for each feature.
Area ratio
[5] Y. K. Wang and Y. S. Lin, A real-time approach for
classification and tracking of multiple moving objects,
Graphics and Image processing, 2003.
[6] P. Neskovic and L. N. Cooper, A recognition system
that uses saccades to detect cars from real-time video
streams, International Conference on Neural Information
Processing, 2002.
Table 1. The results of classification for each feature.
But there is incorrect classification cases which have
many background edge information. In the future, we
apply the additional processing to eliminate the
background effect. Fig. 5 shows the result of incorrect
- 24 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
A Graph-based Collaborative Filtering Algorithm for
Recommender System
Jinseok Lee, Daijin Kim
Department of Computer Science, POSTECH, Korea
{seogi81, [email protected]
Recommender systems apply knowledge discovery
techniques to the problem of making personalized
recommendations for information. Collaborative
filtering algorithm is one of the most widely used
technique in recommender system, which uses user's
previous preference to items. In this paper, we propose
the graph based recommendation algorithm, which is a
kind of a collaborative filtering algorithm that doesn't
need pre-computation time. Our algorithm updates
graphs in response to a user's expression to an item, and
it uses graphs to give a recommendation to a user's
request. Our experimental result shows that the quality
of recommendation is as good as existing algorithm's
quality without pre-computation.
computer science and it can be used to directly reflect
similarities between items. The algorithm updates graphs
when a user expresses his preference on an item. When a
user request a recommendation of some items, the
algorithm uses graphs to find items that the user may prefer.
Updating graphs can be done in real time, so this algorithm
does not need time to pre-compute similarities.
This paper is organized as follows. Section 2 explains
how to construct graphs. In section 3, 4, we describe how
to use these graphs to recommend items. Section 5 explains
the experimental results. Finally, section 5 presents the
meaning of result and conclusion for this paper.
A procedure for making (maintaining) a graph. The
system maintains two graphs. The first is the preference
graph, and the second is the relation graph.
1) Preference Graph: The preference graph is a bipartite
graph. A bipartite graph has two disjoint sets of vertices, V1
and V2 such that every edge connects a vertex in $V_1$
and one in V2. In the preference graph, two sets of vertices
represent users and items respectably. Each edge represents
a rating value between the user and the item which the
edge connects. When a user rates his preference (generally
1 ~ 5, 1 means the worst and 5 means the best) for an item,
the system looks the preference graph to determine
whether there is an edge which connects the user and the
item. If there is not an edge, the system draws an edge
between the user and the item the user rated. If there is an
edge, the system changes the value.
2) Relation Graph: In our algorithm, its vertices
represent items. An edge between two items represents a
relation between the two items. Therefore the system has to
update the graph when a user rates his preference for an
item. At first, the system finds items which the user has
rated before. The system updates edges' weights with the
equation (1). Assume that a user ui rates an item ij with a
value r, and the user has already rated item ik with value
r(ui, ik). Then the system updates its graphs.
Collaborative filtering (CF) is one of the most promising
technologies to help distinguish useful information from
useless information. CF works by building a database of
user's preferences for items. Two classes of CF algorithm is
exist. One is the user-based CF algorithm. Each user is
matched against the database to discover neighbors.
Neighbors are other users who have similar preference to a
specific user. So, the algorithm recommends items
corresponding to items that neighbors prefer. The other
class is the item-based CF algorithm. This is similar to the
user-based CF algorithm, but differs in that the
neighborhood relationship is built between items, not users.
A user has his preferences and the algorithm recommends
items corresponding to that items' neighbors. CF has been
successful in both research and practice. However, some
challenges remain.
Existing CF algorithms cannot update their similarity in
real time because time required to compute similarity is
quite long. Therefore, many existing systems that use CF
algorithms pre-compute similarities regularly (every
midnight, etc.). So there could be a problem of similarity
consistence. Some kinds of item like music and movies are
changing rapidly, so new items occur frequently. Thus,
similarity consistency is one of the most important
problems of existing CF algorithms.
In this paper, we propose a new algorithm that resolves
the challenge of similarity consistency. Our new algorithm
is called "Graph-based recommendation algorithm", which
uses two graphs. Graph is well-known data structure in
The system performs this updating process for every item
that the user has already rated.
3. Similarity Computation
The first step the system performs is similarity computation
when it gives recommendations to a user. When the system
computes a similarity between item ij and ik, it needs 3
values; the sum of all edge values connected with ij; ik, and
- 25 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
an edge weight between ij and ik. Formally, the similarity
between ij and ik, denoted by sim(ij, ik) is given by equation
the average. The lower the MAE, the more accurately the
recommendation engine predicts user ratings. Our results
are divided into two parts; recommendation quality and
neighborhood size.
1) Recommendation quality: MAE was 0.72 for CF and
0.76 for GB. Hit ratio was 67% for CF and 64% for GB.
2) Neighborhood size: Neighborhood size means the
number of items that the system uses when it recommends
items to a user. We define that the neighborhood size be k.
In the relation graph, the system determines that at most k
items are connected to an item with respect to their edge's
weights. The size of the neighborhood has significant
impact on the prediction quality and calculation time. To
determine the sensitivity of this parameter, we performed
an experiment where we varied the number of neighbors to
be used and computed MAE and calculation time of
recommending items to a user. We can observe that the size
of neighborhood does affect the quality of prediction.(Fig
where e(ij, ik) denotes an edge weight between ij and ik,
and e(ij) denotes sum of all edge connected with ij. The
numerator of equation (2) means the count of co-rated
by the users in the system. The denominator means all
possible combinations of edges from ij and ik, which are
normalized by their numerator. Therefore equation (2)
measures possibility of co-related between ij and ik
given all possible combination of edges from ij and ik.
4. Prediction Computation
The second step is prediction computation, in which the
system computes a prediction value about how user ui will
like item ij. It uses ui's rating values for items that ui has
rated. It also uses items that are connected with ij. Let the
set of items that ui has rated be Ii and set of items that
connected with ij be Ij. The system computes similarities
between ij and the intersection of Ii and Ij.
When the system computes a prediction value using a
weighted sum, it uses the similarity explained above. The
system computes a prediction value on item ij for user ui by
computing the sum of ratings given by ui to the items
similar to ij. Each rating is weighted by the corresponding
similarity sim(ij, ik) between items ij and ik. The prediction
value of ui at ij, denoted by Pre(ui, ij) is given by equation
figure1. Change of MAE and Calculation time over
corresponds to calculation time and clear circles
corresponds to MAE.
6. Conclusions
In this paper we presented and experimentally evaluated a
new algorithm for recommender systems. Our results show
that GB algorithm hold the promise of allowing
recommendation algorithms to scale to large data sets and
at the same time produce high-quality recommendations.
The system computes prediction values over all
candidates, sorts prediction values in descending order
and returns the highest values to the user ui.
5. Experiment
We used experimental data from the MovieLens
recommender system web-site to compare our algorithm
with previous work (Item-based Collaborative Filtering).
The data set can be obtained from ''.
At first, we constructed two graphs with the training set.
We tested for each user in the test set. Assume that current
user has rated n items (i1,i2,...,in). For each item that the
user has rated, we assumed that the user has not rated the
item and get recommendations from the system. Finally we
check whether the recommendations list contains the item
or not. If the list contains the item, we increase the hit
count. After we tested all items the user has rated, we
computed ratio of hit count to the number of items that
current user has rated. We tested for all users in the test set.
MAE is a measure of the deviation of recommendations
from their true user-specified values. For each
ratings-prediction pair (pi for rating value, qi for prediction
value) this metric treats the absolute error between them,
i.e., |pi - qi|. The MAE is computed by first summing these
absolute errors of the N (number of all ratings)
corresponding ratings-prediction pairs and then computes
This work was supported by the Intelligent Robotics
Development Program, one of the 21st Century Frontier R&D
Programs funded by the Ministry of Commerce, Industry and
Energy (MOCIE) and the authors would like to thank the
Ministry of Education of Korea for its financial support
toward the Division of Mechanical and Industrial Engineering,
and the Division of Electrical and Computer Engineering at
POSTECH through BK21 program.
[1] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item-Based
Collaborative Filtering Recommendation Algorithms, in
Proceddings of International World Wide Web Conference, 2001
[2] G. Linden, B. Smith, and J. York,
Recommendations Item-to-Item Collaborative Filtering, in
Proceedings of the IEEE Internet Computing, 2003.
[3] C. C. Aggarwal, J.L. Wolf, K. Wu, and P. S. Yu,
Graph-theoretic Approach to Collaborative Filtering, in
Proceddings of the ACM KDD'99 Conference. 1999
- 26 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Exploration in a Maze
Jungtae Kim, Daijin Kim
Department of Computer Science, POSTECH, Korea
{postman, [email protected]
In this paper we present a new exploration method which
focuses on the economic searching for unknown areas, that
is the removal of redundant path. There are many previous
works about the exploration and the exploitation, but most
of them focus on the trade-off between the exploration and
the exploitation not on the exploration itself. Our proposed
method, Remembering Exploration (RE), is competitive in
the length of the final converged path and faster in finding
the optimal path than experimented previous methods.
mover’s path and the mover will go there later.
In (a) of Fig 1, the mover locates in the room with number
1, and among the four movable directions the upper
direction is blocked. Because the distances from the current
position of the mover to three unknown lands are equal, we
choose one randomly, and then result is (b) of Fig 1; the
mover chose the right direction. In (b) of Fig 1, among the
previous three unknown land the one land (the room with
number 2) is now known, and the three unknown lands are
added. From the room with number 2, among the five
unknown lands the distances to the three unknown lands
(upper, right, and lower rooms of the room with number 2)
are nearest if the metric is Euclidean space. Here we
assume that the maze is the Euclidean space. Among the
three unknown lands, we choose one randomly. The result
is (c) of Fig 1. As you see in (c) of Fig 1, the chosen
unknown land becomes the known land with number 3.
Same to previous work, we calculate which unknown lands
are nearest from the room with number 3 and choose one
among them randomly. Even if there is no unknown land
which is neighborhood of the room with number 3, we
chose one randomly among the unknown lands, and go
there through the known path (3-2).
By repeating the work we can explore all over the maze,
and if we find the final position in the middle of
exploration, we can choose whether we try more
exploration or we stop and do exploitation. In the
following section we summary the previous process.
1. Introduction
The exploration problem is very important with the
exploitation problem in planning. The exploration is the
trying to undergo unknown area, and the exploitation is the
using the already known area as much as possible. In most
cases the exploration is just spend resource and not useful,
but with very few probability the exploration gives us high
profit. Therefore, we cannot satisfy exploitation but try to
exploration sometimes.
There are already several exploration methods [1, 2], but
we want to explore more economically not randomly. The
simplest exploration is of course the random method.
Because of its redundancy the method is good for dynamic
environment, but not good for stationary environment.
Other methods are rather similar to the random method.
To find out the optimal exploration path, we use the
optimal exploitation techniques like breadth-first search,
depth-first search or depth-limited search. We regard the
unknown areas are our goal, and use the optimal
This paper consists of five sections; the first section is the
introduction, the second section includes detail explanation
about our proposed methods, the third section shows the
outline of algorithm, the fourth section explains our
experiments and the results, and finally the fifth section
contains the conclusion of this paper.
3. Algorithm
The outline of our suggested algorithm is following;
Set the current position ( xc , y c ) a random or fixed, and
the target position ( xt , y t ) same to ( xc , y c ).
Register neighbor lands of ( xc , y c ) to unknown lands
2. Remembering Exploration (RE)
Our proposed method focuses on the optimization of the
path of a mover; removing redundant paths as many as
possible. We call our proposed method as Remembering
Exploration method.
Let’s explain about the method using the Fig 1. There are
a maze, and each room is empty or filled by followings; the
blue circle means the there is a block so that the mover
cannot go there or die, the dark green with some number
means the mover’s path, the light green without any
number means that there are unknown lands around the
if not already registered.
IF ( xc , y c ) is equal to ( xt , y t )
Set ( xt , y t ) the nearest unknown land
Move to the neighbor which is on the path toward
( xt , y t )
Update the current position ( xc , y c )
- 27 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Fig 1. Exploration algorithm in a maze; blue circle means block, dark green with number means path, and
Light green without number means unknown land. Each time a mover go toward the nearest unknown land.
4. Experiments
For showing the superior to the previous works, we
compared three well-known simple exploration methods;
one is random walk, other is the following left-wall, and
another is the counter-based exploration.
The random walk (RW) method [1] is, just like the name,
that there is no specific planning. At each time the mover
choose next toward position by random.
The following left wall (FLW) method uses the path from
the start position passed by the connected wall with fixed
one direction; the name FLW does not means just left
direction but one direction, so even though we choose right
direction, the name of the method is still FLW.
The counter-based exploration (CE) method [2] follows
the rule “Go to the least visited state”. Whenever selecting
the next move direction, the mover looks out which room
is visited unusually and goes to there.
We make the test bed; the maze with 13x13 rooms where
there are randomly some blocks which the mover cannot
go through (Fig 2). If the mover tries to go the block, then
the mover must go back to the start position and go on. At
that time the number of dead is increased.
Table 1. Performance of three previous and our method
# of move
# of dead
Final path length
As the summary table shows, RW is worst in this
maze environment. The FLW is very fast in finding the
final position if the start position is connected with the
final position by the walls and blocks, but the final path
is not optimal path. The CE found the almost optimal
path, but during the procedure there are some redundant
moves. Our proposed method, RE, also found the
almost optimal path like the CE, and even it took less
moves than the CE. Therefore, we can say that our
proposed method is very competitive with other
previous methods.
5. Conclusions
We have presented new exploration method which works
well in a maze. This simple and fast method showed better
performance than the previous methods.
We think that this method can be used in more complex
environment like 2D real-environment. For the future work,
we will research and study about cooperation with
Simultaneous Localization and Map-building (SLAM), for
example, SLAM-exploration method.
The authors would like to thank the Ministry of
Education of Korea for its financial support toward the
Division of Mechanical and Industrial Engineering, and
the Division of Electrical and Computer Engineering at
POSTECH through BK21 program.
Fig 2. Maze with 13x13 rooms. The red circle means the
start position, the red cross means the final position,
and the green rooms mean blocked rooms.
[1] S. B. Thrun, The role of exploration in learning control.
In D. A. White and D. A. Sofge (eds.), Handbook of
Intelligent Control: Neural, Fuzzy and Adaptive
Approaches. Florence, Kentucky: Van Nostrand Reinhold,
[2] S. B. Thrun, Efficient exploration in reinforcement
learning. In Proceedings of the 8th International Workshop
on Machine Learning, pages. 358-362, 1991.
Using the three previous methods and our proposed
method, the mover finds the final position from the start
position. The performance comparison is done by how fast,
that is how little moving, the mover finds out the final
position. Whenever all methods find out the route, the
number of steps is recorded and new map is made. Total
100 maps are used for getting average performance and the
result is shown in Table 1.
- 28 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Action Detection Using Shape Matching
Ju-young Lee,
Daijin Kim
Department of Computer Science, POSTECH, Korea
{zealota, [email protected]
In real world, the action occur in crowded, dynamic
environments. We propose a technique for action
detection in cluttered video that reliably identifies
actions. Our approach is based on two key ideas: (1) we
efficiently match the volumetric representation of an
action against over-segmented spatio-temporal video;
(2) we augment our shape-based features using flow.
Our experiments on human actions, such as jumping
jacks or waving two hands in a cluttered video.
⎛ y (i, j ) − f k 2 ⎞
∑k =1 fi g ⎜⎜
y (i, j + 1) =
⎛ y (i, j ) − f k ⎞
∑k =1 g ⎜⎜
where j = 1,2..., with kernel g, typically a Gaussian kernel,
and bandwidth h. Fig. 1 show the result of segmentation
using mean shift.
The goal of action detection is to localize and identify
specified spatio-temporal patterns in video, such as
jumping jacks. A recent trend in action detection and
recognition has been the epoch-making techniques based
on the volumetric analysis of video[1][2]. These approach s
have a benefit that is not limited to a specific set of actor or
action s. It means, in principle, to extend a variety of
actions - given an appropriate training data.
Flow-based methods for action detection operate
directly on the spatio-temporal sequence, attempting to
recognize the specified pattern by brute-force correlation
without segmentation. Shechtman and Irani propose an
algorithm for correlating spatio-temporal action templates,
which can be noisy on object boundaries[4].
We emphasize key ideas from the earlier approaches and
propose an algorithm for the action detection in the real
cluttered video. Section 2. describes mean shift algorithm
for the image segmentation. Section 3. describes how to
detect the action. Section 4. show the experimental results.
And in the final section, we come to a conclusion for the
action detection using shap matching.
figure 1. Mean shift is to segment the image (a) The
result of segmented image, (b) The original input image
To detect the action, we consider the distance between
sptio-temporal template and the segmented input image,
with mean shift algorithm described Section 2. To
calculate the distance, we use the shape matching
Mean shift algorithm is an iterative procedure that finds
local modes in the data and have been used to segment
figure 2. Example showing how a template is
matched to a segmented input.
images[3]. Let f i be a d-dimensional point in a set of n
The shape matching metric is based on the region
intersection distance between the sptio-temporal template
and the set of segmented input image.
We consider the minimum set is defined as the most closest
set of region to the template. There are four cases to
feature points, f1 ,..., f n . To use mean shift algorithm, we
can find the mode near the f i for clustering the points.
We calculate
y (i,1) = f i ,
determine the minimum set as follow. If a region Vi is
- 29 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
similar to human in the video, the action detection can
produce the false detection. To overcome this limitation,
we proposed to use only the spatial shape but also temporal
shape in the segmented video.
completely enclosed by the template, such as V5 then it is
always chosen to make up the minimum set. Similarly to
opposite case, if a region Vi dose not intersect with
template, such as V11 , then it is never considered in the
This work was supported by the Intelligent Robotics
Development Program, one of the 21st Century Frontier
R&D Programs funded by the Ministry of Commerce,
Industry and Energy (MOCIE) and the authors would
like to thank the Ministry of Education of Korea for its
financial support toward the Division of Mechanical and
Industrial Engineering, and the Division of Electrical
and Computer Engineering at POSTECH through BK21
elements of the minimum set. The two interesting cases are
when Vi partial intersects with template, such as V2 ,
V5 . If the intersected region is greater than a half of region,
the distance is the number of elements of the intersected
region. Otherwise, the distance is the number of the
un-intersected elements in the region. More formally, the
distance between the template T and the input V at location
l is defined as
d (T , V ; l ) = ∑ d (T , Vi ; l ),
i =1
[1] Y. Ke, R. Sukthankar and M. Hebert, Event Detection in
Crowded Video, ICCV, 2007.
[2] Y. Ke, R. Sukthankar and M. Hebert, Spatio-temporal Shape
and Flow Correelation for Action Recognition, Visual
Surveillance Workshop, 2007
⎧ T ∩ Vi if T ∩ Vi < Vi / 2
. (3)
d (T , V ; l ) = ⎨
⎩ Vi − T (l ) ∩ Vi otherwise
where T (l ) denotes the template T placed at location l.
[3] D.Comaniciu and P.Meer, Mean shift: A robust
approach toward feature space analysis. IEEE Transaction
on Pattern Analysis and Machine Intelligence, vol.24, no.5,
[4] E.Shechtman and M.Irani, Space-time behavior based
correlation. in Proceeding of IEEE International
Conference on Computer Vision and Pattern Recognition,
[5] H. Jiang and D.R. Martin, Finding Actions Using
Shape Flows, European Conference on Computer Vision ,
[6] C.Schuldt, I.Laptev, and B.Caputo, Recognizing human
actions: A local SVM approach. in Proceeding of IEEE
International Conference on Pattern Recognition, 2004.
For evaluating the proposed the action detection method,
we used the KTH dataset[6]. The KTH dataset and a
hand-held camera in environments with moving people or
cars in the background[1][2]. The videos were down scaled
to 160x120 in resolution. The templates are typically
60x80x30 in size.
Fig.3 shows the result of the action detection in the
cluttered video, where (a) is represented to two hand
waving action and (b) is shown to jumping jacks action,
(a) The action of two hand waving
(b) The action of jumping jacks
figure 3. The result of experiment is showing the
overlapped image template on the segmented input
5. Conclusions
We proposed an action detection using mean shift
segmentation and shape matching. In the case of the object
- 30 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
New Distance Measure for Capturing Global Geometry of Data
Kye-Hyeon Kim, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{fenrir, [email protected]
We present a method for determining k nearest
neighbors, which accurately recognizes the underlying
clusters in a data set. We formulate the neighborhood
construction as an optimization problem, leading to a
novel distance measure called minimax distance. We
further develop a message passing algorithm for
efficient computation of the minimax distance. For
several real data sets, our method outperformed the
k-nearest neighbor method.
Figure 1. The difference between the k-NN and our method.
(Left) The green star denotes a test point, and the solid circle
represents the standard k-nearest neighborhood of that point.
In this case, many incorrect points (blue squares) are in the
neighborhood, since the intrinsic partition (dotted ellipse) is
not considered. (Right) In contrast, our method exactly
reconstructs the desirable neighborhood by joining a number
of small circles with diameter ε.
Finding similar elements of an item plays a key role in
most of the pattern recognition tasks such as classification,
clustering and information retrieval, since a group of
similar items generally forms a meaningful pattern. The
k-nearest neighbor method (k-NN) was proposed in this
perspective and has been widely used. Given a test data
point x*, the method finds the k nearest points of x* in a
given set of training data points using a distance metric.
The success of k-NN depends on how many neighbors
are “actually similar” to the test data point.
Various methods for global or local metric learning
have been developed to discover a proper distance
measure for a given data set. However, they have failed to
achieve success in practical applications because (1) a
real data set generally consists of several arbitrary-shaped
partitions but those methods cannot reflect such
nonlinearity and discontinuity; (2) those methods take too
much time compared to k-NN.
We present a new way for determining neighborhood
which adequately captures and reflects the intrinsic
partitions of a data set. To this end, first we represent a
neighborhood as a connected set of small local balls
while the standard k-nearest neighborhood is determined
by a single large ball, leading to the minimax distance
measure. We then derive a message passing algorithm
which scales with O(N) in time, where N is the number
of training data points. For details, see the full paper [1].
Let us start from a ball initially attached to the test
point, and attach each ball by the rule until there is no
point that cannot be attached. Then, the set of data
points covered by balls can be seen as a cluster whose
size and shape are determined by ε. By choosing the
proper value of ε, we can obtain the correct result.
Let V be a set of data points and d(u,v) denotes the
Euclidean distance between two points u,v V. Then,
the above neighborhood construction rule can be
rewritten as the following:
Definition 1 (ε-bounded pair). For ε ≥ 0, a pair
u,v V is ε-bounded if d(u,w) ≤ ε.
Definition 2 (ε-bounded sequence). A sequence
(s1,s2,…,sM) is ε-bounded if all adjoining pairs st , st+1
are ε-bounded.
Definition 3 (ε-neighbor). A pair u,v V is a
ε-neighbor of each other, denoted by u~v, if there
exists an ε-bounded sequence (u,…,v).
Theorem 1 (ε-neighborhood construction rule).
For a pair u,v V, if there exists w V such that
w~v and d(u,w) ≤ ε,
then u~v.
The main goal of our method is to find appropriate
intrinsic clusters as neighborhoods, for successful
nearest neighborhood retrieval. The intuitive idea is
shown in Fig. 1. The desirable neighborhood in Fig. 1 is
constructed by joining several balls with diameter ε,
following the rule:
The above theorem represents the transitive closure
of the neighborhood, that is, “My neighbor’s neighbor is
also my neighbor.”
Now we formulate the problem of neighborhood
construction as a minimax optimization. For a sequence
S, denote by d(S) the distance between the farthest
adjoining pair in S, i.e.,
A ball whose center corresponds to a data point must
contact or overlap one or more other balls.
- 31 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
d(S) = max(a,b)
then we can easily determine whether S is ε-bounded or
not. If d(S) ≤ ε, there is no adjoining pair (a,b) in S such
that d(a,b) > ε, thus S(u,v) is ε-bounded.
One can enumerate arbitrarily many sequences
between a pair u,v. Among those sequences, if there is at
least one sequence S = (u,…,v) such that d(S) ≤ ε, i.e.,
min S=(u,…,v) d(S) ≤
Figure 2. Image retrieval precision on (Left) Yale-B faces
and; (Right) USPS handwritten digits. “MM N” denotes our
algorithm, where N is the number of training points. The
vertical line indicates the standard deviation.
then there is at least one ε-bounded sequence so u and v
are neighbors of each other.
Now we define the minimax distance between u and
v, denoted by dmm(u,v):
dmm(u,v) = min S=(u,…,v) { max(a,b)
By reformulating the above formula as the recursive
form, finally we develop a message passing algorithm.
Let E be an edge set of the MST and Ii be an index set
{j | (i, j) E}. Then, all minimax distances dmm(x1 ,x*),…,
dmm(xN ,x*) can be computed on the MST in O(N) time:
d(a,b) }.
For dmm(u,v), the corresponding ε-bounded sequence(s)
is called a minimax path between u and v. A tree T is
called a minimax spanning tree if all paths between
pairs u,v in T are minimax paths.
dmm(u,v) represents that there are one or more
sequences between u and v whose bound d(S) is
dmm(u,v). Thus, if dmm(u,v) is small, both u and v are in a
dense region, i.e., they are in the same cluster. If
dmm(u,v) is large, u and v tend to be in different clusters.
Consequently, our neighbor selection method can
capture the intrinsic clusters by using dmm as a distance
Algorithm 3 (Minimax message passing).
1. For all edges (i , j), (note: N −1 edges for a tree)
mi j = max { d(xi ,xj ), min[d(xi ,x* ), mink mki ] }
where k Ii −{j}.
2. For all xi X,
dmm(xi ,x*) = min[d(xi ,x* ), mink mki ]
where k Ii .
For more details, see our full paper [1].
We applied our method to image retrieval tasks, with
two real-world datasets, USPS and Yale-B. We
evaluated the performance in terms of the precision, i.e.,
[The number of correctly retrieved points for a query
image]/k, where k is the number of whole retrieved
points. Fig. 2 shows that our minimax algorithm
outperformed the standard k-NN method.
Algorithm 1 (ε-neighbor selection).
For a test point x*, select xi if dmm(xi ,x*) ≤ ε.
Algorithm 2 (k-nearest neighbor selection).
For a test point x*, select k smallest dmm(xi ,x*).
Let X = { x1,…,xN } be a set of training points and x*
be a test point. Now we derive an algorithm for
computing dmm(xi ,x*) for all xi X efficiently. One
problem is that there are enormously many possible
sequences between xi and x*. However, a minimum
spanning tree (MST) is also a minimax spanning tree [2].
That is, every path between two nodes xi and xj in an
MST, denoted by M(i , j), is also a minimax path. Hence,
we can obtain the minimax distance dmm(xi ,xj) for any
pair xi ,xj X by simply computing d(M), i.e.,
dmm(xi ,xj) = max(a,b)
M(i , j)
We have presented a novel neighbor selection method
which can exploit global geometry of data by choosing
neighbors according to the minimax distance. An
efficient message passing algorithm was also presented,
which computes minimax distances between a query
point and N data points in O(N) time.
[1] K-H. Kim and S. Choi, Neighbor search with global
geometry: a minimax message passing algorithm, in
proceedings of the International Conference on Machine
Learning (ICML), Corvallis, OR, 2007.
[2] J. D. Carroll, Minimax length links of a dissimilarity
matrix and minimum spanning trees, Psychometrika,
vol.60, pp.371-374, 1995.
Now we derive a formula for dmm(xi ,x*). There are N
types of paths between xi and x* such that (xi ,…,x1 ,x*),
(xi ,…,x2 ,x*), …, (xi ,…,xN ,x*). All paths (xi .,…,x*) must
belong to one of them. If we have the MST of the
training set X = { x1,…,xN }, there exist only N paths
(M(i ,1), x*), (M(i ,2), x*), …, (M(i ,N), x*), hence
dmm(xi ,x*) = minj=1…N { max[dmm(xi ,xj ), d(xj ,x*)] }.
- 32 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Neuro-Evolutionary Behavior Selection Network for Enhancing
Performance of Cleaning Robots
Sanghoon Baek, Sanghyuk Park, Se-Young Oh
Department of Electrical and Electronic Engineering, POSTECH, Korea
{tkeb100, psh9924, [email protected]
A new approach to coverage problem using the
neuro-evolutionary technique is proposed. There are
some novel complete coverage algorithms for cleaning
robots, but all of them require solving localization
problem which is hard to achieve with common
cleaning robot hardware system. Here, we propose
no-localization, no-complete coverage approach to
realize the algorithm on a common cleaning robot
hardware that is cheap enough to be sold on the market.
The proposed approach is similar with the evolutionary
robotics approach, but in our approach only the
behavior selection network is evolved which selects
manual rule-based behavior modules. Input vectors
considering odometry errors and the fitness function
that represents ‘good’ cleaning performance are selected
carefully. To prove the performance of the behavior
selection network, two other methods are chosen and
they are tested at various places with simulation and real
robot experiments. Consequently, the proposed
approach shows better performance at large and
partitioned places.
robot. But this method can not guarantee a complete
coverage. Complex method is the other way to categorize
coverage problem. This method consists of three different
categories by cell decomposition methods like approximate
cellular, semi-approximate cellular and exact cellular
decomposition. The Backtracking Spiral Algorithm (BSA)
which is one of the approximate cellular decomposition
methods, for example, decomposes cell with robot size and
environment can be represented with these decomposed
square cells [2]. If there is an obstacle in specific area, that
obstacle can be represented with an obstacle cell and robot
moves along to a free cell which is not an obstacle cell. In
the semi-approximate cellular decomposition case, cells
are defined by rectangular shape which is a vertical slice of
environment [3]. In the exact cellular decomposition case,
cells are defined by rectangles which have different size to
fit in local environment [4]. Most of complex methods
have localization problem. This needs high computation
power and expensive-sensor like laser finder or vision
system. So, we can say that complex methods are not
appropriate yet.
In this paper, we propose new method for enhancing
coverage performance with simple method and applicable
method in the cleaning robot with limited sensing
There are researches for cleaning robot in various
methods. The most important thing of these researches is
coverage problem. This presents the ability difference
between cleaning robots. And these are gaining
commercial attention today with their markets expanding
and at the same time, very intensive competition among the
manufactures. To survive in this severe competition, we
have to get the best result with the least cost. To make it
cheaper, using low-cost sensors and controllers is
inevitable and it restricts the robot’s ability. That’s why the
random-navigation, which does not need many and
high-performance sensors, is widely used by most
sweeping robots; even it can be realized with only one
contact sensor on its front.
By Howie Choset, who organized a coverage problem
according to its coverage methods, we can categorize
coverage problem in two ways that are simple method and
complex method [1]. Simple method is a one way to
categorize coverage problem and this method consists of
heuristic & randomized approach. It has an advantage of
low-cost for computational power and sensors in a cleaning
1) Overall system
Usually, cases of evolution robotics way play a role that
connects between sensor system and motor [5]. In cleaning
problem, finding proper cleaning motion need more
intelligent way than simple controller for adjusting motor’s
Fig. 1. Overall block diagram of behavior selection
velocity. The proposed algorithm has hierarchical structure
as depicted in Fig. 1.
2) Behavior module
Generally, there are four motions for a cleaning robot as
depicted in Fig. 2. And these can be divided by two
different functions which are escaping & exploring and
- 33 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Overlap Fitness
Fig. 2. Cleaning motions: (a)Random motion,
(b)Wall following motion, (c)Back and forth motion,
(d) Spiral motion.
Fig. 4. Experimental result of real robot: (a),(b) is
environment to verify performance proposed algorithm,
(c) is graph of coverage rate, (d) presents overlap ratio.
Escaping & exploring which consists of random and wall
following motion is utilized to move another region which
is not covered yet or to escape specific region which is
already covered. Filling which consists of back and forth
and spiral motion is utilized to cover an empty region.
3) Evolution Process
The neural network for selecting behavior consists of 20
input nodes, 2 output nodes, and one hidden layer. To find
the best weight set of neural network in our behavior
~+90° with maximum 30cm range. And it has bumper
sensor. To verify the performance of proposed algorithm,
we carry out experiment with comparing the performance
of Roomba and random motion of our robot. Figure 4
shows an experimental result.
To achieve a complex task like cleaning, we have to
choose a conceptually bigger output which is called a
behavior rather than left-right motor speed values. The
proposed neuro-evolution behavior selection network
method is proper algorithm for the cleaning robot with
low-cost sensor and computation power.
This work was supported by Korea Ministry of
Knowledge Economy under Intelligent Robot program.
[1] H. Choset, “Coverage for robotics - A survey of recent
results,” in Annals of Mathematics and Artificial Intelligence,
2001, pp.113-126.
[2] E. González, Oscar Álvarez, Yul Díaz, Carlos Parra,
Cesar Bustacara, “BSA: A Complete Coverage Algorithm”,
in Proc. IEEE Int. Conf. on Robotics and Automation,
Barcelona, Spain, April 2005.
[3] V. Lumelsky, S. Mukhopadhyay and K. Sun,
“Dynamic path planning in sensor-based terrain
acquisition.” IEEE Trans. Robotics Automation. 6(4)
462-472, 1990.
[4] H. Choset and P. Pignon, “Coverage path planning:
Proceedings of the International Conference on Field
and Service Robotics, Canberra, Australia, December
[5] A. L. Nelson and E. Grant, “Using direct
competition for competent controllers in evolutionary
robotics.” Robotics and Autonomous Systems, 54:
840-857, 2006.
Fig. 3. Flow chart of evolution process
selection network, we use evolutionary programming
technique like in Fig. 3.
Robot platform has 7 infrared sensors between -90°
- 34 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Gaussian Processes for Source Separation:
Pseudo-likelihood Maximization
Sunho Park, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{titan, [email protected]
function which relates the current sample of source to past
samples is represented by a random process with a
Gaussian prior. Integrating out latent functions is tractable,
leading to a Gaussian pseudo-likelihood that is determined
by a mixing matrix as well as by the predictive mean and
covariance matrix that can be easily computed by GP
regression. The demixing matrix is estimated by
maximizing the log-pseudo-likelihood of the data. The
flexible nonparametric nature allows sources to be
nonlinear time series. This aspect is proven by
experimental results.
In this paper we present a probabilistic method for
source separation in the case where each source has a
certain unknown temporal structure. We tackle the
problem of source separation by maximum
pseudo-likelihood estimation, representing the latent
function which characterizes the temporal structure of
each source by a random process with a Gaussian prior.
The resulting pseudo-likelihood of the data is Gaussian,
determined by a mixing matrix as well as by the
predictive mean and covariance matrix that can be
easily computed by Gaussian process (GP) regression.
Gradient-based optimization is applied to estimate the
log-pseudo-likelihood of the data. Numerical
experiments confirm the useful behavior of our method,
compared to existing source separation methods.
Incorporating the temporal structure of individual source,
we model si,t by
ε i ,t
is the white Gaussian noise with zero mean and
ε i ,t ~ N (0,1) .
which relates the current sample si ,t to past samples
n× n
si ,t −1 = ⎡⎣ si ,t −1 , si ,t − 2 ,…, si ,t − p ⎤⎦ ,
The function f i ( ) is referred to as the latent function
( xi ,t represents the i th element of xt ∈ R ) is assumed
where A ∈ R
unit variance,
separation, the observation data xt = [ x1,t , …, xn ,t ]
xt = Ast ,
where si ,t −1 ∈ R is a collection of past $p$ samples,
Source separation is a fundamental problem that has
wide applications in machine learning, pattern recognition,
and signal processing. In the simplest form of source
to be generated by
si ,t = fi ( si ,t −1 ) + ε i ,t ,
si ,t −1 . In the case of linear AR model, the latent function is
written as
fi ( si ,t −1 ) = ∑ hi ,τ si ,t −τ ,
is the nonsingular mixing matrix and
τ =1
st ∈ R is the source vector whose elements are assumed
where {hi ,τ } are AR coefficients.
to statistically independent. The task of source separation is
to restore unknown independent sources st up to scaling
GP model represents the latent function f i ( ) by a
random process with a Gaussian prior, instead of a
parametric form (4). We place a GP prior over the function
fi ( ) , i.e.,
and permutation ambiguities, without the knowledge of the
invertible mixing matrix A , given an ensemble of data
{xt }tN=1 . In other words, source separation aims to
r r
fi ~ GP ( 0, k ( si ,t , si ,τ ) ) ,
estimate a demixing matrix W = A
such that
W A = P Λ , where P is the permutation matrix and
Λ Lambda is an arbitrary invertible diagonal matrix.
Gaussian process (GP) model has been widely used in
machine learning because of its flexible nonparametric
nature and computational simplicity [1]. In this paper we
use a Gaussian process (GP) model to characterize the
temporal structure of a source, representing the postulated
relationship by a distribution of latent functions. The latent
r r
where k ( si ,t , si ,τ ) is a covariance function. We use the
squared exponential covariance function,
r r
k ( si ,t , si ,τ ) = exp {−λi || si ,t − si ,τ ||2 } ,
is a length-scale hyperparameter.
Taking the source generative model (2) with GP prior (5)
into account, the mixing model (1) can be written as
- 35 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
xt = A ft −1 + A ε i ,t ,
where ft −1 = [ f1,t −1 L f n ,t −1 ] ∈ R ( f i ,t −1
We use nonlinear time series (Mackey-Glass MG30 ,
f i ( si ,t −1 )
Santa Fe competition laser, the first variable Chaotic data
Ikeda map) as sources. Our method shows the best
performance, compared to SOBI [3] and AR-BSS [4]
(see Fig. 1), although the performance difference is not
so big. SOBI does not assume any linear temporal
modeling since it exploits only time-delayed covariance
structure. Linear AR modeling seems to be fine even in the
case where actual sources are nonlinear times series.
However the performance is degraded compared to our
nonparametric source modeling.
and ε i ,t ∈ R follows independent Gaussian distribution,
p(ε i ,t )=N(0,I ) . The induced model (7) is one of
key ingredients in our method, which will be used in the
next section.
In maximum likelihood source separation, sources are
treated as latent variables that are marginalized out. In our
induced model (7) with GP prior (5), we consider the
pseudo-likelihood which approximates the likelihood with
a product of conditional distributions of xt given its
pseudo-likelihood = ∏ p ( xt | X(\ t ) ) ,
t =1
where X
(\ t )
= {x1 ,…, xt −1 , xt +1 ,…, x N } .
We estimate the demixing matrix W = A by
pseudo-likelihood maximization with integrating out the
latent vector ft −1 .To this end, we consider a single factor
of the pseudo-likelihood of the data
To this end, we consider a single factor of the
pseudo-likelihood of the data
figure 1. Performance index (PI) of each algorithm
We have presented a source separation method where
each source is modeled by a GP and the demixing matrix is
learned by maximizing the log-pseudo-likelihood of the
data. Compared to source separation methods where a
parametric modeling (e.g., AR model) was used to capture
the temporal structure of sources, our method is more
flexible in the sense that sources are allowed to be
nonlinear time series. We compared our method to two
representative methods (SOBI and AR-BSS) which also
exploited the temporal structure of source.
p(xt | X(\t ) ) = ∫ p(xt | ft −1, X(\t ) ) p(ft −1 | X(\t ) )dft −1, (8)
p (xt | ft −1 , X(\ t ) ) = N ( A ft −1 , AA ),
p (ft −1 | X(\ t ) ) = N (μ t ,Σt ).
The predictive mean vector μ t ∈ R and diagonal
covariance matrix Σt ∈ R
n× n
are calculated using (9)
and (10). One can refer [2] for the detailed derivations.
It follows from (8-10) that the single factor of the
pseudo-likelihood follows Gaussian distribution,
(\ t )
p(xt | X ) = N ( Aμt , Γt ),
Γ t = A( Σt + I ) A
This work was supported by Korea MIC under ITRC
support program supervised by IITA (IITA-2007-C1090
. Thus we write the
log-pseudo-likelihood of the data as
L = ∑ log p ( xt | X
t =1
(\ t )
[1] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes
for Machine Learning, MIT Press, 2006.
[2] S. Park and S. Choi, Gaussian processes for source
separation, in Proceedings of the IEEE International
Conference on Acoustics, Speech, and Signal Processing,
Las Vegas, USA, 2008.
[3] A. Belouchrani, K. Abed-Merain, J. F. Cardoso, and
E. Moulines, A blind source separation technique using
second order statistics, IEEE Trans. Signal Processing},
Vol.45, pp.434-444, Feb. 1997.
[4] Y. M. Cheung, Dual auto-regressive modeling
approach to Gaussian process identification' in Proceedings
of IEEE International Conference on Multimedia and Expo,
2001, pp.1256-1259.
1 N
= − ∑ {log 2π + log | Γt | + βt ( Σt + I )βt },
2 t =1
where [β t ]i = ⎡⎣ K i si ,1:N ⎤⎦ . We estimate the demixing
matrix W by maximizing the log-pseudo-likelihood (12).
A gradient-based optimization is applied to find a solution
which maximize (12), where fminunc was used in Matlab
implementation (see [2]).
- 36 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Motion Recognition with 3-D Acceleration Signal
Sangki Kim, Gunhyuk Park, Seungjin Choi, Seungmoon Choi
Department of Computer Science, POSTECH, Korea
{ihistory, maharaga, seungjin, [email protected]
In this paper we present a motion recognition system
with hand-held controller. 3-D acceleration Signals are
generated by 3 axis accelerometer in the controller. We
extract motion segment from continuous acceleration
signals and apply to each motion model, which is
trained in training phase. Hidden Markov Model was
used to model each motion. We applied it to three
motion sets with good performance enough to practical
recognition system.
Acceleration of Gravity removal
Offset filter was used for acceleration of gravity
removal. We update the offset, that is, acceleration of
gravity when the acceleration signal make small
changes for certain time interval. We assume that
controller in pause make small acceleration signal
New kinds of interaction technology that understands
user’s motion has emerged due to the rapid development of
sensor technology. An accelerometer measures the amount
of acceleration of controller in motion. Analysis of
acceleration signals enables motion recognition. Some
researchers study motion recognition with Acceleration
Signal [1, 2]. In [1] they model each motion with state
machine and use transitions between states according to
heuristic threshold. It can be used to recognition simple
motions but can’t deal complex motions. Moreover design
of state machine of each motion is not easy work. In [2]
they estimate 3-D trajectories of controller with 3-D
accelerometer and gyroscopes. Trajectory estimation is
effective approach for motion recognition but the size and
cost of gyroscopes make it’s practical use.
In this paper we propose HMM based motion recognition
system with 3-D Acceleration Signals.
Before acceleration of
gravity removal
400 0
Motion model
Motion block extraction
The signals, in motion block, have large acceleration
values and variances. So we assume that continues
region, which have large values and variances, are
motion block.
of Gravity
Figure 2 Acceleration of gravity removal
After acceleration of
gravity removal
Figure 1 Overall System structure
Figure 3 Motion Block Extraction
Figure 1 show overall structure of our motion
- 37 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Motion model
Continues Hidden Markov Model was used to model
each motion. We use 15 hidden states, full covariance
Gaussian distribution for observation modeling. We
assign extracted motion block to the motion with largest
Acceleration of gravity
It is natural to think acceleration of gravity as noise. But
it is not true in some situation. Suppose two kinds of
sword swing motions, one is up-down swing, the other
is left-right swing. Because accelerometers can’t detect
rotation of controller, these two sword swing motions
have similar acceleration signals, without acceleration
of gravity. In this case, we use signals without
acceleration of gravity removal. We can’t distinguish 7
sword swing motions in Ghost Hose without
acceleration of gravity.
Figure 5 3-D Tetris motion set
Motion set numbers consist of 10 numbers from 0 to 9.
Ghost Hose is a game application to demonstration
motion recognition system. It consists of 9 motions,
include 7 sword swings, gun reload, weapon change,
special command.
Before compare likelihoods of each model we add
weight constant to each likelihood according to motion
model. We assign weight constant to each model for
tuning the model. It is helpful reduce recognition error
rate, especially similar motion pair.
Early decision
In the case of long time consuming motion, it is not
comfortable to wait motion block extraction before
recognition. If we have enough acceleration signals, we
can recognition the motion before it is finished. It is
useful in sword swing motions in Ghost House. Figure 4
show loglikelihood graph for 7 sword swing motions.
Blue one is curve from correct model.
Figure 6 Ghost Hose
Each motion model was trained by 30 training motions
made by one people. For test we collect 40 test motions
for each motion, from 4 people.
3-D Tetris
Ghost Hose
Figure 7 Recognition Rate
Figure 6 show the experimental result of 3 motion sets,
include train data and test data.
We developed 3-D Acceleration Signal based motion
recognition system. It was applied to 3 motion sets with
acceptable recognition rate.
[1] Jonghun Baek, Ik-Jin Jang, Hyun Deok Kim, Jae-Soo
Cho and Byoung-Ju Yun, In Proc. KES 2006,
[2] S.-J. Cho, et. al, MagicWand: A hand-drawn gesture
input device in 3-D space with inertial sensors, In Proc.
IWFHR 2004
[3] A Wilson and S. Shafer, XWand: UI for Intelligent
Spaces, In Proc. CHI 2003
Figure 4 Time-Loglikelihood graph
We applied proposed motion recognition system to 3
motion set, 3-D Tetris, numbers, Ghost Hose (shooting
game). 3-D Tetris consist of 12 motions including 6
translations and 6 rotations.
- 38 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Pedestrian Detection using Omega Shape
Sunki Kim, Daijin Kim
Department of Computer and
Communication Engineering, POSTECH, Korea
{kimsk03, [email protected]
In this paper we present a vision based pedestrian
detection and counting system. In Surveillance
application, human detection and counting is very
important problems. We find pedestrians in
indoor/outdoor situation using human head shape. We
begin with shape based searching to detect human head
of various sizes and orientations. The assumption that
head and shoulder contours have an omega shape is
made. The final step is head region verification. We
show that high detection rate and processing speed can
be achieved.
Figure 1. The proposed head shape detection system
After edge detection, we calculate angle of normal
vector of edge outlines. So, we get edge images and
corresponding angle images. Fig. 2 shows the normal
vector direction of edge.
1. Introduction
Human detection and counting for the surveillance and
security is very active research area in the computer vision
community. Applications range from pedestrian traffic
management, detection of overcrowding situations in
public buildings to tourist flow estimations.
Detect human has many problems because human can
wear various cloths and take various poses. Omega shape
is formed by head and shoulders, regardless of clothing and
hairstyle and orientation. So, omega shape has robustness
for head rotation, illumination changing, and variable head
The details of the head candidate region detection and
head candidate region verification are described in
section 2 and 3, respectively. We also discuss the
experimental results of the proposed method in Section
4. Finally, we draw up the conclusions in Section 5.
Figure 2. Normal vector direction of mean shape
Fig. 3 show templates for prewitt operator. And we
obtain the angle of normal vector of edge outlines as
2. Head Candidate Detection
We assume that object detection is done in advance. We
can obtain the bounding box regions. We can extract
foreground pixel using background information. Then, we
make binary image that black area represents background
and white area represents foreground. Next step, we
perform human head candidate region detection and
candidate region verification process.
The proposed head shape detection system is shown in
Fig. 2
Figure 3. Templates for Prewitt operator
θ ( x, y ) = tan −1 (
My ( x, y )
Mx( x, y )
2.2 Candidate Region Detection
In Training step, we can make mean head-shoulder
shape. Mean shape has 23 points ( x1 , y1 ,..... x23 , y 23 ) .
Each point has x, y coordinates. So, mean shape consists of
46-dimensional vector. Mean shape is average omega
There is difference between edge shape and mean shape.
We consider additional 5 pixel at each point of mean shape
2.1 Edge Detection and Angle Calculation
First, we extract edge from input binary images because
we use only boundary information. We use canny edge
detection algorithm.
- 39 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
on normal vector direction for minimize difference.
The weight ( w1... w23 ) are used to assign more
various camera positions and sparse occlusion situation.
But we do not consider occlusion in densely crowded
In future work, we will improve proposed algorithm that
deals with densely crowded situations such as subway and
railway station
importance to head point and reduce the influence of
shoulder points of the model. O is the edge image map. R
is the angle map. ( xs , ys ) is point of the shape in the
input image. A local angle
of a model point is defined
as the direction of the vertex. Fig. 4 shows head candidate
region detection. Finally we get xi that represents
candidate vector.
Figure 5. The result of pedestrian detection
This work was supported by the Intelligent Robotics
Development Program, one of the 21st Century Frontier R&D
Programs funded by the Ministry of Commerce, Industry and
Energy (MOCIE) and the authors would like to thank the
Ministry of Education of Korea for its financial support
toward the Division of Mechanical and Industrial Engineering,
and the Division of Electrical and Computer Engineering at
POSTECH through BK21 program.
Figure 4. Human head candidate region
We obtain the summation of candidate vector as follows.
If S is greater than candidate threshold, then we accept
candidate region.
[1] J. Garcia and M. Shah, Automatic detection of heads in
colored images, in Proceedings of the 2nd Canadian Conference on
Computer and Robot Vision, 2005, pp.276-281
[2] J. Yuk and K. Wong, Real-time Multiple Head Shape
Detection and Tracking System with Decentralized Trackers, in
Proceedings of Sixth International Conference on Intelligent
Systems Design and Applications, 2006, pp.384-389
[3] S. Oliver and L. Yuriy, Pedestrian Detection and Tracking for
Counting Applications in Crowded Situations, in Proceedings of
IEEE International Conference on Advanced Video and Signal
Based Surveillance, 2006
[4] S. Cho and D. Kim, Pose Robust Human Detection and
Tracking Using Multiple Oriented 2D Elliptical Filters, Pohang
University of Science and Technology, 2007
[5] H.Yoon and D. Kim, A Robust Human Head Detection
Method for Human Tracking, in Proceedings of IEEE
International Conference on Intelligent Robots and Systems, 2006
[6] T. Cootes and C. Taylor, Active Shape Models: Their Training
and Application, Computer Vision and Image Understanding,
Vol.16, pp. 38-59, January. 1995
1 23
∑ wi × max xi ( g ( x, y) * (O( xi , yi ) × cos(ai − R( xi , yi ))))
23 i =1
3. Head Candidate Verification
We found candidate region in previous step. In this step,
we determine that candidate region is real human head
shape region or not.
A shape X which consists of vertex points
X = {x1 , y1 , x2 , y 2 ,....., x23 , y 23 } can be approximated
by X = X + Eb . X is mean shape, E is eigenvector,
b is shape parameter. And shape parameter bi can be
computed by bi = E ( X i − X ) . We compute norm
of shape parameter by
bi = E T ( X i − X ) . If norm
of shape parameter is greater than verification threshold,
then we determine that candidate region is real human
head shape region.
4. Experimental Result
We use the LG surveillance database. The candidate
threshold value is 5, and the verification threshold is 7.07.
Fig. 5 shows the results of head shape detection processing
for pedestrian detection.
[7] H. Ghassan and A.G. Raffeef, Active Shape Models – Part Ⅰ :
5. Conclusions
Image Search and Classification, in Proceedings of the Swedish
Symposium on Image Analysis, 1998
Modeling Shape and Gray Level Variations, in Proceedings of the
Swedish Symposium on Image Analysis, 1998
[8] H. Ghassan and A.G. Raffeef, Active Shape Models – Part Ⅱ :
We proposed the human head shape detection algorithm
using omega shape. Our implementation works well in
- 40 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Complete Coverage Algorithm for a Robot Cleaner:
Enhanced Spiral Filling Algorithm
Tae-Kyeong Lee, Young-ho Choi, Sanghoon Baek, Se-Young Oh
Department of Electrical and Electronic Engineering, POSTECH, Korea
{devilee, rockboy, tkeb100, [email protected]
This paper presents an online complete coverage path
planning algorithm suitable for cleaning robots
equipped with low cost sensors, guaranteeing the robot
to completely cover a closed and unstructured planar
environment. The proposed Enhanced Spiral Filling
(ESF) algorithm belongs to the approximate cellular
decomposition approach which abstracts the
environment as a set of grids. The algorithm generates a
path composed of several simple spiral paths and
linkage paths among them. To minimize the number of
turns, we suggest aligning the x-axis of map coordinates
to one of the walls of the target space. In addition, we
propose an efficient path planner to link simple spiral
paths using constrained inverse distance transform; this
planner can find both the nearest non-contiguous
uncovered cell and the shortest path to it,
simultaneously. Experiments on real robot and
simulator demonstrate the efficiency and robustness of
proposed algorithm.
we develop a map coordinate initialization scheme based
on the history of sensor readings to improve the
time-to-completion by reducing the number of turns on the
generated path. Second, we propose constrained inverse
distance transform (CIDT) to link simple spiral paths,
which can find both the nearest non-contiguous uncovered
cell and the shortest path to it, simultaneously.
Enhanced Spiral Filling (ESF) algorithm basically covers
simple region using spiral filling path, and uses constrained
inverse distance transform (CIDT) to link these simple
regions. Here, a simple region denotes an area that can be
filled using a single spiral path. To this end, the
environment is incrementally modeled by a coarse-grain
occupancy grid and each cell is set to one of five types:
unexplored, covered, partially covered, occupied and
candidate. The overall process of ESF is shown in Fig. 1.
1) Map Coordinate Initialization
ESF algorithm is focused on the indoor applications. In
general, indoor environments (e.g. home, office, etc.) have
a rectangular shape consisting of four walls. In these cases,
we can drastically reduce the number of turns by aligning
the x-axis of map coordinates to the one of the walls of a
target environment. In our algorithm, we assume that the
Among several research topics related to the cleaning
robot, an efficient path planning algorithm is described in
this paper. The original path planning algorithms focused
on the problem of finding a path between two points
minimizing the path length, journey time and energy
consumption. However, cleaning robot requires other kinds
of path planning algorithms which are capable of finding
paths which ensure the complete coverage of an
environment. This is called a coverage path planning [1]
whose goal is to find paths maximizing the coverage while
minimizing the time-to-completion. Some approaches in
coverage path planning rely on the heuristic and
randomized components [2,3]. They cannot guarantee
complete coverage. Other approaches are complete
coverage path planning algorithms, which generate paths
that completely cover the free space. They can achieve
lower time-to-completion than heuristic approaches. In this
kind of approach, some algorithms, such as Backtracking
Spiral Algorithm [4], Spiral-STC [5], are already proposed.
Backtracking spiral algorithm (BSA) is based on the
execution of spiral filling paths and assures the
completeness using backtracking mechanism to link simple
spiral paths. This paper has two main contributions. First,
Initialize map to
Wall following &
map coordinate
Does neighboring
candidate cell
inverse distance
Does candidate
cell exist?
Spiral filling rule
Move to the
target cell
Is the target cell
fully accessible?
Contour following
Fig. 1 Overall process of ESF
- 41 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Using these rules, we can efficiently constrain the
searching boundary of IDT. Therefore, we call the
overall process of searching the
nearest candidate cell Constrained Inverse Distance
Transform (CIDT).
robot starts nearby the wall and ESF starts from following
this wall. Map coordinate initialization procedure is
activated in the initial stage of wall following. To follow
the wall with the desired distance, first of all, the robot
should access to the wall until satisfying the desired
distance (accessing phase) and then align its orientation to
the wall (aligning phase). When the robot’s orientation is
stably aligned to the wall, we then use this robot pose as
the reference frame of map coordinates
2) Wall Following and Contour Following
We adopt wall following and contour following
procedure to cover the partially occupied cells nearby the
walls and the obstacles. Therefore, spiral filling rules
guarantee the complete covering of a cell because the robot
moves via the centers of cells but wall and contour
following procedures do not guarantee it. To handle this
problem, these cells are marked as partially covered type
and then considered as cells of interest in applying both
spiral filling rules and CIDT and these cells are revisited to
be covered completely.
3) Constrained Inverse Distance Transform
Constrained Inverse Distance Transform (CIDT) is a key
component of our ESF algorithm to link simple spiral paths.
In the proposed Inverse Distance Transform, region
growing is started from a known current cell (starting
point) and terminated when meeting an unknown candidate
cell (goal point) with the shortest distance and this
transform generates the inverse path from the goal to the
starting point. this is not always guaranteed because
distance wave is propagated with the shape of square in the
IDT as depicted in Fig. 2. This termination condition is
implemented using following rules:
IF ( D > 2r )
THEN terminate_IDT
ELSE IF (nearer_candidate_cell_detected)
THEN renew_r
where D is the distance of a cell being evaluated and r
retains the distance of the nearest candidate cell so far.
Current cell
First found candidate cell
Nearer candidate cell
Ideal distance wave
Boundary of IDT
To show the robustness of ESF in the various
environments, we conducted experiments in the four
different real environments, with the size of 4.2 m by 5
m, shown in the first column of Fig. 3.
: Final robot
: Final robot position in
This work was supported by Korea Ministry of
Knowledge Economy under Intelligent Robot program.
(n-2)th distance wave
(n-1)th distance wave
nth distance wave
[1] H. Choset, “Coverage for robotics - A survey of recent
results,” in Annals of Mathematics and Artificial Intelligence,
2001, pp.113-126.
The proposed Enhanced Spiral Filling (ESF) algorithm
drastically reduces the number of turns on the generated
path by applying map coordinate initialization scheme.
In addition, we propose constrained inverse distance
transform (CIDT) to link simple spiral paths. CIDT is
the unified solution which is able to find a
non-contiguous uncovered cell with the minimal path
length from a cell being swept as well as path to it,
simultaneously. Last, we demonstrate that our algorithm
is applicable to embedded
Before detecting
nearer candidate cell
Fig. 3 Real-world experimental results. The first column shows the real
environment setups, the second column shows the overall trajectories, the
third column shows the maps and last column shows the coverages
: Initial robot
1st distance wave
2nd distance wave
(b) After detecting a nearer
candidate cell
Fig. 2 Definition of the boundary of IDT
- 42 -
[2] R. A. Brooks, “A robust layered control system for a
mobile robot,” IEEE J. Robotics and Automation, 1986.
[3] D. Gage, “Randomized search strategies with imperfect
sensors,” in Proc. SPIE Mobile Robots VIII, Boston, MA,
1993, pp.270-279.
[4] E. González, Oscar Álvarez, Yul Díaz, Carlos Parra,
Cesar Bustacara, “BSA: A Complete Coverage Algorithm”,
in Proc. IEEE Int. Conf. on Robotics and Automation,
Barcelona, Spain, April 2005.
[5] Y. Gabriely and Elon Rimon, “Spiral-STC: An
On-Line Coverage Algorithm of Grid Environments by a
Mobile Robot,” in Proc. IEEE Int. Conf.on Robotics and
Automation, Washington, DC, May 2002.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
A MeanShift-Watershed embedded Particle Filter:
Variable number of Target Detection and Tracking
Taewan Kim, Daijin Kim
Department of Computer Science, POSTECH, Korea
{taey16, [email protected]
version, MS-WS uses fancy visual pattern clustering
method that give a chance to estimate the number and
position of the initial seeds and then applying regionally
growing segmentation so that we can surmount the
over-segmentation problem.
Tracking a varying number of non-rigid object is a
difficult problem because of such problems that there is
no exact way to estimate the expected number of objects
which we want to track and the presence of overlap and
visual ambiguities. To surmount these difficulties, we
introduce MeanShift-Watershed segmentation(MS-WS)
embedded particle filter. Our approach combines
segmentation technique which gives the regions that
particles are should be scattered. And then our particle
filter infers the best regions where the objects are
suppose to be. In other worlds, The MS-WS
segmentation directly gives a chance to estimate the
inference region in feature space. By using this prior
information, state transition, and color cue, the particle
filter approximates the best region via sequential
Monte-Carlo method.
2.1. MeanShift Segmentation
The input image firstly transformed into a 3 dimensional
vector in LUV space. The image I ( x, y ) ∈ R
decomposed by singular value decomposition(SVD) such
that I = U ∑ V T . The image I ( x , y ) is factorized by
SVD and then the columns of U represents the principle
component which contains major chromatic and shading
information exploiting the following formula
A m×n = U ∑ V T
= EV ( AA T ) m × m ∗ ∑ ∗ EV( A T A ) Tn × n , (1)
where A = 1 / n ( I − μ ) , EV ( • ) is eigen-vector of
corresponding matrix. This eigen-image which is
represented by the column of U gives a better
computational efficiency because we only use gray space
which contains rich chromatic and shading information with
few loss of details. This eigen-image reconstructed as 1D
histogram and then the MeanShift algorithm itself find
modes(or centers of the regions of high concentration) of the
Tracking multiple objects which can be appear and
disappear in a scene is an open problem in many vision
tasks. This problem in nature contains both a complex
nonlinear property and uncertainty. To estimate variously
MeanShift-Watershed(MS-WS) segmentation and then
track the targets with particle filter framework. The
remainder of this paper is organized as follows. Theoretical
details of MeanShift and Watershed is mentioned in section
2. Particle filter framework is explained in section 3.
Section 4 is devoted to presenting the experimental result
of our tracking system as well as the experimental
configuration. Finally, we conclude our approach and leave
aspects of improvements in future work in section 5.
As mentioned in section 1, MeanShift-Watershed
segmentation is used for detecting all players in a scene.
Detecting targets correctly is critical to achieve better
MeanShift-Watershed(MSWS) segmentation method
which is a combined version between MeanShift and
segmentation[1] is simple but powerful method that cluster
coherent visual patterns in feature space. However, it does
not provide a locally separable characteristic at all. The
watershed segmentation[2] is not feature space analysis
method but region growing approach so that it provides
regionally discriminative property. It however remains
over-segmentation problem because we do not know the
number of initial growing point. Hence the combined
Figure 1. A histogram of eigen-image separating from
players and background by using mean shift.
Figure 2. (Left)An example of eigen-image, (right):
Separation from players and background
2.2. Watershed Segmentation
Watershed transform[2] is a method that considers any
grey scale image as a topographic surface. As depicted
in figure 3, the gray image can be visualized in three
dimensions: two spatial coordinates versus gray levels.
Intuitively thinking, we fill out the water in topographic
- 43 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
where α is a mixing proportion. Unlike a mixture
particle filter, we know the number of targets and
corresponding likelihood(observation), {pj ( yt | xt )}j=1,...,M
surface, and then all local minima and corresponding
locally surrounded area are regarded as perceptual
by MS-WS segmentation. When one or more newly
appeared objects in a scene, our MS-WS algorithm can
detect these objects and automatically start to track with
color based measurement model
3.2 State transition model: Autoregressive model
The autoregressive(AR) process is the easiest and
simplest model to estimate the dynamic state transition
so that we adapt such that
⎧ A~
x ti + B ~
x ti− 1 + Cv t if , s t > 0 . 5
x ti+ 1 = ⎨
x ti− 1 + Cv t
, otherwise
where A, B, C are the AR coefficients and st ~ U (0,`1)
Figure 3. An illustration of watershed segmentation
transform in 2D, (bottom-left):Distance transform in 3D,
(bottom-right):Segmentation result
Figure 4 shows an segmentation result of our MS-WS
segmentation. The players and background is segmented
by Meanshift algorithm shown in right of figure 2. Even
thought we separate players from background, there is
no perceptual homogeneity per each player at all so that
we make use of watershed transform in order to
segment each players with region growing approach.
The result of our MS-WS segmentation is shown in
right of figure 4. The segmentation result(position of all
players) is explicitly used for estimating number of
players and observation cue.
3.3 Observation model: Color based tracking
Our observation model is color histogram based
Bhattacharyya coefficient metric which is introduced by
In figure 5, we can recognize the fact that the result of
our MS-WS embedded particle filter is acceptable
Figure 4. Segmentation result by MS-WS method
(left):Edge detection by using Meanshift segmentation,
(middle):Distance transform, (right):Labeling
Figure 5, Tracking result a object(x-axis:frame number,
y-axis:position at x direction), Blue line : estimated
position, Black line : ground truth
The particle filter is a successful tracking method quite
proper for moving with nonlinearity and uncertainty.
This particle filter obeys the following Bayesian
inference with sequential Monte-Carlo sampling.
p ( x t | y t −1 ) =
p ( x t | x t − 1 ) p ( x t − 1 | y t − 1 ) dx
There is no global solution for tracking varying number
of targets which naturally involves huge complex
interactions with overlap and ambiguities. To overcome
this problem, the MS-WS is used for our detection
method. Even though there is no generally applicable
criterion for segmentation, the combined method
between feature space analysis and region growing
approach shows a better tracking result such sports
player case with particle filter.
t −1
p ( x t | y t ) = cp ( y t | x t ) p ( x t | y t −1 )
N ~ i
~i ,
≈ ∑ w
tδ ( xt − xt )
where c =
~i ∝ wi
t −1
∫ p ( y t | x t ) p ( x t | y t − 1 ) dx t ,
p ( y t | x ti ) p ( x ti | ~
x t i− 1 ) and
q(x | x , y )
~ i = 1
t −1
[1] Y.Keselman,E.M.Tzanakou, ”Extraction and
Characterization of regions of interest in biomedical
images”, IEEE Inter. Conf. on ITAB, pp87-90, 1998
[2] J. B. T. M. Roerdink and A. Meijster, ”The
Watershed Transform: Definitions, Algorithms and
Parallelization Techniques,” Institute for Mathematics
and Computer Science, University of Groningen,
Groningen, The Netherlands, IWI 99–9-06, 1999
[3] D.Comaniciu, P. Meer, “Real-Time Tracking of
Non-Rigid Objects using Mean Shift, CVPR, 2000
3.1 Acquiring optimal proposal distribution using
MS-WS segmentation
The proposal distribution should be able to scatter the
particles into the regions with high likelihood ratio even
thought there is big gap between the mode of prior
density and likelihood. We adapt to two Gaussian
mixture that combines both autoregressive prior and
MS-WS cue.
q( xt | ~
xt −1 ) = α ⋅ qMS−WS ( xt | ~
xt −1, yt ) + (1 − α ) ⋅ p( xt | ~
xt −1 ) , (2)
- 44 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Yong-Deok Kim, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{karma13,[email protected]
Nonnegative Tucker decomposition (NTD) is a recent multiway extension of nonnegative matrix factorization (NMF), where nonnegativity constraints are incorporated into Tucker model. In this paper
we consider α-divergence as a discrepancy measure and derive multiplicative updating algorithms for NTD. The proposed multiplicative algorithm includes some existing NMF and NTD algorithms as
its special cases, since α-divergence is a one-parameter family of
divergences which accommodates KL-divergence, Hellinger divergence, χ2 divergence, and so on. Numerical experiments on face
images show how different values of α affect the factorization results under different types of noise.
Nonnegative matrix factorization (NMF) is a widely-used multivariate analysis of nonnegative data [1] which has many potential applications in machine learning, pattern recognition, and signal processing. Successful applications of NMF result from its ability to learn a
parts-based representation through a matrix factorization, X = AS,
is a data matrix, A ∈ Rm×r
is a
where X = [x1 , . . . , xl ] ∈ Rm×l
basis matrix, and S ∈ R+ is an encoding variable matrix.
In many real applications, data has a multiway structure. Conventional methods preprocess multiway data, putting them into a
matrix. Recently, there has been a great deal of research on multilinear analysis which conserves the original multiway structure of
the data. Motivated by multiway extensions of SVD, NMF was
also extended to 2D-NMF, nonnegative tensor factorization (NTF),
NTF2, higher-order NMF, and nonnegative Tucker decomposition
(NTD) [2] which are based on Tucker2, CANDECOMP/PARAFAC,
PARAFAC2, and Tucker model [3], respectively.
LS error function or KL-divergence has been widely used as a
discrepancy measure in NMF, NTF, and NTD. Recently various divergence measures such as Csisz´ar’s f -divergences α-divergences
[4], and Bregman divergences, were considered as discrepancy measures in NMF and NTF.
In this paper we further elaborate our recent work on NTD [2],
considering the α-divergence. 1 We develop multiplicative updating
algorithms, referred to as ’α-NTD’, which iteratively minimize the
α-divergence between nonnegative data tensor and Tucker model.
Empirically we investigate the role of α on image de-noising by αNTD, confirming that the parameter α is related to the characteristics of a learning machine, where the α-divergence of q from p
(Dα [p||q]) emphasizes the part where p is small as α increases [6].
1 In
2.1. α-Divergence
Let us consider α-divergence as an error measure. The objective
function for NMF is given by
i,j αXij + (1 − α)[AS]ij − Xij [AS]ij
Dα [X||AS] =
α(1 − α)
The α-divergence includes KL-divergence (KL[q||p] for α → 0
or KL[p||q] for α → 1), Hellinger divergence (α = 21 ), and χ2 divergence (α = 2), as its special cases.
This objective function is iteratively minimized by the following
multiplicative updating algorithms:
¾. 1
½ >
A (X/[AS]).α α
S ← S¯
A> 11>
¾. 1
(X/[AS]).α S > α
A ← A¯
11> S >
where ¯ is Hadamard product, / is element-wise division, 1 =
[1, . . . , 1]> , and [X].α = [Xij
]. Updating rules (1) and (2), referred
to as ’α-NMF’, can be easily derived using the same technique as
used in [1].
2.2. Nonnegative Tucker Decomposition
An N -way tensor X ∈ RI1 ×I2 ×···×IN has N indices (i1 , i2 , . . . , iN )
and its elements are denoted by Xi1 i2 ...iN where 1 ≤ in ≤ In . The
mode-n matricization of X ∈ RI1 ×I2 ×···×IN rearranges the elements of X to form the matrix X (n) ∈ RIn ×In+1 In+2 ···IN I1 I2 ···In−1 ,
where In+1 In+2 · · · IN I1 I2 · · · In−1 is in a cyclic order.
Nonnegative Tucker decomposition (NTD) seeks a decomposiI1 ×I2 ×···×IN
tion of a nonnegative N -way tensor X ∈ R+
as mode
J1 ×J2 ×···×JN
and N
products of a nonnegative core tensor S ∈ R+
In ×Jn
nonnegative mode matrices A ∈ R+
b = S ×1 A(1) ×2 A(2) · · · ×N A(N ) ,
X ≈X
which can be written in an element-wise form as
(N )
Xbi1 i2 ···iN =
Sj1 j2 ···jN Ai1 j1 Ai2 j2 · · · AiN jN .
- 45 -
j1 ,j2 ,...,jN
The mode-n matricization of X in Tucker model (3), is expressed by
Kronecker products of the mode-n matricization of the core tensor
and mode matrices:
X (n) ≈ A(n) S (n) A(n−1) ⊗ · · · ⊗ A(2) ⊗ A(1)
⊗A(N ) ⊗ · · · ⊗ A(n+2) ⊗ A(n+1)
fact, this paper is a shorter version of [5].
A(n) S (n) A(\n)> ,
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
where S (n) is the mode-n matricization of the core tensor S. The
representation (5) plays a crucial role in deriving multiplicative updating algorithms for NTD.
α-NTD considers the objective function that is the α-divergence of
b (given in (3)) from the N -way tensor X of data:
Tucker model X
Dα [X ||X
α(1 − α)
αXi1 i2 ···iN
X (n)
vec X (n)
A(n) S (n) A(\n)> = A(n) S A ,
vec A(n) S (n) A(\n)>
A(\n) ⊗ A(n) vec S (n) .
Note that these matricized equations have the same form as NMF.
Thus, update for the core tensor follows (1) and updates for mode
matrices are derived from (2).
With some calculations, the multiplicative updating rules for the
core tensor and mode matrices of α-NTD are given by
pepper (black)
pepper & salt
salt (white)
α = 0.5
α = 0.5
i1 ,i2 ,...,iN
Multiplicative updating algorithms for NTD were elegantly derived
in [2] when LS error function or KL-divergence was used as an objective function. We derive multiplicative updates for NTD in a similar way when α-divergence objective function (6) is considered.
The core idea in developing multiplicative updates for α-NTD
b , given by
is to use the mode-n matricization of Tucker model X
Table 1. The relationship between types of noise and divergences.
In the case of pepper (black) noise, the larger value of α gives better
result. In the case of salt (white) noise, the smaller value of α gives
better result.
(1 − α)Xbi1 i2 ···iN − Xiα1 i2 ···iN Xbi1−α
. (6)
1 i2 ···iN
noise and the smaller α works better in the case of salt noise. In fact,
these results are consistent with the characteristics of α-divergence
where Dα [p||q] emphasizes the part where p is small as α increases.
b ).α ×1 A(1)> · · · ×N A(N )>
(X /X
E ×1 A(1)> · · · ×N A(N )>
. α1
b ).α
SA 
 (X /X
A(n) ¯
11> S A
). 1
for n = 1, . . . , N .
In the case of pepper noise, noise-contaminated pixels are replaced by Xij = 0 and the associated term in the error function is
1 b
X . The larger value of α penalizes such terms, leading to the betα ij
ter performance. In the case of salt noise, noise-contaminated pixels
are given as Xij = 255 and the associated term in the error function
bij − 255α X
b 1−α }. In this function the
is α(1−α)
{255α + (1 − α)X
smaller value of α de-emphasize the outlier effect.
We have presented a method of nonnegative Tucker decomposition
in the case where α-divergence is considered as an error function.
We have derived multiplicative updating rules for α-NTD which include various existing algorithms such as NMF, nsNMF, NTF, NTF
as special cases. We have also investigated the role of α in α-NTD
with experiments on image de-noising under 3 different types of
noise. Empirical results were well matched with the known characteristics of α-divergence.
We investigate the role of α ∈ {0.5, 1, 2} in the tensor factorization.
To this end, we apply α-NMF and α-NTD to a image de-noising
task with different types of noise or outliers. We use ORL face DB
(X ∈ R48×48×400
collects 48 × 48 images of 400 people) to gener+
ate noise-contaminated images where 3 different types of noise are
considered, including pepper (black), salt (white), and pepper & salt
(black and white). For each image, 5% of pixels are randomly chosen and then are converted to black or white pixels. In α-NMF, the
number of basis is set as 25. In α-NTD, the dimension of the core
tensor is set as 24 × 24 × 25. Note that α-NTD require smaller
number of parameters (about 38%), compared to α-NMF.
Experiments are carried out 20 times independently for each
type of noise and each value of α = {0.5, 1, 2}. As a performance measure, averaged peak-signal-to-noise ratio (PSNR) is used.
Higher PSNR values represent better results. The results are shown
in Table 2. In most of cases, α-NTD is more robust to noise than αNMF. Interesting observations in these experiments are as follows.
The larger α results in the better performance in the case of pepper
- 46 -
[1] D. D. Lee and H. S. Seung, “Learning the parts of objects by
non-negative matrix factorization,” Nature, vol. 401, pp. 788–
791, 1999.
[2] Y. -D. Kim and S. Choi, “Nonnegative Tucker decomposition,”
in Proceedings of the IEEE CVPR-2007 Workshop on Component Analysis Methods, Minneapolis, Minnesota, 2007.
[3] L. R. Tucker, “Some mathematical notes on three-mode factor
analysis,” Psychometrika, vol. 31, pp. 279–311, 1966.
[4] S. Amari and H. Nagaoka, Methods of Information Geometry,
Oxford University Press, 2000.
[5] Y. -D. Kim, A. Cichocki, and S. Choi, “Nonnegative Tucker
decomposition with alpha-divergence,” in Proceedings of the
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, Nevada, 2008.
[6] S. Amari, “Integration of stochastic models by minimizing αdivergence,” Neural Computation, vol. 19, no. 10, pp. 2780–
2796, 2007.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Temporally Weighted Common Spatial Pattern for
Uncued EEG Application
Yunjun Nam, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{druid, [email protected]
In this paper we propose the method to adopting
temporal structure into CSP, by measuring adaptability
between each time point of multi-trial structure and
covariance matrices.
and it represent EEG signal of single-trial. In experiment,
many trials are repeated to acquire sufficient information.
If trial is repeated N times, total mean-centered data
matrix X can be represented to X = [ Xˆ 1, Xˆ 2,L , XˆN ] .
The original CSP does not care about temporal structure, so
sequence of trial or temporal sequence of each trial can be
ignored. The normalized spatial covariance of the EEG is
Brain computer interface (BCI) is a system designed to
translate a subject’s intention or mind into a control signal
for various devices. Electroencephalogram (EEG), which
captures electrical potentials from surface of scalp, is the
most popular signal used for BCI because of its portability,
A large number of techniques and algorithms for signal
processing and pattern recognition have been employed in
the BCI community. Among them, common spatial pattern
(CSP) technique is one of the most popular methods, since
firstly introduced by Koles et al. in 1990[1]. We are trying
to solve drawback of CSP, such as outlier sensitivity, and to
expand limits by adopting multi-class paradigm or local
temporal structure.
Our current issue is uncued classification problem. Most
researches are aimed to design good classifier for
single-trial signals, which is already timely partitioned,
from beginning of stimuli to attenuation of brain activity.
However, in a real world environment, the machine cannot
measure these time points or duration, because EEG signal
is only input for the machine. And this is also one of the
prime issue of BCI competition 2007. In previous
competition, BCI competition III(2005’) this problem was
invalidated because only one team was participated to
when D is the diagonal matrix of
as P = UD
eigenvalues, and U is the eigenvector matrix of ΣT .
Then, Each whitened matrices KL, KR is factorized by
eigenvalue decomposition.
D −1/2U T ΣLUD −1/2 = ΚL = ΨΛLΨ T
D −1/2U T ΣRUD −1/2 = ΚR = ΨΛRΨ T
They shares common eigenvectors Ψ , and eigenvalues
ΛL, ΛR has the relation of ΛL + ΛR = I , when I is
the C × C identity matrix and the eigenvalues of
ΛL, ΛR are inversely ordered[3].
Then, final projection matrix W is represented to
W = UD −1/2 Ψ .
Column vectors of W is the direction of maximizing
difference for projected variance of XL and XR .
Geometrically, the discriminative features of CSP are
made available by finding directions, which maximize the
difference of projected variance between two EEG signals.
CSP computes these directions by simultaneous
diagonalization of two covariance matrices associated with
two populations of EEG signals.
For the EEG analysis, the raw EEG signal of single
Basically, CSP does not take temporal information into
account in the estimation of covariance matrices. Few
researches were tried to adopting local temporal structure
into covariance matrices by weighting adjacent time
samples.[4] In this study, we want to design temporal
weighting vector by calculating adaptability of each time
slice about covariance matrix.
CSP analysis is based on correlation structure of data set.
When data set has strong correlation structure, sample
group’s projected covariance is increased, and
trial is a time course of xt = [ x1(t ), x 2(t ),L , xC(t )]
calculated as Σ = XX / tr( XX ) .
The goal of this study is to design spatial filter that can
classify two populations of EEG signal from left hand and
right hand movement. For that, denote ΣL as the
covariance matrix from left hand movement and ΣR as
the matrix from right hand movement.
For simultaneous diagnolization, ΣL, ΣR will be
whitened by the composite spatial covariance
matrix ΣT = ΣL + ΣR . The whitening matrix P is formed
t = 1,L , T ,with T the number of time slices, and
C the number of channels (i.e., recording electrodes).
Denote the C × T matrix Xˆ as Xˆ = [x1,L , xN ]
- 47 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
discriminative information.
classification result also tends to be enhanced. By contrast,
if data set from specific time point within various trials has
weak relation with covariance matrix, it means that time
point has less relation with target spatial pattern.
And this adaptability between specific time points and
spatial pattern can be measured by distance between
sample points and its projection to eigenvector associated
with largest eigenvalue of covariance matrix, as shown in
<Fig. 1>.
This eigenvector is the fittest vector that can represent
correlation of covariance matrix and data points of highly
related to this covariance matrix has low d value, which
is distance between eigenvector.
Figure 2. Projected variance by CSP
Figure 3. Topological representation of CSP
Following <Fig. 4> shows that inversed temporal
adaptability, 1/ wt from each group. For better
representation inversed value was used for plotting.
Horizontal axis means adaptability for 0~3 sec. of time
range from stimuli.
Figure 1. distance between datapoint and eigenvector
X has covariance matrix
Σ = XX , and Σ can be decomposed by Σ = V ΦV T ,
first eigenvector of V associated with the largest
eigenvalue of Φ can be represented to v . Then distance
between eigenvector and data point d can be represented
When total data matrix
Figure 4. Inversed adaptability
d = x − v x .
Selected 1~2 sec. range shows lower distance, and 0~1,
2~3 sec. range shows higher distance. So, this measure
could be used for weighting temporal structure.
And temporal adaptability wt of each time slice t can
be measured by
wt = ∑ xnt − v T xnt
This measure will be adopted for predicting stimuli
points and solving uncued classifier problem.
n =1
Data set is from “BCI competition 2007”-data set I
<motor imagery, uncued classifier application>[5]. This
data set is consisted of 1905.94 seconds of continuous EEG
signal, which is recorded by 100Hz and 59 electrode
channels. And marker data is also included for denoting
time point of stimuli. Most of stimuli were repeated every
8 seconds, and subject imagine movement of left or right
hand as ordered by stimuli.
As a preprocessing, signal is band pass filtered by
8~16Hz, and signal of 1s~2s after stimuli was selected as
brain activity signal by heuristic selection.
<Fig. 2> shows projected variance of left and right
covariance by most discriminative column vector pairs of
common spatial pattern of W . Red circle is the projection
result of left trials and blue ‘x’ mark is the result of right
trials. Each projected points are classified by LDA, and
showed 82%(164/200) of correct classification result. And
<Fig. 3> shows topological representation of the used
vector pairs of W . Heavily colored region has more
[1] Z. J. Koles, M. S. Lazar, and S. Z. Zhou, “Spatial patterns
underlying population differences in the background EEG,” Brain
Topogr., vol. 2, pp.275-284, 1990.
[2] Benjamin Blankertz, Klaus-Robert Müller, Dean
Krusienski, Gerwin Schalk, Jonathan R. Wolpaw, Alois
Schlögl, Gert Pfurtscheller, José del R. Millán, Michael
Schröder, and Niels Birbaumer. The BCI competition III:
Validating alternative approachs to actual BCI problems. IEEE
Trans Neural Sys Rehab Eng, 14(2):153-159, 2006.
[3] K. Fukunaga, “Introduction to Statistical Pattern Recognition”,
2nd ed. Boston, MA: Academic, 1990.
[4] Haixian Wang and Wenming Zheng, Local temporal common
spatial patterns for robust single-trial EEG classification, IEEE
Trans. Neural Systems and Rehabilitation Engineering, Vol. 16,
No. 2, April 2008.
[5] Benjamin Blankertz, Guido Dornhege, Matthias Krauledat,
Klaus-Robert Müller, and Gabriel Curio. The non-invasive
Berlin Brain-Computer Interface: Fast acquisition of effective
37(2):539-550, 2007.
- 48 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Commute Time Distance On Isomap For Pose Estimation
Yoonseop Kang, Seungjin Choi
Department of Computer Science, POSTECH, Korea
{e0en, [email protected]
In this paper we present a new application of isomap
using commute time distance instead of geodesic
distance. We used isomap using commute time distance
for estimating the pose from the pre-processed video
sequence using HUMANEVA -I data set. The same
experiment was also done using the original isomap
algorith m for co mparison. Iso map using commute time
distance showed more accurate result than the original
isomap algorith m.
Isomap [1] is a widely used dimensionality reduction
method. Isomap finds the embedding which best represents
the geodesic distances between data points. Embedding
with isomap is done in three steps:
1. Construct a neighborhood graph. ε-neighbor or
k -nearest neighbor can be used. The graph will be
represented as a sparse n  n matrix.
2. Compute the shortest distance matrix D .
Dijkstra’s algorithm or Floyd’s algorithm can be
used to compute the shortest distances between
nodes on the neighborhood graph.
3. Perform the classical multidimensional scaling
(MDS) using the calculated shortest distance. This
is done by calculating the matrix M  HDH ,
where H is a centering matrix and calculating the
largest eigenvectors of M.
However, isomap fails when the data is noisy or when
the data contains outliers. This is because isomap only
considers the shortest path, and the shortest path on a graph
can be dramatically change by a slight change on the graph.
Such weakness of isomap can be managed by using
commute time distance instead of the shortest distance on
the graph.
figure 1. the extracted silhouettes of the first
subject from the randoml y chosen frames of the
vi deo sequence of HUMANEVA-I data set
where ei is a n-dimensional vector filled with 0 except
1 on the i th element.
We can use commute t ime distance for isomap.
Isomap using commute time distance is also called a
commute time embedding [2]. Performing classical MDS
on the matrix D is equivalent to finding largest
eigenvectors of L , which is also smallest eigenvectors of
Isomap with commute time distance has several known
properties. The method has a tendency of clustering nodes
[2]. As commute time distance takes account of every
possible path between two nodes, isomap using commute
time distance shows a robust performance on the data with
noise or outliers.
Here we seek for an application of isomap using
commute time distance – pose estimation.
We estimated the position of the joints from the video
sequence. HUMANEVA-I data set [3] was used for our
experiment. The data set contained the video sequences of
four people performing various actions while attaching
several markers for motion capture devices . The same
actions were recorded by seven cameras, three of them
were in color, and the other four of them were black and
white camera. The data set also contains the motion
capture data for some of the video sequences. We extracted
the silhouettes from the frames of the video sequence
which contains the first subject walking, taken by the
second color camera and resized them into 100 by 75
pixels (Fig. 1).
Commute time between two nodes xi and x j on a
graph is defined as the average number of steps required
for a random walker to start from u , visit v , and return
to u . Calculating the commute time distance between xi
and x j on the graph can be done by using pseudoinverse
of the graph laplacian L of the graph:
Dij  d ( xi , x j )  (ei  e j )T L (ei  e j ) ,
- 49 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Using Commute Times”, IEEE Transactios on Pattern Analysis
and Machine Intelligence, pp. 1873-1890, 2007
[3] L. Sigal and M. J. Black. “HumanEva: Synchronized Video
and M otion Capture Dataset for Evaluation of Articulated Human
Motion”, Technical Report CS-06-28, 2006.
figure 2. The embeddi ng di mension (horizontal
axis) vs. the mean s quared error of pose estimation
(vertical axis) by the original isomap (dashed line)
and isomap wi th commute time distance (soli d
Our task was to estimate the position of the markers using
the original isomap and isomap using commute time
distance. For both algorithms 4-nearest neighbor graph was
constructed. Total 200 frames were randomly chosen for
the experiment. Among these frames, 150 of them were
used as a training set, and the rest 50 of them were used as
a test set. The embedding dimension was varied from 2 to
150. The embedding was done using both training and test
Estimating the position of markers of a test sample frame
was done by taking average of marker positions of 8
nearest neighbors of the embedded coordinate of the test
sample frame. The estimated marker positions were
compared to the actual marker positions from
HUMANEVA-I data set and the difference between those
two values were recorded.
Though the differences between the error rates of two
methods used were varying, isomap using commute time
distance showed significantly lower error rate than the
original isomap (Fig. 2). It seems due to the fact that the
automatically extracted silhouettes from a video sequence
contained a remarkable noise. The background was not
removed perfectly, and the silhouettes of the two frames
with the same pose can be different. Using commute time
distance would have removed the effect of such kind of
4. Conclusions
We have applied isomap using commute time distance on
the problem of estimating the person’s pose from the video
sequence and compared it with the original isomap which
uses geodesic distance. We found that using commute time
distance is helpful for solving this problem, and we also
expect that the isomap with using commute time distance
can be helpful on the other problems dealing with video
sequence data.
[1] J.B. Tenenbaum, V. de Silva, and J. C. Langford. “A global
geometric framework for nonlinear dimensionality reduction”,
Science, 290(5500) , pp. 2319-2323, 2000.
[2] Huaijun Qiu, Hancock. E. R. “Clustering and Embedding
- 50 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Network Decomposition:
A New Framework for Analyzing Dynamic Microarray Data
Yongsoo Kim, Daehee Hwang, and Seungjin Choi
Department of Computer Science, POSTECH, Korea
{anoyaro, dhhwang, [email protected]
We present a new network-based analysis method of
interaction(PPI) data is weighted using DNA microarray
data with various conditions. We applied orthogonal
non-negative matrix factorization (ONMF-A), NMF
with orthogonality constraint on basis matrix, on this
matrix. Factorization gives sub-network clusters and
activation patterns of each cluster, which is very
suitable to interpret.
Non-negative matrix factorization (NMF)[1] is a group
of algorithms in multivariate analysis and linear algebra
where a matrix, X , is factorized into (usually) two
matrices, A and S .
X = AS
NMF is successfully applied for analyzing microarray
data, especially for clustering[2]. NMF showed robust
clustering result then other method, such as hierarchical
clustering and self-organizing map. Lately, non-smooth
NMF is successfully applied to bicluster(simultaneous
clustering of rows and columns) the microarray data[3].
One drawback of NMF is non-negativity constraint.
Although the microarray data is by nature non-negative,
fold change values, difference of intensity value
between different condition data, are not. NMF cannot
be directly applied to this problem. We suggest a
framework to construct a weighted global PPI matrix
and decompose using one variant of NMF, orthogonal
NMF, which result in sub-networks of PPI data and its
activation patterns that are very easy to interpret.
The interactions between proteins are important for
many biological functions. For example, signals from
the exterior of a cell are mediated to the inside of that
cell by protein-protein interactions of the signaling
molecules. This process, called signal transduction,
plays a fundamental role in many biological processes
and in many diseases (e.g. cancer). Since its importance,
protein-protein interaction (PPI) data is frequently used
data in the Systems biology, a relatively new biological
study field that focuses on the systematic study of
complex interactions in biological systems.
A DNA microarray is a high-throughput technology
used in molecular biology and in medicine. It consists of
an arrayed series of thousands of microscopic spots of
DNA oligonucleotides, called features, each containing
picomoles of a specific DNA sequence. Using DNA
microarray we can simultaneously measure gene
expression levels of many genes. DNA microarray does
not give precise amount of individual genes but gives
high-throughput measures of bunch of genes which is
suitable for systems biology approach.
However, since its nature of high dimensionality,
there are several obstacles to analyze DNA microarray
data. Furthermore, as the importance of DNA
microarray is emerged, many biologists use many chips
for a single system to observe dynamic changes in the
biological system(e.g. time-course microarray data).
There is a necessity to build up a fine frame work to
analyze these data.
Global PPI data is obtained from two source, NCBI and
KEGG pathway database. These data sets tell the protein
pairs that are physically interactable. We assumed that the
the chance of interaction is depend on the amount of
interactants. We used the fold-change values to weight the
global PPI data. For the i’th interaction, we give the weight
as follows.
I i = ∑ (σ (log 2 (Tij ) − log 2 (Cij )) − 0.5) (2)
j =1
( )
is logistic function and log 2 Tij − log 2 (Cij )
is fold-change value of j’th gene in i’th interaction where n is
number of interactant(actually 2). More precisely, Tij is
intensity value of j’th gene in i’th interaction at the treatment
sample, where Cij
is that of control sample. This
weighting function ranges -1 to 1 according to the value of
fold change value. This will weight close to 1 if the both
- 51 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
dependent(Factor 1,2).
interactant are highly overexpressed and -1 in opposite case.
We can get this interaction vector I for every
control-treatment pairs of DNA microarray. So we can get n
by m matrix H where n is number of interactions and m is
number of control-treatment sample pairs. However, H
may have negative values. We resolve this problem by
separating negative values and positive values. That is, let
there exist an non-negative matrix A and B that
H = A + B (3)
We finally construct non-negative matrix by concatenating
these two matrices. That is, final matrix X is given as
X = [ A, B] (4)
Now the number of columns is doubled and X
Fig 1. Network activation patterns(matrix S in X =
Applying NMF on constructed matrix X , we get two
matrix A and S . Using A , we can get clusters of
interaction sets and S is the activation patterns of each
cluster. For interpretation, we want to make the clusters as
disjoint as possible(Not strictly disjoint, because it is not
natural for biological problem). Orthogonality constraint on
A can answer this requirement. [4] suggested a way to
impose orthogonality constraint using true gradient in Stiefel
manifold which is a parameter space with the orthogonality
Fig 2 Sample sub-network, from basis 1 in matrix A
constraint A A = I . The multiplicative updates rule for
S ←So
We have presented a new framework to perform
network-base analysis on dynamic microarray data using
ONMF. This framework successfully summarized large
microarray data sets into several sub-networks with
providing activation patterns of each network.
A ← Ao
[1] Daniel D. Lee and H. Sebastian Seung (1999).
"Learning the parts of objects by non-negative matrix
factorization". Nature 401 (6755): 788–791.
[2] Brunet, J.-P., Tamayo, P., Golub, T.R., Mesirov, J.P.
(2004). “Metagenes and molecular pattern discovery
using matrix factorization”. Proceedings of the National
Academy of Sciences of the United States of America
101 (12), pp. 4164-4169
[3] Carmona-Saez, P., Pascual-Marqui, R.D., Tirado, F.,
“Biclustering of gene expression data by non-smooth
non-negative matrix factorization”. BMC Bioinformatics
7, art. no. 78
[4] Seungjin Choi (2007),"Algorithms for orthogonal
nonnegative matrix factorization," in Proceedings of the
International Joint Conference on Neural Networks
( IJCNN-2008 ), Hong Kong, June 1-6, 2008
Note that the update rule of S is same as standard NMF.
We used microarray data from gene expression
omnibus(GEO). We used microarray data set which
analyzes MCF-7 breast cancer cells treated with various
doses of epidermal growth factor (EGF) or heregulin
(HRG) for up to 90 minutes(Accession: GDS252).
There are 4 different doses and 7 time points each(total
28 samples).
Fig1 shows the result of ONMF(matrix S). Right half
columns are down-regulated and other half is
up-regulated(A and B from (3)). ONMF separated
sub-network clusters according to 3 patterns, constantly
active(Factor 5,6), time-course(Factor 3,4) and dose
- 52 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
The POSTECH Subtle Facial Expression Database 2007 (SFED07)
Sungsoo Park, Hyung-Soo Lee, Jongju Shin, Daijin Kim
Dept. of Computer Science and Engineering, POSTECH, Korea
{sungs, sooz, jjshin, [email protected]
subject in the database. In this figure, four image groups
represent the different expressions.
We constructed POSTECH Subtle Facial Expression
Database 2007 (SEFD07). SEFD07 contains the
true-color face images of 20 people representing 4
various images (4 expression variations such as neutral,
subtle smile, subtle surprised, subtle angry) per person.
All of the people in the database are Korean. We also
present the result of subtle facial expression recognition
experiment using Support Vector Machines (SVMs) in
order to provide an example evaluation protocol on the
database. The database is expected to be used to
evaluate the algorithm of subtle facial expression
Figure 1. Some images with
expressions in the SEFD07
1. Introduction
Four kinds of facial expressions such as neutral, subtle
smile, subtle surprised, and subtle angry are considered.
The data could be used for evaluating the robustness of
subtle facial expression recognition algorithms.
Recently, facial expression analysis has been attracted a
plenty of interest in the area of computer vision because
it plays an important role in the human-machine
communication [1]. Most of existing facial expression
analysis methods attempt to recognize the extreme
facial expressions. However, in the real situation, people
do not always express he facial expression extremely.
To tackle this problem, several researchers have been
tried to recognize the subtle facial expressions [2, 3],
but there are not yet appropriate database to evaluate
recognition performance of subtle facial expression. To
contribute evaluation of subtle facial expression
recognition, we have built a new database called
POSTECH Subtle Facial Expression Database 2007
2. Overview of POSTECH Subtle
expression Database 2007 (SEFD07)
3. Capturing Environment
We describe the capturing environment. For obtaining
images of a subject, we used CNB-AN202L CCD
cameras capturing images. We asked the subject to sit in
the chair and look at the frontal camera with one of four
expressions such as neutral, subtle smile, subtle
surprised, subtle angry. Then we captured images form
camera. For one expression, we were able to obtain
sequence images, and we choose before expressed
images and well expressed images for each subtle
expression. Consequently, we obtained 4 kinds of
expressions x 2 kinds of images (before expressed
image and well expressed image) = 8 images for a
SEFD07 includes 160 images because there are 20
subjects with 8 images per each subject. Each image is
recorded in BMP format. The image is of size 640 by
480 and of true color. Figure 1 shows the image set of a
4. Performance Evaluation
In order to show the difficulty of the database for subtle
- 53 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
facial expression recognition algorithms and provide an
example evaluation protocol on the database. To
recognize the facial expression, we extracted facial
expression features by applying the Active Appearance
Model (AAM) [4,5] fitting on the SEFD07 images as
shown in figure 2.
expression classifier.
Table 1 : Recognition rates with SVM based
facial expression recognition
5. Conclusions
Figure 2. The mean shape of AAM and fitting
result of the AAM
In this paper, we described the SFED07 database, which
contains 8 images of 20 subjects where there are 4 kinds
of subtle expressions. We also evaluated SVM based
facial expression recognition algorithm on this database.
We expect the database to be used to evaluate the
algorithm of subtle facial expression recognition. The
database is now available for research purpose on our
web site (
The AAM is a suitable method to extract the important
facial shape feature by fitting the model on the facial
expression images. To classify the extracted facial
expression features, we take the Support Vector
Machines (SVMs) based classifier [6], as shown in
figure 3.
Figure 3. The facial expression
based on the SVM classifiers
[1] B. Fasel and J. Luettin, Automatic Facial Expression
Analysis: A Survey, Pattern Recognition, vol. 36, no. 1, pp.
259-275, 2003.
[2] M. Song, H. Wang, J. Bu, C. Chen, and Z. Liu, Subtle
Facial Expression Modeling with Vector Field Decomposition,
Proceedings of IEEE International Conference on Image
Processing, pp. 2102-2104, 2006.
[3] J. R. Cowell and A. Ayesh, Extracting Subtle Facial
Expression for Emotional Analysis, Proceedings of IEEE
International Conference on Systems Man and Cybernetics,
vol. 1, pp. 677-681, 2004.
[4] G. Edwards, C. Taylor, and T. Cootes, Active Appearance
Models, IEEE Transactions on Pattern Analysis and Machine
Intelligence, vol. 23, no. 6, pp. 681-685, 2001.
[5] I. Matthews and S. Baker, Active Appearance Models
Revisited, International Journal of Computer Vision, vol. 60,
no. 2, pp. 135-164, 2004.
[6] V. N. Vapnik, Statistical Learning Theory, Wiley
-Interscience Press, 1998.
It consists of 4 facial expression SVM classifiers which
are the neutral, smile, surprise and angry classifiers and
determined the facial expressions of the input facial
feature as the label of the classifier whose output value
is the maximal.
5. Evaluation Results
For evaluating the subtle facial expression recognition
performance on the SEFD07, we used the half of the
subjects as the training set, and the remaining subjects
as the test set for the 2-fold cross validation. Table 1.
represents the confusion matrix of the facial expression
recognition when we take the SVM as the facial
- 54 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Yasunori Takemura and Kazuo Ishii
Department of Brain Science and Engineering, KYUTECH
[email protected], ishii@}
RoboCup is an international joint project to
promote Artificial Intelligent (AI), robotics, and
related fields. It is an attempt to foster AI and
Intelligent robotics research by providing standard
problems where a wide variety of technologies can be
integrated and examined. In RoboCup, soccer game is
selected as a main topic of research, aiming at
innovations to be applied for socially significant
problems and industries in the future [1].
“Hibikino-Musashi” is a joint RoboCup middlesize league (MSL) soccer team [2,3]. Members of the
team are from three different research and educational
organizations located in the Kitakyushu Science and
Research Park, Kitakyushu, Japan. “Musashi” robot
is developed based on the concept of the “omnidirectional” and “modularity” concept[4].
In this paper, we present the color recognition
algorithm based on color constancy algorithm of
using Self-Organizing Map (SOM) for our robot
vision system. SOM is an unsupervised learning
transformation from higher-dimensional vector data
spaces to low map spaces. The SOM has become a
powerful tool in many areas such as data mining, data
analysis, data classification and data visualization
[5,6,7]. Our robot vision system is based on YUV and
HSV color map spaces. Both color maps have
different vector spaces, then, some objects are
recognized by using threshold to use logical addition
to YUV and HSV thresholds [8]. To realize a robust
robot vision system against light changing
environment, in our new robot vision algorithm,
recognition of the light color environment and the
threshold parameters are, both, estimated by using
Conventional vision system of the “Musashi” robot
consists of omni-directional mirror and IEEE 1394
digital camera. The robots recently have strong
kicking devices that kick high loop-shoots, so the
camera is protected with a plastic cover. An acryl
tube is used for this purpose.
Data transmission from camera to laptop PC is in
YUV format. We convert YUV format into HSV
format. The upper and lower thresholds for YUV and
- 55 -
HSV formats are used to extract the colors, which are
then used in logical conjunction of both formats. This
method improves the color extraction robustness and
accuracy in variable lighting condition. Presently, we
set the thresholds manually[8].
Basic Input data and Output data
In this paper, SOM is used for the recognition to
the light environment condition. The basic four kinds
of colors are prepared around the camera.
is (Yg , U g , V g , Yr , U r , Vr , Yb , U b , Vb , Yw , U w , Vw ) . Y, U
and V mean value of YUV 8 bit color map. This input
data vector is normalized -1.0 to 1.0 vectors. Then, x
is inputted the SOM.
Output data needs the three kinds of the color
thresholds: There are orange (ball), white (line) and
green (fields), because the RoboCup Middle Size
League (MSL) changes the rule which changes color
goal (blue and yellow) to no color goal in this year.
Orange is recognized with threshold Umin, Umax ,Hmin
= 0.0 (constant value) and Hmax. Green is recognized
with Ymin = 0.0 (constant value), Ymax ,Umin= 0.0
(constant value) and Umax. White is recognized with
Vmin and Vmax = 255(constant value) H, S and V mean
Hue, Saturation and Value (8 bit).
( H max
, U min
, U max
, Ymax
, U max
, Vmin
) . Upper index
means o (orange), g (green) and w (white). All
parameters are normalized -1.0 to 1.0 vector.
B Color constancy algorithm using SOM
In a Self-organizing map, the neurons are placed at
the nodes of a lattice that is usually two-dimensional.
Higher-dimensional maps are also possible but not as
common. The neurons become selectively tuned to
various input patterns (stimuli) or classes of input
patterns in the course of a competitive learning
process. The locations of the neurons so tuned (i.e.,
the winning neurons) become ordered with respect to
each other in such a way that a meaningful coordinate
system for different input features is created over the
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
lattice[9]. A self-organizing map is therefore
characterized by the formation of a topographic map
of the input patterns in which the spatial locations
(i.e., coordinates) of the neurons in the lattice are
indicative of intrinsic statistical features contained in
the input patterns, hence the name “self-organizing
map”. SOM algorithm has the 4 processes: there are
Cooperative process and Adaptive process. In the
following, we describe the each process.
Table I shows the variable used explanation of
the color constancy algorithm of using batch type
SOM. The learning data is defined by eq.(0).
θ i = [xi y i ]
(a) Evaluative Process
In evaluative process, all class learning data is
calculated the distance between each reference vector.
E = w − xi
(b) Competitive Process
In competitive process, to find the best match of
the input vector x with the reference vector w, the
best matching unit (k*) is defined that the minimum
distance of Eik (eq. (2)).
k i* = arg k min E ik
(c) Cooperative process
A typical choice of φ ik that satisfies these
requirements is the Gaussian function (eq.(3)).
* 2
φ ik = exp⎜ d (k , k i )
The parameter σ is the “effective width” of
the topological neighborhood. It is called
neighbor radius. Its use also makes the SOM
algorithm converge more quickly than a
rectangular topological neighborhood. Another
unique feature of the SOM algorithm is that the
size of the topological neighborhood shrinks
with time. A popular choice for the dependence
of σ on discrete time t is the exponential decay
described by
σ =σ
+ (σ
) exp − t
∑ φ i'
Best Matching Unit (BMU)
Neighbor Function
Euclidean distance between a and b
Learning late
Neighbor radius
Time constant
In this color constancy algorithm, learning data q
is described by eq.(0). So, unit vector has also
input vectors and output vectors. In other words,
reference vectors have also same dimensions of
input and output vectors. In this time, we use the
batch SOM, then the reference vectors w are
updated by eq.(7).
w k ( t + 1) = ∑ ψ ik θ i
In this papere, we described the new color
recognition algorithm of color constancy by
using SOM. The experiment of this algorithm
will be introduced in the POSTECHKYUTECH joint workshop.
( τ)
summation of the φ ik (eq.(5))
Distance between input vector and reference vector
In the batch SOM, each unit learning rate is
defined by ψ ik which is normalized by
Learning data
Input vector
Output vector
Index expressing classes (i = 1, .... , I)
Reference vector
Index expressing unit (k = 1, ... , K)
(d) Adaptive process
In the Kohonen’s SOM, in adaptive process, all
unit vectors are adjusted by using the update
formula eq.(6)
w k ( t + 1 ) = w k ( t ) + ψ ik ( x i − w k )
- 56 -
RoboCup homepage,
H.Matsubara, M.Asada, et. al., The RoboCup:The Robot
World Cup Initiative, IN PROCEEDINGS OF IJCAI-95
Y.Takemura, A.A.F. Nassiraei, et. al., “HIBIKINOMUSASHI”, RoboCup 2008 Suzhou, CD-ROM Proc.
A.A.F. Nassiraei, Y.Takemura, et. al., “Concept of
Mechatroncis Modular Design for an Autonomous Mobile
Soccer Robot”, CIRA2007, Jacksonville, pp.178-183
T.Kohonen, “Self-organized formation of topologically
correct feature maps”, Biol. Cybernetics, vol.43, pp.59-69,
Saalbach G.Heidemann and H.Ritter, “Parametrized SOMs
for Object Recognition and Pose Estimation”, Proc. Of
ICANN 2002, pp.902-907, 2002
Barreto, G.A., Araujo, et. al, “A distribution of a simulator
of a simulator for robots and its application to position
estimation by self-organization”, Technical Report of IEICE,
vol.100, No.687, NC2000-147,pp.149-156
K.Azeura, I.Godler, “Color Sampling with Omni-Directional
Camera in Soccer Robot”, Proc. Of RSJ 2006, Okayama,
1B25.pdf, 2006, In Japanese
T.Kohonen, “The Self-organizing map,” Proceedings of the
Institute of Electrical and Electronics Engineers, vol.78.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
A Pixel-parallel Anisotropic Diffusion Algorithm
for Subjective Contour Generation
Youngjae Kim and Takashi Morie
Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology
[email protected]
Introduction: In natural-scene images obtained by image sensors, the edge information is often lacked due to
the illumination condition and poor contrast with the background. Even in such cases, the human can recognize an
object by generating imaginary contours, which is known
as subjective contours. A typical figure that induces subjective contours is Kanizsa figures as shown in Fig. 1, where
the human can perceive a white triangle or square at the
center region [1].
Regarding subjective contour generation, physiological,
psychophysical, and computational models have been proposed. Some physiological experiments revealed that neurons in the primary visual cortex (V1 or V2) respond to
subjective contours as well as to real contours [2].
On the other hand, in some computational models, the
physically-existing points are combined with a spline function to realize a subjective contour [3]. These models are
useful but it is difficult to apply them to VLSI implementation because of their complexity. Thus we use a model
that makes subjective contours grow from the physicallyexisting points.
In order to implement such a model, we have proposed a
pixel-parallel directional (anisotropic)-diffusion algorithm
(DDA) to generate subjective contours, which is suitable
for VLSI implementation because it is based on the framework of cellular neural networks (CNNs) and cellularautomata (CA) with nearest-neighbor connections [4]. Furthermore, we have proposed a pixel-parallel anisotropic
diffusion circuit based on a merged analog-digital circuit
approach for subjective contour generation [5].
In this paper, we describe a pixel-parallel anisotropic diffusion algorithm and show numerical simulation results for
subjective contour generation. Our algorithm consists of
two parts: a directional (anisotropic)-diffusion algorithm
(DDA) and a ridge-line detection algorithm (RDA). The
DDA generates analog ridge-like functions of the diffusion state by propagating the internal-state of PU’s in the
given direction. The RDA generates binary line-segments
by tracing the ridge lines generated by DDA.
V (i, j;t) = V1 (i, j;t) +V2 (i, j;t),
Vk (i, j;t) = Vkx (i, j;t) +Vky (i, j;t),
where V (i, j;t) represents the internal state of a PU at coordinates (i, j) at time t; Vk (i, j;t), (k = 1, 2) represent the
two diffusion states propagating in the opposite directions;
Vkx (i, j;t) and Vky (i, j;t) are their x- and y- directional components.
As an initial state, U(i, j; 0) is set to ’1’ at the pixel position of the diffusion source, and is set to ’0’ at the other
pixels. The updating rule is as follows. The number of
updating iterations is determined by the diffusion distance.
Vkx (i, j;t + 1)
= αVk (i + ak , j + bk ;t) +U(i + ak , j + bk ; 0),
Vky (i, j;t + 1)
= βVk (i + ck , j + dk ;t) +U(i + ck , j + dk ; 0),
for k = 1, 2,
where α and β are non-negative constants which determine the diffusion directions, and are given by the following equations:
| tan θ| = β/α for θ , 90 deg,
α = 0 for θ = 90 deg,
α+β =
α = p/(1 + | tan θ|),
where, p is a constant, which affects the amplitude of
diffusion states but hardly affects the diffusion direction.
Therefore, p is set at unity, here. The indices are determined by the diffusion orientation as shown in Table 1.
Fig. 2 shows the relation between the variables and a
numerical simulation result (for θ=60 deg).
Fig. 1: Kanizsa figures: triangle(a) and square(b).
Directional-diffusion algorithm(DDA): The DDA is a
CNN-based algorithm that propagates the PU’s analog internal states in a given direction from a given diffusion
source. The proposed algorithm is given as follows:
Fig. 2: (a) Relation between the variables and (b)
Numerical simulation result
- 57 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Table 1: Relation between diffusion direction θ
and indices (ak , bk , ck , dk ) for DDA.
θ (deg)
0 ≤ θ < 90
90 ≤ θ < 180
θ (deg)
0 < θ ≤ 90
90 < θ < 180
orientation, directional diffusion is executed by using the
DDA, and ridge-lines are generated by using the RDA. If
the ridge-lines generated from two confronting diffusion
sources are connected, a pair of ridge-lines is considered
as a subjective contour.
After the above processing is performed for various orientations, if a subjective region is formed by a set of the extracted subjective contours, perception of Kanizsa figures is
achieved. The subjective region can easily be extracted by
the ordinary labeling processing. Fig. 4 shows the simulation results of subjective contour generation. Using the
proposed algorithm, the Kanizsa triangle and square have
successfully been extracted.
Table 2: Relation between diffusion orientation θ
and indices (mk ,nk ) for RDA
0 < θ < 90
90 < θ < 180
Ridge-line detection algorithm(RDA): The RDA is an
algorithm for obtaining binary line-segment information
U(i, j;t) along the diffusion directions from V (i, j) ≡
V (i, j;t) after executing the DDA (Fig. 2(b)). This is a CAbased algorithm, and is given as follows:
At pixel (i, j), if all the following conditions are satisfied, then U(i, j;t + 1) = 1.
U(i, j;t) = 0,
∀ (i , j ) ∈ N ,∃ U(i , j ;t) = 1,
V (i, j) > V (i + m1 , j + n1 ),
V (i, j) > V (i + m2 , j + n2 ),
where Ni j is a set of neighboring pixels of (i, j), and indexsets, (m1 , n1 ) and (m2 , n2 ) are given in Table 2. This updating is repeated for the given iterations. Fig. 3 shows an
example of the result of ridged-line detection.
Fig. 3: Ridge-line extraction result.
Generation of subjective contours: The algorithm described above has been applied to subjective contour generation in Kanizsa figures shown in Fig. 1.
In order to obtain the diffusion source positions from the
given image, orientation-selective Gabor-filtering is performed. Using the obtained diffusion source positions and
Fig. 4: Simulation results of subjective contour
generation: (i) input image (100×100 pixels), (ii) diffusion sources, (iii) diffusion
results by DDA, (iv) ridge-line detection
results, (v) subjective contour extraction,
(vi) subjective region extraction.
Conclusion: We proposed a pixel-parallel anisotropic
diffusion algorithm(DDA) for subjective contour generation. The DDA generates analog ridge-like functions from
the given diffusion source position in the given direction,
and the RDA traces the ridge lines. Therefore, even if the
pair of diffusion ridge lines from the confronting diffusion
sources do not meet strictly, the connected line can be detected by the RDA. The proposed algorithms can be implemented in pixel-parallel, and they use only the neighboring
pixel states, they are suitable for the VLSI implementation.
[1] G. Kanizsa, Scientific American, 234, 48–52, 1976.
[2] Tai Sing Lee, Neuron, 33(5), 667–668, 2002.
[3] H. Yasuda, et al., IEICE Trans., J73-D-II(6), 906–913,
[4] Y. Kim and T. Morie, ISCAS2005, 4237–4240, 2005.
[5] Y. Kim and T. Morie, J. Signal Processing, 10, 259–
262, 2006.
- 58 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Yuya Nishida and Masahiro Nagamatsu
Department of Brain Science and Engineering, Kyushu Institute of Technology
Email: [email protected]
Recently, control and analysis of an underactuated
manipulator (UAM) that has nonholonomic system attract
the attention of researcher. An UAM has passive joint
mounted no motor compared to each joint of generic
manipulator mounted motor. Overall weight saving and
energy saving are expected by control of manipulator that
has fewer actuator than its degree of freedom. In an UAM,
the torque is transmitted to passive joint by inertia
because there is no driving force in passive joint. Thus,
control of an UAM is difficult problem.
This research has studied control system that set state
of an UAM from random initial state to target state. As
control method that can be stopped movement of
manipulator when state of manipulator is target state, we
can think trajectory planning and reinforcement learning
[1][2]. Trajectory learning is method that search
trajectory along objective function under constraint of
stopping on target state. But trajectory planning has
problem that can not handle change of state. At the same
self-directive ability, and handles unexpected change of
state. But reinforcement learning needs for learning
optimal trajectory within a relatively long time to learn by
trial and error.
In this paper, we explain control method that combines
trajectory planning and reinforcement learning, and we
report trajectory planning in this control method.
We use actor-critic method as reinforcement learning.
Actor-critic method learns with updating value function
(critic) by interaction between agent (actor) and
environment. Actor is not able to learn optimal action
when learning of critic does not advance, because actor
that decides current action is evaluated and altered by
critic. Critic is not able to learn optimal value function
when actor never has gotten reward from environment,
because critic is updated by TD-error. This is difficult
that actor sets state of an UAM to target state and gets
reward from environment. Therefore, controlling attitude
of an UAM by actor-critic method need lot of time. And
so, we suggest the following control method of an UAM.
Fig.1 Image of this control method
1. We get optimal trajectory from random initial state
using trajectory planning.
2. We increase value of getting trajectory in value
function of actor- critic method.
3. We learn using actor-critic method with value
We promise that convergence rate of learning is
increased by giving the trajectory from trajectory
planning to initial value function as above.
We think about concatenated neural networks in time
series as shown fig.2. A neural network is that learned
forward model of an UAM. The neural network can
output next state vector st+1 in current state vector st and
current input torque τ1. We are able to get after-n-hours
state vector sn and the following by n of concatenated
neural networks.
T = {τ 0 , τ 1 , K , τ n −1 τ n }
S = {s0 , s1 , K , sn −1
sn }
Equation 1 is set of input torque T, and eq.2 is set of
state S. We can get set of state vector S by inputting set of
input torque T to concatenated neural networks. If final
state vector sn that n of concatenated neural networks
output is target state vector s*, we get set of input torque
T*={τ*0, τ*1, …, τ*n-1, τ*n} that can set final state vector sn
to target state vector s* and set of state vector S*={s*0, s*1,
…, s*n-1, s*n} between initial state vector s0 and target
state s*.
- 59 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Fig.2 Concatenated Neural Networks in time series
In this research, trajectory planning is to get set of state
vector S*, and set of state vector S* is trajectory that we
will give to value function. If we convert this problem
into optimization problem, this problem is converted into
optimization problem as following.
Constraint h = 0
Y axis
h = s* − sn −1
E τ , w LPPH = w LPPH hT
dw ∂E τ , w LPPH
− αw LPPH
∂τ t
X axis
Fig.3 Result of simulation
Above w
is weight in LPPH, and eq.5 is evaluation
function. We need set of input T that evaluation equation
minimize, if we want to get set of state vector S* between
initial state and target state. As this time, updating
equation in the evaluation equation is given as following.
∂E τ, w LPPH
= −η
Initial state
Equation 3 and 4 is constraint condition in optimization
problem. We apply LPPH (Lagrange Programming neural
net work with Polarized High-order connections) to this
optimization problem. LPPH is algorithm that is
developed by nagamatu’s laboratory in KIT [3].
Final state
We explain control method that combines trajectory
planning and reinforcement. And we suggested trajectory
planning in this control method. But we got proper
trajectory between initial state and target state.
In future work, we will readjust constraint or
evaluation equation, and will run a simulation control
This research was supported by Grant-in-Aid for 21th
century COE program.
In above equation, η is learning rate and α is cutting rate.
Cutting rate was used to do not infinitely increase weight
We plan a trajectory for ten seconds eq.2 and eq.3 as
updating equation in an UAM. Evaluation equation eq.5
was newly added term of total input torque. We hope
trajectory that minimize
.. total input torque.
As s = [θ1 θ2 θ1 θ2]T, We set initial state s0=[0 0 0 0 ]T
and target state s*=[π 0 0 0]T. Result of trajectory
planning is shown in fig.3. We got trajectory that state of
an UAM near to target state s*. But Final state included
steady-state error and velocity of final state was not zero.
Thus, we think that final state change after ten seconds.
[1] N.Shiroma, K.M.Lynch, H.Arai, and K.Tanie,
“Motion Planning for a Three-Axis Planar
Manipulator with a Passive Revolute joint”,
Transactions of the Japan Society of Mechanical
Engineers(C), Vol.66, No.642, pp.545-552
[2] J.Yoshimoto, S.Ishii, and M.Sato, “Application of
Reinforcement Learning Based on On-Line EM
Algorithm to Balancing of Acrobot”, The Institute of
Electrical Engineers of Japan, Vol. J83-D-II, No.3
[3] K.Zhang, and M.Nagamatu, “Solving Satisfiability
Problem by Parallel Execution of Neural Networks
with Biases”, Proceedings of the 15th International
Conference on Artificial Neural Networks,
- 60 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Daisuke Atuti, Kazuki Nakada, Takashi Morie
Department of Brain Science and Engineering, Kyushu Institute of Technology
E-mail: [email protected]
Nonlinear phenomena in ensembles of oscillatory elements in physical, chemical, and biological systems can be
modeled as coupled phase oscillators with periodic phase
variables and periodic nonlinear interaction [1]. For instance, the oscillatory neural network is a class of coupled
phase oscillators, in which each oscillator is regarded as
a spiking neuron in a periodic firing state. The dynamical
and statistical properties of the oscillatory neural networks
have been investigated in detail as a model of associative
memory [2, 3].
In the previous work, we have proposed a CMOS pulse
modulation circuit technique for coupled phase oscillators
with arbitrary nonlinear interactions, in which states and
signals are represented by pulse-width modulation (PWM)
and pulse-phase modulation (PPM) schemes [4]. In the
pulse-modulation circuit technique, analog signal processing can be achieved by binary pulses in the time domain,
and therefore we can construct an analog dynamical system compatible with the conventional digital systems.
In this study, we propose a phase oscillator circuit [5] using the pulse-modulation approach for implementing PLL neural networks proposed by Hoppensteadt
and Izhikevich [6]. We designed an element circuit and
confirmed fundamental operation of two coupled element
circuits through SPICE simulations.
The PLL neural network is an oscillatory neural network that can generate arbitrary phase-locked patterns of
output waveforms equivalent to embedded memories. In
specific case, the dynamics of the PLL neural network is
written as follows:
φ˙i = ω − sin(φi )
sij cos(φj )
By using the pulse-modulation approach, we designed
an element circuit for PLL neural networks. The block
diagram of the element circuit is shown in Fig. 1. The
circuit timing diagram is shown in Fig. 2.
In our circuit implementation, a continuous-time dynamics is discretized by using the Euler discretization
method. Phase variables φi,j are represented by PWM
signals. A PWM signal is transformed into a PPM signal by using a PWM-PPM converter (PPC). A PPM signal
switches a pair of current sources, which perform a positive or negative update according to the nonlinear waveform Vnon p or Vnon n. After an update operation φi,j
are stored in capacitor Cφ as terminal voltage Vφ for each
element circuit. Vφ is transformed into a PWM signal
again by using a voltage-pulse converter (VPC). By repeating this update operation, the discretized dynamics is
We simulated two coupled element circuits with
HSPICE using the TSMC 0.25 µm technology.
We set the coupling strength sij = 1 and -1, and
changed the angular frequency ω from 10 to 2.5 kHz. The
waveforms of voltages Vφi and Vφj are shown in Fig. 3.
The voltages were synchronized with in- and anti-phase
depending on the coupling strength sij .
We designed an element circuit for implementing PLL
neural networks using the pulse-modulation approach.
We verified the fundamental operation of two coupled
element circuits with HSPICE. The results demonstrated
that the circuits synchronized with in- and anti-phase depending on coupling strengths.
where φi denotes the phase variable of the i-th oscillator, ω the angular frequency of the oscillator, and sij
the coupling strength betwteen the PLLs. In order to realize multiplication terms for circuit implementation, we
rewrite (1) as follows:
φ˙i = ω +
X sij
sin(φj − φi ) −
sin(φj + φi ) (2)
- 61 -
[1] Y. Kuramoto, “Chemical oscillation, waves, and turbulence,” Springer, Berlin, 1984.
[2] T. Aoyagi and K. Kitano, “Retrieval dynamics in
oscillator neural networks,” Neural Computation,
vol. 10, pp. 1527–1546, 1998.
[3] T. Aonishi, K. Kurata, and M. Okada, “A statistical mechanics of an oscillator associative memory
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Figure 1: Circuit configuration of two-coupled element circuits for PLL neural networks.
Figure 2: Timing diagram of the two coupled element circuits.
with scattered natural frequencies,” Phys. Rev. Lett.,
vol. 82, pp. 2800–2803, 1999.
[4] D. Atuti, T. Morie, T. Nakano, and K. Nakada,
“An element circuit for nonlinear dynamical systems using phase variables,” Abst. of The 3rd
Int. Conf. on Brain-Inspired Information Technology
(BrainIT2006), p. 66, Sept. 2006.
[5] D. Atuti, K. Nakada, and T. Morie, “CMOS pulsemodulation circuit implementation of phase-locked
- 62 -
Figure 3: HSPICE circuit simulation results.
loop neural networks,” IEEE Int. Symp. on Circuits
and Systems (ISCAS2008), pp. 2174–2177, May 2008.
[6] F. C. Hoppensteadt and E. M. Izhikevich, “Pattern recognition via synchronization in phase-locked
loop neural networks,” IEEE Trans. Neural Networks,
no. 11, pp. 734–738, 2000.
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Underwater Sound Source Localization System using Particle Filter
Yu Eriguchi, Kazuo Ishii
Department of Brain Science and Engineering, LSSE, Kyushu Institute of Technology
[email protected], [email protected]
2) Particle Filter
Particle Filter (PF) [2]-[5] is introduced in USLS to
predict the direction from the series of measured data.
The system model and the observation model of PF are
set as follows,
xt = xt −1 + u t −1 + ν t , ν t ~ N ( μ s , σ s 2 )
As the tools for ocean survey and natural energy
resources search, underwater robots are expected to
operate these missions because underwater operations
are too danger for humans. One of the key issue for
underwater robots is the self-localization problem in the
ocean. Self-localization systems are required to navigate
underwater robots. There are various methods of selflocalization such as Dead Reckoning, GPS, and sound
source localization system. However, Dead Reckoning is
easy to accumulate errors, and GPS is unusable
underwater because of radio attenuations. Though
“Underwater Sound source Localization System”
(USLS) are often used to communicate in the ocean due
to propagate well, USLS have problems caused by
multipath. Accordingly, robust USLS are hoped for
underwater robots.
In this paper, we describe a basic method and design
of USLS, and results of experiments and simulations
using USLS introduced Particle Filter to predict locations
from the series of observed data.
y t = h ( xt ) + ε t , ε t ~ N ( μ o , σ o 2 )
where xt is a sound source direction at time t, yt is
observed value. In Eq.(3), ut-1 means the rotation angle of
the underwater robot, vt is the process noise which
supposes the normal random numbers with the variance
σs2. In Eq.(4), observation noise εt supposes the normal
distribution with the variance σo2, and the observation
function h(xt) are decided by results of the experimental
observation. Weights of particles are performed by P(εt).
Each particle corresponds to a target direction with
estimated direction ~xt (i ) and its weight wt(i) given by
wt (i ) = f (ε t (i ) ) =
1) Calculation of Sound Source Direction
We approach to obtain the sound source direction with
a pair of hydrophones which are obtained using Super
Short Base Line (SSBL) method [1] as shown in Fig.1.
The direction θ is given by (1). f is the frequency of the
sound wave, c is the velocity of the acoustic waves
underwater, d is the baseline length which is less than
between two hydrophones, and δt =δφ / 2πf is the timelag.
⎛ c⋅δ t ⎞
θ = cos −1 ⎜
⎟ , d ≤ λ / 2 (=
The Cross Correlation Function (CCF) is used for the
time-lag estimation according to Eq.(2). The delay time τ
is obtained so that R(k) becomes maximum value. V1 and
V2 are the acoustic voltage of each hydrophone.
R (k ) =
⎛ − (ε t (i ) − μ o ) 2 ⎞
2π σ o
2σ o 2
System architecture of USLS is illustrated in Fig.2.
Each hydrophone (Reson TC4013) is connected to a
small pressure tight hull which includes amplifier, filter,
and batteries. The hulls are connected to a large pressure
hull which includes signal processing circuit board using
the dsPIC. The signals which traveled through the
circuits are acquisitioned by fast 10 bit A/D converter
and detected by a threshold value. The target direction is
transmitted to PC via RS232 serial communication.
i =0
τ = arg max R(k )
The equation shows probability density function of
normal distribution, which follows μo, σo2. PF is
calculated recursively by method of Sampling
Importance Resampling (SIR) [2][3].
N −1
∑V (i)V (i + k )
Fig.2 System architecture of USLS
Fig. 1 Calculation of the sound source direction
- 63 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
1) Observations of Sound Source Direction
The developed hydrophone modules and a pinger were
set as shown in Fig.3. The experimental results of USLS
on each observed point is shown in Fig.4. In Fig.4,
observed directions of a pinger are plotted along the
ordinate, numbers of observations are plotted along the
abscissa. Each observed direction at Point2-4 was
coincident with each theory direction π/3, π/2, 2π/3[rad].
On the other hand, at Point1,5, the observed directions
are different from theory directions. The errors are
attributed to multipath from water surface, wall surface
and bottom face.
Table 1 Parameters for the PF simulations
Particle Number
System Model
Observation Model
In this paper, we discussed development of the basic
Underwater Sound source Localization System designed
for mounting in underwater robots. According to the
experimental and the simulation result, detecting ability
of the developed modules is within the region of ± π/3
[rad] from forward direction, and the robust estimation
systems for noise are technically feasible.
2)Estimating Simulation using Particle Filter
Figure 5 and Figure 6 show simulated results of
USLS using PF. Underwater sound source directions
were simulated by parameters as shown in Table 1, and
set as shown in Fig.3. Where, average values of
likelihood were defined by Eq. (6). Eq.(6) was derived
from histogram of direction errors between observed
direction and theory direction in Fig.4. Every simulation,
starting particles {~x0 (i ) }iN=1 were distributed between
[1] The Marine Acoustics Society of Japan,
"Fundamentals and Applications of Marine
Acoustics", 2004
[2] R.Karlsson, F.Gustafsson, ”Particle Filter for
Underwater Terrain Navigation”, Statistical Signal
Processing workshop 2003
[3] N.Gordon, D.Salmond,A.Smith, ”Novel approach to
nonlinear/non-Gaussian Bayesian state estimation”,
[4] N. Ikoma,”Tutorial: Particle Filters”, PF Study
Group and Allied HIF Council, Lecture Document,
KIT, Jan.26, 2007, Japan
[5] T.Maki, H.Kondo, T.Ura,T.Sakamaki, “Observation
of Breakwater Caissons by the AUV ”Tri-Dog 1”Dives by Autonomous Navigation based on Particle
Filter-“, Proc. of the 2005 JSME Conference on
Robotics and Mechatronics, Kobe, Japan, June 9-11
0[rad] and π[rad] by uniform distribution. In the Figs,
directions of a pinger are plotted along the ordinate,
number of observation is plotted along the abscissa.
In Fig.5 and Fig.6, the pinger was fixed on each
direction point, and particle data were appeared only in
Fig.5. As a result, the estimated values converged on
theory values using PF even there were noises such as
the multipath.
⎛ μ11
⎜ μ 21
⎜ 31
⎜ μ 41
⎝ μ 51
μ 22
μ 42
0.622 ⎞
μ13 ⎞ ⎛ 0 2.157
⎟ ⎜
− ⎟
μ 23 ⎟ ⎜ 0 0.456
− ⎟
μ 33 ⎟ = ⎜ 0
⎟ ⎜
− ⎟
μ 43 ⎟ ⎜ 0
⎟ ⎜
μ 53 ⎠ ⎝ 0 − 2.157 − 0.622 ⎠
N =50
T =10 or 16
u t-1 =0.0
σ s 2 =0.1
μ ij ∼Eq.(6)
σ o 2 =0.1
Simulation of sound source localization using PF / Point1= π/6[rad]
Fig.3 Experimental Arrangements of
a pinger and USLS
Number of the observation
Fig.5 Simulated result of USLS at Point1
Sound Source Observation / 5 Points
Simulation of sound source localization using PF / Point5= 5π/6[rad]
Number of the observation
Number of the observation
Fig.6 Simulated result of USLS at Point5
Fig.4 Sound source observation
- 64 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Tomokazu HIRATSUKA∗ , Keiichi HORIO† and Takeshi YAMAKAWA‡
Department of Brain Science and Engineering, Kyushu Institute of Technology
E-mail: ∗ [email protected],
[email protected], ‡ [email protected]
I. I
Various oral mucosal diseases, including carcinoma (cancer),
often appear inside the mouth. Squamous cell carcinoma which
account for 2% of the cancer population neighbors various
maxillofacial organs. Thus, the functional and aesthetic significant disabilities often occur at these organs.
One of the features of squamous cell carcinoma is that the
mucosa surface has concavo-convex shape as shown in Fig.1.
Therefore, affected parts are diagnosed by a measurement of
the mucosa surface roughness for determining whether the affected part is squamous cell carcinoma or not.
In this study, we present a new imaging device with circulating light sources (CLS) for surface roughness measurement. The features of surface roughness are extracted by a
Wavelet Multi-Resolution Analysis (MRA) from the images
produced by the proposed device, and are categorized by a SelfOrganizing Map (SOM). The roughness of unknown surfaces
can be estimated by the SOM after categorizing.
Fig. 1. Squamous cell carcinoma case.
3. Deciding a pixel value of the foreign patterns image (Fi, j )
is as follows:
0 S i, j = 0
oreign pattern)
Fi, j = 
 0 others
II. C L S
The structure of the circulating light sources (CLS) is shown
in Fig.2. Multiple light sources which aligned in a circle illuminate on object surface sequentially, and produce shadow
images along concavo-convex shapes as shown in Fig.3. The
shadow images produced by CLS contain concavo-convex information. If the shadow area is large, the degree of surface
roughness is large. On the other hand, if the shadow area is
small, the degree of surface roughness is small. The features
of surface roughness are extracted from the shadow images including concavo-convex information.
III. F P E  E
Blood vessel patterns also would sometimes appear in the
oral images produced by CLS. The blood vessel patterns (foreign patterns) will affect the surface roughness measurement
based on the shadow images (original texture) produced by
CLS. Thus, the foreign patterns should be eliminated from the
original texture.
Figure 4 shows the framework of foreign patterns extraction.
And procedure of extracting the foreign patterns area is as follows:
Finally, the brightness of the foreign pattern area is corrected,
and eliminated. Figure 5 shows an example of the foreign patterns extraction.
IV. F E  D C
The Wavelet Multi-Resolution Analysis (MRA) [1] resolves
original signal to a high-frequency component and a lowfrequency component. Then the resolved low-frequency component is also resolved into the high-frequency component and
the low-frequency component by the MRA. The features of surface roughness are extracted by the MRA as wavelet coefficients. Figure 6 shows an example of the MRA.
The Self-Organizing Map (SOM) [2] is biologically motivated by the ability of the brain to map outside world onto the
cortex, where nearby stimuli are coded on nearby cortical areas. The SOM algorithm is a simplified model of such a mapping process. The structure of the SOM is shown in Fig.7. The
wavelet coefficients of various surface roughness extracted by
the MRA are categorized by the SOM. The roughness of unknown surfaces can be estimated by the SOM after categorizing.
1. Binarizing of eight shadow images (‘1’: black, ‘0’: white).
2. Focusing on a pixel (Pni, j ), and summing pixel value of
eight images (S i, j ). Where i and j are pixel location.
S i, j =
This work was partially supported by a 21st COE program
granted to Kyushu Institute of Technology by MEXT of Japan.
Pni, j
[1] Charles K. Chui, An Introduction to Wavelets, Academic Press, 1992.
[2] T. Kohonen, Self-Organizing Maps, Springer-Verlag, 1995.
- 65 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Fig. 5. An example of the foreign patterns extraction. (a) the shadow image
with foreign pattern, (b) extracted foreign pattern area, (c) eliminated result.
Fig. 2. The structure of the Circulating Light Sources (CLS).
Fig. 6. An example of the Multi-Resolution Analysis (MRA).
Fig. 3. An example of the shadow images produced by CLS. This object is a
sandpaper No.P16. The image (a)∼(h) are illuminated by L1∼L8, respectively.
Input Vector
Input Layer
Competitive Layer
Weight Vector
Fig. 7. The structure of the Self-Organizing Map (SOM).
Fig. 4. The framework of the foreign patterns extraction.
- 66 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Atushi Kanda, Masanori Sato, and Kazuo Ishii
Department of Brain Science and Engineering, Kyushu Institute of Technology, Japan
[email protected]
In the previous research, we have developed a 6wheeled mobile robot “Zaurus” employing a passive
linkage mechanism to evaluate the maneuverability
on rough terrain environment. Figure 1 and 2 show
the overview and degrees of freedom of Zaurus,
respectively. A neural network controller for velocity
control is introduced into the robot and the
performances of controllers for rough terrain using a
neural network showed better performance than a
well-tuned PID controller [1]. In order to improve the
maneuverability, the function to switch controller
suitable to current environment is also inevitable [2].
In this paper, we proposed an environment
recognition system which select on the adjusted
controller in each environment. The input data to the
system are the linkage angles and fed to SelfOrganizing Map (SOM) which is the basic
information processing engine of the environment
recognition system and compared with other
clustering methods; PCA and k-means.
The environment recognition system needs the
function to classify environment using the linkage
angle data and posture of robot to estimate the current
terrain. Figure 3 shows the concept of the
environment recognition system. The system
recognizes various environments using data of
passive link joints. The angles of the passive linkage
mechanism and the attitude of robots can express the
current environment that is; the robot itself can be an
environment recognition sensor.
In this paper, we compared classification
performance with three multiple classification
analyses; Principle Component Analysis (PCA) which
is basic multiple classification analysis, k-means
method which is basic nonhierarchical cluster
analysis and third one is SOM [3].
1) Basic Environment Data
To realize better recognition capability, previous sets
of state variables and also included for classification.
Therefore, the number of the input data to the system
becomes high dimension. We defined our
environmental data of height using joint and attitude
angle data making basic environment data by
environment”,. Experiments to make basic data set,
several time series of climbing over and down of
bumps, front angle (θf), side link angle (θs) and
attitude angle of pitch (θp). We decide four one-step
height bumps, 0.0[m], 0.06[m], 0.12[m] and 0.18[m]
as basic environment, and a PID controller is
employed for data sampling.
- 67 -
2) Principle component analysis: PCA
The principle component Analysis (PCA) is one of
the feature extraction algorithms for less deficit data.
The advantages of PCA are the possibility of
contraction of data and the feature. The extracted data
are able to be evaluated independently. This method
is used to detect characteristic point statistically.
Figure 4 show the results of analysis using PCA,
Horizontal axis shows the first principle component.
In Fig.4 the climbing up and down data are spread in
wide variety area of graph. Therefore, the PCA using
principle component 1 and 2 is not suitable in this
case to use as the environment recognition system.
3) k-means method
Regarding to the multiple classification analysis, k-
Fig.1 Overview of Zaurus
Fig.2 DOF of Zaurus
Fig.3 The concept of proposed system
(a) 0.18[m]
(b)All height
Fig. 4 The result of PCA
(a)cluster number is 9 (b)cluster number is 27
Fig. 5 The result of k-means
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
means method is one of the most popular nonhierarchical analysis. The number of cluster, k, is
assumed, and each input data is classified into the
similarity cluster.
Figure 5 shows the result of analysis using k-means.
The number of cluster is 9 and 27. Horizontal axis
shows the sampling step and vertical axis show the
cluster number. The cluster number 9 is assumed that
robot has nine situations. The cluster number 27 is
assumed that nine situations and each height of
The data classify as the attitude of robot climbing up
and down. However, the height of bumps is not
clustered because the cluster number 9 and cluster
number 27 are almost same result.
4) Self-Organizing Map
Self-Organizing Map proposed by T.Kohonen is one
of the well-known methods for classifying data with
preserving topological feature map.
The SOM is trained using unsupervised learning to
produce low dimensional of training data.
Figure 6 shows the obtained feature map and color
of each unit means distance vector between the
reference neighborhood units. The white color means
far, and the black color means that we can see that
similar units locate closely.
As shown in Fig.6 (a), the map is classified based on
the attitude of robot. The movement of climbing up
and down is arranged in opposite side. The feature
map by SOM includes the dynamics of the robot.
In Fig.6 (b), we evaluate interpolation function using
unlearned experimental result (0.10[m] height). The
trajectory of 0.10 is similar to those of 0.06[m] and
0.12[m], and the unlearned data is assigned in the
middle of those lines. The areas where the trajectories
overlap are the conditions of where the differences
are not clear.
As the result, SOM showed the best performance in
the points of bump height and posture classification,
so that SOM is introduced as the basic processing
The developed environment recognition system is
introduced into Zaurus to evaluate classification
ability online. Figure 7 shows the velocity of the
robot and which environment the system categorized
the current condition. The conditions are classified
into 3 states, climb up and down and flat. The
velocity of the robot is controlled properly and the
environment recognition works online.
The number of input data to the system becomes
high dimension, so that we combined PCA and SOM
for system miniaturization. We detected number of
dimension from contribution ratio of basic
environment data and compressed from 24 to 5
dimensions. As shown in Fig.8, the PCA-SOM map is
classified based on the attitude of robot. For this
reason, the map of PCA-SOM is expected to same
performance with the map using original basic
environment data.
In this research, we proposed the environment
recognition system by classifying the angle data. The
basic environment map can classify attitude and
- 68 -
height of bumps. The system is able to recognize the
simple traveled terrain while traveling. The
environment recognition system using compressed
basic environment data shows same performance with
the proposed system used original high dimensional
environment recognition data.
(a)attitudes of robot (b)interpolation function(0.10[m])
Fig. 6 The result of SOM
Fig.7 The experimental result of environment recognition
Fig.8 The result of SOM: Classification about attitude of robot
This work was partly supported by a grant of
Knowledge Cluster Initiative implemented by Min
istry of Education, Culture, Sports, Science and T
echnology (MEXT)
[1] Masanori SATO, Kazuo ISHII, “A neural network
based controller for a wheel type mobile robot”, BrainInspired IT 3, pp. 261-264., 2006
[2] Masanori Sato, Atushi Kanda, Kazuo Ishii, “A
Switching Controller System for a Wheeled Mobile
Robot”, Journal of Bionic Engineering, Vol.4, No.4,
pp.281-289, 2007
[3] T. Kohonen, “Self-Organizing Maps”, SpringerVerlag(2001)
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Emotional Expression Model Inspired by the Amygdala
Satoshi Sonoh and Takeshi Yamakawa
Department of Brain Science and Engineering, Kyushu Institute of Technology
Email: [email protected], yamakawa@}
of the amygdala[2]. EMA consists of an architecture
which is illustrated in Figure 1, and can realize two
functions: one is a recognition of the sensory stimuli,
the other is a classical conditioning between the sensory
stimuli and emotional stimuli.
The recognition of the sensory stimulus can be
achieved by a function of the LA layer. In the LA
layer, many functional units with reference vectors are
arranged on two-dimensional array, and learn to get a
feature of the sensory stimulus by using an extended
Kohonen’s rule. The best matching unit (BMU) is selected when the sensory stimulus is presented, and the
reference vector of the BMU is used as a recognized
stimulus. In learning of EMA, an adaptation to environmental change can be achieved by using adjustment
of a learning rate and a neighboring width based on an
adaptation degree of the BMU. For example, Figure
2 shows a obtained feature map for the color sensory
stimuli. Neighboring units have similar reference vectors (color), thus a robustness to noises of the sensory
stimuli is enhanced.
A relationship between the recognized sensory stimulus and presented emotional stimulus is acquired by a
classical conditioning model. The emotional stimulus
unconditionally induces the emotional response, and
the sensory stimulus have no emotional value naturally.
The emotional value is output of EMA, and represents
a strength of the emotional response. The LA units
have emotional weights, and the emotional response is
calculated by product-sum of BMU and neighboring
units. Then, the emotional weights are updated based
on delta rule (difference between the emotional stimulus and the output emotional value).
Figure 3 shows a learning curve of the emotional
value on a computational simulation. This simulation
emulates the acquisition and extinction in a part of the
classical conditioning. Although fear emotion may be
processed mainly in the amygdala, many other emotions such as pleasure and surprise emotions can too
be handled in EMA. This extension is more effective
to generate various emotions and behavior on robot
Animals generate a wide variety of emotions based on
interactions between them and their environments, and
can adapt themselves to new environments. It is well
known that emotions are controlled by the amygdala
that is a part of the cerebral limbic system[1]. In the
amydala, the sensory stimulus is evaluated according
to its an emotional value; how strong does the stimulus
affect pleasure or unpleasure? And then, the emotional
response is transmitted to the corresponding systems
from the amygdala. The emotional response relates
very closely to learning and memory system of animals.
This means that the emotions have a significant role to
play on animal’s behavior.
The aim of this research is the development of an
emotional expression algorithm from the point of view
of brain-inspired technology. We proposed a novel
neural network model based on neuroscience findings
about the architecture and the function of the amygdala. Along with a philosophy of Kyutech COE program; ”World of Brain Computing Interwoven out of
Animals and Robots”, in this research, we have proposed the emotional expression model of the amygdala
and developed a specific digital hardware for the proposed model as well. Furthermore, an implementation
of the proposed model was successfully archived in order to make an autonomous robot generate the emotions.
Neural Network Model of the
1 (t
k (t
1 (t
k (t
Emotional expression model of the amygdala (EMA),
in this research, is proposed based on the neural circuit
Sensory Input Layer
CE Layer
Implementation in Robot
In this research, we implemented the proposed model
EMA to an autonomous robot. The autonomous
robot “WITH” is developed by Ishii laboratory as
an experimental platforms of brain-like information
LA Layer
Figure 1: The architecture of EMA.
- 69 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Figure 4: The emotional interaction between human
and “WITH”. (a)During conditioning. (b)After conditioning.
Figure 2: A feature map of the sensory stimulus.
Experimental Results
An experiment of an emotional interaction was carried
out by using the developed robot. Actions from hu1
man to “WITH” are presenting color marker as the
sensory stimulus, and hitting and gentle stroking as
the emotional stimulus. “WITH” detects those stim100 200 300 400 500 600 700 800 900 1000
uli, and induces the emotional response based on the
emotional value that EMA outputs.
The emotional interaction is shown in Figure 4. In
Figure 3: A learning curve in acquisition and extinc- the beginning, “WITH” did not induce any emotional
responses against the presented red marker. However,
as a result of the classical conditioning, “WITH” gradprocessing[3]. In order to establish an emotional in- ually induced the emotional responses fear against the
teraction between human and the robot, we extended presented red marker. In the end, if the emotional
stimulus is changed from hitting to gentle stroking, an“WITH” as follows:
other emotional response, pleasure is induced.
1. Developing a real-time hardware of EMA
2. Addition of a touch sensor array which the emotional stimulus can detect
In this research, we propose a neural network model,
3. Addition of ear- and tail-like devices which uni- Emotional expression Model of the Amygdala (EMA),
from brain information engineered point of view. The
versally express the emotions
emotion expression in the EMA was achieved by a
The EMA hardware (EMAHW) employing a mas- cooperation between recognition of sensory stimulus
sively parallel architecture is developed by using FPGA and a classical conditioning. In addition, we imple(Xilinx Spartan-3E, sc3s1600E). Both speed-up of cal- mented EMA as the emotion expression system to an
culations and small-scale integration can be achieved autonomous robot “WITH”. As the result, we successby simplifying the calculation algorithms and replacing fully made the robot to generate principal emotions
all multiplier with bit shifters. The proposed EMAHW from the interaction between the human and the robot.
can process calculations of 81 units at clock frequency
of up to 35MHz. The calculation speed is twenty-times
faster than one of a portable PC (Intel 1.1GHz, 512M References
[1] J. P. Aggleton, The Amygdala: A Functional
The touch sensory array on head of “WITH” is able
Analysis, Oxford University Press, 2000.
to detect the emotional stimulus such as hitting and
gentle stroking from human. In addition, we added [2] S. Sonoh, et. al., “Emotional behavior and expresthe image sensor for detection of the sensory stimulus
sion based on a neural network model of amyg(color-marker).
dala,” Abstract of the 4th International ConferThe ear- and tail-like devices are emotional expresence on Brain-Inspired Information Technology
sion devices modeling behavior of dogs (one of the most
(BrainIT 2007), p.36, 2007.
famous pets). If the emotions are represented by movements of ear and tail, a non-verbal interaction will be- [3] Y. Takemura, et. al., “Toward Realization of
Swarm Intelligence Mobile Robots,” Lecture
come available. The non-verbal interaction is indepenNotes in Computer Science, Vol.4232, pp.273-276
dent of language, nationality, and age, that means very
universal. The universal communication is considered
an effective method that reduces techno-stress.
- 70 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Dynamics of Link Mechanisms Employing COG Jacobian matrices
*Takashi Sonoda1, Kazuo Ishii1 and Daigoro Isobe2
Department of Brain Science and Engineering, Kyushu Institute of Technology
Department of Engineering Mechanics and Energy, University of Tukuba
E-mail: [email protected], [email protected] and
[email protected]
Analyzing dynamics of robots is important in controls,
developments and designs. However, it causes increase
of calculation cost and complication of expressions,
when analysis targets become complex. Therefore, unified calculation method that doesn’t regard target’s
structures is desired, and the parallel computation
method by matrix forms is proposed [1].
The proposed method expresses the equation of motion as a simple form and employs the Jacobian matrix
(COG Jacobian matrix) regarding the center of gravity of
links and the active joints. In this paper, we derive the
equation of motion employing COG Jacobian matrix
from principle of virtual work. Moreover, we show the
simulations of inverse dynamics using the proposed
∂θ m
Fig. 1
: Passive joint
: Active joint
Model of closed-link mechanism
q&& = Gθ&&m + G& θ&m .
In brief, COG Jacobian matrices are matrices which
express relation between the velocities of the COGs and
the velocities of the active joints as described to eq. (2).
2.2 Inverse dynamics employing COG Jacobian.
Relationship between torque vector τ m ∈ R M applied
to active joints and inertia forces originated by the acceleration vector q&& obtained in the above section can be
expressed as following equations by using principle of
virtual work.
Eq. (3) expresses the COG Jacobian matrix of the link
mechanism. Moreover, differentiating eq. (2) with respect
to time, we derive the relationship of the accelerations of
the COGs as
∂r T F = 0 ,
∂r = [∂q T , ∂hT , ∂θ mT ]T
F = [−( Mq&& + b)T ,− f T , τ mT ]T .
G ≡ G (θ m ) =
Then, differentiating eq. (1) with respect to time, we
obtain the velocities of the COGs as
q& = Gθ&m ,
2.1 Deriving COG Jacobian matrices
Fig. 1 shows relationship between the forces applyed
around the COGs of each link and the torques needed at
the active joints, in the motion of a closed link mechanism. In this paper, we don’t discuss about
non-holonomic mechanisms, because it’s easy.
First, the number of links and the number of active
joints are defined as N and M. Then, the coordinates of
the COGs of the link i (i=1,…,N) are expressed as
qi = [ pGi
, θGi
] , and the all coordinates are grouped as
q = [q1 ,L, q TN ]T .
The coordinates θm ∈ R M of all active joints decide
the motion of the mechanism uniquely. So, the coordinates q of the COGs are expressed by using the function g (θ ) regarding to θm as the following equation.
q = g (θm ) .
∂q , ∂h and ∂θm are virtual displacement vectors of
the COGs of links, the arbitrary points and the active
joints. b is external force vector which doesn’t proceed
- 71 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
from the active joints such as gravity. f is the force vector
applied to the arbitrary point coordinates h. M is the mass
matrix of each link. Expanding with respect to eq. (5), we
obtain as
− ∂q T ( Mq&& + b) − ∂hT f + ∂θ mT τ m = 0 .
Then solve for τ m . So,
: Passive joint
Model of a mechanism
without redundant actuators.
⎛ ∂h
⎛ ∂q ⎞
⎟⎟ ( Mq&& + b) + ⎜⎜
τ m = ⎜⎜
⎝ ∂θm
⎝ ∂θm ⎠
: Active joint
(b) Model of a mechanism with a
redundant actuator.
Fig. 2 Two types of link mechanisms for the dynamic analysis examples
employing the COG Jacobian matrices.
⎟⎟ f
obtained by following equation.
≡ G T ( Mq&& + b) + J T f .
J is a Jacobian matrix regarding the external forces
which are applied to arbitrary point except the COGs. For
example, applying it to a robot arm, we can obtain the
torques balancing with the external forces when the coordinate vector h is defined to the end-effector of the arm.
Eq. (9) means that inverse dynamics of link mechanisms are able to be solved by using COG Jacobian matrices. However, the virtual displacements have to satisfy
those constraints when the mechanisms have some constraints of displacements and motions.
Next, substituting eq. (4) for eq. (9), we obtain the following equation.
τ m = G T MGθ&&m + G T MG& θ&m + G T b + J T f
= H (θ )θ&& + B (θ , θ& )
where eq. (10) is general form expressing equation of
motion, H (θ m ) is an inertia matrix and B (θ m , θ& m ) is
a term grouping together gravity, Coriolis and centrifugal forces.
⎡ G1 (θ 1 , θ 3 ) ⎤
⎡τ 1 ⎤ ⎢
⎢τ ⎥ = ⎢G 2 (θ 1 , θ 3 )⎥
⎣ 3 ⎦ ⎢G (θ , θ ) ⎥
⎣ 3 1 3 ⎦
⎡ F1 ⎤
⎢F ⎥
⎢ 2⎥
⎢⎣ F3 ⎥⎦
We showed inverse dynamics calculations employing
COG Jacobian matrix expressing relationship between the
velocities of the COGs and the ones of the active joints.
Moreover, deriving the COG Jacobian matrix from principle of virtual work, we showed that equations of motion
are expressed as matrix forms by using it.
[1] D. Isobe: “A Unified Numerical Scheme for Calculating
Inverse Dynamics of Open/Closed Link Mechanisms,” Proceedings of the 27th Annual Conference of the IEEE Industrial Electronics Society IECON'01, pp.341-344, 2001.
In this section, we show a inverse dynamics simulation
employing COG Jacobian matrices.
Fig. 2 is shown two models of simple link-mechanisms
that have four joints. The model of (a) in Fig. 2 doesn’t
have redundant actuators. The model of (b) in Fig. 2 has a
redundant actuator. Both models are same sizes, masses
and materials.
In case of (a), the active joint of the model is θ m ≡ θ3 ,
and using COG Jacobian matrix, active joint’s torque is
obtained by following equation.
⎡ F1 ⎤
⎢F ⎥
⎢ 2⎥
⎢⎣ F3 ⎥⎦
But each active joint displacement θi (i = 1,L,4) has
to satisfy the constraints of closed-loop in both cases.
Fig. 3 is shown the target motion applied to both cases,
and then Fig. 4 is shown those results of inverse dynamics
⎡ G1 (θ 3 ) ⎤
[τ 3 ] = ⎢⎢G 2 (θ 3 )⎥⎥
⎢⎣G 3 (θ 3 ) ⎥⎦
Fig. 3
Target motion
Fig. 4
Torque curves
In case of (b), the active joints of the model are
θm ≡ [θ1 , θ3 ]T , and active joint’s torques are similarly
- 72 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Takayuki Matsuo, Takeshi Yokoyama, Daishi Ueno and Kazuo Ishii
Department Brain Science and Engineering, Kyushu Institute of Technology
Email:{matsuo-takayuki, [email protected]
[email protected]
wheels are used on ground
MLMR II consists of two kinds of cylinders; a
cylinder for head (head module) and seven
cylinders for joints (motor modules). A motor
module has a motor to control joint angle and a
circuit board. The circuit
has a MPU, a
potentiometer to measure the joint angle, a
RS485 transceiver for communication and a
current sensor to measure joint torque. The
block diagram of the circuit is shown in Fig.2. The
head module is the interface device between the
robot and the host PC, and transfers the target
behavior to other modules.
Robots and robotics technologies are expected as
new tools for
inspection and manipulation,
especially in the extreme environments. Robots
for such the extreme environments should be
robust and strong enough against disturbance.
As you can see in the animals, they adapt well to
environmental changes in both short and long
time periods. Here, we investigate the motion of
snake and eels as they can move various fields,
land and underwater, by wriggling bodies.
Snake’s body is covered by special scales which
have low friction in tangential direction and high
friction in normal
direction[1]. The feature
enables to produce thrust from wriggle motion.
Eel swims in underwater by generating impellent
force using hydrodynamic force[2]. We can see
that a certain phase difference along the body and
the amplitude of wriggling increase along the
In nervous systems of animals, it has proven
that motion patterns, such as walking,
respiration, flap, etc, are controlled by rhythm
generator mechanisms called the Central Pattern
Generator (CPG). CPG consists of many neural
oscillators, and rhythm patterns are generated
on each neural oscillator by affecting each other.
In this research, we introduce Matsuoka Model[3]
as the neuron model of CPG. We also realized
phase differences and bias by adjusting
parameters and the periodical signals from CPG
are utilized as the target joint angles of a robot.
And, we developed a amphibious multi link
mobile robot (MLMR II). A robot motion control
system using CPG is investigated and applied to
Table 1 Specification of MLMR II
Length [m]
Weight [kg]
Number of joint
PIC18F452 x8
Current sensor
DC motor x7
Fig.1 Overview of MLMR II
The specification and overview of amphibious
multi link mobile robot (MLMR II) are shown as
Table 1 and Fig.1. MLMR II is able to move on
both environments, ground and underwater.
Therefore, Waterproof is very important point in
development of robot. The robot is constructed by
8 cylinders which consist of joint that are able to
rotate around yaw axis using DC motors, gearbox
and control circuit. Each cylinder is put O-rings
on shaft of joint and lids of cylinder for the
waterproof of robot.
In underwater, hydrodynamic forces caused by
fins and body produce thrust forces and passive
Fig.2 Block diagram of circuit
The robot has 7 joints ; therefore, a set of neural
oscillators is assigned to each joint. The CPG for
MLMR II is shown in Fig.3. Matsuoka model is
expressed in Eqs. (1)-(3), where u i is the
membrane potential of the i-th neuron, v i is
- 73 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
the variable that represents the degree of
adaptation, u 0 is the external input with a
constant rate, fi is the feedback signal from a
sensory input, β , τ u , τ v are parameters that
specify the time constant for the adaptation, wij
is the weight between neurons, and n is the
number of neurons. Table 2 shows parameters in
forward motion and Fig.4 shows outputs of each
neural oscillator. MLMR II can change heading
and to rotates by changing the parameter u0 so as
to shift neutral position. If parameters are set as
Table 3, robot rotate and changing heading
toward right. Figure 5 show output of CPG
network in right turn motion. We analyzed
motion of robot using motion capture system in
underwater and on ground. Figure 6 (a) show
motion capture data in forward motion on the
ground and in the underwater. Figure 6 (b) show
motion capture data in right turn motions. In
forward motion, the robot is able to go forward on
ground more than in underwater. In underwater,
robot was able to move through a distance of
about 2.0 meters, but on ground, about 5.5 meters.
In right turn motion, the robot is able to go round
on ground, but, is not able to go round in
underwater. Turning radiuses are 1.6 meters on
ground and 1.2 meters in underwater.
τ v v&i = −vi + yi
τuu&i − ui − βvi + ∑wij yi + u0 + fi
yi = max(0, ui )
Neural Oscillator1
Neural Oscillator2
EN 1
FN 1
Neural Oscillator3
w1 EN 2 w2
w1 FN 2 w2
EN 3
FN 3
Multi link mobile robot which has 7 joints and
moves on ground and in underwater is developed.
Distributed control system which is implemented
biomimetic motion control system using CPG are
developed. The system was applied to MLMR II,
and motion control were carried out on the
ground and in the underwater with adjusting
CPG parameters. Additionally, the motions of
robot are analyzed using motion capture system.
Fig.6 Motion capture data of forward motion (a)
and right turn motion (b) on ground and in
underwater using CPG
Neural Oscillator7
Table 3 Parameters of CPG network in right turn
0.44 τv
0.54 β
1.5 wfe
1.5 w1
0 u0_e
1.02 u0_f
Table 2 Parameters of CPG network in forward
1.5 wfe
Fig.3 CPG Network for biomimetic motion control
This work was supported by a 21st Century
Center of Excellence Program, ”World of Brain
Computing Interwoven out of Animals and Robots
(Pl: T. Yamakawa)” (center#J19) granted to
Kyushu Institute of Technology by MEXT of
Fig.4 Output of CPG network in forward motion
[1] S. Hirose, “Bionic machine engineering”,
Kougyo tyosakai, 1987 (in japanese)
[2] A. Azuma, “The subject-book of Animal’s
Motion”, Asakura, 1997
[3] K.
generated by mutually inhibiting neurons
with adaptation”, Biological Cybernetics 52,
pp367-376, 1985.
Fig.5 Output of CPG network in right turn motion
- 74 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
A Central Pattern Generator Network Model with Independent Controllability
Tatsuji TOKIWA, Takeshi YAMAKAWA,
Department of Brain Science and Engineering, Kyusyu Institute of technology,
Email: [email protected] , [email protected]
where the parameter ε is a small constant and is called a
We have proposed a central pattern generator (CPG)
nonlinearity coefficient. The value of xi is the output
network model with independent controllability, in
signal of CPGi (i= 1, 2,…, n). A and Bi determine the
terms of the amplitude and period of each output signal,
value of the amplitude and period of xi in the stable
and phase difference between CPGs [1]. In this paper,
state, respectively [1][3][4]. Especially, the period is
we introduce the CPG network with independent
inversely proportional to the parameter Bi, and the
controllability for quadruped locomotion signal
amplitude reaches 2A in the stable state. n denotes the
generator. The proposed model is comprised of several
required number of CPGs in the CPG network. In order
CPGs and one Rhythm Generator (RG). Each CPG and
to control the phase difference between CPGs, the
the RG are described by the Van der Pol (VDP)
phase of each CPG must be temporally shifted. The
equation. The VDP equation has a limit cycle and a
parameter Bi represented in the following equation is
feature of independent controllability in terms of an
utilized to control the phase.
amplitude and period of the limit cycle. The amplitude
and period of an output signal from each CPG and the
RG are independently controlled by external signals,
where B and k denote the natural frequency of the CPGs
because the CPG and RG is designed keeping the
and gain factor, respectively. The value of bi determines
feature of the VDP equation. In order to control the
the amount of phase shift of CPGi. After the value of
phase shift between CPGs, the period of the output
the phase difference (xi-Xi) becomes 0, the value of bi
signal from each CPG is temporarily controlled through
approaches 0. The value of k can control the time until
connections which are only conjunction between the
the stable state. The block diagram of the CPGi is
RG and each CPG.
shown in Figure 1. The period of the output signal xi is
The proposed CPG network is applied to a quadruped
controlled by the bi so that the phase difference between
locomotion signal generator, and can generate typical
the control and target signal becomes 0.
quadruped locomotion signals that are walk, trot, bound
and gallop mode.
The Rhythm Generator (RG) is designed as the target
signal generator and also designed based on the VDP
Locomotion signals such as walking, swimming and
equation by using subscript R.
flying are generated and controlled by a central nervous
system so-called central pattern generator (CPG) [2].
The ith CPG model (CPGi) is written as follows,
=0 (1)
Figure 1; CPG model
- 75 -
Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008
Figure 2; Rhythm Generator
(a)Transition of xi
0 (4)
The target signal Xi is described as equation (5). The
values of ci1 and ci2 take -1, 0, or 1 and are decided
based on the gait transitions which we intend to control.
Either ci1 or ci2 always takes 0.
where, τ arranges the dimension of xR and dxR/dt, and
(b)Transition of bi
adjusts the amplitude of dxR/dt to always make the
Figure 3; Output signals of each CPG.
amplitudes of xR and dxR/dt constant even if the
parameter A and B are changed.
of each xi were controlled by target signal Xi. In this
simulation, since value of X i was not always
The proposed CPG network can be composed of one
corresponding to the value of xi, bi omitted 0.3 or less.
RG and n CPGs based on a gait transition. The
In figure 3 (b), a horizontal line describes that the value
connections of the CPG network can be logically and
of bi=0.3.
uniquely decided based on the gait transition. The
composed method is reported in [1].
We introduced a Central Pattern Generator (CPG)
network with independent controllability for quadruped
In order to verify the effectiveness of the proposed
locomotion signal generator. The network could be
method, the output signal of each CPG was investigated
generated and controlled the output signals for walk
with the 4th order Runge-Kutta method. In this paper,
we aim to obtain the typical quadruped gaits, especially
walk mode. The parameter A and B are provided by
[1] Tatsuji Tokiwa and Takeshi Yamakawa, “Central
external signals.
Pattern Generator network model with independent
Figure 3 shows simulation results of walk mode. In
controllability,” RISP International Workshop on
this simulations, ε , k, initial values of xR, x1, x2, x3, x4,
Nonlinear Circuits and Signal Processing
dxR/dt, dx1/dt, dx2/dt, dx3/dt and dx4/dt were 0.2, 1, 0.1,
(NCSP’08), pp.156-159, 2008.
[2] F. Delcomyn,”Neural basis of rhythmic behavior in
0.1, -0.5, 0.3, 0.7, 0.1, 0.1, 0.3, 0.3 and 0.2, respectively.
animals,” Science, Vol.210, pp.492-498, 1980.
A and B is 0.5 and 1. The output value of CPGs and bn
[3]Van der pol, On relazation oscillations, Phil. Mag.,
were measured until 1200 steps. In figure 1, it is
No.2, pp.987-993, 1926.
confirmed that the amplitude of the limit cycle becomes
almost 2A and phase differences between CPGs are
[4]Louis A. Pipes and Lawernce R. Harvill,
controlled as the desired phase differences. In figure
Mathematics for engineers and physicists,
3(b), it is found that bi becomes 0 after the phase
McGrawHill Education, Third Edition, 1970.
- 76 -