Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Nonnegative Tensor Factorization for EEG Pattern Classification Seungjin Choi Department of Computer Science POSTECH, Korea [email protected] Abstract Learning fruitful representation from data is one of fundamental problems in machine learning and pattern recognition. Various methods have been developed, including factor analysis, principal component analysis (PCA), independent component analysis (ICA), manifold learning, and so on. Among those, nonnegative matrix factorization (NMF) has recently drawn extensive attention, since promising results were reported in handling nonnegative data such as document, image data, spectrograms of audio. NMF seeks a decomposition of a nonnegative data matrix into a product of two factor matrices (basis matrix and encoding matrix) such that all factor matrices are forced to be nonnegative. In this talk, I will begin with the useful behavior of NMF in EEG pattern classification, which plays a critical role in noninvasive brain computer interface (BCI). Next, I will introduce multiway extension of NMF, what is called, “nonnegative tensor factorization” and will emphasize its useful behavior in EEG pattern classification. A tensor is nothing but ‘multiway array’, generalizing vector and matrix in order to accommodate higher-order representations. For instance, a vector is a 1-way tensor, a matrix is a 2-way tensor, and a cube is a 3-way tensor, etc. Through this talk, I will stress why tensor is useful in learning fruitful representation, compared to existing matrix-based methods. -1- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Sequence learning: a network model of the entorhinal cortex layer II with hippocampal-entorhinal loop circuit Hatsuo Hayashi Department of Brain Science and Engineering, Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology Email: [email protected] Finding of place cells encoding place information in phase of the theta rhythm has revealed that the hippocampus has ability to represent spaces. The firing phase with respect to the theta rhythm advances within a theta cycle when the animal traverses a place field [1, 2]. Although the phase precession was found in the hippocampal CA3 and CA1, it has also been found in the dentate gyrus [2], and then in grid cells in the medial entorhinal cortex [3]. So, a model of the entorhinal-hippocampal system accounting for the phase precession has been proposed [4]. This model is rather abstract, and the several layers of the hippocampus and the entorhinal cortex were modeled by simple oscillator units to describe local field theta rhythms. Another model of the entorhinal cortex layer II network that has the entorhinal-hippocampal loop circuit has also been proposed [5]. This model consisted of conductance-based multi-compartmental models of the stellate cell and the inhibitory interneuron, and the entorhinal-hippocampal loop circuit was modeled by delay lines whose signal transmission delay was 20 ms, based on physiological data. In this lecture, we will review this model having mechanisms of theta rhythm generation and theta phase coding, and show how successfully sequence learning is done. We also tested how the shape of STDP rule affected the performance of the present model amid background noise [6]. STDP rules observed in the hippocampus and the entorhinal cortex are classified into two types: temporally symmetric and asymmetric STDP rules. The symmetric STDP rule that has LTD windows on both sides of the central LTP window prevented irrelevant enhancement of loop connections caused by noise in cooperation with loop connections having a larger signal transmission delay and the theta rhythm pacing the activity of the stellate cells. However, the asymmetric STDP rule that has no LTD window in the range of positive-spike timing did not prevent such irrelevant enhancement, and the memory pattern was blurred by noise. Important elements of the present model for sequence learning are (1) symmetric STDP rule, (2) loop connections having a large signal transmission delay, and (3) theta rhythm pacing the activity of stellate cells. These elements also contribute to robust sequence learning amid background noise. Above all, the LTD window in the range of positive spike-timing is important to prevent influences of noise with the progress of sequence learning. REFERENCES [1] O’Keefe J and Recce ML, “Phase relationship between hippocampal place units and the EEG theta rhythm,” Hippocampus, vol. 3, pp. 317-330, 1993 [2] Skaggs WE, McNaughton BL, Wilson MA, and Barnes CA, “Theta phase precession in hippocampal neuronal populations and the compression of temporal sequence,” Hippocampus, vol. 6, pp. 149-172, 1996 [3] Hafting T, Fyhn M, Bonnevie T, Moser M-B, and Moser E, “Hippocampus-independent phase precession in entorhinal grid cells,” Nature, vol. 453, pp. 1248-1252, 2008 [4] Yamaguchi Y, “A theory of hippocampal memory based on theta phase precession,” Biol. Cybern., vol. 89, pp. 1-9, 2003 [5] Igarashi J, Hayashi H, and Tateno K, “Theta phase coding in a network model of the entorhinal cortex layer II with entorhinal-hippocampal loop connections,” Cogn. Neurodyn., vol. 1, pp. 169-184, 2007 [6] Igarashi J and Hayashi H, “Theta phase coding and suppression of irrelevant plastic change through STDP in the entorhino-hippocampal system amid background noise,” in “Advances in Cognitive Neurodynamics,” (Eds) Wang R, Gu F, and Shen E, Springer, in press, 2008 -2- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 TRAINING AND FARTIGUE OF POLYMER ARTIFICIAL MUSCLES Keiichi Kaneto, Hirotaka Suematsu, and Kentaro Yamato Department of Biological Function and Engineering, Kyushu Institute of Technology, Email: [email protected] 1. INTRODUCTION Present robots are heavy and noisy, because they are driven by motors made with copper and iron. In order to build light weight and human friendly robots, artificial muscles or soft actuators are demanded to replace motors. Artificial muscles are required not only to drive robots but also to mimic biological motions like a snake, fishes and insects for medical equipments and variety of application. The research on soft actuators [17] began 20 years ago based on ionic polymers (membrane), polymer gels, dielectric elastic polymers and conducting polymers. Among them conducting polymers are the most prospective material from the view point of low operating voltage, large strain and stress[8]. The conducting polymers are actuated or deformed by electrochemical oxidation and reduction. The deformation results from the insertion of bulky ions as well as the change of polymer conformation due to the delocalization of π-electron upon oxidation. The deformation is named as the electrochemomechanical strain (ECMS). In various conducting polymers, polypyrrole (PPy) based ECMS is currently studied, since PPy shows excellent performance in strain, stress and cycle life compared with those of polyaniline [9] and polythiophene [10]. Efforts have been mostly paid to improve the strain, stress and response time [3,4], however, characteristics of ECMS under tensile loads have not been investigated from the view points of training, fatigue and aging. Muscles can be strengthen by training through heavy tasks, though it feels fatigue. One may consider training of artificial muscle is impossible, however, we have found that a training of an artificial muscle fabricated with conducting polymers [11] is feasible. The training effect was observed as the increased strain after an experience of heavier tensile stress. This is resulted from a relaxation of anisotropic deformation, which was induced along the tensile direction as realignment of polymeric molecules during a creeping under higher stresses. In this paper, ECMS responses of cation driven PPy actuators [11-14] operated under various tensile stresses are mentioned. After the application of large tensile stresses, the recovery of strain due to creeping was studied to demonstrate the effects of training and fatigue of artificial muscles. The results are discussed taking a model of relaxation of anisotropic deformation induced by high tensile stress into consideration. The energy conversion efficiency from electrical input energy to mechanical output work as the function of tensile stress is also reported. 2. EXPERIMENTAL Polypyrrole films were electrodeposited on a Ti plate in an aqueous electrolyte solution of 0.15 M pyrrole and 0.25 M of DBS acid at a constant current of 1 mA/cm2 for 1500s vs. a Ni counter electrode as described in previous paper [11,14]. The thickness and conductivity of typical film (named as PPy/DBS) was approximately 18 μm and 30-50 S/cm, respectively. Young modulus of film was 140 and 70 MPa at oxidized and reduced states, respectively. 3. RESULTS AND DISCUSSION Cyclic voltammograms (CV) of PPy/DBS film in 1 M NaCl aqueous solution are shown in Fig.1 for various tensile stresses. The film was cycled for 3 times at given tensile stress from 0 to 5 MPa by a step of 1MPa sequentially, and the CVs obtained at 3rd cycle are shown in Fig.1. The film was expanded by the negative voltage application (reduction), and contracted by the positive voltage application (oxidation) as observed in cation driven films [11,14]. It was found that with increasing the tensile stress a shoulder at around 0V in the oxidation cycle increased and shifted to higher potential. Similarly, the peak at around -400 mV in the reduction cycle also shifted to higher potential side. It is interesting to note that after removal of 5 MPa tensile stress, the CV at 0 MPa traced almost the same to that of 5 MPa as shown in Fig.1 by the dotted line (0 after training). Fig. 1 CV of PPy/DBS film at various tensile stresses from 0 to 5 MPa in aqueous NaCl electrolytes. Curves in Fig.2 show the deformation of the film during the CV measurement. The deformation is an increment of the film length with 10 mm upon cycling. It is clearly observed that the film was elongated by each electrochemical reduction and creeping, which was -3- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 enhanced at higher tensile stresses. However, after removal of 5 MPa, the film contracted upon cycles, resulting from the relaxation of anisotropic strain induced during creeping. 4. Summary The cyclic voltammetry and electrochemical actuation of PPy/DBS film have been studied as the function of tensile stresses. It was found that the stroke of contraction was increased by the training with larger tensile stress (training effect). The result was explained by an anisotropic deformation of the film due to uniaxial stress or by creeping and polymer conformation memory effect. The conversion efficiency was found to be lower than 0.07 % of the input electrical energy. It was also found that the degradation of films was significantly accelerated under larger tensile stresses. It is interesting to compare the results of conducting polymer artificial muscle with natural muscles. Acknowledgements This work was supported by a Grant-in-Aid for Science Research in a Priority Area “Super-Hierarchical Structures” from the Ministry of Education, Culture, Sports, Science and Technology, Japan. Fig.2 Time dependence of deformation during the CV measurement shown in Fig.1. The charge dependence of deformations of the PPy/DBS film derived from Fig.2 is shown in Fig.3 (a) and (b) for oxidation (contraction) and reduction (elongation), respectively, at various tensile stresses up to 5 MPa. In the oxidation process at 0 or lower tensile stresses as shown in Fig.3 (a), the deformation was proportional to the number of charges. At higher stress, however, the deformation saturated at higher charges, indicating that the film kept the length even cations went out from the film. In other words, the film shrunk anisotropic manner as ellipsoidal stretching along the tensile stress. The idea is supported by the reversed process, i.e. reduction as shown in Fig.3 (b) at the higher tensile loads, namely, the film deformed slowly at the beginning then steeply elongated. The anisotropic deformation is a resultant of competition between the uniaxial stress and the elasticity of polymer chains or the thermal fluctuation. Consequently, one of the mechanisms for creeping is due to the anisotropic strain, besides slipping and breaking of chains. The anisotropic strain is considered as a memory effect. It is noted that the number of charges for reduction is larger than that for oxidation, resulting in some degradation of the film. References [1] T. Mirfakhrai, J.D.W. Madden, and R.H. Baughman, Materials Today, 10 (2007) p.30. [2] M. Watanabe and T. Hirai, Appl. Phys. Lett., 74 (1999) p. 2717. [3] S. Hara, T. Zama, W. Takashima and K. Kaneto, Polym. J., 36 (2004) p.151. [4] T. Zama, N. Tanaka, W. Takashima and K. Kaneto, Polymer Journal, 38 (2006) pp.669. [5] T. Osada, H. Okuzaki and H. Hori, Nature 355 (1992) p. 242. [6] S. M. Ha, W. Y. Yuan, Q. Pei, R. Pelrine, and S. Stanford, Advanced Materials, 18 (2006) 887. [7] T. Fukushima, K. Asaka, A. Kosaka and T. Aida, Agrew Chem. Int. 44 (2005) 22. [8] K. Kaneto, Y. Sonoda and W. Takashima. Jpn. J. Appl. Phys. 39 (2000) p. 5918. [9] W. Takashima, M. Nakashima, S. S. Pandey, and K. Kaneto ,Electrochimica Acta, 49 (2004) pp.4239. [10] M. Fuchiwaki, W. Takashima and K. Kaneto, Mol. Crys and Liq. Crys., .374 (2002) p. 513. [11] H. Suematsu, K. Yamato and K. Kaneto, Bioinspiration and Biomimetics, submitted (2007). [12] H. Fujisue, T. Sendai, K. Yamato, W. Takashima and K. Kaneto, Bioinspiration and Biomimetics, 2 (2007) S1-S5. [13] K. Yamato and K. Kaneto, Analytica Chimica Acta, 568 (2006) p.133. [14] K. Kaneto, H. Fujisue, M. Kunifusa and W. Takashima, J. Smart Mat. and Struct. 16 (2007) p.S250. Fig.3 Deformations of PPyDBS film as the function of charges for (a) oxidation and (b) reduction. -4- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Motion Estimation in Sub-pixel Resolution Using Fast Belief Propagation Intae Na, Sungchan Park, and Hong Jeong Department of Electronic and Electrical Engineering, POSTECH, Korea {bluewing, mrzoo, [email protected] Abstract In motion estimation, iterative global matching methods outperform existing local methods in terms of robustness and accuracy of results. Unfortunately, global methods usually suffer from large computational burden, which obstructs high-speed processing. In this study, we suggest a novel method for motion estimation in sub-pixel resolution. Hierarchical fast belief propagation was applied to reduce computational and memory complexity. Furthermore, angular errors from coarse resolution of discrete motion vector were effectively reduced by dynamic message update in two-layered MRF. Experimental results show that our system has a lowest angular error performance and utilizes 40 times less memory. well as number of hidden states. Instead of assigning nodes to all sub-pixels, we used the model that consists of two different 2D-MRFs which correspond to motion vectors with coarse and fine resolution. Two layers use different energy model and message update scheme. In coarse MRF, whole process is same as the standard belief propagation. The energy function of coarse MRF is, (1) E (U ) = ∑ V (u p , u q ) + ∑ D p (u p ) ( p , q )∈N E p∈P where N E denotes edges in MRF, and U denotes the set of motion vectors. The first term and the second term are called as smoothness cost and data cost function, respectively. Assuming image pair satisfy the intensity constancy condition, data cost of node p is given as, D p (u p ) = min(Cd | I t +1 (x + u p ) − I t (x) |, K v ) (2) which is truncated version of absolute difference. Data cost indicates the degree of dissimilarity between pixels of two images. Smoothness cost function in coarse MRF model is, (3) V (u p , u q ) = min(Cv | u p − u q |, K v ) 1. Introduction Motion estimation has been, and continuous to be, one of the major tasks in computer vision. Recently, global methods like graph cut[1] or belief propagation[2] has outperformed traditional local methods in early vision field. Global methods usually adopt global energy minimization strategy with probabilistic model such as Markov random field. They consider not only the local evidence from two images, but also the spatial smoothness between neighboring pixels. Although global methods give far more robust results under discontinuities and occlusions, there are some drawbacks. Since global methods require huge number of computation in general, they are time-consuming in most cases. Moreover, due to the computational complexity, only limited number of discrete hidden states is available, which may cause large angular errors of motion vector. We present a novel sub-pixel resolution motion estimation method based on two-layered MRF and dynamic message calculation, which relieves not only severe angular errors from coarse resolution of motion vector, but also large memory complexity of belief propagation. which is truncated linear function. The cost has lower value if u p , u q is similar. That is, node p is more likely to have similar hidden state with that of neighboring nodes. As we constructed our coarse MRF model so far, we need to find the set of optimal motion vectors that minimizes energy function. As well known, loopy belief propagation gives approximate MAP solution in MRF with message propagation. Here, we used max-product algorithm with logarithm, which is formulated as, mtpq (uq ) = min(V (u p , u q ) + D p (u p ) + up ∑ r∈Nb ( p )\ q (mrpt −1 (u p ) − α )) (4) When the algorithm is converged, we estimate coarse motion vector uˆ q using following formula. uˆ q = arg min( Dq (u q ) + uq ∑ p∈Nb ( q ) m Lpq (u q )) (5) The basic energy model for fine MRF is similar to coarse MRF, while its cost functions are slightly different. Cost function of fine MRF incorporates with estimated state in coarse MRF. Data cost and smoothness cost functions of hidden state δ u p are defined using uˆ q as, 2. Optimization in two-layered MRF model Let the size of hidden motion vector states is S × S . To estimate sub-pixel resolution result using standard belief propagation, we need to use more numbers of hidden states, which results in increasing number of nodes in MRF as D p (δ u p ) = min(Csd | I t +1 (x + uˆ p + δ u p ) − I t (x) |, K sd ), (6) V (u p , u q ) = min(Csv | (δ u p − δ u q ) + (uˆ p − uˆ q ) | , K sv ), 2 -5- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 ratio is calculated using, (7) Data cost is similar to coarse MRF, while smoothness cost function has truncated quadratic function which is more sensitive to hidden state variation between two neighboring node. Final motion vector in sub-pixel resolution is calculated as, u p = uˆ p + δ u p . (8) Rm = ∑ ∑ K −1 k =0 K −1 B ( Lk + 1)( M / 2k ) k =0 (12) Because of the powerful memory reduction of FBP sequence, parallel processing using distributed memory architecture and individual parallel processors becomes feasible. As a result, FBP can effectively reduce memory complexity as well as computational complexity. Optimization process in fine MRF follows iterative message propagation scheme as standard belief propagation, while message calculation is changed. Message calculation for fine MRF depends on the resulting uˆ q of coarse MRF. Whole process includes 3 steps as following · First step msum (δ u p , uˆ p ) = D p (δ u p , uˆ p ) + ∑ (mrpl −1 (δ u p , uˆ p ) − α ) 4. Experimental results Middlebury motion data sets were used in our verification. For quantitative analysis, average angular error(AAE) and standard deviation of angular error is calculated using, Ψ p = cos −1 ( r∈Nb ( p )\ q · Second step msum (δ u p , uˆ q ) = msum (δ u p + (uˆ p − uˆ q ), uˆ p ) B ( N / 2k )( M / 2k ) (9) · Third step mlpq (δ u p , uˆ p ) = min V (δ u p + uˆ p , δ u q + uˆ q ) + msum (δ u p , uˆ q ) p∈P Ψp u% p ,σ Ψ = 2 ⋅ u% true u% true ∑ p∈P 2 ), (13) (Ψ p − mΨ ) 2 (14) M ×N M ×N Figure 2 shows sample image (Yosemite) that is used for verification. Overall results are shown in Table 1. mΨ = (10) ∑ u% p δ up (11) At first step, messages from neighbors and data cost are accumulated. As neighboring nodes in fine MRF may have different coarse motion vector, we shift the value with amount of difference between coarse motion vectors in second step. Finally, message for fine node is calculated using smoothness cost and msum . Remaining optimization procedure is exactly same as in coarse MRF. Figure 2. Sample frame from Yosemite sequence 3. FBP for sub-pixel resolution Table 1. Overall results(AAE/STD) for set of test image data. 5. Conclusions In this paper, we present a novel approach for motion estimation in sub-pixel resolution. The problem of large number of nodes is relieved by constructing two-layered MRF model. Moreover, applying multi-scale FBP sequence, our method could dramatically reduce complexity in terms of memory as well as computation. Figure 1. FBP sequence in multi-scale MRF So far, we have discussed our two-layered MRF model and optimization scheme using belief propagation. Although our method can successfully reduces number of nodes for sub-pixel resolution application, computational and memory burden are still large. To cope with complexity problem, we adopted an memory efficient sequence for iterative optimization of belief propagation so-called fast belief propagation(FBP)[3]. FBP represents iterative message passing sequence for whole image as layered 3D graph whose layer implies the number of iteration. FBP changes the direction of iteration in the 3D graph by adopting slanted processing group. Changed architecture is shown in figure 1. When FBP sequence is combined with multi-scale optimization, required amount of memory is reduced 40 times. Overall memory reduction References [1] V. Komogorov and R. Zabih. Computing visual correspondence with occlusions using graph cuts. In ICCV, no.2, pp.508-515, 2001. [2] P.F. Felzenszwalb and D.R. Huttenlocher. Efficient belief propagation for early vision, In CVPR, no.1, pp.1261-1268, 2004. [3] S.C. Park and H. Jeong. Stereo matching using FBP : A memory efficient parallel architecture. In IPCV, no.2, pp.550-556, 2008. -6- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 A Neural Network Based Retrainable Framework for Robust Object Recognition with Application to Mobile Robotics Su-Yong An, Jeong-Gwan Kang, Won-Seok Choi, Se-Young Oh Department of Electronic and Electrical Engineering, POSTECH, Korea {grasshop, naroo1, onlydol, [email protected] Abstract In this paper, we address object recognition for a mobile robot which is deployed in a multistory building. To move to another floor, a mobile robot should recognize various objects related to an elevator, e.g., elevator control, call buttons, and LED displays. To this end, we propose a neural network based retrainable framework for object recognition, which consists of four components- preprocessing, binary classification, object identification, and outlier rejection. The binary classifier, a key component of our system, is a neural network that can be retrained, the motivation of which is to adapt to varying environments, especially with illuminations. Without incurring any extra process to prepare new training samples for retraining, they are freely obtained as a result of the outlier rejection component, being extracted on-line. To realize a practical system, we adopt a parallel architecture integrating both recognition and retraining processes for seamless object recognition, and furthermore detect and cope with the deterioration of a retrained neural network to ensure high reliability. We demonstrate the positive effect of retraining on the object recognition performance by conducting experiments over hundreds of images obtained in daytime and nighttime. preceding three steps. In brief, the main contribution of this paper is to provide simple but robust recognition system for mobile robots which adapt to varying environments through online retrainable neural network. 2. System Overview Fig. 1 Structure of the CPEs and two image queues. The CPEs consist of four components and two external image storages, defined as the Object Image Queue (ObjIQ) and the Outlier Image Queue (OutIQ) (Fig. 1). Each component has its unique function. 1. Introduction To navigate in a multistory environment, the robot should be able to exploit an elevator [1]. From a mobile robot’s point of view, elevator access is a very challenging task which has not yet been completely resolved. At the outset, a preliminary question which must be considered is an ability to recognize the status of an elevator, such as moving up or down. We proposed a sequential framework, named as the Cascaded Processing Elements (CPEs). The CPEs include: preprocessing, binary classification, object identification, and outlier rejection. The first step of preprocessing extracts object candidates from an input image regardless of varying illumination conditions using an adaptive thresholding and a size filtering technique. The binary classifier, as a key component of the proposed method, is implemented with a neural network whose weights can adapt to changing environments. Then, object identification is accomplished by template matching, giving an identified index to each candidate. In the last step, graph partitioning in conjunction with a priori knowledge is utilized to reject outliers which were not removed by the 3. Preprocessing and Neural network In the preprocessing component, an adaptive thresholding technique is applied to the whole region of the input image to extract object candidates under varying illumination conditions. Then, the several extracted object candidates go through size filtering, using its geometric constraints. (Fig. 2) Fig. 2 Preprocessing -7- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 In this study, the neural network, as the second component of the CPEs, classifies each candidate into an object or a non-object when an input image patch is fed into it. 4. Outlier Rejection by Graph Portioning and A Priori Knowledge Fig. 4 Performance improvement by the retrained neural network (a) Recognition result at frame 42. (b) Recognition result at frame 127. Fig. 3 (a) Control panel model. (b) Graph representation of object candidates Usually a graph G is expressed by a set of vertices V and edges E connecting them. An image space can be transformed into a graph space; object candidates and their relationships in the image space are mapped to vertices and edges, respectively, in the graph space by the control panel model shown in Fig. 3. 5. Neural Network: From a Retraining Perspective In most cases, an off-line training of the neural network is not enough to cope with all the possible variations of the environment including illumination conditions; hence it requires an on-line retraining behavior that can be performed in parallel to the recognition process. We now begin with a discussion of the four fundamental questions related to the on-line retraining neural network: Generation and reorganization of a new training set in real time for on-line retraining; this procedure has to be embodied in the recognition process so as to reduce extra processing time to generate new training data. It should also give a solution to the problem of forgetting some of the previously learned knowledge in the neural network. How to determine when to retrain; the decision is based on whether the environment changed or not. Evaluation of the retrained neural network; it is necessary after every retraining to validate the retrained neural network, avoiding getting stuck in local minima. Simultaneous retraining and recognition process; the retraining process has to be done in parallel to recognition process. Fig. 5 Performance comparison with varying OutIQ size The network performance was determined by the total number of outliers Noutlier occurring in each image set or the average number of outliers per image which is also called the outlier occurrence rate Ro (Fig. 5). 7. Conclusions A retrainable framework approach to object recognition in a visually dynamic environment has been presented. First, in order to extract object candidates, basic image processing techniques including adaptive thresholding, connected component labeling, and template matching were applied to an input image. An on-line retraining neural network as a primary component of the proposed method, serving as a binary classifier, has then been employed as a practical tool for coping with varying illumination conditions. The performance enhancement of the retrained neural network over a non-retrained one was verified by experiments over hundreds of images, being represented by a quantitative difference in the number of outliers that occurred. Acknowledgments This work was supported by the grant No. RT104-02-06 fromthe Regional Technology Innovation Program of the Ministry of Commerce, Industry and Energy (MOCIE), of Korea. 6. Experimental Results Fig. 4 directly illustrates the effect of retraining. The red and green squares correspond to candidates which go through the neural network and template matching, but red squares indicate the outliers that are removed by the outlier rejection component in the CPEs and the green ones are the recognized objects. References [1] R. Simmons et al., “Grace: An autonomous robot for the AAAI robot Challenge,” AAAI Magazine, vol. 42, No. 2, pp 51–72, 2003. -8- Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Brain-inspired Emergence of Behaviors by Reinforcement Learning Masumi Ishikawa1, Mikio Morita1, Takao Hagiwara2 1 Department of Brain Science and Engineering, Kyushu Institute of Technology Email:[email protected], 2Cuebs 1. INTRODUCION We aim at developing truly autonomous mobile robots capable of behaving appropriately without being told what to do. This is what mammals do in their daily lives. We believe a developmental approach is effective to realize this. The desire for existence, specific curiosity, diversive curiosity, boredom, and novelty are important factors contributing to learning and development. We propose mathematical models for internal rewards in reinforcement learning. Schmidhuber proposed two curiosity principles in response to prediction errors and predictor improvements [1]. Oudeyer et al. proposed architecture based on the second curiosity principle [2]. Stout et al. used both external and internal rewards, but the latter may be interpreted as a subgoal provided by a designer to accelerate learning [3]. If internal rewards are task dependent and are provided by a designer, the resulting mobile robots can no longer claim to be autonomous. In this paper we propose the followings, which we believe form a novel framework for useful and effective intelligent systems. Firstly, we propose to use multiple sources of rewards to endow mobile robots with ability to behave properly in the real world. This is because we think external rewards and curiosity rewards alone are not enough for mobile robots to behave properly in the complex and context-dependent real world. Secondly, we propose task-independent internal rewards, because task-dependent rewards provided by a designer are not applicable to other tasks as they are and do not contribute much to genuine autonomy. Thirdly, we propose to attain engineering merit of internal rewards, in addition to scientific interest. These proposals share one characteristic, i.e., use of reinforcement learning and internal rewards. The reason for this is that reward based learning with trial and error is, we believe, essential in developing intelligent behaviors. On the other hand, hand-crafted programs tend to impose a burden on humans, supervised learning has difficulty in providing teacher signals, and self-organizing learning alone is not powerful enough. A pursuit-evasion game comprising a predator and its prey on a robotic field is selected as a testbed. Simulation experiments as well as real experiments using mobile robots, WITHs, are carried out to demonstrate the utility and benefit of internal rewards in reinforcement learning. -9- 2. OVERVIEW OF A TASK To evaluate the utility and benefit of internal rewards, we use a predator and its prey on a robotic field as a testbed. A task, here, is that the predator learns to capture the prey for its survival, and the prey also learns to escape from the predator for its survival. The desire for existence is represented by an intrinsic effort to keep the level of a battery of an agent higher than a threshold. They also learn a map of the environment to make pursuit or evasion more efficient. The following is a prospective task scenario of agents. When the level of its battery is sufficiently high, an agent efficiently learns the environment motivated by curiosity, boredom and novelty [4]. Under this condition, curiosity, either specific or diversive, motivates the agent to learn the environment through a desire to increase accumulated future rewards. When some part of the environment is sufficiently learned, the agent gets bored and goes away by a negative reward caused by boredom, which provides a stronger motivation than diversive curiosity. When a novel object is observed, novelty motivates the agent to approach and closely observe it, which in turn motivates specific curiosity. On the other hand, when the level of its battery becomes lower, the predator starts to pursue its prey for its survival by taking advantage of the knowledge acquired through curiosity and novelty. Due to space limitation, only the desire for existence is presented here. 3. DESIRE FOR EXISTENCE To keep the level of a battery of an agent higher than a threshold, the predator must learn when to start pursuit of the prey. This is because if the predator captures the prey too early, i.e., when the battery level is significantly higher than the threshold, the external reward becomes less, hence the accumulated future rewards becomes less. Therefore, this pursuit timing is an important target of learning. As is well known, Q-learning has a difficulty of large computational cost. We propose to decrease the number of states by various approximations. The state here is defined by the battery level (20-level) and the distance to the prey (20-level), without absolute coordinates. When there are obstacles, however, they are not enough to define the state, and the distance to an obstacle in the forward direction (10-level) is added to the definition of state. Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 The speed of the agent (10-level) constitutes action alternatives. The ego-centric orientation of motion (5level) is added to the definition of the action in the presence of obstacles. The speed of the predator is given by, speed = (specified relative speed) * (maximum speed) * (motion efficiency) where motion efficiency decreases as the battery level falls off. The prey moves slowly when the distance is large enough, but escapes at the maximum speed when the distance is less than a threshold, which is also to be learned. 4. EXPERIMENTAL RESULTS When the predator captures the prey at a high battery level, a negative reward is given to suppress too early capture. Owing to these rewards, the predator is able to learn appropriate timing for effective pursuit. Fig. 1 illustrates that the remaining battery level at capture is about 36%, which is much lower than about 75% in case without learning of pursuit timing. Fig. 2 depicts that negative rewards are given to adjust the pursuit timing, especially in the beginning stage of learning. Fig. 3 illustrates that the predator captures the prey with low battery level owing to learning of the pursuit timing. remaining battery at capture 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 500 1000 1500 2000 2500 No. of episodes Fig. 1 The level of remaining battery at the time of capture 3 rewards at capture 2 1 0 -1 -2 -3 -4 0 500 1000 1500 2000 2500 No. of episodes Fig. 2 The amount of reward at the time of capture - 10 - 0 1 3 16-20 12-16 5 8-12 6 4-8 0-4 0 1 2 3 4 5 6 7 4 Distance to the prey 2 7 -4-0 8 8 9 9 Battery level Fig. 3 The resulting Q-values with training of battery level 5. CONCLUSIONS We carried out simulation experiments on the predator and the prey task to demonstrate the utility and benefit of introducing the desire for existence, specific curiosity, diversive curiosity, boredom, and novelty as internal rewards into reinforcement learning. Skill acquired through learning by simulation was implemented on real mobile robots, WITHs. Simulation experiments as well as real experiments using mobile robots on a robotic field well demonstrated the effectiveness of introducing internal rewards in addition to external rewards in reinforcement learning. What was done, however, is still a small step towards truly autonomous mobile robots. Much remains to be done for future study. This research was supported by a COE program (#J19) granted to KIT by MEXT. We also acknowledge valuable discussions with Profs. Masahiro Nagamatsu, Takeshi Yamakawa, Kazuo Ishii, Hideki Nakagawa, and Hong Zhang at KIT. REFERENCES [1] J. Schmidhuber, “Self-motivated development through rewards for predictor errors/ improvements”, Developmental Robotics AAAI Spring Symp., 2005. [2] P-Y. Oudeyer and F. Kaplan and V. V. Hafner, “Intrinsic Motivation Systems for Autonomous Mental Development,” IEEE Trans. EC, Vol.11, No.2, pp.265-286, 2007. [3] A. Stout, G. D. Konidaris, A. G. Barto, “Intrinsically motivated reinforcement learning: A promising framework for developmental robot learning,” Developmental Robotics AAAI Spring Symp., 2005. [4] M. Ishikawa, T. Hagiwara, N. Yamamoto, F. Kiriake, “Brain-inspired emergence of behaviors in mobile robots by reinforcement learning with internal rewards,” HIS2008 (to appear). Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 NON-CONVEX RANKING SVM WITH FEWER SUPPORT VECTORS Hyohyeong Kang, Seungjin Choi Department of Computer Science, POSTECH, Korea {paanguin, [email protected] ABSTRACT To reduce number of support vectors of Ranking SVM, we derived a modified Ranking SVM algorithm and compared with Ranking SVM. One approach for reducing number of support vectors is to formulate SVM with a non-convex loss function. We applied that approach to reformulate Ranking SVM’s objective function, and derived an iterative training algorithm using concave convex procedure (CCCP). Number of support vectors of the proposed algorithm was reduced while accuracy of ranking was retained. The algorithm improved the scalability of Ranking SVM at prediction stage. function, the concave convex procedure (CCCP) [8] is applied, and an iterative optimization procedure derive is derived. In this paper, we tried that approach to reduce the number of support vectors in Ranking SVM. We derived the CCCP for the ramp loss Ranking SVM and compared it with the Ranking SVM on a synthetic data. 2. RANKING SVM Let us briefly summarizes the Ranking SVM formulation [2]. We are given a set of queries {qk } and documents {d i } . For each query, a partial ranking of corresponding documents can be specified as ordered pairs ( qk , d i ) f ( qk , d j ) which represents d i is more relevant than d j for the query qk . For simplicity, let us denotes a 1. INTRODUCTION The ranking problem is to learn the ranking over all items using training data. Recently, this problem has been intensively considered in machine learning to develop web search engines and recommendation systems. The problem can be solved by finding the ranking function that returns higher scores for items in higher ranks and lower scores for items in lower ranks. Many algorithms such as Ranking SVM [1], [2], RankNet [3] and RankBoost [4] have been proposed to learn the ranking function from training data. Ranking SVM [1], [2] is one algorithm that solves the ranking problem. To formulate the ranking problem, Ranking SVM applies a support vector machine (SVM) [5], which is a well-known machine learning algorithm for classification and regression. In classification problem, SVM finds the separating hyperplane which maximizes the margin. The margin of a separating hyperplane is the distance between the nearest points which are separated as distinct classes by the hyperplane. Ranking SVM showed that the idea of maximizing the margin also can be applied for the ranking problem [1]. Although the Ranking SVM trains an accurate ranking function, it has some problems with scalability in the prediction stage. In SVM, the number of support vectors, which are a subset of the training data that construct the decision boundary, increases linearly with number of training samples [6]. One approach in classification that reduces the number of support vectors is to replace the convex loss function with a non-convex function [7]. When the hinge loss function in the SVM objective function is replaced with a ramp loss function, number of support vectors is reduced and grows more slowly than in the standard SVMs. To optimize the non-convex objective vector corresponds to each query-document pair ( q, d ) as x and the set of all ordered pairs as R . The goal of the ranking problem is to find the ranking function that generalizes ranking to test samples from the given training samples. The Ranking SVM finds a linear function T f ( x ) = w Φ ( x ) such that ∀( xi f x j ) ∈ R, f ( xi ) > f ( x j ) where Φ( x) is a feature mapping function and w is a weight vector learned by the algorithm. The margin for Ranking SVM is given as the minimum difference between ordered pair f ( xi ) − f ( x j ) where ( xi f x j ) ∈ R . We can find w that maximizes the margin by solving the constrained optimization problem 1 2 min || w || +C ∑ ξ ij w ,ξ 2 ij ij subject to w (Φ ( xi ) − Φ ( x j )) ≥ 1 − ξ ij T ξ ij ≥ 0, ∀( xi f x j ) ∈ R (1) where C is a constant and ξij is a slack variable. 3. CCCP-RANKING SVM By using the hinge loss function H s (t ) = max(0, s − t ), we can express the optimization problem (1) as 1 2 T min || w || +C ∑ H 1 ( w (Φ ( xi ) − Φ ( x j ))). w 2 The ramp loss function - 11 - (2) Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Rs (t ) = min(1 − s, max(0,1 − t )) 2000 is a non-convex function which is used to reduce the number of support vectors in [7]. By replacing the hinge loss function in (2) with the ramp loss function, we obtain 1 2 T || w || +C Rs ( w (Φ ( xi ) − Φ ( x j ))) (3) min w 2 for a predefined parameter s < 1 . Because the ramp loss function can be expressed as Rs (t ) = H 1 (t ) − H s (t ) , Number of Support Vectors 1600 ∑ 1200 1000 800 600 200 concave so that objective function (3) can be expressed as J ( w) = 1400 400 {{ convex Ranking SVM Ramp loss Ranking SVM (s = -1) Ramp loss Ranking SVM (s = -0.5) Ramp loss Ranking SVM (s = -0.3) 1800 0 1 || w ||2 +C ∑ H1 ( wT (Φ ( xi ) − Φ ( x j ))) 2 144444424444443 0 200 400 600 800 1000 1200 1400 Number of Training Samples 1600 1800 2000 Fig. 1 Number support vectors vs. number of training samples. J convex ( w ) −C ∑ H s ( wT (Φ ( xi ) − Φ ( x j ))). 14444 4244444 3 J concave ( w ) Since the objective function is decomposed as sum of a concave function and a convex function, we can apply the concave convex procedure (CCCP) [8] to derive an iterative optimization algorithm [7]. Let us denotes Fig. 2 Averaged performance of Ranking SVM and ramp loss Ranking SVM. ⎧C if wT (Φ ( xi ) − Φ ( x j )) < s β ij = =⎨ ∂f w (Φ ( xi ), Φ ( x j )) ⎩ 0 otherwise ∂J concave ( w) 5. CONCLUSION We implemented the ramp loss SVM using CCCP and compared its complexity and accuracy with on two ranking datasets. Compared to the Ranking SVM, our proposed algorithm reduced the number of support vectors while retaining the three performance measures approximately same. The algorithm improved the scalability of Ranking SVM at prediction stage. where f w (Φ ( xi ), Φ ( x j )) = w (Φ ( xi ) − Φ ( x j )) . Then at T each iteration of CCCP for the ramp loss ranking SVM, we solve the convex optimization problem 1 * 2 T w = arg min[ || w || + C ∑ H 1 ( w (Φ ( xi ) − Φ ( x j ))) w 2 + ∑ β ij w (Φ ( xi ) − Φ ( x j ))], T REFERENCES where βij is determined by w in the previous iteration. This problem can be reformulated as similar dual problem in the standard SVM, so that we can solve it using the same method that solves the standard SVM problem. [1] R. Herbrich, T. Graepel, and K. Obermayer, Large margin rank boundaries for ordinal regression. MIT Press, Cambridge, MA, 2000. [2] T. Joachims, “Optimizing search engines using clickthrough data,” in KDD ’02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York, NY, USA: ACM, 2002, pp. 133–142. [3] C. Burges, T. Shaked, E. Renshaw, A. Lazier, M. Deeds, N. Hamilton, and G. Hullender, “Learning to rank using gradient descent,” Bonn, Germany, 2005, pp. 89–96. [4] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on-line learning and an application to boosting,” in European Conference on Computational Learning Theory, 1995, pp. 23–37. [5] V. N. Vapnik, The nature of statistical learning theory. New York, NY, USA: Springer-Verlag New York, Inc., 1995. [6] I. Steinwart, “Sparseness of support vector machines—some asymptotically sharp bounds,” in Advances in Neural Information Processing Systems 16, S. Thrun, L. Saul, and B. Sch¨olkopf, Eds. Cambridge, MA: MIT Press, 2004. [7] R. Collobert, F. Sinz, J. Weston, and L. Bottou, “Trading convexity for scalability,” in ICML ’06: Proceedings of the 23rd international conference on Machine learning. New York, NY, USA: ACM, 2006, pp. 201–208. [8] A. L. Yuille and A. Rangarajan, “The concave-convex procedure (cccp),” in NIPS, 2001, pp. 1033–1040. [9] K. J¨arvelin and J. Kek¨al¨ainen, “Ir evaluation methods for retrieving highly relevant documents,” in SIGIR ’00: Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval. New York, NY, USA: ACM Press, 2000, pp. 41–48. 4. NUMERICAL EXPERIMENT We randomly generated 10,000 10-dimensional vectors. Each dimension of the vectors was generated from a uniform distribution defined between [0, 1]. We assigned relevance score using a linear function f ( x ) = w x + e where w is also randomly generated from the uniform distribution and e is Gaussian noise generated from the standard normal distribution. We compared the number of support vectors of Ranking SVM and Ramp loss ranking SVM with three distinct s values; -1, -0.5 and -0.3. We compared the quality of ranking by using three ranking measures; [email protected] ([email protected]), mean average precision (MAP) and normalized discount cumulative gain ([email protected]) [9]. In Ramp loss ranking SVM, the number of support vectors reduced (Fig. 1), while retaining quality of the ranking (Fig. 2). T - 12 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Nonnegative Tensor Factorization for EEG-based BCI Hyekyoung Lee, Yong-Deok Kim, Seungjin Choi Department of Computer Science, POSTECH, Korea {leehk, karma13, [email protected] Abstract In this paper we present a method for continuous EEG classification, where we employ nonnegative tensor factorization (NTF) to determine discriminative spectral features and use the Viterbi algorithm to continuously classify multiple mental tasks. This is an extension of our previous work on the use of nonnegative matrix factorization (NMF) for EEG classification. Numerical experiments with two data sets in BCI competition, confirm the useful behavior of the method for continuous EEG classification. (higher-order SVD) are exemplary methods which are extensions of factor analysis and SVD. PARAFAC model was exploited in EEG data analysis. Nonnegative tensor factorization (NTF) incorporates nonnegativity constraints into PARAFAC model, extending NMF in the framework of tensor algebra. Various information divergences were employed in NTF [1]. In this paper, we revisit PARAFAC model and present a NTF algorithm in a compact form, in the framework of tensor algebra. Then we cast the time-frequency representation of multichannel EEG data into a N-way tensor and apply NTF to determine discriminative spectral features. With these features, we use the Viterbi algorithm for continuous EEG classification with no trial structure. 1. INTRODUCTION Brain computer interface (BCI) is a system that is designed to translate a subject's intention or mind into a control signal for a device such as a computer, a wheelchair, or a neuroprosthesis. BCI provides a new communication channel between human brain and computer and adds a new dimension to human computer interface (HCI). The most popular sensory signal used for BCI is electroencephalogram (EEG) which is the multivariate time series data where electrical potentials induced by brain activities are recorded in a scalp. Exemplary spectral characteristics of EEG, in motor imagery tasks which are considered in this paper, are mu rhythm (8-12 Hz) and beta rhythm (18-25 Hz) which decrease during movement or in preparation for movement(event-related desynchronization, ERD) and increase after movement and in relaxation (event-related synchronization, ERS). However those phenomena could happen in different frequency bands, depending on subjects. For instance, they might occur in 16-20 Hz, not in 8-12 Hz. Therefore, methods for determining meaningful discriminative features become more important. Nonnegative matrix factorization (NMF) is interesting linear data model, which is more appropriate for handling nonnegative data [3]. NMF allows only non-subtractive combinations of nonnegative basis vectors, providing a parts-based representation. The time-frequency representation of EEG data computed by short-time Fourier transform or wavelet transform, can be cast into a nonnegative data matrix. Recently, NMF was shown to be useful in determining discriminative basis vectors which well reflect meaningful spectral characteristics without the cross-validation in motor imagery EEG task. Multiway analysis extends aforementioned linear methods, working with multiway data array which is referred to as tensor. PARAFAC and multiway SVD 2. METHOD AND EXPERIMENTAL RESULTS For our empirical study, we used two data sets: one is the dataset III in BCI competition II, which was provided by the Laboratory of Brain-Computer Interfaces (BCI-Lab), Graz University of Technology, and the other is the dataset V in BCI competition III, which was provided by the IDIAP Research Institute. The Graz dataset involves left/right imagery hand movements and consists of 140 labeled trials for training and 140 unlabelled trials for test. It contains EEG acquired from two channels: C3 and C4. The IDIAP dataset contains EEG data recorded from 3 normal subjects with eight channels C3, Cz, C4, CP1, CP2, P3, Pz, and P4, and involves three tasks, including the imagination of repetitive self-paced left/right hand movements and the generation of words beginning with the same random letter. EEG data is not splitted in trials, since the subjects are continuously performing any of the mental tasks (i.e., no trial structure). Preprocessing We construct various types of a data tensor from spectral EEG data obtained by short-time Fourier transform or wavelet transform in Table 1. Figure 1. 3-way tensor (PARAFAC model) PARAFAC finds the meaning component matrices for each mode. -way data tensor 2 3 4 (time) 4 (class) Graz IDIAP Table 1. N-way data tensors used for NTF - 13 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Figure 2. An Illustration for a four-way NTF of EEG data (frequency x time x channels x trials). In the left figure, the structure of the spectral EEG data tensor is illustrated. To display four-way tensor structure, we place three-way tensors in a column (of trials). Upper 70 trials are for imagination of left-hand movement, and lower 70 trials are for imagination of right-hand movement. In the right figure, the results of four-way NTF with four components are shown. The core tensor G is a super-diagonal tensor. The components of a factor matrix are displayed in a column. Each factor matrix represents the components of the frequency, time, space (C3 and C4 channels) and trial domain, respectively. Component 1 is generated by the imagination of left-hand movement, and Component 2 is by right-hand one. Component 3 shows a subject cannot concentrate on the imagination for the specific hand after a few seconds passed in a trial. The theta rhythm (4-8 Hz), which is related with quiet concentration, is shown in the Component 4. Feature Extraction We extract discriminative spectral features by applying NTF to preprocessed data. Depending on a way of constructing a data tensor, classification results are slightly different. One exemplary result of NTF shows in Figure 2. Table 2. Classification Accuracy of IDIAP data spectral characteristics) without cross-validation. NTF, more general framework with multiway structure, can find the hidden structures for new dimension such as time or class. Continuous EEG classification don't need the trial structure. The Viterbi algorithm can use the classifier to infer the possible hidden labels from observed data sequence. Our experiments show that it could be applied to uncued EEG classification successfully. Figure 3. Experimental results of Graz data (left) and IDIAP data (right) in time domain. Classification For the single-trial online classification for Graz data (with trial structure), we use a Gaussian probabilistic model-based classifier where Gaussian class-conditional probabilities for a single point in time are integrated temporally by taking the expectation of the class probabilities with respect to the discriminative power at each point in time [2]. For the on-line classification for IDIAP data which consist of uncued EEG signals, we use the Viterbi algorithm that is a dynamic programming algorithm for finding a most probable sequence of hidden states that explains a sequence of observations. Classification results shows in Figure 3 and Table 2. ACKNOWLEDGMENTS This work was supported by KOSEF Basic Research Program (grant R01-2006-000-11142-0) and National Core Research Center for Systems Bio-Dynamics. REFERENCES [1] Hyekyoung Lee, Yong-Deok Kim, Andrzej Cichocki, and Seungjin Choi, Nonnegative tensor factorization for continuous EEG classification, International Journal of Neural Systems, vol. 17, no. 4, pp. 305-317, August 2007. [2] S. Lemm, C. Schafer, and G. Curio, BCI competition 2003-data set III: Probabilistic modeling of sensorimotor mu rhythms for classification of imaginary hand movements, IEEE Trans. Biomedical Engineering, 51(6), 2004. [3] H. Lee, A. Cichocki, and S. Choi, Nonnegative matrix factorization for motor imagery EEG classification, Proc. Int'l Conf. Artificial Neural Networks, 2006. 3. CONCLUSIONS We have presented an NTF-based method of feature extraction and the Viterbi algorithm for continuous EEG classification. The NMF-based method, linear data model with nonnegativity constraint, could find discriminative and representative basis vectors (which reflected proper - 14 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Orthogonal Nonnegative Matrix Factorization: Multiplicative Updates on Stiefel Manifolds Jiho Yoo, Seungjin Choi Department of Computer Science, POSTECH, Korea {zentasis, [email protected] Abstract Nonnegative matrix factorization (NMF) is a popular method for multivariate analysis of nonnegative data, the goal of which is decompose a data matrix into a product of two factor matrices with all entries in factor matrices restricted to be nonnegative. NMF was shown to be useful in a task of clustering (especially document clustering). In this paper we present an algorithm for orthogonal nonnegative matrix factorization, where an orthogonality constraint is imposed on the nonnegative decomposition of a term-document matrix. We develop multiplicative updates directly from true gradient on Stiefel manifold, whereas existing algorithms consider additive orthogonality constraints. Experiments on several different document data sets show our orthogonal NMF algorithms perform better in a task of clustering, compared to the standard NMF and an existing orthogonal NMF. 1. INTRODUCTION Nonnegative matrix factorization (NMF) is a multivariate analysis method which is proven to be useful in learning a faithful representation of nonnegative data such as images, spectrograms, and documents [1]. Incorporating extra constraints such as locality and orthogonality was shown to improve the decomposition, identifying better local features or providing more sparse representation [2]. Orthogonality constraints were imposed on NMF [3], where nice clustering interpretation was studied in the framework of NMF. A prominent application of NMF is in document clustering [4, 5], where a decomposition of a term-document matrix was considered. In this paper we consider orthogonal NMF and its application to document clustering, where an orthogonality constraint is imposed on the nonnegative decomposition of a term-document matrix. We develop new multiplicative updates for orthogonal NMF, which are directly derived from true gradient on Stiefel manifold, while existing algorithms consider additive orthogonality constraints. Experiments on several different document data sets show our orthogonal NMF algorithms perform better in a task of clustering, compared to the standard NMF and an existing orthogonal NMF. 2. NMF FOR DOCUMENT CLUSTERING NMF seeks a decomposition of X ∈ m× N that is of the form X ≈ UV T , (1) where U ∈ and V ∈ N × K are restricted to be nonnegative matrices as well and K corresponds to the number of clusters when NMF is used for clustering. Applying NMF to a term-document matrix for document clustering, each column of X is treated as a data point in m -dimensional space. In such a case, the factorization (1) is interpreted as follows. - U ij corresponds to the degree to which term ti m× K belongs to cluster c j . In other words column j of U , denoted by u j , is associated with a prototype vector (center) for cluster c j . - Vij corresponds to the degree document di is associated with cluster normalization, Vij j . With appropriate is proportional to a posterior probability of cluster c j given document di . 2.1 Multiplicative updates for NMF With the least squares error function, NMF involves the following optimization: 1 arg minU ≥ 0,V ≥ 0 ε = ‖X − UV T‖2 . (2) 2 Gradient descent learning (which is additive update) can be applied to determine a solution to (2), however, nonnegativity for U and V is not preserved without further operations at iterations. On the other hand, a multiplicative method developed in [6] provides a simple algorithm for (2). We give a slightly different approach from [6] to derive the same multiplicative algorithm. Suppose that the gradient of an error function has a decomposition that is of the form ∇ε = [∇ε ]+ − [∇ε ]− , (3) where [∇ε ]+ > 0 and [∇ε ]− > 0 . Then multiplicative update for parameters Θ has the form .η ⎛ [∇ε ]− ⎞ , (4) ⎜ + ⎟ ⎝ [∇ε ] ⎠ where represents Hadamard product (element-wise product) and (·).η denotes the elementwise power and Θ←Θ η is a learning rate (0 < η ≤ 1) . It can be easily seen that the multiplicative update nonnegativity of the parameter when the convergence is achieved. With the gradient calculations of η = 1 yields the well-known multiplicative updates [6] - 15 - (4) preserves the Θ , while ∇ε = 0 (2), the rule (4) with Lee and Seung’s Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 performance of different algorithms. Because the algorithms gave different results depending on the initial conditions, we calculated the mean of 100 runs for different initial conditions. Our orthogonal NMF algorithm gave better performance than the standard NMF and DTPP for the most of the datasets (Table 1). The orthogonality of the matrix V is also measured by using ‖V T V − I‖. The changes of the orthogonality over the iterations are measured and averaged for 100 trials. NMF DTPP ONMF cstr 0.7568 0.7268 0.7844 wap 0.4744 0.4281 0.4917 k1a 0.4773 0.4311 0.4907 k1b 0.7896 0.6087 0.8109 re0 0.3624 0.3384 0.3691 re1 0.4822 0.4452 0.5090 XV , (5) UV T V X TU , (6) V ←V VU T U where the division is also an elementwise operation. 3. ORTHOGONAL NMF FOR DOCUMENT CLUSTERING Orthogonal NMF with V T V = I is formulated as following optimization problem: 1 arg minU ,V ε = ‖X − UV T‖2 (7) 2 T subject to V V = I , U ≥ 0, v ≥ 0 In general, the constrained optimization problem (2) is solved by introducing a Lagrangian with a penalty term tr {Λ (V T V − I )} where Λ is a symmetric matrix U ←U Table 1. Mean clustering accuracies (n=100) of standard NMF, Ding's orthogonal NMF (DTPP), and our orthogonal NMF (ONMF) for six document datasets. containing Lagrangian multipliers. Ding et al. [3] took this approach with some approximation, developing multiplicative updates. Here we present a different approach, incorporating the gradient in a constraint surface on which V T V = I is satisfied, into (4). With U fixed in (2), we treat (2) as a function of V . Minimizing (2) where V is constrained to the set of n × K matrices such that V T V = I was well studied in [7]. Here we incorporate nonnegativity constraints on V to develop multiplicative updates with preserving the orthogonality constraint V T V = I . The constraint surface which is the set of n × K orthonormal matrices such that V T V = I is known as the Stiefel manifold [8]. % ε can be The gradient on the Stiefel manifold ∇ V computed as % ε = ∇ ε − V (∇ ε )T V , ∇ (8) V V V where ∇V is the gradient of V computed on the Euclidean space. Thus, with partial derivatives in (?), the gradient in the Stiefel manifold is calculated as % ε = (− X T U + VU T U ) − V (− X T U + VU T U )T V ∇ V = VU T XV − X T U % ε ]+ − [∇ % ε ]− = [∇ V 5. CONCLUSIONS We have developed multiplicative updates on Stiefel manifold for orthogonal NMF. We have successfully applied the algorithm to a task of document clustering, empirically indicating performance gains over standard NMF and existing orthogonal NMF. ※This paper is a shortened version of recently accepted paper [9] REFERENCES [1] Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401 (1999) 788–791 [2] Li, S.Z., Hou, X.W., Zhang, H.J., Cheng, Q.S.: Learning spatially localized parts-based representation. In: Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Kauai, Hawaii (2001) 207–212 [3] Ding, C., Li, T., Peng, W., Park, H.: Orthogonal nonnegative matrix tri-factorizations for clustering. In: Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), Philadelphia,PA (2006) [4] Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the ACM SIGIR International Conference on Research and Development in Information Retrieval (SIGIR), Toronto, Canada (2003) [5] Shahnaz, F., Berry, M., Pauca, P., Plemmons, R.: Document clustering using nonnegative matrix factorization. Information Processing and Management 42 (2006) 373–386 [6] Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems (NIPS). Volume 13., MIT Press (2001) [7] Edelman, A., Arias, T., Smith, S.T.: The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20(2) (1998) 303–353 [8] Stiefel, E.: Richtungsfelder und fernparallelismus in n-dimensionalem mannig faltigkeiten. Commentarii Math. Helvetici 8 (1935-1936) 305–353 [9] Jiho Yoo and Seungjin Choi, "Orthogonal nonnegative matrix factorization: Multiplicative updates on Stiefel manifolds," in Proceedings of the 9th International Conference on Intelligent Data Engineering and Automated Learning ( IDEAL-2008 ), Daejon, Korea, November 2-5, 2008. (9) V % Invoking the relation (4) with replacing ∇V by ∇ V yields X TU , V ←V (10) VU T XV which is our ONMF algorithm. The updating rule for U is the same as (5). 4. NUMERICAL EXPERIMENTS We tested our orthogonal NMF algorithm on the six standard document datasets (CSTR, k1a, k1b, re0, and re1) and compared the performance with the standard NMF and the Ding’s orthogonal NMF (DTPP)[3]. We applied the stemming and stop-word removal for each dataset, and select 1,000 terms based on the mutual information with the class labels. Normalized-cut weighting [4] is applied to the input data matrix. We use the accuracy to compare the clustering - 16 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Facial Expression Recognition using Feature Tracking Jongju Shin, Daijin Kim Department of Computer Science, POSTECH, Korea {jjshin, [email protected] Abstract In this paper, we propose facial expression recognition using feature tracking. Facial features are tracked using Inverse Compositional Image Alignment (ICIA) algorithm. This algorithm is faster than the conventional tracking algorithm like Lucas and Kanade [3]. To initialize the facial features, we use Active Appearance Models(AAMs). AAMs generate the model instance which is similar with the input face image and find the facial features automatically. Support Vector Machine (SVM) is used to classify the facial expression with the shape parameters and the appearance parameters of the tracking result. The classification result shows more than 94%. Fig 1. The face with 18 landmark points. basis. In figure (1), 18 tracking regions are represented as the square. Each square has 11-by-11 pixels. 1. INTRODUCTION Facial expression recognition is the important area to Human Computer Interaction (HRI). The methods of the facial expression recognition are classified as two approaches’ : holistic approach and analytic approach. In the holistic approach, the face is represented as a whole unit. Usually, the intensities of the whole face is used to recognize the facial expression. In the analytic approach, the face is modeled as a set of facial points. Active Appearance Models (AAMs) [1], Active Shape Models (ASMs) [2] are used to model the face. AAM uses the whole face’s intensities with 70 points to model the face. To recognize the facial expression, the whole face and the 70 feature points are not necessary. We need to know where the eye, eyebrow, and the mouth are because they are related to the facial expression. We use the feature tracking algorithm – Inverse Compositional Image Alignment (ICIA) - for certain ROI(Region Of Interest) for facial expression recognition. Before tracking the features of the face, AAM is used to detect features of the face at the first frame. The features of the face are then tracked in each frame. 3. Inverse Compositional Image Alignment To track the features of the face, we use the image alignment algorithm. Lucas and Kanade (LK) algorithm[3] is the most famous algorithm in the image alignment. But LK algorithm needs to compute the jacobian, the steepest descent images, and the hessian in each iteration. To reduce the computation cost of the image alignment, Inverse Compositional Image Alignment (ICIA) algorithm[4] is proposed by Matthews and Baker. ICIA algorithm changes the role of the model and the input image. The jacobian, the steepest descent images, and the hessian are computed once before iteration. ICIA algorithm is then faster than the LK algorithm, and shows similar tracking result. The minimization term of the ICIA algorithm is ∑ [T (W (Χ; ΔΡ)) − I (W (Χ; Ρ))] , 2 (2) Χ 2. 2D Face Model The face is represented by 18 points like figure (1). 18 points are enough to recognize the facial expression. The face shape model is represented by the combination of the mean shape and the shape basis : where T is the model, and I is the input image. Taylor expansion with the first order term on equation (2) is ∂W ∑Χ [T (W (Χ;0)) + ∇T ∂Ρ ΔΡ − I (W (Χ; Ρ))]2 . (3) Then, the least square solution on ΔΡ is ∂W ⎡ ⎤ ΔΡ = H −1 ∑ ⎢∇T ΔΡ ⎥[ I (W ( Χ; Ρ)) − T ( Χ)]2 , (4) ∂ Ρ ⎦ Χ ⎣ Where H is the Hessian matrix : (1) (5) s = s0 + ∑ pi si , ∂W ∂W ⎡ ⎤ ⎡ ⎤ H = ∑ ⎢∇T ΔΡ ⎥ ⎢∇T ΔΡ ⎥ . ∂ Ρ ∂ Ρ ⎦ ⎣ ⎦ Χ ⎣ ΔΡ is updated by (6) iteratively : T i where pi is the shape parameters and si is the shape - 17 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 W ( Χ; Ρ) = W ( Χ; Ρ) o W ( Χ; ΔΡ) −1 6. Facial Expression Recognition using SVM To recognize the facial expression, we use the shape parameters and the appearance parameters simultaneously. The shape parameters obtained in the procedure of tracking algorithm. The appearance parameters are computed by projection on the appearance basis : T (10) λ = Abasis ( A − A0 ) , (6) 4. Tracking Instead of the model T , we use n − 1 frame as the model to track. We suppose we know the parameters Ρ on the n − 1 frame. The minimization term is then ∑[I Χ n −1 (W ( Χ; ΔΡ)) − I n (W ( Χ; Ρ))]2 , (7) A is the appearance of the input image, A0 is the mean appearance, and Abasis is the appearance basis. Two parameters are concatenated as [Ρ λ ] and treated as the where where I n−1 is the n − 1 frame as the model, and I n is n frame as the current input image. The error term from equation (7) is [ I n (W ( Χ; Ρ)) − I n−1 ( Χ)] . (8) input parameters of the RBF kernel function : To be a robust tracker, we add the first frame : − K ( xi , x j ) = e [(1 − ε )( I n (W ( Χ; Ρ)) − I n−1 ( Χ)) + ε ( I n (W ( Χ; Ρ)) − I1 ( Χ ))] . (9) ( xi − x j ) 2 2σ 2 . (11) We classified 4 facial expression: neutral, angry, happy, surprise. This error term reduces the drift problem in the tracking. 5. Initialization using AAM In the tracking problem, we supposed we know the parameters Ρ on the n − 1 frame. This makes the feature detection algorithm is needed in the first frame. We make Active Appearance Models on the face with 64 landmarks like figure (2). 7. Experiments We tested 200 images each facial expression. The result of the classification is on table (1). Result neutral 199/200 angry 177/200 happy 195/200 surprise 188/200 Total 759/800 (94.9%) Table (1). The result of the facial expression recognition. Neutral and happy are well classified. But classification result of angry is lower than others. Angry is sometimes misclassified as neutral because the shape parameters of angry are not much different from those of neutral. 5. Conclusions We implemented Inverse Compositional Image Alignment (ICIA) algorithm to track facial features and Support Vector Machine to classify the facial features by the shape parameters and the appearance parameters.. Because facial feature tracking result is better than AAM, the facial expression recognition result using facial feature tracking is better than that using AAM. Figure (2). The mean shape of AAM with 64 landmarks. In figure (2), the mean shape doesn’t contain the cheek which is not needed in the facial expression recognition. AAM algorithm detects the facial feature points automatically. By AAM, we can track feature points. But AAM fails sometimes when the input face is far from the mean shape statistically. Figure (3) shows the case of the fail of AAM. Due to this problem, AAM is used at the first frame. The features from AAM are then tracked by ICIA algorithm. References [1] T.F. Cootes, G.J. Edwards and C.J. Taylor. (1998). Active appearance models. In Proceedings of the European Conference on Computer Vision, Vol. 2, pp. 484-498, Springer, 1998. [2] T.F. Cootes, D. Cooper, C.J. Taylor and J. Graham, Active Shape Models - Their Training and Application. Computer Vision and Image Understanding. Vol. 61, No. 1, Jan. 1995, pp. 38-59. [3] B.D. Lucas and T. Kanade, An iterative image registration technique with an application to stereo vision, in Proceedings of the 7th International Joint Conference on Artificial Intelligence(IJCAI ’81), April 1981, p.674-679. [4] S. Baker, R. Gross, and I. Matthews, “Lucas-kanade 20 years on: A unifying framework: Part 3,” Robotics Institute, Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-RI-TR-03-35, November 2003. Figure (3). The example of the fail of AAM. AAM fails to detect the features of mouth. - 18 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 A Generative/Discriminative Hybrid Model for Discriminative Motif Discovery Jong Kyoung Kim, and Seungjin Choi Department of Computer Science, POSTECH, Korea {blkimjk, [email protected] Abstract We present a probabilistic model to identify the binding sites of transcription factors in a discriminative setting. We compare generative and discriminative approaches for learning and provide a hybrid generative/discriminative model to take advantages of both methods. In particular, our hybrid model can make use of unlabeled DNA sequences for semi-supervised learning. We demonstrate the performance of our hybrid model using both synthetic and real biological data for discovering DNA motifs. Our results show that the optimum performance is obtained between the purely generative and the purely discriminative, and the semi-supervised learning improves the performance when the labeled sequences are limited. additional experimental data. However, it is difficult to provide enough labeled sequences with high confidence, and so a semi-supervised learning approach which uses unlabeled sequences should be incorporated. In this paper we propose a probabilistic model for discriminative motif discovery. Our model has two essential features that distinguish our approach from other existing methods. First, we develop a hybrid model combining generative and discriminative approaches to take advantages of both methods. Second, we can make use of unlabeled sequences to expand the training set consisting of labeled sequences. We consider a motif model Θ as a position frequency matrix which specifies a probability distribution over the four nucleotides for each position within a motif site. 1. INTRODUCTION The transcription of protein-coding genes is tightly regulated by DNA binding transcription factors (TF). The TFs bind to specific DNA regulatory sequences which are often referred to as transcription factor binding sites (TFBS). The precise identification of these regulatory sequences is a very first step toward a full understanding of the complex gene regulatory networks consisting of thousands of TFs and their target genes. The main bottleneck of searching and predicting novel TFBSs is the fact that the binding sites of the same TF are short (typically less than 12-14 base pairs in length) and degenerate (allow multiple mismatches) [1]. This property makes the prediction problem highly complicated because it leads to high false positive and negative rates in a genome-scale application. In addition, randomly occurring sequence patterns or repeating elements in eukaryotic genomes make the pattern detection problems further harder. To reduce the intrinsic error, a variety of integrative approaches using additional biological information have been proposed. The recent methods for discriminative motif discovery, as a specific example of the integrative approaches, attempt to find binding sites of an unknown TF relatively over-represented in a positive set of sequences compared to a negative set. The concept of relative over-representation, contrast to the classic non-discriminative motif discovery in which only one set of sequences is considered to find over-represented binding sites, can help distinguish genuine TFBSs from repeating elements or uninteresting TFBSs because the negative set provides an informative criterion to separate them. The positive and negative sets of sequences for discriminative motif discovery can be easily obtained by combining 2. GENERATIVE/DISCRIMINATIVE HYBRID MODEL Recently, several hybrid models of generative and discriminative have been proposed. The hybrid models can be divided into two groups according to the technique of combining the generative and discriminative likelihood. First approach considers the two likelihoods separately, and then combines the log transformed likelihoods linearly [2,3]. Another approach defines two probabilistic models for generative and discriminative approaches with two separate parameters specifying the two models, and then combines them by defining a joint prior distribution over the two parameters [4]. We adopt the second approach because it is based on the true likelihood of the well-defined model. We first consider a hybrid model with two separate motif models Θ G and Θ D for generative and discriminative components respectively. Now, the two motif models will be treated as random variables in order to impose a prior distribution. We consider the following form of prior distributions where Here, the variance s(α ) of the conditional distribution is defined by - 19 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 framework. We observed improved performance on synthetic and real biological data, by combining where α ∈ [0,1]. For α → 0, our model corresponds to the generative model. Conversely for α → 1, we obtain the discriminative model. The joint distribution for this hybrid model is given by The marginal likelihood of the input sequences can be interpreted as parameter regularization. We estimate the parameters of the hybrid model by maximizing the log likelihood using the gradient-based optimization algorithm. 3. EXPERIMENTAL RESULTS In this section, we show experimental results that demonstrate the ability of discovering motifs on synthetic and real biological data. In the first experiments with the synthetic data, we address the effect of α on identifying motif sites, and the merit of semi-supervised learning. Figure 1 (a) depicts the effect of α on the performance coefficient, where we run 10 times with differing random initialization and reported the average values with ±1 standard deviations. From Fig. 1 Effect of α and semi-supervised learning Table 1. Results on yeast ChIP-chip data. A: AlignACE, C: CONVERGE, D: MDScan, K: Kellis, M: MEME, N: MEME_c. α =0.8. the results, we see that α strongly influence the performance for motif discovery, and the best performance is obtained between generative and discriminative extremes. Figure 1 (b) shows that semi-supervised learning greatly improves the performance when the number of labeled sequences is very small. In the second experiment with yeast ChIP-chip data [5], we compare our method to 6 popular motif discovery algorithms to see the merits of discriminative and semi-supervised learning for motif discovery. We define the positive set as the one of probe sequences found to bind a TF (p-value ≤0.001). The negative set consists of randomly selected probe sequences found not to bind a TF (p-value >0.5), where the negative set contains twice as many sequences as the positive set. Unlabeled sequences are defined to be probe sequences with marginal confidence (0.001<p-value ≤0.002). We selected 76 TFs which have known binding specificity and non-empty unlabeled sequences. Table 1 shows the number of successfully identified TF motifs out of 76 TFs. The ability of our method to discover motifs is comparable to other motif discovery algorithm. Using unlabeled sequences, the performance of our method is increased by successfully identifying motifs with small training data. generative and discriminative approaches and augmenting the training data with unlabeled sequences. In future works, we will incorporate the concept of evolutionary conservation of motifs by introducing a joint prior distribution on the motif models of closely related species (transfer learning) and by using strongly labeled sites obtained from the sequence conservation. References [1] G. Pavesi, G. Mauri, and G. Pesole. In silico representation and discovery of transcription factor binding sites. Brief Bioinform, 5:217-236, 2004. [2] O. Yakhnenko, A. Silvescu, and V. Honavar. Discriminatively trained markov model for sequence classication. In ICDM, 2005. [3] A McCallum, C. Pal, G. Druck, and X. Wang. Multi-conditional learning: Generative/discriminative training for clustering and classication. In AAAI, 2006. [4] J. A. Lasserre, C. M. Bishop, and T. P. Minka. Principled hybrids of generative and discriminative models. In CVPR, 2006. [5] C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac, T. W. Danford, N. M. Hannett, J. B. Tagne, D. B. Reynolds, J. Yoo, E. G. Jennings, J. Zeitlinger, D. K. Pokholok, M. Kellis, P. A. Rolfe, K. T. Takusagawa, E. S. Lander, D. K. Gifford, E. Fraenkel, and R. A. Young. Transcriptional regulatory code of a eukaryotic genome. Nature, 431:99.104, 2004. 4. CONCLUSION We have presented a probabilistic model for discriminative motif discovery which can make use of unlabeled sequences and a prior knowledge of known motif sites. We also provided a hybrid generative/discriminative model that takes advantages of both learning approaches. Our model was compared with six different probabilistic models for motif discovery in terms of a unifying probabilistic - 20 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Robust lip feature tracking for lip-reading Jin Lee, Daijin Kim Department of Computer Science, POSTECH, Korea {leejin, [email protected] Abstract The speech recognition tools show more than 90% recognition rate under limited silent condition. But the recognition rate is fallen down rapidly with the raise of the nose. So, many researchers studied a visual-audio speech recognition system which is more stable under the noisy environment. We studied about the robust visual feature tracking methods as a previous study for the visual speech recognition system. In this paper, we suggest a robust lip feature tracking method that uses active appearance model for the initializing the lip shape model and uses Lucas-Kanade method(LK) based feature tracking algorithm for the lip feature tracking. In the experiment we can see the stable and robust fitting performance after more than 2000 frames since the initializing of the first frame. points on the lip outer line and 20 points on the inner lip line.(see Fig. 1)[1] To make shape model, we first pointed on the training images by hand and get the training data points X i , If the i th shape model is, T X i = ( xi1 , yi1 , xi 2 , yi 2 ,..., xi 44 , yi 44 ) then two similar shapes X 1 and X 2 can be aligned by minimizing, E = ( x1 − M ( s, θ )[ x2 ] − t )T W ( x1 − M ( s, θ )[ x2 ] − t ) (1) Where the pose transform for scale, s, rotation, θ ,and translation in x and y (t x , t y ) is, ⎡ x jk ⎤ ⎛ ( s cos θ ) x jk − ( s sin θ ) y jk ⎞ ⎟ M ( s, θ ) ⎢ ⎥ = ⎜ ⎜ ⎟ y ( s sin θ ) x ( s cos θ ) y + jk jk ⎠ ⎣⎢ jk ⎦⎥ ⎝ (2) t = (t x1 , t y1 , t x 2 , t y 2 ,..., t xN , t yN ) 1. INTRODUCTION Audio based speech recognition system is very weak under noisy environment. So, many researchers studied about visual-audio speech recognition system for the robust speech recognition under noisy environments. Most of the study about lip-reading was focused on lip feature tracking and analysis with the recognition tools such as Neural Network(NN), Support Vector Machine(SVM) or hidden Markov models(HMM). Then lip tracking is not simple problem. The lip is the most various part of the face organization. In general case, the head is always moving to the arbitrary directions then the lightening condition also can be changed. Further more, when a human speaks, the lip is very rapidly moving. So, the lip tracking algorithm must be vary fast and robust under various lighting condition and movements of the head and the lip. In this paper, we suggest a adaptive and robust lip feature tracking method that uses active appearance model for the initial lip shape model fitting and uses Lucas-Kanade method(LK) based feature tracking algorithm. (3) And W is a diagonal weight matrix for each point with weights that are inversely proportional to the variance of each point. Any valid shape can be approximated by adding a reduced subset, t, of these modes to the mean shape, xs = xs + Ps bs (4) Where Ps is the matrix of the first t eigenvectors, Ps = ( p1 , p2 ,..., pt ) and bs is a vector of t weights, bs = (b1 , b2 ,..., bt )T . bs can be calculated from an example set of points, bs = PsT ( xs − xs ) . (5) We can get shape model like (see Fig. 2). 60 60 40 40 60 60 40 40 20 20 20 20 0 0 0 0 -20 -20 -20 -40 -60 -100 -20 -40 -40 -60 -80 -60 -40 -20 0 20 40 60 80 100 -80 -150 -100 -50 0 50 100 150 -60 -150 -40 -100 -50 0 50 100 150 -60 -150 -100 -50 0 figure 2. Mean shape and top 3 modes of shape Each shape mode can be obtained from the Principle Component Analysis (PCA) of the shape data. Appearance mode of the AAM model can be calculated from the PCA of the training data. (see Fig. 3) 2. INITIALZING LIP MODEL WITH AAM (a) (b) figure 1. (a) AAM lip model (b)Appearance parts We made the 44 points lip shape model which consists 24 figure 3. - 21 - Mean and top 3 modes of the appearance 50 100 150 Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 4. Experiment Results AAM lip model shows a little good fitting performance in one image(see Fig. 5), but AAM is very sensitive to the initial model position, and, when we tracking the lip of the sequential image frames, the error is accumulated and fitting points are diverged. (see Fig. 6) For more detailed explain about AAM, refer to Iain Matthews’ paper[see 1]. 3. FEATURE TRACKING WITH LK METHOD For the fast and robust lip feature tracking, we use another lip model that consists with 24 lip outer shape points. We used not inner lip shape points. From the experiments, we know that the variance of the appearance around inner lip lines is so many changed that stable lip model fitting is impossible. figure 5. AAM lip fitting progress figure 6. AAM lip fitting diverges figure 4. Lip shape model with 24 points, each point consist with 10 by 10 pixel window. Lucas-Kanade method(LK) is a two-frame differential method for optical flow estimation. In our algorithm, each window consisted with 10x10 pixels is tracked with LK algorithm(see Fig. 4). LK method calculates the motion between two image frames which are taken at times t and t + δt at every pixel position. This based on local Taylor series approximations for the image signal and use partial derivatives with respect to the spatial and temporal coordinates. As a voxel at location (x,y,z,t) with intensity I(x,y,z,t) will have moved δx, δy, δz and δt between the two frames, so, image constraint equation can be given: I(x, y, z, t) = I(x + δx, y + δy, z + δz, t + δt) (6) Assuming the movement to be small enough, it can be presented Taylor series: Instead of that, LK feature tracking method is very stable tracking results. From the experiment, we could find that the feature points are tracked exactly along the more than 2000 frames.(Fig. 7) (a) 1st frame figure 7. 1st and 2000th frame LK feature tracking I(x + δx, y + δy, z + δz, t + δt) = I(x, y, z, t) + ∂I ∂I ∂I ∂I δx + δy + δz + δt + H .O.T . ∂x ∂y ∂z ∂t (7) where H.O.T means higher order terms, which are small enough to be ignored. From these equation follows that: ∂I δx ∂I δy ∂I δz ∂I δt + + + =0 ∂x δt ∂y δt ∂z δt ∂t δt (8) References [1] I. Matthews, T. F. Cootes, J. A. Bangham, S. Cox, and R. Harvey, Extraction of Visual Features for Lipreading, in Proceedings of the Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2002 , pp.198-213 (9) → Thus ∇I • V = − I t 5. Conclusions The LK based feature tracking method shows most stable tracking performance. In the more than 2000 frames since the initializing of the first frame, it did not miss the feature points. But it needs another initializing algorithm. So, we can solve that problem with Active Appearance Model (AAM). Then AAM lip model did not show so good fitting performance. But, AAM face model shows more stable performance compare to the lip model. Acknowledgments This work was supported by grant No. RTI04-02-06 from the Regional Technology Innovation Program of the Ministry of Commerce, Industry and Energy (MOCIE). Which results in ∂I ∂I ∂I ∂I Vx + V y + Vz + = 0 ∂x ∂y ∂z ∂t (b)2000th frame (10) [2] S. B. Gokturk, J. Bouguet, C. Tomasi and B. Girod, Model-Based Face Tracking for View-Independent Facial Expression Recognition, in Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition, 2002, pp. 287-293 [3] I. Matthews, T. Ishikawa and S. Baker, The template update problem, in Proceedings of the IEEE Transactions on Pattern Analysis and Machine Intelligence, 2004, Vol. 24, Issue. 6, pp. 810-815 Assuming that the flow V is constant in small window → of size m x m x m with m>1 , we can calculate the V easily with the pseudo inverse of ∇I . And we using the each center points of the each window must follow the shape model variation condition. So we can track the each point robustly. For more detailed explain about LK algorithm, refer to Iain Matthews’ paper[see 3]. - 22 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Car/Human classification using intuitive feature Jiman Kim and Daijin Kim Department of Computer Science, POSTECH, Korea {jmk, [email protected] Abstract Object classification is necessary to outdoor surveillance system. Specially, car/human classification is most important issue. Many methods are proposed for car/human classification. But most of methods cannot satisfy the accuracy and processing speed. In this paper we propose the efficient feature which is based on intuitive property for object classification. The feature is robust to external environmental change and the rotation of objects. The experimental result shows that the proposed feature is very simple and has high classification performance. and classify the object. 2.1. Edge extraction We extract the edge component for vertical and horizontal direction. Sobel operator is used to detect edges. The operator consists of a pair of 3x3 convolution kernels as shown in Fig. 1. Gx Gy Figure 1. Sobel convolution kernels. After the vertical and horizontal edge detection from gray image, we select the strong edges which have intensity value greater than threshold value. Then we compute the length of strong edges and select the major edges which contain the 75% of total lengths for each direction for noise elimination. The edge detection processing within bounding box is shown in Fig. 2. 1. Introduction We consider the outdoor environment. And we assume that the detected objects are car or human. Object classification in outdoor environment is difficult because of the change of the object’s property. Many approaches are proposed for car/human classification. First category of object classification is learning method [1, 2, 3]. Learning method consists of four steps that is feature extraction, dimension reduction, training of classifier and search/decision. But the learning method is time consuming approach because of training and search. And there must be exists different classifier according to the view of object. Another category is intuitive method. Intuitive features represent the overall properties such as aspect ratio, color, dispersedness, velocity, rigidity, area ratio[4, 5, 6]. Intuitive method is simple and fast. But the features are not robust to the distance between camera and object and the quality of camera image. To solve this problem, we propose an efficient feature using edge. The proposed feature uses the length of vertical and horizontal edges. The distributions of the vertical and horizontal edges of car/human are different because the appearance of car/human is different. We classify objects using the property. Section 2 describes edge detection processing and edge length ratio (ELR) feature. Section 3 shows the experimental result of the proposed feature and compare with other features. Section 4 describes conclusion. Figure 2. The edge detection processing. 2.2. Edge length ratio The positions where vertical/horizontal edge is detected are as follow. 2. Car / Human classification We assume that object detection is done in advance. We can obtain the bounding box regions which contain foreground pixels. Then we perform the edge extraction for each bounding box, compute the edge length ratio (ELR) Figure 3. Vertical / Horizontal edge positions. - 23 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Fig. 3 represents that car has more horizontal edges than vertical edges. On the other hand, human has more vertical edges than horizontal edges. We classify objects using the property. We compute the total length of all edges for each direction. Then we obtain the edge length ratio (ELR) as follows. Vertical edge length (1) ELR = Horizontal edge length classification because there are many vertical edges in background. The ELR is robust to the distance between the object and camera because ELR consider major edges. The major edge of human in the distance is the silhouette. The major edge of car in the distance is silhouette and horizontal strong edges such as windshield, side panel and bumper. And ELR is robust to rotation of object if the object is not rotated by 45o. In addition to, ELR can classify peoples correctly. Figure 5. The result of incorrect classification 4. Conclusion We proposed the simple and efficient intuitive feature for object classification. The feature uses the vertical and horizontal edges of foreground bounding box. After edge detection for each direction, we select the strong edges using threshold value. Then we select the major edges from strong edges using length. Finally, edge length ratio(ELR) is computed by total length of vertical and horizontal edges. If the ELR is more than any threshold value, the object is human. Otherwise, the object is car. The ELR is more robust to the distance and rotation of object than other intuitive features. The experimental result represents that the ELR feature is more accurate than other features. We will try other approaches using the distribution of vertical and horizontal edges. And the additional post-processing to eliminate the background effect is needed. 3. Experimental results We use the StreetScene database in web. We decide the threshold value of ELR from many arbitrary car/human images. The threshold value is about 0.9~1. The ELR of car is less than the threshold value. Contrarily, the ELR of human is more than the threshold value. Fig. 4 shows the results of edge detection processing for car/human images. Acknowledgments This work was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy (MOCIE) and the authors would like to thank the Ministry of Education of Korea for its financial support toward the Division of Mechanical and Industrial Engineering, and the Division of Electrical and Computer Engineering at POSTECH through BK21 program. References [1] B. Leung, Component-based car detection in street scene images, MIT, 2004 [2] A. Mohan, C. Papageorgiou and T. Poggio, Example-based object detection in images by components, Pattern analysis and machine intelligence, 2001 [3] S. Agarwal, A. Awan and D. Roth, Learning to detect objects in images via a sparse, part-based representation, Pattern analysis and machine intelligence, 2004 [4] A. J. Lipton, H. Fujiyoshi and R. S. Patil, Moving target classification and tracking from real-time video, Applications of Computer Vision, 1998 Figure 4. The results of edge detection. We use other intuitive features such as color histogram variance, velocity, dispersedness and area ratio in addition to ELR for comparing. Table. 1 show the classification accuracy for each feature. Feature Color Area ratio Velocity Dispersedness ELR Accuracy (%) 72.78 73.32 76.55 83.56 92.37 [5] Y. K. Wang and Y. S. Lin, A real-time approach for classification and tracking of multiple moving objects, Graphics and Image processing, 2003. [6] P. Neskovic and L. N. Cooper, A recognition system that uses saccades to detect cars from real-time video streams, International Conference on Neural Information Processing, 2002. Table 1. The results of classification for each feature. But there is incorrect classification cases which have many background edge information. In the future, we apply the additional processing to eliminate the background effect. Fig. 5 shows the result of incorrect - 24 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 A Graph-based Collaborative Filtering Algorithm for Recommender System Jinseok Lee, Daijin Kim Department of Computer Science, POSTECH, Korea {seogi81, [email protected] Abstract Recommender systems apply knowledge discovery techniques to the problem of making personalized recommendations for information. Collaborative filtering algorithm is one of the most widely used technique in recommender system, which uses user's previous preference to items. In this paper, we propose the graph based recommendation algorithm, which is a kind of a collaborative filtering algorithm that doesn't need pre-computation time. Our algorithm updates graphs in response to a user's expression to an item, and it uses graphs to give a recommendation to a user's request. Our experimental result shows that the quality of recommendation is as good as existing algorithm's quality without pre-computation. computer science and it can be used to directly reflect similarities between items. The algorithm updates graphs when a user expresses his preference on an item. When a user request a recommendation of some items, the algorithm uses graphs to find items that the user may prefer. Updating graphs can be done in real time, so this algorithm does not need time to pre-compute similarities. This paper is organized as follows. Section 2 explains how to construct graphs. In section 3, 4, we describe how to use these graphs to recommend items. Section 5 explains the experimental results. Finally, section 5 presents the meaning of result and conclusion for this paper. 2. MAKING GRAPHS A procedure for making (maintaining) a graph. The system maintains two graphs. The first is the preference graph, and the second is the relation graph. 1) Preference Graph: The preference graph is a bipartite graph. A bipartite graph has two disjoint sets of vertices, V1 and V2 such that every edge connects a vertex in $V_1$ and one in V2. In the preference graph, two sets of vertices represent users and items respectably. Each edge represents a rating value between the user and the item which the edge connects. When a user rates his preference (generally 1 ~ 5, 1 means the worst and 5 means the best) for an item, the system looks the preference graph to determine whether there is an edge which connects the user and the item. If there is not an edge, the system draws an edge between the user and the item the user rated. If there is an edge, the system changes the value. 2) Relation Graph: In our algorithm, its vertices represent items. An edge between two items represents a relation between the two items. Therefore the system has to update the graph when a user rates his preference for an item. At first, the system finds items which the user has rated before. The system updates edges' weights with the equation (1). Assume that a user ui rates an item ij with a value r, and the user has already rated item ik with value r(ui, ik). Then the system updates its graphs. (1) 1. INTRODUCTION Collaborative filtering (CF) is one of the most promising technologies to help distinguish useful information from useless information. CF works by building a database of user's preferences for items. Two classes of CF algorithm is exist. One is the user-based CF algorithm. Each user is matched against the database to discover neighbors. Neighbors are other users who have similar preference to a specific user. So, the algorithm recommends items corresponding to items that neighbors prefer. The other class is the item-based CF algorithm. This is similar to the user-based CF algorithm, but differs in that the neighborhood relationship is built between items, not users. A user has his preferences and the algorithm recommends items corresponding to that items' neighbors. CF has been successful in both research and practice. However, some challenges remain. Existing CF algorithms cannot update their similarity in real time because time required to compute similarity is quite long. Therefore, many existing systems that use CF algorithms pre-compute similarities regularly (every midnight, etc.). So there could be a problem of similarity consistence. Some kinds of item like music and movies are changing rapidly, so new items occur frequently. Thus, similarity consistency is one of the most important problems of existing CF algorithms. In this paper, we propose a new algorithm that resolves the challenge of similarity consistency. Our new algorithm is called "Graph-based recommendation algorithm", which uses two graphs. Graph is well-known data structure in The system performs this updating process for every item that the user has already rated. 3. Similarity Computation The first step the system performs is similarity computation when it gives recommendations to a user. When the system computes a similarity between item ij and ik, it needs 3 values; the sum of all edge values connected with ij; ik, and - 25 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 an edge weight between ij and ik. Formally, the similarity between ij and ik, denoted by sim(ij, ik) is given by equation (2). the average. The lower the MAE, the more accurately the recommendation engine predicts user ratings. Our results are divided into two parts; recommendation quality and neighborhood size. 1) Recommendation quality: MAE was 0.72 for CF and 0.76 for GB. Hit ratio was 67% for CF and 64% for GB. 2) Neighborhood size: Neighborhood size means the number of items that the system uses when it recommends items to a user. We define that the neighborhood size be k. In the relation graph, the system determines that at most k items are connected to an item with respect to their edge's weights. The size of the neighborhood has significant impact on the prediction quality and calculation time. To determine the sensitivity of this parameter, we performed an experiment where we varied the number of neighbors to be used and computed MAE and calculation time of recommending items to a user. We can observe that the size of neighborhood does affect the quality of prediction.(Fig 1.) (2) where e(ij, ik) denotes an edge weight between ij and ik, and e(ij) denotes sum of all edge connected with ij. The numerator of equation (2) means the count of co-rated by the users in the system. The denominator means all possible combinations of edges from ij and ik, which are normalized by their numerator. Therefore equation (2) measures possibility of co-related between ij and ik given all possible combination of edges from ij and ik. 4. Prediction Computation The second step is prediction computation, in which the system computes a prediction value about how user ui will like item ij. It uses ui's rating values for items that ui has rated. It also uses items that are connected with ij. Let the set of items that ui has rated be Ii and set of items that connected with ij be Ij. The system computes similarities between ij and the intersection of Ii and Ij. When the system computes a prediction value using a weighted sum, it uses the similarity explained above. The system computes a prediction value on item ij for user ui by computing the sum of ratings given by ui to the items similar to ij. Each rating is weighted by the corresponding similarity sim(ij, ik) between items ij and ik. The prediction value of ui at ij, denoted by Pre(ui, ij) is given by equation (3). figure1. Change of MAE and Calculation time over variable neighborhood size. Filled circles corresponds to calculation time and clear circles corresponds to MAE. (3) 6. Conclusions In this paper we presented and experimentally evaluated a new algorithm for recommender systems. Our results show that GB algorithm hold the promise of allowing recommendation algorithms to scale to large data sets and at the same time produce high-quality recommendations. The system computes prediction values over all candidates, sorts prediction values in descending order and returns the highest values to the user ui. 5. Experiment We used experimental data from the MovieLens recommender system web-site to compare our algorithm with previous work (Item-based Collaborative Filtering). The data set can be obtained from 'www.grouplens.org'. At first, we constructed two graphs with the training set. We tested for each user in the test set. Assume that current user has rated n items (i1,i2,...,in). For each item that the user has rated, we assumed that the user has not rated the item and get recommendations from the system. Finally we check whether the recommendations list contains the item or not. If the list contains the item, we increase the hit count. After we tested all items the user has rated, we computed ratio of hit count to the number of items that current user has rated. We tested for all users in the test set. MAE is a measure of the deviation of recommendations from their true user-specified values. For each ratings-prediction pair (pi for rating value, qi for prediction value) this metric treats the absolute error between them, i.e., |pi - qi|. The MAE is computed by first summing these absolute errors of the N (number of all ratings) corresponding ratings-prediction pairs and then computes Acknowledgments This work was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy (MOCIE) and the authors would like to thank the Ministry of Education of Korea for its financial support toward the Division of Mechanical and Industrial Engineering, and the Division of Electrical and Computer Engineering at POSTECH through BK21 program. References [1] B. Sarwar, G. Karypis, J. Konstan, and J. Riedl, Item-Based Collaborative Filtering Recommendation Algorithms, in Proceddings of International World Wide Web Conference, 2001 [2] G. Linden, B. Smith, and J. York, Amazon.com Recommendations Item-to-Item Collaborative Filtering, in Proceedings of the IEEE Internet Computing, 2003. [3] C. C. Aggarwal, J.L. Wolf, K. Wu, and P. S. Yu, Graph-theoretic Approach to Collaborative Filtering, in Proceddings of the ACM KDD'99 Conference. 1999 - 26 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Exploration in a Maze Jungtae Kim, Daijin Kim Department of Computer Science, POSTECH, Korea {postman, [email protected] Abstract In this paper we present a new exploration method which focuses on the economic searching for unknown areas, that is the removal of redundant path. There are many previous works about the exploration and the exploitation, but most of them focus on the trade-off between the exploration and the exploitation not on the exploration itself. Our proposed method, Remembering Exploration (RE), is competitive in the length of the final converged path and faster in finding the optimal path than experimented previous methods. mover’s path and the mover will go there later. In (a) of Fig 1, the mover locates in the room with number 1, and among the four movable directions the upper direction is blocked. Because the distances from the current position of the mover to three unknown lands are equal, we choose one randomly, and then result is (b) of Fig 1; the mover chose the right direction. In (b) of Fig 1, among the previous three unknown land the one land (the room with number 2) is now known, and the three unknown lands are added. From the room with number 2, among the five unknown lands the distances to the three unknown lands (upper, right, and lower rooms of the room with number 2) are nearest if the metric is Euclidean space. Here we assume that the maze is the Euclidean space. Among the three unknown lands, we choose one randomly. The result is (c) of Fig 1. As you see in (c) of Fig 1, the chosen unknown land becomes the known land with number 3. Same to previous work, we calculate which unknown lands are nearest from the room with number 3 and choose one among them randomly. Even if there is no unknown land which is neighborhood of the room with number 3, we chose one randomly among the unknown lands, and go there through the known path (3-2). By repeating the work we can explore all over the maze, and if we find the final position in the middle of exploration, we can choose whether we try more exploration or we stop and do exploitation. In the following section we summary the previous process. 1. Introduction The exploration problem is very important with the exploitation problem in planning. The exploration is the trying to undergo unknown area, and the exploitation is the using the already known area as much as possible. In most cases the exploration is just spend resource and not useful, but with very few probability the exploration gives us high profit. Therefore, we cannot satisfy exploitation but try to exploration sometimes. There are already several exploration methods [1, 2], but we want to explore more economically not randomly. The simplest exploration is of course the random method. Because of its redundancy the method is good for dynamic environment, but not good for stationary environment. Other methods are rather similar to the random method. To find out the optimal exploration path, we use the optimal exploitation techniques like breadth-first search, depth-first search or depth-limited search. We regard the unknown areas are our goal, and use the optimal path-planning. This paper consists of five sections; the first section is the introduction, the second section includes detail explanation about our proposed methods, the third section shows the outline of algorithm, the fourth section explains our experiments and the results, and finally the fifth section contains the conclusion of this paper. 3. Algorithm The outline of our suggested algorithm is following; ========================================= INITIALIZE Set the current position ( xc , y c ) a random or fixed, and the target position ( xt , y t ) same to ( xc , y c ). LOOP Register neighbor lands of ( xc , y c ) to unknown lands 2. Remembering Exploration (RE) Our proposed method focuses on the optimization of the path of a mover; removing redundant paths as many as possible. We call our proposed method as Remembering Exploration method. Let’s explain about the method using the Fig 1. There are a maze, and each room is empty or filled by followings; the blue circle means the there is a block so that the mover cannot go there or die, the dark green with some number means the mover’s path, the light green without any number means that there are unknown lands around the if not already registered. IF ( xc , y c ) is equal to ( xt , y t ) Set ( xt , y t ) the nearest unknown land END_IF Move to the neighbor which is on the path toward ( xt , y t ) Update the current position ( xc , y c ) END_LOOP ========================================= - 27 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Fig 1. Exploration algorithm in a maze; blue circle means block, dark green with number means path, and Light green without number means unknown land. Each time a mover go toward the nearest unknown land. 4. Experiments For showing the superior to the previous works, we compared three well-known simple exploration methods; one is random walk, other is the following left-wall, and another is the counter-based exploration. The random walk (RW) method [1] is, just like the name, that there is no specific planning. At each time the mover choose next toward position by random. The following left wall (FLW) method uses the path from the start position passed by the connected wall with fixed one direction; the name FLW does not means just left direction but one direction, so even though we choose right direction, the name of the method is still FLW. The counter-based exploration (CE) method [2] follows the rule “Go to the least visited state”. Whenever selecting the next move direction, the mover looks out which room is visited unusually and goes to there. We make the test bed; the maze with 13x13 rooms where there are randomly some blocks which the mover cannot go through (Fig 2). If the mover tries to go the block, then the mover must go back to the start position and go on. At that time the number of dead is increased. Table 1. Performance of three previous and our method # of move # of dead Final path length RW 5000 3000 500 FLW 100 10 40 CE 800 100 28 RE 600 50 28 As the summary table shows, RW is worst in this maze environment. The FLW is very fast in finding the final position if the start position is connected with the final position by the walls and blocks, but the final path is not optimal path. The CE found the almost optimal path, but during the procedure there are some redundant moves. Our proposed method, RE, also found the almost optimal path like the CE, and even it took less moves than the CE. Therefore, we can say that our proposed method is very competitive with other previous methods. 5. Conclusions We have presented new exploration method which works well in a maze. This simple and fast method showed better performance than the previous methods. We think that this method can be used in more complex environment like 2D real-environment. For the future work, we will research and study about cooperation with Simultaneous Localization and Map-building (SLAM), for example, SLAM-exploration method. Acknowledgments The authors would like to thank the Ministry of Education of Korea for its financial support toward the Division of Mechanical and Industrial Engineering, and the Division of Electrical and Computer Engineering at POSTECH through BK21 program. Fig 2. Maze with 13x13 rooms. The red circle means the start position, the red cross means the final position, and the green rooms mean blocked rooms. References [1] S. B. Thrun, The role of exploration in learning control. In D. A. White and D. A. Sofge (eds.), Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Florence, Kentucky: Van Nostrand Reinhold, 1992. [2] S. B. Thrun, Efficient exploration in reinforcement learning. In Proceedings of the 8th International Workshop on Machine Learning, pages. 358-362, 1991. Using the three previous methods and our proposed method, the mover finds the final position from the start position. The performance comparison is done by how fast, that is how little moving, the mover finds out the final position. Whenever all methods find out the route, the number of steps is recorded and new map is made. Total 100 maps are used for getting average performance and the result is shown in Table 1. - 28 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Action Detection Using Shape Matching Ju-young Lee, Daijin Kim Department of Computer Science, POSTECH, Korea {zealota, [email protected] Abstract In real world, the action occur in crowded, dynamic environments. We propose a technique for action detection in cluttered video that reliably identifies actions. Our approach is based on two key ideas: (1) we efficiently match the volumetric representation of an action against over-segmented spatio-temporal video; (2) we augment our shape-based features using flow. Our experiments on human actions, such as jumping jacks or waving two hands in a cluttered video. ⎛ y (i, j ) − f k 2 ⎞ ⎟ ∑k =1 fi g ⎜⎜ ⎟ h ⎝ ⎠. y (i, j + 1) = 2 ⎛ y (i, j ) − f k ⎞ n ⎟ ∑k =1 g ⎜⎜ ⎟ h ⎝ ⎠ n (2) where j = 1,2..., with kernel g, typically a Gaussian kernel, and bandwidth h. Fig. 1 show the result of segmentation using mean shift. 1. INTRODUCTION The goal of action detection is to localize and identify specified spatio-temporal patterns in video, such as jumping jacks. A recent trend in action detection and recognition has been the epoch-making techniques based on the volumetric analysis of video[1][2]. These approach s have a benefit that is not limited to a specific set of actor or action s. It means, in principle, to extend a variety of actions - given an appropriate training data. Flow-based methods for action detection operate directly on the spatio-temporal sequence, attempting to recognize the specified pattern by brute-force correlation without segmentation. Shechtman and Irani propose an algorithm for correlating spatio-temporal action templates, which can be noisy on object boundaries[4]. We emphasize key ideas from the earlier approaches and propose an algorithm for the action detection in the real cluttered video. Section 2. describes mean shift algorithm for the image segmentation. Section 3. describes how to detect the action. Section 4. show the experimental results. And in the final section, we come to a conclusion for the action detection using shap matching. (a) (b) figure 1. Mean shift is to segment the image (a) The result of segmented image, (b) The original input image 3. ACTION DETECTION To detect the action, we consider the distance between sptio-temporal template and the segmented input image, with mean shift algorithm described Section 2. To calculate the distance, we use the shape matching technique. 3.1 SHAPE MATCHING 2. MEAN SHIFT SEGMENTATION Mean shift algorithm is an iterative procedure that finds local modes in the data and have been used to segment figure 2. Example showing how a template is matched to a segmented input. images[3]. Let f i be a d-dimensional point in a set of n The shape matching metric is based on the region intersection distance between the sptio-temporal template and the set of segmented input image. We consider the minimum set is defined as the most closest set of region to the template. There are four cases to feature points, f1 ,..., f n . To use mean shift algorithm, we can find the mode near the f i for clustering the points. We calculate y (i,1) = f i , determine the minimum set as follow. If a region Vi is (1) - 29 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 similar to human in the video, the action detection can produce the false detection. To overcome this limitation, we proposed to use only the spatial shape but also temporal shape in the segmented video. completely enclosed by the template, such as V5 then it is always chosen to make up the minimum set. Similarly to opposite case, if a region Vi dose not intersect with template, such as V11 , then it is never considered in the Acknowledgments This work was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy (MOCIE) and the authors would like to thank the Ministry of Education of Korea for its financial support toward the Division of Mechanical and Industrial Engineering, and the Division of Electrical and Computer Engineering at POSTECH through BK21 program. elements of the minimum set. The two interesting cases are when Vi partial intersects with template, such as V2 , V5 . If the intersected region is greater than a half of region, the distance is the number of elements of the intersected region. Otherwise, the distance is the number of the un-intersected elements in the region. More formally, the distance between the template T and the input V at location l is defined as k d (T , V ; l ) = ∑ d (T , Vi ; l ), (3) References i =1 where [1] Y. Ke, R. Sukthankar and M. Hebert, Event Detection in Crowded Video, ICCV, 2007. [2] Y. Ke, R. Sukthankar and M. Hebert, Spatio-temporal Shape and Flow Correelation for Action Recognition, Visual Surveillance Workshop, 2007 ⎧ T ∩ Vi if T ∩ Vi < Vi / 2 . (3) d (T , V ; l ) = ⎨ ⎩ Vi − T (l ) ∩ Vi otherwise where T (l ) denotes the template T placed at location l. [3] D.Comaniciu and P.Meer, Mean shift: A robust approach toward feature space analysis. IEEE Transaction on Pattern Analysis and Machine Intelligence, vol.24, no.5, 2002 [4] E.Shechtman and M.Irani, Space-time behavior based correlation. in Proceeding of IEEE International Conference on Computer Vision and Pattern Recognition, 2005. [5] H. Jiang and D.R. Martin, Finding Actions Using Shape Flows, European Conference on Computer Vision , 2008 [6] C.Schuldt, I.Laptev, and B.Caputo, Recognizing human actions: A local SVM approach. in Proceeding of IEEE International Conference on Pattern Recognition, 2004. 4. EXPERIMENTAL RESULTS For evaluating the proposed the action detection method, we used the KTH dataset[6]. The KTH dataset and a hand-held camera in environments with moving people or cars in the background[1][2]. The videos were down scaled to 160x120 in resolution. The templates are typically 60x80x30 in size. Fig.3 shows the result of the action detection in the cluttered video, where (a) is represented to two hand waving action and (b) is shown to jumping jacks action, respectively. (a) The action of two hand waving (b) The action of jumping jacks figure 3. The result of experiment is showing the overlapped image template on the segmented input 5. Conclusions We proposed an action detection using mean shift segmentation and shape matching. In the case of the object - 30 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 New Distance Measure for Capturing Global Geometry of Data Kye-Hyeon Kim, Seungjin Choi Department of Computer Science, POSTECH, Korea {fenrir, [email protected] Abstract We present a method for determining k nearest neighbors, which accurately recognizes the underlying clusters in a data set. We formulate the neighborhood construction as an optimization problem, leading to a novel distance measure called minimax distance. We further develop a message passing algorithm for efficient computation of the minimax distance. For several real data sets, our method outperformed the k-nearest neighbor method. Figure 1. The difference between the k-NN and our method. (Left) The green star denotes a test point, and the solid circle represents the standard k-nearest neighborhood of that point. In this case, many incorrect points (blue squares) are in the neighborhood, since the intrinsic partition (dotted ellipse) is not considered. (Right) In contrast, our method exactly reconstructs the desirable neighborhood by joining a number of small circles with diameter ε. 1. INTRODUCTION Finding similar elements of an item plays a key role in most of the pattern recognition tasks such as classification, clustering and information retrieval, since a group of similar items generally forms a meaningful pattern. The k-nearest neighbor method (k-NN) was proposed in this perspective and has been widely used. Given a test data point x*, the method finds the k nearest points of x* in a given set of training data points using a distance metric. The success of k-NN depends on how many neighbors are “actually similar” to the test data point. Various methods for global or local metric learning have been developed to discover a proper distance measure for a given data set. However, they have failed to achieve success in practical applications because (1) a real data set generally consists of several arbitrary-shaped partitions but those methods cannot reflect such nonlinearity and discontinuity; (2) those methods take too much time compared to k-NN. We present a new way for determining neighborhood which adequately captures and reflects the intrinsic partitions of a data set. To this end, first we represent a neighborhood as a connected set of small local balls while the standard k-nearest neighborhood is determined by a single large ball, leading to the minimax distance measure. We then derive a message passing algorithm which scales with O(N) in time, where N is the number of training data points. For details, see the full paper [1]. Let us start from a ball initially attached to the test point, and attach each ball by the rule until there is no point that cannot be attached. Then, the set of data points covered by balls can be seen as a cluster whose size and shape are determined by ε. By choosing the proper value of ε, we can obtain the correct result. Let V be a set of data points and d(u,v) denotes the Euclidean distance between two points u,v V. Then, the above neighborhood construction rule can be rewritten as the following: Definition 1 (ε-bounded pair). For ε ≥ 0, a pair u,v V is ε-bounded if d(u,w) ≤ ε. Definition 2 (ε-bounded sequence). A sequence (s1,s2,…,sM) is ε-bounded if all adjoining pairs st , st+1 are ε-bounded. Definition 3 (ε-neighbor). A pair u,v V is a ε-neighbor of each other, denoted by u~v, if there exists an ε-bounded sequence (u,…,v). Theorem 1 (ε-neighborhood construction rule). For a pair u,v V, if there exists w V such that w~v and d(u,w) ≤ ε, then u~v. 2. NEIGHBORHOOD CONSTRUCTION RULE The main goal of our method is to find appropriate intrinsic clusters as neighborhoods, for successful nearest neighborhood retrieval. The intuitive idea is shown in Fig. 1. The desirable neighborhood in Fig. 1 is constructed by joining several balls with diameter ε, following the rule: The above theorem represents the transitive closure of the neighborhood, that is, “My neighbor’s neighbor is also my neighbor.” 3. MINIMAX DISTANCE Now we formulate the problem of neighborhood construction as a minimax optimization. For a sequence S, denote by d(S) the distance between the farthest adjoining pair in S, i.e., A ball whose center corresponds to a data point must contact or overlap one or more other balls. - 31 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 d(S) = max(a,b) S(u,v) d(a,b), then we can easily determine whether S is ε-bounded or not. If d(S) ≤ ε, there is no adjoining pair (a,b) in S such that d(a,b) > ε, thus S(u,v) is ε-bounded. One can enumerate arbitrarily many sequences between a pair u,v. Among those sequences, if there is at least one sequence S = (u,…,v) such that d(S) ≤ ε, i.e., min S=(u,…,v) d(S) ≤ Figure 2. Image retrieval precision on (Left) Yale-B faces and; (Right) USPS handwritten digits. “MM N” denotes our algorithm, where N is the number of training points. The vertical line indicates the standard deviation. ε, then there is at least one ε-bounded sequence so u and v are neighbors of each other. Now we define the minimax distance between u and v, denoted by dmm(u,v): dmm(u,v) = min S=(u,…,v) { max(a,b) S(u,v) By reformulating the above formula as the recursive form, finally we develop a message passing algorithm. Let E be an edge set of the MST and Ii be an index set {j | (i, j) E}. Then, all minimax distances dmm(x1 ,x*),…, dmm(xN ,x*) can be computed on the MST in O(N) time: d(a,b) }. For dmm(u,v), the corresponding ε-bounded sequence(s) is called a minimax path between u and v. A tree T is called a minimax spanning tree if all paths between pairs u,v in T are minimax paths. dmm(u,v) represents that there are one or more sequences between u and v whose bound d(S) is dmm(u,v). Thus, if dmm(u,v) is small, both u and v are in a dense region, i.e., they are in the same cluster. If dmm(u,v) is large, u and v tend to be in different clusters. Consequently, our neighbor selection method can capture the intrinsic clusters by using dmm as a distance measure: Algorithm 3 (Minimax message passing). 1. For all edges (i , j), (note: N −1 edges for a tree) mi j = max { d(xi ,xj ), min[d(xi ,x* ), mink mki ] } where k Ii −{j}. 2. For all xi X, dmm(xi ,x*) = min[d(xi ,x* ), mink mki ] where k Ii . For more details, see our full paper [1]. 5. NUMERICAL EXPERIMENTS We applied our method to image retrieval tasks, with two real-world datasets, USPS and Yale-B. We evaluated the performance in terms of the precision, i.e., [The number of correctly retrieved points for a query image]/k, where k is the number of whole retrieved points. Fig. 2 shows that our minimax algorithm outperformed the standard k-NN method. Algorithm 1 (ε-neighbor selection). For a test point x*, select xi if dmm(xi ,x*) ≤ ε. Algorithm 2 (k-nearest neighbor selection). For a test point x*, select k smallest dmm(xi ,x*). 4. MESSAGE PASSING ALGORITHM Let X = { x1,…,xN } be a set of training points and x* be a test point. Now we derive an algorithm for computing dmm(xi ,x*) for all xi X efficiently. One problem is that there are enormously many possible sequences between xi and x*. However, a minimum spanning tree (MST) is also a minimax spanning tree [2]. That is, every path between two nodes xi and xj in an MST, denoted by M(i , j), is also a minimax path. Hence, we can obtain the minimax distance dmm(xi ,xj) for any pair xi ,xj X by simply computing d(M), i.e., dmm(xi ,xj) = max(a,b) M(i , j) 6. CONCLUSION We have presented a novel neighbor selection method which can exploit global geometry of data by choosing neighbors according to the minimax distance. An efficient message passing algorithm was also presented, which computes minimax distances between a query point and N data points in O(N) time. References [1] K-H. Kim and S. Choi, Neighbor search with global geometry: a minimax message passing algorithm, in proceedings of the International Conference on Machine Learning (ICML), Corvallis, OR, 2007. [2] J. D. Carroll, Minimax length links of a dissimilarity matrix and minimum spanning trees, Psychometrika, vol.60, pp.371-374, 1995. d(a,b). Now we derive a formula for dmm(xi ,x*). There are N types of paths between xi and x* such that (xi ,…,x1 ,x*), (xi ,…,x2 ,x*), …, (xi ,…,xN ,x*). All paths (xi .,…,x*) must belong to one of them. If we have the MST of the training set X = { x1,…,xN }, there exist only N paths (M(i ,1), x*), (M(i ,2), x*), …, (M(i ,N), x*), hence dmm(xi ,x*) = minj=1…N { max[dmm(xi ,xj ), d(xj ,x*)] }. - 32 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Neuro-Evolutionary Behavior Selection Network for Enhancing Performance of Cleaning Robots Sanghoon Baek, Sanghyuk Park, Se-Young Oh Department of Electrical and Electronic Engineering, POSTECH, Korea {tkeb100, psh9924, [email protected] Abstract A new approach to coverage problem using the neuro-evolutionary technique is proposed. There are some novel complete coverage algorithms for cleaning robots, but all of them require solving localization problem which is hard to achieve with common cleaning robot hardware system. Here, we propose no-localization, no-complete coverage approach to realize the algorithm on a common cleaning robot hardware that is cheap enough to be sold on the market. The proposed approach is similar with the evolutionary robotics approach, but in our approach only the behavior selection network is evolved which selects manual rule-based behavior modules. Input vectors considering odometry errors and the fitness function that represents ‘good’ cleaning performance are selected carefully. To prove the performance of the behavior selection network, two other methods are chosen and they are tested at various places with simulation and real robot experiments. Consequently, the proposed approach shows better performance at large and partitioned places. robot. But this method can not guarantee a complete coverage. Complex method is the other way to categorize coverage problem. This method consists of three different categories by cell decomposition methods like approximate cellular, semi-approximate cellular and exact cellular decomposition. The Backtracking Spiral Algorithm (BSA) which is one of the approximate cellular decomposition methods, for example, decomposes cell with robot size and environment can be represented with these decomposed square cells [2]. If there is an obstacle in specific area, that obstacle can be represented with an obstacle cell and robot moves along to a free cell which is not an obstacle cell. In the semi-approximate cellular decomposition case, cells are defined by rectangular shape which is a vertical slice of environment [3]. In the exact cellular decomposition case, cells are defined by rectangles which have different size to fit in local environment [4]. Most of complex methods have localization problem. This needs high computation power and expensive-sensor like laser finder or vision system. So, we can say that complex methods are not appropriate yet. In this paper, we propose new method for enhancing coverage performance with simple method and applicable method in the cleaning robot with limited sensing capabilities. 2. NEURO-EVOLUTIONARY BEHAVIOR 1. INTRODUCTION There are researches for cleaning robot in various methods. The most important thing of these researches is coverage problem. This presents the ability difference between cleaning robots. And these are gaining commercial attention today with their markets expanding and at the same time, very intensive competition among the manufactures. To survive in this severe competition, we have to get the best result with the least cost. To make it cheaper, using low-cost sensors and controllers is inevitable and it restricts the robot’s ability. That’s why the random-navigation, which does not need many and high-performance sensors, is widely used by most sweeping robots; even it can be realized with only one contact sensor on its front. By Howie Choset, who organized a coverage problem according to its coverage methods, we can categorize coverage problem in two ways that are simple method and complex method [1]. Simple method is a one way to categorize coverage problem and this method consists of heuristic & randomized approach. It has an advantage of low-cost for computational power and sensors in a cleaning SELECTION NETWORK 1) Overall system Usually, cases of evolution robotics way play a role that connects between sensor system and motor [5]. In cleaning problem, finding proper cleaning motion need more intelligent way than simple controller for adjusting motor’s Fig. 1. Overall block diagram of behavior selection velocity. The proposed algorithm has hierarchical structure as depicted in Fig. 1. 2) Behavior module Generally, there are four motions for a cleaning robot as depicted in Fig. 2. And these can be divided by two different functions which are escaping & exploring and - 33 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 (a) (b) (a) (b) Coverage Overlap 2 1.8 0.8 1.6 0.7 1.4 Overlap Fitness 1 0.9 Coverage 0.6 0.5 0.4 0.3 1.2 1 0.8 0.6 0.2 0.4 N RB R 0.1 0 N RB R 0 5 10 Time(min) 0.2 15 0 0 5 10 15 Time(min) (b) (d) Fig. 2. Cleaning motions: (a)Random motion, (b)Wall following motion, (c)Back and forth motion, (d) Spiral motion. (c) (d) Fig. 4. Experimental result of real robot: (a),(b) is environment to verify performance proposed algorithm, (c) is graph of coverage rate, (d) presents overlap ratio. filling. Escaping & exploring which consists of random and wall following motion is utilized to move another region which is not covered yet or to escape specific region which is already covered. Filling which consists of back and forth and spiral motion is utilized to cover an empty region. 3) Evolution Process The neural network for selecting behavior consists of 20 input nodes, 2 output nodes, and one hidden layer. To find the best weight set of neural network in our behavior ~+90° with maximum 30cm range. And it has bumper sensor. To verify the performance of proposed algorithm, we carry out experiment with comparing the performance of Roomba and random motion of our robot. Figure 4 shows an experimental result. 4. CONCLUSION To achieve a complex task like cleaning, we have to choose a conceptually bigger output which is called a behavior rather than left-right motor speed values. The proposed neuro-evolution behavior selection network method is proper algorithm for the cleaning robot with low-cost sensor and computation power. Acknowledgments This work was supported by Korea Ministry of Knowledge Economy under Intelligent Robot program. References [1] H. Choset, “Coverage for robotics - A survey of recent results,” in Annals of Mathematics and Artificial Intelligence, 2001, pp.113-126. [2] E. González, Oscar Álvarez, Yul Díaz, Carlos Parra, Cesar Bustacara, “BSA: A Complete Coverage Algorithm”, in Proc. IEEE Int. Conf. on Robotics and Automation, Barcelona, Spain, April 2005. [3] V. Lumelsky, S. Mukhopadhyay and K. Sun, “Dynamic path planning in sensor-based terrain acquisition.” IEEE Trans. Robotics Automation. 6(4) 462-472, 1990. [4] H. Choset and P. Pignon, “Coverage path planning: The Boustrophedon Cellular Decomposition.” Proceedings of the International Conference on Field and Service Robotics, Canberra, Australia, December 1997. [5] A. L. Nelson and E. Grant, “Using direct competition for competent controllers in evolutionary robotics.” Robotics and Autonomous Systems, 54: 840-857, 2006. Fig. 3. Flow chart of evolution process selection network, we use evolutionary programming technique like in Fig. 3. 3. EXPERIMENTAL RESULTS Robot platform has 7 infrared sensors between -90° - 34 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Gaussian Processes for Source Separation: Pseudo-likelihood Maximization Sunho Park, Seungjin Choi Department of Computer Science, POSTECH, Korea {titan, [email protected] function which relates the current sample of source to past samples is represented by a random process with a Gaussian prior. Integrating out latent functions is tractable, leading to a Gaussian pseudo-likelihood that is determined by a mixing matrix as well as by the predictive mean and covariance matrix that can be easily computed by GP regression. The demixing matrix is estimated by maximizing the log-pseudo-likelihood of the data. The flexible nonparametric nature allows sources to be nonlinear time series. This aspect is proven by experimental results. ABSTRACT In this paper we present a probabilistic method for source separation in the case where each source has a certain unknown temporal structure. We tackle the problem of source separation by maximum pseudo-likelihood estimation, representing the latent function which characterizes the temporal structure of each source by a random process with a Gaussian prior. The resulting pseudo-likelihood of the data is Gaussian, determined by a mixing matrix as well as by the predictive mean and covariance matrix that can be easily computed by Gaussian process (GP) regression. Gradient-based optimization is applied to estimate the demixing matrix through maximizing the log-pseudo-likelihood of the data. Numerical experiments confirm the useful behavior of our method, compared to existing source separation methods. 2. GP SOURCE GENERATIVE MODEL Incorporating the temporal structure of individual source, we model si,t by r and ε i ,t (3) is the white Gaussian noise with zero mean and ε i ,t ~ N (0,1) . which relates the current sample si ,t to past samples n n× n r si ,t −1 = ⎡⎣ si ,t −1 , si ,t − 2 ,…, si ,t − p ⎤⎦ , The function f i ( ) is referred to as the latent function ( xi ,t represents the i th element of xt ∈ R ) is assumed where A ∈ R p unit variance, separation, the observation data xt = [ x1,t , …, xn ,t ] xt = Ast , (2) where si ,t −1 ∈ R is a collection of past $p$ samples, 1. INTRODUCTION Source separation is a fundamental problem that has wide applications in machine learning, pattern recognition, and signal processing. In the simplest form of source to be generated by r si ,t = fi ( si ,t −1 ) + ε i ,t , r si ,t −1 . In the case of linear AR model, the latent function is written as (1) p r fi ( si ,t −1 ) = ∑ hi ,τ si ,t −τ , is the nonsingular mixing matrix and (4) τ =1 st ∈ R is the source vector whose elements are assumed n where {hi ,τ } are AR coefficients. to statistically independent. The task of source separation is to restore unknown independent sources st up to scaling GP model represents the latent function f i ( ) by a random process with a Gaussian prior, instead of a parametric form (4). We place a GP prior over the function fi ( ) , i.e., and permutation ambiguities, without the knowledge of the invertible mixing matrix A , given an ensemble of data {xt }tN=1 . In other words, source separation aims to r r fi ~ GP ( 0, k ( si ,t , si ,τ ) ) , −1 estimate a demixing matrix W = A such that W A = P Λ , where P is the permutation matrix and Λ Lambda is an arbitrary invertible diagonal matrix. Gaussian process (GP) model has been widely used in machine learning because of its flexible nonparametric nature and computational simplicity [1]. In this paper we use a Gaussian process (GP) model to characterize the temporal structure of a source, representing the postulated relationship by a distribution of latent functions. The latent (5) r r where k ( si ,t , si ,τ ) is a covariance function. We use the squared exponential covariance function, r r r r k ( si ,t , si ,τ ) = exp {−λi || si ,t − si ,τ ||2 } , where λi (6) is a length-scale hyperparameter. Taking the source generative model (2) with GP prior (5) into account, the mixing model (1) can be written as - 35 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 xt = A ft −1 + A ε i ,t , where ft −1 = [ f1,t −1 L f n ,t −1 ] ∈ R ( f i ,t −1 n (7) We use nonlinear time series (Mackey-Glass MG30 , r f i ( si ,t −1 ) Santa Fe competition laser, the first variable Chaotic data Ikeda map) as sources. Our method shows the best performance, compared to SOBI [3] and AR-BSS [4] (see Fig. 1), although the performance difference is not so big. SOBI does not assume any linear temporal modeling since it exploits only time-delayed covariance structure. Linear AR modeling seems to be fine even in the case where actual sources are nonlinear times series. However the performance is degraded compared to our nonparametric source modeling. and ε i ,t ∈ R follows independent Gaussian distribution, n p(ε i ,t )=N(0,I ) . The induced model (7) is one of i.e., key ingredients in our method, which will be used in the next section. 3. GP SOURCE SEPARATION In maximum likelihood source separation, sources are treated as latent variables that are marginalized out. In our induced model (7) with GP prior (5), we consider the pseudo-likelihood which approximates the likelihood with a product of conditional distributions of xt given its neighbors, pseudo-likelihood = ∏ p ( xt | X(\ t ) ) , N t =1 where X (\ t ) = {x1 ,…, xt −1 , xt +1 ,…, x N } . −1 We estimate the demixing matrix W = A by pseudo-likelihood maximization with integrating out the latent vector ft −1 .To this end, we consider a single factor of the pseudo-likelihood of the data To this end, we consider a single factor of the pseudo-likelihood of the data figure 1. Performance index (PI) of each algorithm 5. CONCLUSIONS We have presented a source separation method where each source is modeled by a GP and the demixing matrix is learned by maximizing the log-pseudo-likelihood of the data. Compared to source separation methods where a parametric modeling (e.g., AR model) was used to capture the temporal structure of sources, our method is more flexible in the sense that sources are allowed to be nonlinear time series. We compared our method to two representative methods (SOBI and AR-BSS) which also exploited the temporal structure of source. p(xt | X(\t ) ) = ∫ p(xt | ft −1, X(\t ) ) p(ft −1 | X(\t ) )dft −1, (8) where p (xt | ft −1 , X(\ t ) ) = N ( A ft −1 , AA ), p (ft −1 | X(\ t ) ) = N (μ t ,Σt ). (9-10) The predictive mean vector μ t ∈ R and diagonal n covariance matrix Σt ∈ R n× n are calculated using (9) and (10). One can refer [2] for the detailed derivations. It follows from (8-10) that the single factor of the pseudo-likelihood follows Gaussian distribution, (\ t ) p(xt | X ) = N ( Aμt , Γt ), Γ t = A( Σt + I ) A where ACKNOWLEDGMENTS This work was supported by Korea MIC under ITRC support program supervised by IITA (IITA-2007-C1090 -0701-0045). (11) . Thus we write the log-pseudo-likelihood of the data as L = ∑ log p ( xt | X N t =1 (\ t ) REFERENCES [1] C. E. Rasmussen and C. K. I. Williams, Gaussian Processes for Machine Learning, MIT Press, 2006. ) [2] S. Park and S. Choi, Gaussian processes for source separation, in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Las Vegas, USA, 2008. [3] A. Belouchrani, K. Abed-Merain, J. F. Cardoso, and E. Moulines, A blind source separation technique using second order statistics, IEEE Trans. Signal Processing}, Vol.45, pp.434-444, Feb. 1997. [4] Y. M. Cheung, Dual auto-regressive modeling approach to Gaussian process identification' in Proceedings of IEEE International Conference on Multimedia and Expo, 2001, pp.1256-1259. (12) 1 N = − ∑ {log 2π + log | Γt | + βt ( Σt + I )βt }, 2 t =1 −1 where [β t ]i = ⎡⎣ K i si ,1:N ⎤⎦ . We estimate the demixing t matrix W by maximizing the log-pseudo-likelihood (12). A gradient-based optimization is applied to find a solution which maximize (12), where fminunc was used in Matlab implementation (see [2]). 4. NUMERICAL EXPERIMENTS - 36 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Motion Recognition with 3-D Acceleration Signal Sangki Kim, Gunhyuk Park, Seungjin Choi, Seungmoon Choi Department of Computer Science, POSTECH, Korea {ihistory, maharaga, seungjin, [email protected] Abstract In this paper we present a motion recognition system with hand-held controller. 3-D acceleration Signals are generated by 3 axis accelerometer in the controller. We extract motion segment from continuous acceleration signals and apply to each motion model, which is trained in training phase. Hidden Markov Model was used to model each motion. We applied it to three motion sets with good performance enough to practical use. recognition system. Acceleration of Gravity removal Offset filter was used for acceleration of gravity removal. We update the offset, that is, acceleration of gravity when the acceleration signal make small changes for certain time interval. We assume that controller in pause make small acceleration signal change. 1.5 0.6 0.4 1 0.2 0.5 1. INTRODUCTION New kinds of interaction technology that understands user’s motion has emerged due to the rapid development of sensor technology. An accelerometer measures the amount of acceleration of controller in motion. Analysis of acceleration signals enables motion recognition. Some researchers study motion recognition with Acceleration Signal [1, 2]. In [1] they model each motion with state machine and use transitions between states according to heuristic threshold. It can be used to recognition simple motions but can’t deal complex motions. Moreover design of state machine of each motion is not easy work. In [2] they estimate 3-D trajectories of controller with 3-D accelerometer and gyroscopes. Trajectory estimation is effective approach for motion recognition but the size and cost of gyroscopes make it’s practical use. In this paper we propose HMM based motion recognition system with 3-D Acceleration Signals. 0 -0.2 0 -0.4 -0.5 -0.6 -1 0 50 100 150 200 250 300 Before acceleration of gravity removal 350 -0.8 400 0 50 100 150 200 250 1 Model likelihood … Weighting N 3-D Acceleration Motion model 400 Motion block extraction The signals, in motion block, have large acceleration values and variances. So we assume that continues region, which have large values and variances, are motion block. 2 Acceleration of Gravity removal 350 Figure 2 Acceleration of gravity removal 2. OVERALL SYSTEM STRUCTURE Motion block extraction 300 After acceleration of gravity removal Motion recognition Figure 1 Overall System structure Figure 3 Motion Block Extraction Figure 1 show overall structure of our motion - 37 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Motion model Continues Hidden Markov Model was used to model each motion. We use 15 hidden states, full covariance Gaussian distribution for observation modeling. We assign extracted motion block to the motion with largest likelihood. Acceleration of gravity It is natural to think acceleration of gravity as noise. But it is not true in some situation. Suppose two kinds of sword swing motions, one is up-down swing, the other is left-right swing. Because accelerometers can’t detect rotation of controller, these two sword swing motions have similar acceleration signals, without acceleration of gravity. In this case, we use signals without acceleration of gravity removal. We can’t distinguish 7 sword swing motions in Ghost Hose without acceleration of gravity. Figure 5 3-D Tetris motion set Motion set numbers consist of 10 numbers from 0 to 9. Ghost Hose is a game application to demonstration motion recognition system. It consists of 9 motions, include 7 sword swings, gun reload, weapon change, special command. Weighing Before compare likelihoods of each model we add weight constant to each likelihood according to motion model. We assign weight constant to each model for tuning the model. It is helpful reduce recognition error rate, especially similar motion pair. Early decision In the case of long time consuming motion, it is not comfortable to wait motion block extraction before recognition. If we have enough acceleration signals, we can recognition the motion before it is finished. It is useful in sword swing motions in Ghost House. Figure 4 show loglikelihood graph for 7 sword swing motions. Blue one is curve from correct model. Figure 6 Ghost Hose Each motion model was trained by 30 training motions made by one people. For test we collect 40 test motions for each motion, from 4 people. 3-D Tetris Numbers Ghost Hose Train 100% 100% 100% Test 97.9% 94.2% 90.1% Figure 7 Recognition Rate Figure 6 show the experimental result of 3 motion sets, include train data and test data. 4. DISCUSSION We developed 3-D Acceleration Signal based motion recognition system. It was applied to 3 motion sets with acceptable recognition rate. References [1] Jonghun Baek, Ik-Jin Jang, Hyun Deok Kim, Jae-Soo Cho and Byoung-Ju Yun, In Proc. KES 2006, [2] S.-J. Cho, et. al, MagicWand: A hand-drawn gesture input device in 3-D space with inertial sensors, In Proc. IWFHR 2004 [3] A Wilson and S. Shafer, XWand: UI for Intelligent Spaces, In Proc. CHI 2003 Figure 4 Time-Loglikelihood graph 3. EXPERIMENT We applied proposed motion recognition system to 3 motion set, 3-D Tetris, numbers, Ghost Hose (shooting game). 3-D Tetris consist of 12 motions including 6 translations and 6 rotations. - 38 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Pedestrian Detection using Omega Shape Sunki Kim, Daijin Kim Department of Computer and Communication Engineering, POSTECH, Korea {kimsk03, [email protected] Abstract In this paper we present a vision based pedestrian detection and counting system. In Surveillance application, human detection and counting is very important problems. We find pedestrians in indoor/outdoor situation using human head shape. We begin with shape based searching to detect human head of various sizes and orientations. The assumption that head and shoulder contours have an omega shape is made. The final step is head region verification. We show that high detection rate and processing speed can be achieved. Figure 1. The proposed head shape detection system After edge detection, we calculate angle of normal vector of edge outlines. So, we get edge images and corresponding angle images. Fig. 2 shows the normal vector direction of edge. 1. Introduction Human detection and counting for the surveillance and security is very active research area in the computer vision community. Applications range from pedestrian traffic management, detection of overcrowding situations in public buildings to tourist flow estimations. Detect human has many problems because human can wear various cloths and take various poses. Omega shape is formed by head and shoulders, regardless of clothing and hairstyle and orientation. So, omega shape has robustness for head rotation, illumination changing, and variable head sizes. The details of the head candidate region detection and head candidate region verification are described in section 2 and 3, respectively. We also discuss the experimental results of the proposed method in Section 4. Finally, we draw up the conclusions in Section 5. Figure 2. Normal vector direction of mean shape Fig. 3 show templates for prewitt operator. And we obtain the angle of normal vector of edge outlines as follows. 2. Head Candidate Detection We assume that object detection is done in advance. We can obtain the bounding box regions. We can extract foreground pixel using background information. Then, we make binary image that black area represents background and white area represents foreground. Next step, we perform human head candidate region detection and candidate region verification process. The proposed head shape detection system is shown in Fig. 2 Figure 3. Templates for Prewitt operator θ ( x, y ) = tan −1 ( My ( x, y ) ) Mx( x, y ) (1) 2.2 Candidate Region Detection In Training step, we can make mean head-shoulder shape. Mean shape has 23 points ( x1 , y1 ,..... x23 , y 23 ) . Each point has x, y coordinates. So, mean shape consists of 46-dimensional vector. Mean shape is average omega shape. There is difference between edge shape and mean shape. We consider additional 5 pixel at each point of mean shape 2.1 Edge Detection and Angle Calculation First, we extract edge from input binary images because we use only boundary information. We use canny edge detection algorithm. - 39 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 on normal vector direction for minimize difference. The weight ( w1... w23 ) are used to assign more various camera positions and sparse occlusion situation. But we do not consider occlusion in densely crowded situation. In future work, we will improve proposed algorithm that deals with densely crowded situations such as subway and railway station importance to head point and reduce the influence of shoulder points of the model. O is the edge image map. R is the angle map. ( xs , ys ) is point of the shape in the input image. A local angle αi of a model point is defined as the direction of the vertex. Fig. 4 shows head candidate region detection. Finally we get xi that represents candidate vector. Figure 5. The result of pedestrian detection Acknowledgments This work was supported by the Intelligent Robotics Development Program, one of the 21st Century Frontier R&D Programs funded by the Ministry of Commerce, Industry and Energy (MOCIE) and the authors would like to thank the Ministry of Education of Korea for its financial support toward the Division of Mechanical and Industrial Engineering, and the Division of Electrical and Computer Engineering at POSTECH through BK21 program. Figure 4. Human head candidate region We obtain the summation of candidate vector as follows. If S is greater than candidate threshold, then we accept candidate region. S= References [1] J. Garcia and M. Shah, Automatic detection of heads in colored images, in Proceedings of the 2nd Canadian Conference on Computer and Robot Vision, 2005, pp.276-281 [2] J. Yuk and K. Wong, Real-time Multiple Head Shape Detection and Tracking System with Decentralized Trackers, in Proceedings of Sixth International Conference on Intelligent Systems Design and Applications, 2006, pp.384-389 [3] S. Oliver and L. Yuriy, Pedestrian Detection and Tracking for Counting Applications in Crowded Situations, in Proceedings of IEEE International Conference on Advanced Video and Signal Based Surveillance, 2006 [4] S. Cho and D. Kim, Pose Robust Human Detection and Tracking Using Multiple Oriented 2D Elliptical Filters, Pohang University of Science and Technology, 2007 [5] H.Yoon and D. Kim, A Robust Human Head Detection Method for Human Tracking, in Proceedings of IEEE International Conference on Intelligent Robots and Systems, 2006 [6] T. Cootes and C. Taylor, Active Shape Models: Their Training and Application, Computer Vision and Image Understanding, Vol.16, pp. 38-59, January. 1995 1 23 ∑ wi × max xi ( g ( x, y) * (O( xi , yi ) × cos(ai − R( xi , yi )))) 23 i =1 (2) 3. Head Candidate Verification We found candidate region in previous step. In this step, we determine that candidate region is real human head shape region or not. A shape X which consists of vertex points X = {x1 , y1 , x2 , y 2 ,....., x23 , y 23 } can be approximated by X = X + Eb . X is mean shape, E is eigenvector, b is shape parameter. And shape parameter bi can be computed by bi = E ( X i − X ) . We compute norm T of shape parameter by bi = E T ( X i − X ) . If norm of shape parameter is greater than verification threshold, then we determine that candidate region is real human head shape region. 4. Experimental Result We use the LG surveillance database. The candidate threshold value is 5, and the verification threshold is 7.07. Fig. 5 shows the results of head shape detection processing for pedestrian detection. [7] H. Ghassan and A.G. Raffeef, Active Shape Models – Part Ⅰ : 5. Conclusions Image Search and Classification, in Proceedings of the Swedish Symposium on Image Analysis, 1998 Modeling Shape and Gray Level Variations, in Proceedings of the Swedish Symposium on Image Analysis, 1998 [8] H. Ghassan and A.G. Raffeef, Active Shape Models – Part Ⅱ : We proposed the human head shape detection algorithm using omega shape. Our implementation works well in - 40 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Complete Coverage Algorithm for a Robot Cleaner: Enhanced Spiral Filling Algorithm Tae-Kyeong Lee, Young-ho Choi, Sanghoon Baek, Se-Young Oh Department of Electrical and Electronic Engineering, POSTECH, Korea {devilee, rockboy, tkeb100, [email protected] Abstract This paper presents an online complete coverage path planning algorithm suitable for cleaning robots equipped with low cost sensors, guaranteeing the robot to completely cover a closed and unstructured planar environment. The proposed Enhanced Spiral Filling (ESF) algorithm belongs to the approximate cellular decomposition approach which abstracts the environment as a set of grids. The algorithm generates a path composed of several simple spiral paths and linkage paths among them. To minimize the number of turns, we suggest aligning the x-axis of map coordinates to one of the walls of the target space. In addition, we propose an efficient path planner to link simple spiral paths using constrained inverse distance transform; this planner can find both the nearest non-contiguous uncovered cell and the shortest path to it, simultaneously. Experiments on real robot and simulator demonstrate the efficiency and robustness of proposed algorithm. we develop a map coordinate initialization scheme based on the history of sensor readings to improve the time-to-completion by reducing the number of turns on the generated path. Second, we propose constrained inverse distance transform (CIDT) to link simple spiral paths, which can find both the nearest non-contiguous uncovered cell and the shortest path to it, simultaneously. 2. ENHANCED SPIRAL FILLING ALGORITHM Enhanced Spiral Filling (ESF) algorithm basically covers simple region using spiral filling path, and uses constrained inverse distance transform (CIDT) to link these simple regions. Here, a simple region denotes an area that can be filled using a single spiral path. To this end, the environment is incrementally modeled by a coarse-grain occupancy grid and each cell is set to one of five types: unexplored, covered, partially covered, occupied and candidate. The overall process of ESF is shown in Fig. 1. 1) Map Coordinate Initialization ESF algorithm is focused on the indoor applications. In general, indoor environments (e.g. home, office, etc.) have a rectangular shape consisting of four walls. In these cases, we can drastically reduce the number of turns by aligning the x-axis of map coordinates to the one of the walls of a target environment. In our algorithm, we assume that the 1. INTRODUCTION Among several research topics related to the cleaning robot, an efficient path planning algorithm is described in this paper. The original path planning algorithms focused on the problem of finding a path between two points minimizing the path length, journey time and energy consumption. However, cleaning robot requires other kinds of path planning algorithms which are capable of finding paths which ensure the complete coverage of an environment. This is called a coverage path planning [1] whose goal is to find paths maximizing the coverage while minimizing the time-to-completion. Some approaches in coverage path planning rely on the heuristic and randomized components [2,3]. They cannot guarantee complete coverage. Other approaches are complete coverage path planning algorithms, which generate paths that completely cover the free space. They can achieve lower time-to-completion than heuristic approaches. In this kind of approach, some algorithms, such as Backtracking Spiral Algorithm [4], Spiral-STC [5], are already proposed. Backtracking spiral algorithm (BSA) is based on the execution of spiral filling paths and assures the completeness using backtracking mechanism to link simple spiral paths. This paper has two main contributions. First, Start Initialize map to unexplored Wall following & map coordinate initialization Does neighboring candidate cell exist? End No Constrained inverse distance transform(CIDT) Yes Does candidate cell exist? Spiral filling rule Move to the target cell Is the target cell fully accessible? Yes No Contour following Yes Fig. 1 Overall process of ESF - 41 - No Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Using these rules, we can efficiently constrain the searching boundary of IDT. Therefore, we call the overall process of searching the nearest candidate cell Constrained Inverse Distance Transform (CIDT). robot starts nearby the wall and ESF starts from following this wall. Map coordinate initialization procedure is activated in the initial stage of wall following. To follow the wall with the desired distance, first of all, the robot should access to the wall until satisfying the desired distance (accessing phase) and then align its orientation to the wall (aligning phase). When the robot’s orientation is stably aligned to the wall, we then use this robot pose as the reference frame of map coordinates 2) Wall Following and Contour Following We adopt wall following and contour following procedure to cover the partially occupied cells nearby the walls and the obstacles. Therefore, spiral filling rules guarantee the complete covering of a cell because the robot moves via the centers of cells but wall and contour following procedures do not guarantee it. To handle this problem, these cells are marked as partially covered type and then considered as cells of interest in applying both spiral filling rules and CIDT and these cells are revisited to be covered completely. 3) Constrained Inverse Distance Transform Constrained Inverse Distance Transform (CIDT) is a key component of our ESF algorithm to link simple spiral paths. In the proposed Inverse Distance Transform, region growing is started from a known current cell (starting point) and terminated when meeting an unknown candidate cell (goal point) with the shortest distance and this transform generates the inverse path from the goal to the starting point. this is not always guaranteed because distance wave is propagated with the shape of square in the IDT as depicted in Fig. 2. This termination condition is implemented using following rules: IF ( D > 2r ) THEN terminate_IDT ELSE IF (nearer_candidate_cell_detected) THEN renew_r OTHERWISE continue_IDT where D is the distance of a cell being evaluated and r retains the distance of the nearest candidate cell so far. Current cell First found candidate cell Nearer candidate cell Ideal distance wave Boundary of IDT 3. EXPERIMENTAL RESULTS To show the robustness of ESF in the various environments, we conducted experiments in the four different real environments, with the size of 4.2 m by 5 m, shown in the first column of Fig. 3. (e) (f) (g) (h) : Final robot : Final robot position in Acknowledgments This work was supported by Korea Ministry of Knowledge Economy under Intelligent Robot program. (n-2)th distance wave (n-1)th distance wave nth distance wave References [1] H. Choset, “Coverage for robotics - A survey of recent results,” in Annals of Mathematics and Artificial Intelligence, 2001, pp.113-126. 2r2 a (d) 4. CONCLUSION The proposed Enhanced Spiral Filling (ESF) algorithm drastically reduces the number of turns on the generated path by applying map coordinate initialization scheme. In addition, we propose constrained inverse distance transform (CIDT) to link simple spiral paths. CIDT is the unified solution which is able to find a non-contiguous uncovered cell with the minimal path length from a cell being swept as well as path to it, simultaneously. Last, we demonstrate that our algorithm is applicable to embedded r1 Before detecting nearer candidate cell (c) Fig. 3 Real-world experimental results. The first column shows the real environment setups, the second column shows the overall trajectories, the third column shows the maps and last column shows the coverages r2 (a) (b) : Initial robot 1st distance wave 2nd distance wave 2r1 (a) (b) After detecting a nearer candidate cell Fig. 2 Definition of the boundary of IDT - 42 - [2] R. A. Brooks, “A robust layered control system for a mobile robot,” IEEE J. Robotics and Automation, 1986. [3] D. Gage, “Randomized search strategies with imperfect sensors,” in Proc. SPIE Mobile Robots VIII, Boston, MA, 1993, pp.270-279. [4] E. González, Oscar Álvarez, Yul Díaz, Carlos Parra, Cesar Bustacara, “BSA: A Complete Coverage Algorithm”, in Proc. IEEE Int. Conf. on Robotics and Automation, Barcelona, Spain, April 2005. [5] Y. Gabriely and Elon Rimon, “Spiral-STC: An On-Line Coverage Algorithm of Grid Environments by a Mobile Robot,” in Proc. IEEE Int. Conf.on Robotics and Automation, Washington, DC, May 2002. Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 A MeanShift-Watershed embedded Particle Filter: Variable number of Target Detection and Tracking Taewan Kim, Daijin Kim Department of Computer Science, POSTECH, Korea {taey16, [email protected] version, MS-WS uses fancy visual pattern clustering method that give a chance to estimate the number and position of the initial seeds and then applying regionally growing segmentation so that we can surmount the over-segmentation problem. Abstract Tracking a varying number of non-rigid object is a difficult problem because of such problems that there is no exact way to estimate the expected number of objects which we want to track and the presence of overlap and visual ambiguities. To surmount these difficulties, we introduce MeanShift-Watershed segmentation(MS-WS) embedded particle filter. Our approach combines segmentation technique which gives the regions that particles are should be scattered. And then our particle filter infers the best regions where the objects are suppose to be. In other worlds, The MS-WS segmentation directly gives a chance to estimate the inference region in feature space. By using this prior information, state transition, and color cue, the particle filter approximates the best region via sequential Monte-Carlo method. 2.1. MeanShift Segmentation The input image firstly transformed into a 3 dimensional is vector in LUV space. The image I ( x, y ) ∈ R decomposed by singular value decomposition(SVD) such that I = U ∑ V T . The image I ( x , y ) is factorized by SVD and then the columns of U represents the principle component which contains major chromatic and shading information exploiting the following formula A m×n = U ∑ V T = EV ( AA T ) m × m ∗ ∑ ∗ EV( A T A ) Tn × n , (1) LUV where A = 1 / n ( I − μ ) , EV ( • ) is eigen-vector of corresponding matrix. This eigen-image which is represented by the column of U gives a better computational efficiency because we only use gray space which contains rich chromatic and shading information with few loss of details. This eigen-image reconstructed as 1D histogram and then the MeanShift algorithm itself find modes(or centers of the regions of high concentration) of the vector. 1. INTRODUCTION Tracking multiple objects which can be appear and disappear in a scene is an open problem in many vision tasks. This problem in nature contains both a complex nonlinear property and uncertainty. To estimate variously changing number of target, we introduce MeanShift-Watershed(MS-WS) segmentation and then track the targets with particle filter framework. The remainder of this paper is organized as follows. Theoretical details of MeanShift and Watershed is mentioned in section 2. Particle filter framework is explained in section 3. Section 4 is devoted to presenting the experimental result of our tracking system as well as the experimental configuration. Finally, we conclude our approach and leave aspects of improvements in future work in section 5. 2. MEANSHIFT-WATERSHED SEGMENTATION As mentioned in section 1, MeanShift-Watershed segmentation is used for detecting all players in a scene. Detecting targets correctly is critical to achieve better tracking performance. We therefore introduce MeanShift-Watershed(MSWS) segmentation method which is a combined version between MeanShift and Watershed segmentation method. MeanShift segmentation[1] is simple but powerful method that cluster coherent visual patterns in feature space. However, it does not provide a locally separable characteristic at all. The watershed segmentation[2] is not feature space analysis method but region growing approach so that it provides regionally discriminative property. It however remains over-segmentation problem because we do not know the number of initial growing point. Hence the combined Figure 1. A histogram of eigen-image separating from players and background by using mean shift. Figure 2. (Left)An example of eigen-image, (right): Separation from players and background 2.2. Watershed Segmentation Watershed transform[2] is a method that considers any grey scale image as a topographic surface. As depicted in figure 3, the gray image can be visualized in three dimensions: two spatial coordinates versus gray levels. Intuitively thinking, we fill out the water in topographic - 43 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 where α is a mixing proportion. Unlike a mixture particle filter, we know the number of targets and corresponding likelihood(observation), {pj ( yt | xt )}j=1,...,M surface, and then all local minima and corresponding locally surrounded area are regarded as perceptual homogeneity. by MS-WS segmentation. When one or more newly appeared objects in a scene, our MS-WS algorithm can detect these objects and automatically start to track with color based measurement model 3.2 State transition model: Autoregressive model The autoregressive(AR) process is the easiest and simplest model to estimate the dynamic state transition so that we adapt such that ⎧ A~ x ti + B ~ x ti− 1 + Cv t if , s t > 0 . 5 ~ x ti+ 1 = ⎨ ~ x ti− 1 + Cv t , otherwise ⎩ where A, B, C are the AR coefficients and st ~ U (0,`1) Figure 3. An illustration of watershed segmentation (top-left):Binarized image, (top-right):Distance transform in 2D, (bottom-left):Distance transform in 3D, (bottom-right):Segmentation result Figure 4 shows an segmentation result of our MS-WS segmentation. The players and background is segmented by Meanshift algorithm shown in right of figure 2. Even thought we separate players from background, there is no perceptual homogeneity per each player at all so that we make use of watershed transform in order to segment each players with region growing approach. The result of our MS-WS segmentation is shown in right of figure 4. The segmentation result(position of all players) is explicitly used for estimating number of players and observation cue. 3.3 Observation model: Color based tracking Our observation model is color histogram based Bhattacharyya coefficient metric which is introduced by [3] 4. EXPERIMENT In figure 5, we can recognize the fact that the result of our MS-WS embedded particle filter is acceptable Figure 4. Segmentation result by MS-WS method (left):Edge detection by using Meanshift segmentation, (middle):Distance transform, (right):Labeling Figure 5, Tracking result a object(x-axis:frame number, y-axis:position at x direction), Blue line : estimated position, Black line : ground truth 3. PARTICLE FILTER FRAMEWORK The particle filter is a successful tracking method quite proper for moving with nonlinearity and uncertainty. This particle filter obeys the following Bayesian inference with sequential Monte-Carlo sampling. Prediction: ~ p ( x t | y t −1 ) = ∫ p ( x t | x t − 1 ) p ( x t − 1 | y t − 1 ) dx 5. CONCLUSION AND DISCUSSION There is no global solution for tracking varying number of targets which naturally involves huge complex interactions with overlap and ambiguities. To overcome this problem, the MS-WS is used for our detection method. Even though there is no generally applicable criterion for segmentation, the combined method between feature space analysis and region growing approach shows a better tracking result such sports player case with particle filter. t −1 Update: ~ p ( x t | y t ) = cp ( y t | x t ) p ( x t | y t −1 ) N ~ i ~i , ≈ ∑ w tδ ( xt − xt ) i where c = ~i ∝ wi w t t −1 ∑ N i ∫ p ( y t | x t ) p ( x t | y t − 1 ) dx t , p ( y t | x ti ) p ( x ti | ~ x t i− 1 ) and i i ~ q(x | x , y ) ~ i = 1 w t t t −1 t 6. REFERENCES [1] Y.Keselman,E.M.Tzanakou, ”Extraction and Characterization of regions of interest in biomedical images”, IEEE Inter. Conf. on ITAB, pp87-90, 1998 [2] J. B. T. M. Roerdink and A. Meijster, ”The Watershed Transform: Definitions, Algorithms and Parallelization Techniques,” Institute for Mathematics and Computer Science, University of Groningen, Groningen, The Netherlands, IWI 99–9-06, 1999 [3] D.Comaniciu, P. Meer, “Real-Time Tracking of Non-Rigid Objects using Mean Shift, CVPR, 2000 3.1 Acquiring optimal proposal distribution using MS-WS segmentation The proposal distribution should be able to scatter the particles into the regions with high likelihood ratio even thought there is big gap between the mode of prior density and likelihood. We adapt to two Gaussian mixture that combines both autoregressive prior and MS-WS cue. q( xt | ~ xt −1 ) = α ⋅ qMS−WS ( xt | ~ xt −1, yt ) + (1 − α ) ⋅ p( xt | ~ xt −1 ) , (2) - 44 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 NONNEGATIVE TUCKER DECOMPOSITION WITH ALPHA-DIVERGENCE Yong-Deok Kim, Seungjin Choi Department of Computer Science, POSTECH, Korea {karma13,[email protected] 2. BACKGROUND ABSTRACT Nonnegative Tucker decomposition (NTD) is a recent multiway extension of nonnegative matrix factorization (NMF), where nonnegativity constraints are incorporated into Tucker model. In this paper we consider α-divergence as a discrepancy measure and derive multiplicative updating algorithms for NTD. The proposed multiplicative algorithm includes some existing NMF and NTD algorithms as its special cases, since α-divergence is a one-parameter family of divergences which accommodates KL-divergence, Hellinger divergence, χ2 divergence, and so on. Numerical experiments on face images show how different values of α affect the factorization results under different types of noise. 1. INTRODUCTION Nonnegative matrix factorization (NMF) is a widely-used multivariate analysis of nonnegative data [1] which has many potential applications in machine learning, pattern recognition, and signal processing. Successful applications of NMF result from its ability to learn a parts-based representation through a matrix factorization, X = AS, is a data matrix, A ∈ Rm×r is a where X = [x1 , . . . , xl ] ∈ Rm×l + + r×l basis matrix, and S ∈ R+ is an encoding variable matrix. In many real applications, data has a multiway structure. Conventional methods preprocess multiway data, putting them into a matrix. Recently, there has been a great deal of research on multilinear analysis which conserves the original multiway structure of the data. Motivated by multiway extensions of SVD, NMF was also extended to 2D-NMF, nonnegative tensor factorization (NTF), NTF2, higher-order NMF, and nonnegative Tucker decomposition (NTD) [2] which are based on Tucker2, CANDECOMP/PARAFAC, PARAFAC2, and Tucker model [3], respectively. LS error function or KL-divergence has been widely used as a discrepancy measure in NMF, NTF, and NTD. Recently various divergence measures such as Csisz´ar’s f -divergences α-divergences [4], and Bregman divergences, were considered as discrepancy measures in NMF and NTF. In this paper we further elaborate our recent work on NTD [2], considering the α-divergence. 1 We develop multiplicative updating algorithms, referred to as ’α-NTD’, which iteratively minimize the α-divergence between nonnegative data tensor and Tucker model. Empirically we investigate the role of α on image de-noising by αNTD, confirming that the parameter α is related to the characteristics of a learning machine, where the α-divergence of q from p (Dα [p||q]) emphasizes the part where p is small as α increases [6]. 1 In 2.1. α-Divergence Let us consider α-divergence as an error measure. The objective function for NMF is given by P 1−α α i,j αXij + (1 − α)[AS]ij − Xij [AS]ij Dα [X||AS] = . α(1 − α) The α-divergence includes KL-divergence (KL[q||p] for α → 0 or KL[p||q] for α → 1), Hellinger divergence (α = 21 ), and χ2 divergence (α = 2), as its special cases. This objective function is iteratively minimized by the following multiplicative updating algorithms: ¾. 1 ½ > A (X/[AS]).α α , (1) S ← S¯ A> 11> ½ ¾. 1 (X/[AS]).α S > α , (2) A ← A¯ 11> S > where ¯ is Hadamard product, / is element-wise division, 1 = α [1, . . . , 1]> , and [X].α = [Xij ]. Updating rules (1) and (2), referred to as ’α-NMF’, can be easily derived using the same technique as used in [1]. 2.2. Nonnegative Tucker Decomposition An N -way tensor X ∈ RI1 ×I2 ×···×IN has N indices (i1 , i2 , . . . , iN ) and its elements are denoted by Xi1 i2 ...iN where 1 ≤ in ≤ In . The mode-n matricization of X ∈ RI1 ×I2 ×···×IN rearranges the elements of X to form the matrix X (n) ∈ RIn ×In+1 In+2 ···IN I1 I2 ···In−1 , where In+1 In+2 · · · IN I1 I2 · · · In−1 is in a cyclic order. Nonnegative Tucker decomposition (NTD) seeks a decomposiI1 ×I2 ×···×IN tion of a nonnegative N -way tensor X ∈ R+ as mode J1 ×J2 ×···×JN and N products of a nonnegative core tensor S ∈ R+ In ×Jn (n) nonnegative mode matrices A ∈ R+ , b = S ×1 A(1) ×2 A(2) · · · ×N A(N ) , X ≈X which can be written in an element-wise form as X (1) (2) (N ) Xbi1 i2 ···iN = Sj1 j2 ···jN Ai1 j1 Ai2 j2 · · · AiN jN . - 45 - (4) j1 ,j2 ,...,jN The mode-n matricization of X in Tucker model (3), is expressed by Kronecker products of the mode-n matricization of the core tensor and mode matrices: h X (n) ≈ A(n) S (n) A(n−1) ⊗ · · · ⊗ A(2) ⊗ A(1) i> ⊗A(N ) ⊗ · · · ⊗ A(n+2) ⊗ A(n+1) = fact, this paper is a shorter version of [5]. (3) A(n) S (n) A(\n)> , (5) Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 where S (n) is the mode-n matricization of the core tensor S. The representation (5) plays a crucial role in deriving multiplicative updating algorithms for NTD. 3. α-NONNEGATIVE TUCKER DECOMPOSITION α-NTD considers the objective function that is the α-divergence of b (given in (3)) from the N -way tensor X of data: Tucker model X b] Dα [X ||X 1 α(1 − α) = X αXi1 i2 ···iN X (n) ¡ ¢ vec X (n) ≈ ≈ = (n) A(n) S (n) A(\n)> = A(n) S A , ³ ´ vec A(n) S (n) A(\n)> ³ ´ ¡ ¢ A(\n) ⊗ A(n) vec S (n) . (7) (8) Note that these matricized equations have the same form as NMF. Thus, update for the core tensor follows (1) and updates for mode matrices are derived from (2). With some calculations, the multiplicative updating rules for the core tensor and mode matrices of α-NTD are given by ( A(n) ← pepper (black) pepper & salt salt (white) α = 0.5 22.18 25.04 25.54 α=1 24.82 25.99 25.09 α=2 25.97 25.49 23.59 α = 0.5 20.30 22.30 25.43 α=1 24.17 25.27 24.72 α=2 26.31 22.82 24.85 i1 ,i2 ,...,iN Multiplicative updating algorithms for NTD were elegantly derived in [2] when LS error function or KL-divergence was used as an objective function. We derive multiplicative updates for NTD in a similar way when α-divergence objective function (6) is considered. The core idea in developing multiplicative updates for α-NTD b , given by is to use the mode-n matricization of Tucker model X ← Table 1. The relationship between types of noise and divergences. In the case of pepper (black) noise, the larger value of α gives better result. In the case of salt (white) noise, the smaller value of α gives better result. (1 − α)Xbi1 i2 ···iN − Xiα1 i2 ···iN Xbi1−α . (6) 1 i2 ···iN + S noise and the smaller α works better in the case of salt noise. In fact, these results are consistent with the characteristics of α-divergence where Dα [p||q] emphasizes the part where p is small as α increases. b ).α ×1 A(1)> · · · ×N A(N )> (X /X S¯ E ×1 A(1)> · · · ×N A(N )> i . α1 h (n)> b ).α SA (X /X (n) A(n) ¯ , (n)> 11> S A ). 1 α (,9) (10) for n = 1, . . . , N . NTD NMF In the case of pepper noise, noise-contaminated pixels are replaced by Xij = 0 and the associated term in the error function is 1 b X . The larger value of α penalizes such terms, leading to the betα ij ter performance. In the case of salt noise, noise-contaminated pixels are given as Xij = 255 and the associated term in the error function 1 bij − 255α X b 1−α }. In this function the is α(1−α) {255α + (1 − α)X ij smaller value of α de-emphasize the outlier effect. 5. CONCLUSIONS We have presented a method of nonnegative Tucker decomposition in the case where α-divergence is considered as an error function. We have derived multiplicative updating rules for α-NTD which include various existing algorithms such as NMF, nsNMF, NTF, NTF as special cases. We have also investigated the role of α in α-NTD with experiments on image de-noising under 3 different types of noise. Empirical results were well matched with the known characteristics of α-divergence. 6. REFERENCES 4. NUMERICAL EXPERIMENTS We investigate the role of α ∈ {0.5, 1, 2} in the tensor factorization. To this end, we apply α-NMF and α-NTD to a image de-noising task with different types of noise or outliers. We use ORL face DB (X ∈ R48×48×400 collects 48 × 48 images of 400 people) to gener+ ate noise-contaminated images where 3 different types of noise are considered, including pepper (black), salt (white), and pepper & salt (black and white). For each image, 5% of pixels are randomly chosen and then are converted to black or white pixels. In α-NMF, the number of basis is set as 25. In α-NTD, the dimension of the core tensor is set as 24 × 24 × 25. Note that α-NTD require smaller number of parameters (about 38%), compared to α-NMF. Experiments are carried out 20 times independently for each type of noise and each value of α = {0.5, 1, 2}. As a performance measure, averaged peak-signal-to-noise ratio (PSNR) is used. Higher PSNR values represent better results. The results are shown in Table 2. In most of cases, α-NTD is more robust to noise than αNMF. Interesting observations in these experiments are as follows. The larger α results in the better performance in the case of pepper - 46 - [1] D. D. Lee and H. S. Seung, “Learning the parts of objects by non-negative matrix factorization,” Nature, vol. 401, pp. 788– 791, 1999. [2] Y. -D. Kim and S. Choi, “Nonnegative Tucker decomposition,” in Proceedings of the IEEE CVPR-2007 Workshop on Component Analysis Methods, Minneapolis, Minnesota, 2007. [3] L. R. Tucker, “Some mathematical notes on three-mode factor analysis,” Psychometrika, vol. 31, pp. 279–311, 1966. [4] S. Amari and H. Nagaoka, Methods of Information Geometry, Oxford University Press, 2000. [5] Y. -D. Kim, A. Cichocki, and S. Choi, “Nonnegative Tucker decomposition with alpha-divergence,” in Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, Nevada, 2008. [6] S. Amari, “Integration of stochastic models by minimizing αdivergence,” Neural Computation, vol. 19, no. 10, pp. 2780– 2796, 2007. Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Temporally Weighted Common Spatial Pattern for Uncued EEG Application Yunjun Nam, Seungjin Choi Department of Computer Science, POSTECH, Korea {druid, [email protected] ABSTRACT In this paper we propose the method to adopting temporal structure into CSP, by measuring adaptability between each time point of multi-trial structure and covariance matrices. and it represent EEG signal of single-trial. In experiment, many trials are repeated to acquire sufficient information. If trial is repeated N times, total mean-centered data matrix X can be represented to X = [ Xˆ 1, Xˆ 2,L , XˆN ] . The original CSP does not care about temporal structure, so sequence of trial or temporal sequence of each trial can be ignored. The normalized spatial covariance of the EEG is 1. INTRODUCTION Brain computer interface (BCI) is a system designed to translate a subject’s intention or mind into a control signal for various devices. Electroencephalogram (EEG), which captures electrical potentials from surface of scalp, is the most popular signal used for BCI because of its portability, simplicity. A large number of techniques and algorithms for signal processing and pattern recognition have been employed in the BCI community. Among them, common spatial pattern (CSP) technique is one of the most popular methods, since firstly introduced by Koles et al. in 1990[1]. We are trying to solve drawback of CSP, such as outlier sensitivity, and to expand limits by adopting multi-class paradigm or local temporal structure. Our current issue is uncued classification problem. Most researches are aimed to design good classifier for single-trial signals, which is already timely partitioned, from beginning of stimuli to attenuation of brain activity. However, in a real world environment, the machine cannot measure these time points or duration, because EEG signal is only input for the machine. And this is also one of the prime issue of BCI competition 2007. In previous competition, BCI competition III(2005’) this problem was invalidated because only one team was participated to competition.[2] T −1/2 when D is the diagonal matrix of as P = UD eigenvalues, and U is the eigenvector matrix of ΣT . Then, Each whitened matrices KL, KR is factorized by eigenvalue decomposition. D −1/2U T ΣLUD −1/2 = ΚL = ΨΛLΨ T D −1/2U T ΣRUD −1/2 = ΚR = ΨΛRΨ T They shares common eigenvectors Ψ , and eigenvalues ΛL, ΛR has the relation of ΛL + ΛR = I , when I is the C × C identity matrix and the eigenvalues of ΛL, ΛR are inversely ordered[3]. Then, final projection matrix W is represented to W = UD −1/2 Ψ . Column vectors of W is the direction of maximizing difference for projected variance of XL and XR . 2. COMMON SPATIAL PATTERN Geometrically, the discriminative features of CSP are made available by finding directions, which maximize the difference of projected variance between two EEG signals. CSP computes these directions by simultaneous diagonalization of two covariance matrices associated with two populations of EEG signals. For the EEG analysis, the raw EEG signal of single 3. REGARDING TEMPORAL STRUCTURE Basically, CSP does not take temporal information into account in the estimation of covariance matrices. Few researches were tried to adopting local temporal structure into covariance matrices by weighting adjacent time samples.[4] In this study, we want to design temporal weighting vector by calculating adaptability of each time slice about covariance matrix. CSP analysis is based on correlation structure of data set. When data set has strong correlation structure, sample group’s projected covariance is increased, and T trial is a time course of xt = [ x1(t ), x 2(t ),L , xC(t )] for T calculated as Σ = XX / tr( XX ) . The goal of this study is to design spatial filter that can classify two populations of EEG signal from left hand and right hand movement. For that, denote ΣL as the covariance matrix from left hand movement and ΣR as the matrix from right hand movement. For simultaneous diagnolization, ΣL, ΣR will be whitened by the composite spatial covariance matrix ΣT = ΣL + ΣR . The whitening matrix P is formed t = 1,L , T ,with T the number of time slices, and C the number of channels (i.e., recording electrodes). Denote the C × T matrix Xˆ as Xˆ = [x1,L , xN ] - 47 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 discriminative information. classification result also tends to be enhanced. By contrast, if data set from specific time point within various trials has weak relation with covariance matrix, it means that time point has less relation with target spatial pattern. And this adaptability between specific time points and spatial pattern can be measured by distance between sample points and its projection to eigenvector associated with largest eigenvalue of covariance matrix, as shown in <Fig. 1>. This eigenvector is the fittest vector that can represent correlation of covariance matrix and data points of highly related to this covariance matrix has low d value, which is distance between eigenvector. Figure 2. Projected variance by CSP Figure 3. Topological representation of CSP Following <Fig. 4> shows that inversed temporal adaptability, 1/ wt from each group. For better representation inversed value was used for plotting. Horizontal axis means adaptability for 0~3 sec. of time range from stimuli. Figure 1. distance between datapoint and eigenvector X has covariance matrix Σ = XX , and Σ can be decomposed by Σ = V ΦV T , first eigenvector of V associated with the largest eigenvalue of Φ can be represented to v . Then distance between eigenvector and data point d can be represented When total data matrix T to 2 2 T Figure 4. Inversed adaptability 2 d = x − v x . Selected 1~2 sec. range shows lower distance, and 0~1, 2~3 sec. range shows higher distance. So, this measure could be used for weighting temporal structure. And temporal adaptability wt of each time slice t can be measured by N wt = ∑ xnt − v T xnt 2 2 −1 . 5. FUTURE WORKS This measure will be adopted for predicting stimuli points and solving uncued classifier problem. n =1 4. EXPERIMENT RESULT Data set is from “BCI competition 2007”-data set I <motor imagery, uncued classifier application>[5]. This data set is consisted of 1905.94 seconds of continuous EEG signal, which is recorded by 100Hz and 59 electrode channels. And marker data is also included for denoting time point of stimuli. Most of stimuli were repeated every 8 seconds, and subject imagine movement of left or right hand as ordered by stimuli. As a preprocessing, signal is band pass filtered by 8~16Hz, and signal of 1s~2s after stimuli was selected as brain activity signal by heuristic selection. <Fig. 2> shows projected variance of left and right covariance by most discriminative column vector pairs of common spatial pattern of W . Red circle is the projection result of left trials and blue ‘x’ mark is the result of right trials. Each projected points are classified by LDA, and showed 82%(164/200) of correct classification result. And <Fig. 3> shows topological representation of the used vector pairs of W . Heavily colored region has more References [1] Z. J. Koles, M. S. Lazar, and S. Z. Zhou, “Spatial patterns underlying population differences in the background EEG,” Brain Topogr., vol. 2, pp.275-284, 1990. [2] Benjamin Blankertz, Klaus-Robert Müller, Dean Krusienski, Gerwin Schalk, Jonathan R. Wolpaw, Alois Schlögl, Gert Pfurtscheller, José del R. Millán, Michael Schröder, and Niels Birbaumer. The BCI competition III: Validating alternative approachs to actual BCI problems. IEEE Trans Neural Sys Rehab Eng, 14(2):153-159, 2006. [3] K. Fukunaga, “Introduction to Statistical Pattern Recognition”, 2nd ed. Boston, MA: Academic, 1990. [4] Haixian Wang and Wenming Zheng, Local temporal common spatial patterns for robust single-trial EEG classification, IEEE Trans. Neural Systems and Rehabilitation Engineering, Vol. 16, No. 2, April 2008. [5] Benjamin Blankertz, Guido Dornhege, Matthias Krauledat, Klaus-Robert Müller, and Gabriel Curio. The non-invasive Berlin Brain-Computer Interface: Fast acquisition of effective performance in untrained subjects. NeuroImage, 37(2):539-550, 2007. - 48 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Commute Time Distance On Isomap For Pose Estimation Yoonseop Kang, Seungjin Choi Department of Computer Science, POSTECH, Korea {e0en, [email protected] Abstract In this paper we present a new application of isomap using commute time distance instead of geodesic distance. We used isomap using commute time distance for estimating the pose from the pre-processed video sequence using HUMANEVA -I data set. The same experiment was also done using the original isomap algorith m for co mparison. Iso map using commute time distance showed more accurate result than the original isomap algorith m. 1. INTRODUCTION Isomap [1] is a widely used dimensionality reduction method. Isomap finds the embedding which best represents the geodesic distances between data points. Embedding with isomap is done in three steps: 1. Construct a neighborhood graph. ε-neighbor or k -nearest neighbor can be used. The graph will be represented as a sparse n n matrix. 2. Compute the shortest distance matrix D . Dijkstra’s algorithm or Floyd’s algorithm can be used to compute the shortest distances between nodes on the neighborhood graph. 3. Perform the classical multidimensional scaling (MDS) using the calculated shortest distance. This 1 T is done by calculating the matrix M HDH , 2 where H is a centering matrix and calculating the largest eigenvectors of M. However, isomap fails when the data is noisy or when the data contains outliers. This is because isomap only considers the shortest path, and the shortest path on a graph can be dramatically change by a slight change on the graph. Such weakness of isomap can be managed by using commute time distance instead of the shortest distance on the graph. figure 1. the extracted silhouettes of the first subject from the randoml y chosen frames of the vi deo sequence of HUMANEVA-I data set where ei is a n-dimensional vector filled with 0 except 1 on the i th element. We can use commute t ime distance for isomap. Isomap using commute time distance is also called a commute time embedding [2]. Performing classical MDS on the matrix D is equivalent to finding largest eigenvectors of L , which is also smallest eigenvectors of L. Isomap with commute time distance has several known properties. The method has a tendency of clustering nodes [2]. As commute time distance takes account of every possible path between two nodes, isomap using commute time distance shows a robust performance on the data with noise or outliers. Here we seek for an application of isomap using commute time distance – pose estimation. 3. EXPERIMENTS We estimated the position of the joints from the video sequence. HUMANEVA-I data set [3] was used for our experiment. The data set contained the video sequences of four people performing various actions while attaching several markers for motion capture devices . The same actions were recorded by seven cameras, three of them were in color, and the other four of them were black and white camera. The data set also contains the motion capture data for some of the video sequences. We extracted the silhouettes from the frames of the video sequence which contains the first subject walking, taken by the second color camera and resized them into 100 by 75 pixels (Fig. 1). 2. ISOMAP US ING COMMUT E TIME DIS TANCE Commute time between two nodes xi and x j on a graph is defined as the average number of steps required for a random walker to start from u , visit v , and return to u . Calculating the commute time distance between xi and x j on the graph can be done by using pseudoinverse of the graph laplacian L of the graph: Dij d ( xi , x j ) (ei e j )T L (ei e j ) , - 49 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Using Commute Times”, IEEE Transactios on Pattern Analysis and Machine Intelligence, pp. 1873-1890, 2007 [3] L. Sigal and M. J. Black. “HumanEva: Synchronized Video and M otion Capture Dataset for Evaluation of Articulated Human Motion”, Technical Report CS-06-28, 2006. 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 50 100 150 figure 2. The embeddi ng di mension (horizontal axis) vs. the mean s quared error of pose estimation (vertical axis) by the original isomap (dashed line) and isomap wi th commute time distance (soli d line). Our task was to estimate the position of the markers using the original isomap and isomap using commute time distance. For both algorithms 4-nearest neighbor graph was constructed. Total 200 frames were randomly chosen for the experiment. Among these frames, 150 of them were used as a training set, and the rest 50 of them were used as a test set. The embedding dimension was varied from 2 to 150. The embedding was done using both training and test set. Estimating the position of markers of a test sample frame was done by taking average of marker positions of 8 nearest neighbors of the embedded coordinate of the test sample frame. The estimated marker positions were compared to the actual marker positions from HUMANEVA-I data set and the difference between those two values were recorded. Though the differences between the error rates of two methods used were varying, isomap using commute time distance showed significantly lower error rate than the original isomap (Fig. 2). It seems due to the fact that the automatically extracted silhouettes from a video sequence contained a remarkable noise. The background was not removed perfectly, and the silhouettes of the two frames with the same pose can be different. Using commute time distance would have removed the effect of such kind of noises. 4. Conclusions We have applied isomap using commute time distance on the problem of estimating the person’s pose from the video sequence and compared it with the original isomap which uses geodesic distance. We found that using commute time distance is helpful for solving this problem, and we also expect that the isomap with using commute time distance can be helpful on the other problems dealing with video sequence data. References [1] J.B. Tenenbaum, V. de Silva, and J. C. Langford. “A global geometric framework for nonlinear dimensionality reduction”, Science, 290(5500) , pp. 2319-2323, 2000. [2] Huaijun Qiu, Hancock. E. R. “Clustering and Embedding - 50 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Network Decomposition: A New Framework for Analyzing Dynamic Microarray Data Yongsoo Kim, Daehee Hwang, and Seungjin Choi Department of Computer Science, POSTECH, Korea {anoyaro, dhhwang, [email protected] Abstract We present a new network-based analysis method of microarray data set. Global protein-protein interaction(PPI) data is weighted using DNA microarray data with various conditions. We applied orthogonal non-negative matrix factorization (ONMF-A), NMF with orthogonality constraint on basis matrix, on this matrix. Factorization gives sub-network clusters and activation patterns of each cluster, which is very suitable to interpret. Non-negative matrix factorization (NMF)[1] is a group of algorithms in multivariate analysis and linear algebra where a matrix, X , is factorized into (usually) two matrices, A and S . X = AS (1) NMF is successfully applied for analyzing microarray data, especially for clustering[2]. NMF showed robust clustering result then other method, such as hierarchical clustering and self-organizing map. Lately, non-smooth NMF is successfully applied to bicluster(simultaneous clustering of rows and columns) the microarray data[3]. One drawback of NMF is non-negativity constraint. Although the microarray data is by nature non-negative, fold change values, difference of intensity value between different condition data, are not. NMF cannot be directly applied to this problem. We suggest a framework to construct a weighted global PPI matrix and decompose using one variant of NMF, orthogonal NMF, which result in sub-networks of PPI data and its activation patterns that are very easy to interpret. 1. INTRODUCTION The interactions between proteins are important for many biological functions. For example, signals from the exterior of a cell are mediated to the inside of that cell by protein-protein interactions of the signaling molecules. This process, called signal transduction, plays a fundamental role in many biological processes and in many diseases (e.g. cancer). Since its importance, protein-protein interaction (PPI) data is frequently used data in the Systems biology, a relatively new biological study field that focuses on the systematic study of complex interactions in biological systems. A DNA microarray is a high-throughput technology used in molecular biology and in medicine. It consists of an arrayed series of thousands of microscopic spots of DNA oligonucleotides, called features, each containing picomoles of a specific DNA sequence. Using DNA microarray we can simultaneously measure gene expression levels of many genes. DNA microarray does not give precise amount of individual genes but gives high-throughput measures of bunch of genes which is suitable for systems biology approach. However, since its nature of high dimensionality, there are several obstacles to analyze DNA microarray data. Furthermore, as the importance of DNA microarray is emerged, many biologists use many chips for a single system to observe dynamic changes in the biological system(e.g. time-course microarray data). There is a necessity to build up a fine frame work to analyze these data. 2. WEIGHTING GLOBAL PPI Global PPI data is obtained from two source, NCBI and KEGG pathway database. These data sets tell the protein pairs that are physically interactable. We assumed that the the chance of interaction is depend on the amount of interactants. We used the fold-change values to weight the global PPI data. For the i’th interaction, we give the weight as follows. I i = ∑ (σ (log 2 (Tij ) − log 2 (Cij )) − 0.5) (2) n j =1 Where σ ( ) is logistic function and log 2 Tij − log 2 (Cij ) is fold-change value of j’th gene in i’th interaction where n is number of interactant(actually 2). More precisely, Tij is intensity value of j’th gene in i’th interaction at the treatment sample, where Cij is that of control sample. This weighting function ranges -1 to 1 according to the value of fold change value. This will weight close to 1 if the both - 51 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 dependent(Factor 1,2). interactant are highly overexpressed and -1 in opposite case. We can get this interaction vector I for every control-treatment pairs of DNA microarray. So we can get n by m matrix H where n is number of interactions and m is number of control-treatment sample pairs. However, H may have negative values. We resolve this problem by separating negative values and positive values. That is, let there exist an non-negative matrix A and B that satisfies H = A + B (3) We finally construct non-negative matrix by concatenating these two matrices. That is, final matrix X is given as X = [ A, B] (4) Now the number of columns is doubled and X non-negative. Fig 1. Network activation patterns(matrix S in X = AS) is 3. ORTHOGONAL NMF Applying NMF on constructed matrix X , we get two matrix A and S . Using A , we can get clusters of interaction sets and S is the activation patterns of each cluster. For interpretation, we want to make the clusters as disjoint as possible(Not strictly disjoint, because it is not natural for biological problem). Orthogonality constraint on A can answer this requirement. [4] suggested a way to impose orthogonality constraint using true gradient in Stiefel manifold which is a parameter space with the orthogonality Fig 2 Sample sub-network, from basis 1 in matrix A constraint A A = I . The multiplicative updates rule for ONMF is T S ←So 5. CONCLUSION We have presented a new framework to perform network-base analysis on dynamic microarray data using ONMF. This framework successfully summarized large microarray data sets into several sub-networks with providing activation patterns of each network. AT X (5) AT AS XS T A ← Ao (6) ASX T A References [1] Daniel D. Lee and H. Sebastian Seung (1999). "Learning the parts of objects by non-negative matrix factorization". Nature 401 (6755): 788–791. [2] Brunet, J.-P., Tamayo, P., Golub, T.R., Mesirov, J.P. (2004). “Metagenes and molecular pattern discovery using matrix factorization”. Proceedings of the National Academy of Sciences of the United States of America 101 (12), pp. 4164-4169 [3] Carmona-Saez, P., Pascual-Marqui, R.D., Tirado, F., Carazo, J.M., Pascual-Montano, A. (2006). “Biclustering of gene expression data by non-smooth non-negative matrix factorization”. BMC Bioinformatics 7, art. no. 78 [4] Seungjin Choi (2007),"Algorithms for orthogonal nonnegative matrix factorization," in Proceedings of the International Joint Conference on Neural Networks ( IJCNN-2008 ), Hong Kong, June 1-6, 2008 Note that the update rule of S is same as standard NMF. 4. EXPERIMENTS We used microarray data from gene expression omnibus(GEO). We used microarray data set which analyzes MCF-7 breast cancer cells treated with various doses of epidermal growth factor (EGF) or heregulin (HRG) for up to 90 minutes(Accession: GDS252). There are 4 different doses and 7 time points each(total 28 samples). Fig1 shows the result of ONMF(matrix S). Right half columns are down-regulated and other half is up-regulated(A and B from (3)). ONMF separated sub-network clusters according to 3 patterns, constantly active(Factor 5,6), time-course(Factor 3,4) and dose - 52 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 The POSTECH Subtle Facial Expression Database 2007 (SFED07) Sungsoo Park, Hyung-Soo Lee, Jongju Shin, Daijin Kim Dept. of Computer Science and Engineering, POSTECH, Korea {sungs, sooz, jjshin, [email protected] subject in the database. In this figure, four image groups represent the different expressions. Abstract We constructed POSTECH Subtle Facial Expression Database 2007 (SEFD07). SEFD07 contains the true-color face images of 20 people representing 4 various images (4 expression variations such as neutral, subtle smile, subtle surprised, subtle angry) per person. All of the people in the database are Korean. We also present the result of subtle facial expression recognition experiment using Support Vector Machines (SVMs) in order to provide an example evaluation protocol on the database. The database is expected to be used to evaluate the algorithm of subtle facial expression recognition. Figure 1. Some images with expressions in the SEFD07 1. Introduction facial Four kinds of facial expressions such as neutral, subtle smile, subtle surprised, and subtle angry are considered. The data could be used for evaluating the robustness of subtle facial expression recognition algorithms. Recently, facial expression analysis has been attracted a plenty of interest in the area of computer vision because it plays an important role in the human-machine communication [1]. Most of existing facial expression analysis methods attempt to recognize the extreme facial expressions. However, in the real situation, people do not always express he facial expression extremely. To tackle this problem, several researchers have been tried to recognize the subtle facial expressions [2, 3], but there are not yet appropriate database to evaluate recognition performance of subtle facial expression. To contribute evaluation of subtle facial expression recognition, we have built a new database called POSTECH Subtle Facial Expression Database 2007 (SEFD07). 2. Overview of POSTECH Subtle expression Database 2007 (SEFD07) different 3. Capturing Environment We describe the capturing environment. For obtaining images of a subject, we used CNB-AN202L CCD cameras capturing images. We asked the subject to sit in the chair and look at the frontal camera with one of four expressions such as neutral, subtle smile, subtle surprised, subtle angry. Then we captured images form camera. For one expression, we were able to obtain sequence images, and we choose before expressed images and well expressed images for each subtle expression. Consequently, we obtained 4 kinds of expressions x 2 kinds of images (before expressed image and well expressed image) = 8 images for a subject. Facial SEFD07 includes 160 images because there are 20 subjects with 8 images per each subject. Each image is recorded in BMP format. The image is of size 640 by 480 and of true color. Figure 1 shows the image set of a 4. Performance Evaluation In order to show the difficulty of the database for subtle - 53 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 facial expression recognition algorithms and provide an example evaluation protocol on the database. To recognize the facial expression, we extracted facial expression features by applying the Active Appearance Model (AAM) [4,5] fitting on the SEFD07 images as shown in figure 2. expression classifier. Table 1 : Recognition rates with SVM based facial expression recognition 5. Conclusions Figure 2. The mean shape of AAM and fitting result of the AAM In this paper, we described the SFED07 database, which contains 8 images of 20 subjects where there are 4 kinds of subtle expressions. We also evaluated SVM based facial expression recognition algorithm on this database. We expect the database to be used to evaluate the algorithm of subtle facial expression recognition. The database is now available for research purpose on our web site (http://imlab.postech.ac.kr). The AAM is a suitable method to extract the important facial shape feature by fitting the model on the facial expression images. To classify the extracted facial expression features, we take the Support Vector Machines (SVMs) based classifier [6], as shown in figure 3. References Figure 3. The facial expression based on the SVM classifiers [1] B. Fasel and J. Luettin, Automatic Facial Expression Analysis: A Survey, Pattern Recognition, vol. 36, no. 1, pp. 259-275, 2003. [2] M. Song, H. Wang, J. Bu, C. Chen, and Z. Liu, Subtle Facial Expression Modeling with Vector Field Decomposition, Proceedings of IEEE International Conference on Image Processing, pp. 2102-2104, 2006. [3] J. R. Cowell and A. Ayesh, Extracting Subtle Facial Expression for Emotional Analysis, Proceedings of IEEE International Conference on Systems Man and Cybernetics, vol. 1, pp. 677-681, 2004. [4] G. Edwards, C. Taylor, and T. Cootes, Active Appearance Models, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 23, no. 6, pp. 681-685, 2001. [5] I. Matthews and S. Baker, Active Appearance Models Revisited, International Journal of Computer Vision, vol. 60, no. 2, pp. 135-164, 2004. [6] V. N. Vapnik, Statistical Learning Theory, Wiley -Interscience Press, 1998. classification It consists of 4 facial expression SVM classifiers which are the neutral, smile, surprise and angry classifiers and determined the facial expressions of the input facial feature as the label of the classifier whose output value is the maximal. 5. Evaluation Results For evaluating the subtle facial expression recognition performance on the SEFD07, we used the half of the subjects as the training set, and the remaining subjects as the test set for the 2-fold cross validation. Table 1. represents the confusion matrix of the facial expression recognition when we take the SVM as the facial - 54 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 COLOR CONSTANCY ALGORITHM USING SOM FOR AUTONOMOUS SOCCER ROBOTS Yasunori Takemura and Kazuo Ishii Department of Brain Science and Engineering, KYUTECH [email protected], ishii@}brain.kyutech.ac.jp 1. INTRODUCTION RoboCup is an international joint project to promote Artificial Intelligent (AI), robotics, and related fields. It is an attempt to foster AI and Intelligent robotics research by providing standard problems where a wide variety of technologies can be integrated and examined. In RoboCup, soccer game is selected as a main topic of research, aiming at innovations to be applied for socially significant problems and industries in the future [1]. “Hibikino-Musashi” is a joint RoboCup middlesize league (MSL) soccer team [2,3]. Members of the team are from three different research and educational organizations located in the Kitakyushu Science and Research Park, Kitakyushu, Japan. “Musashi” robot is developed based on the concept of the “omnidirectional” and “modularity” concept[4]. In this paper, we present the color recognition algorithm based on color constancy algorithm of using Self-Organizing Map (SOM) for our robot vision system. SOM is an unsupervised learning algorithm that performs topology-preserving transformation from higher-dimensional vector data spaces to low map spaces. The SOM has become a powerful tool in many areas such as data mining, data analysis, data classification and data visualization [5,6,7]. Our robot vision system is based on YUV and HSV color map spaces. Both color maps have different vector spaces, then, some objects are recognized by using threshold to use logical addition to YUV and HSV thresholds [8]. To realize a robust robot vision system against light changing environment, in our new robot vision algorithm, recognition of the light color environment and the threshold parameters are, both, estimated by using SOM 2. CONVENTIONAL “MUSASHI” ROBOT VISION SYSTEM Conventional vision system of the “Musashi” robot consists of omni-directional mirror and IEEE 1394 digital camera. The robots recently have strong kicking devices that kick high loop-shoots, so the camera is protected with a plastic cover. An acryl tube is used for this purpose. Data transmission from camera to laptop PC is in YUV format. We convert YUV format into HSV format. The upper and lower thresholds for YUV and - 55 - HSV formats are used to extract the colors, which are then used in logical conjunction of both formats. This method improves the color extraction robustness and accuracy in variable lighting condition. Presently, we set the thresholds manually[8]. 3. COLOR CONSTANCY ALGORITHM USING SELF-ORGANIZING MAPS (SOM) OF Basic Input data and Output data In this paper, SOM is used for the recognition to the light environment condition. The basic four kinds of colors are prepared around the camera. In this time, the input data x is (Yg , U g , V g , Yr , U r , Vr , Yb , U b , Vb , Yw , U w , Vw ) . Y, U and V mean value of YUV 8 bit color map. This input data vector is normalized -1.0 to 1.0 vectors. Then, x is inputted the SOM. Output data needs the three kinds of the color thresholds: There are orange (ball), white (line) and green (fields), because the RoboCup Middle Size League (MSL) changes the rule which changes color goal (blue and yellow) to no color goal in this year. Orange is recognized with threshold Umin, Umax ,Hmin = 0.0 (constant value) and Hmax. Green is recognized with Ymin = 0.0 (constant value), Ymax ,Umin= 0.0 (constant value) and Umax. White is recognized with Vmin and Vmax = 255(constant value) H, S and V mean Hue, Saturation and Value (8 bit). is In this time, output data o o o g g w ( H max , U min , U max , Ymax , U max , Vmin ) . Upper index means o (orange), g (green) and w (white). All parameters are normalized -1.0 to 1.0 vector. B Color constancy algorithm using SOM In a Self-organizing map, the neurons are placed at the nodes of a lattice that is usually two-dimensional. Higher-dimensional maps are also possible but not as common. The neurons become selectively tuned to various input patterns (stimuli) or classes of input patterns in the course of a competitive learning process. The locations of the neurons so tuned (i.e., the winning neurons) become ordered with respect to each other in such a way that a meaningful coordinate system for different input features is created over the Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 lattice[9]. A self-organizing map is therefore characterized by the formation of a topographic map of the input patterns in which the spatial locations (i.e., coordinates) of the neurons in the lattice are indicative of intrinsic statistical features contained in the input patterns, hence the name “self-organizing map”. SOM algorithm has the 4 processes: there are Evaluative Process, Competitive process, Cooperative process and Adaptive process. In the following, we describe the each process. Table I shows the variable used explanation of the color constancy algorithm of using batch type SOM. The learning data is defined by eq.(0). θ i = [xi y i ] (0) (a) Evaluative Process In evaluative process, all class learning data is calculated the distance between each reference vector. E = w − xi k i k 2 TABLE I VARIABLE USED EXPLANATION OF ALGORITHM Symbol θ x y i w k E k* φ d(a,b) ψ σ τ (1) (b) Competitive Process In competitive process, to find the best match of the input vector x with the reference vector w, the best matching unit (k*) is defined that the minimum distance of Eik (eq. (2)). (2) k i* = arg k min E ik (c) Cooperative process A typical choice of φ ik that satisfies these requirements is the Gaussian function (eq.(3)). * 2 ⎛ ⎞ (3) φ ik = exp⎜ d (k , k i ) 2⎟ 2 σ ⎝ ⎠ The parameter σ is the “effective width” of the topological neighborhood. It is called neighbor radius. Its use also makes the SOM algorithm converge more quickly than a rectangular topological neighborhood. Another unique feature of the SOM algorithm is that the size of the topological neighborhood shrinks with time. A popular choice for the dependence of σ on discrete time t is the exponential decay described by (4) σ =σ + (σ −σ ) exp − t min max min k i = φ ∑ φ i' k Best Matching Unit (BMU) Neighbor Function Euclidean distance between a and b Learning late Neighbor radius Time constant In this color constancy algorithm, learning data q is described by eq.(0). So, unit vector has also input vectors and output vectors. In other words, reference vectors have also same dimensions of input and output vectors. In this time, we use the batch SOM, then the reference vectors w are updated by eq.(7). (7) w k ( t + 1) = ∑ ψ ik θ i i 4. CONCLUSION In this papere, we described the new color recognition algorithm of color constancy by using SOM. The experiment of this algorithm will be introduced in the POSTECHKYUTECH joint workshop. [1] [2] [3] [4] [5] [6] ( τ) [7] summation of the φ ik (eq.(5)) ψ Distance between input vector and reference vector REFERENCES In the batch SOM, each unit learning rate is defined by ψ ik which is normalized by k i Quantity Learning data Input vector Output vector Index expressing classes (i = 1, .... , I) Reference vector Index expressing unit (k = 1, ... , K) [8] (5) [9] i' (d) Adaptive process In the Kohonen’s SOM, in adaptive process, all unit vectors are adjusted by using the update formula eq.(6) (6) w k ( t + 1 ) = w k ( t ) + ψ ik ( x i − w k ) - 56 - RoboCup homepage, http://www.robocup.org/ H.Matsubara, M.Asada, et. al., The RoboCup:The Robot World Cup Initiative, IN PROCEEDINGS OF IJCAI-95 Y.Takemura, A.A.F. Nassiraei, et. al., “HIBIKINOMUSASHI”, RoboCup 2008 Suzhou, CD-ROM Proc. A.A.F. Nassiraei, Y.Takemura, et. al., “Concept of Mechatroncis Modular Design for an Autonomous Mobile Soccer Robot”, CIRA2007, Jacksonville, pp.178-183 T.Kohonen, “Self-organized formation of topologically correct feature maps”, Biol. Cybernetics, vol.43, pp.59-69, 1982 Saalbach G.Heidemann and H.Ritter, “Parametrized SOMs for Object Recognition and Pose Estimation”, Proc. Of ICANN 2002, pp.902-907, 2002 Barreto, G.A., Araujo, et. al, “A distribution of a simulator of a simulator for robots and its application to position estimation by self-organization”, Technical Report of IEICE, vol.100, No.687, NC2000-147,pp.149-156 K.Azeura, I.Godler, “Color Sampling with Omni-Directional Camera in Soccer Robot”, Proc. Of RSJ 2006, Okayama, 1B25.pdf, 2006, In Japanese T.Kohonen, “The Self-organizing map,” Proceedings of the Institute of Electrical and Electronics Engineers, vol.78. pp.1460-1480 Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 A Pixel-parallel Anisotropic Diffusion Algorithm for Subjective Contour Generation Youngjae Kim and Takashi Morie Graduate School of Life Science and Systems Engineering, Kyushu Institute of Technology [email protected] Introduction: In natural-scene images obtained by image sensors, the edge information is often lacked due to the illumination condition and poor contrast with the background. Even in such cases, the human can recognize an object by generating imaginary contours, which is known as subjective contours. A typical figure that induces subjective contours is Kanizsa figures as shown in Fig. 1, where the human can perceive a white triangle or square at the center region [1]. Regarding subjective contour generation, physiological, psychophysical, and computational models have been proposed. Some physiological experiments revealed that neurons in the primary visual cortex (V1 or V2) respond to subjective contours as well as to real contours [2]. On the other hand, in some computational models, the physically-existing points are combined with a spline function to realize a subjective contour [3]. These models are useful but it is difficult to apply them to VLSI implementation because of their complexity. Thus we use a model that makes subjective contours grow from the physicallyexisting points. In order to implement such a model, we have proposed a pixel-parallel directional (anisotropic)-diffusion algorithm (DDA) to generate subjective contours, which is suitable for VLSI implementation because it is based on the framework of cellular neural networks (CNNs) and cellularautomata (CA) with nearest-neighbor connections [4]. Furthermore, we have proposed a pixel-parallel anisotropic diffusion circuit based on a merged analog-digital circuit approach for subjective contour generation [5]. In this paper, we describe a pixel-parallel anisotropic diffusion algorithm and show numerical simulation results for subjective contour generation. Our algorithm consists of two parts: a directional (anisotropic)-diffusion algorithm (DDA) and a ridge-line detection algorithm (RDA). The DDA generates analog ridge-like functions of the diffusion state by propagating the internal-state of PU’s in the given direction. The RDA generates binary line-segments by tracing the ridge lines generated by DDA. V (i, j;t) = V1 (i, j;t) +V2 (i, j;t), Vk (i, j;t) = Vkx (i, j;t) +Vky (i, j;t), (1) (2) where V (i, j;t) represents the internal state of a PU at coordinates (i, j) at time t; Vk (i, j;t), (k = 1, 2) represent the two diffusion states propagating in the opposite directions; Vkx (i, j;t) and Vky (i, j;t) are their x- and y- directional components. As an initial state, U(i, j; 0) is set to ’1’ at the pixel position of the diffusion source, and is set to ’0’ at the other pixels. The updating rule is as follows. The number of updating iterations is determined by the diffusion distance. Vkx (i, j;t + 1) (3) = αVk (i + ak , j + bk ;t) +U(i + ak , j + bk ; 0), Vky (i, j;t + 1) (4) = βVk (i + ck , j + dk ;t) +U(i + ck , j + dk ; 0), for k = 1, 2, where α and β are non-negative constants which determine the diffusion directions, and are given by the following equations: | tan θ| = β/α for θ , 90 deg, α = 0 for θ = 90 deg, (5) (6) α+β = (7) p, therefore, α = p/(1 + | tan θ|), (8) where, p is a constant, which affects the amplitude of diffusion states but hardly affects the diffusion direction. Therefore, p is set at unity, here. The indices are determined by the diffusion orientation as shown in Table 1. Fig. 2 shows the relation between the variables and a numerical simulation result (for θ=60 deg). Fig. 1: Kanizsa figures: triangle(a) and square(b). Directional-diffusion algorithm(DDA): The DDA is a CNN-based algorithm that propagates the PU’s analog internal states in a given direction from a given diffusion source. The proposed algorithm is given as follows: Fig. 2: (a) Relation between the variables and (b) Numerical simulation result - 57 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Table 1: Relation between diffusion direction θ and indices (ak , bk , ck , dk ) for DDA. V1x V1y θ (deg) 0 ≤ θ < 90 a1 b1 c1 d1 0 1 1 0 90 ≤ θ < 180 0 -1 1 0 θ (deg) V2x V2y a2 b2 c2 d2 0 0 -1 1 0 0 < θ ≤ 90 0 -1 -1 0 90 < θ < 180 0 1 -1 0 orientation, directional diffusion is executed by using the DDA, and ridge-lines are generated by using the RDA. If the ridge-lines generated from two confronting diffusion sources are connected, a pair of ridge-lines is considered as a subjective contour. After the above processing is performed for various orientations, if a subjective region is formed by a set of the extracted subjective contours, perception of Kanizsa figures is achieved. The subjective region can easily be extracted by the ordinary labeling processing. Fig. 4 shows the simulation results of subjective contour generation. Using the proposed algorithm, the Kanizsa triangle and square have successfully been extracted. Table 2: Relation between diffusion orientation θ and indices (mk ,nk ) for RDA m1 n1 m2 n2 0 -1 0 1 0 90 0 -1 0 1 0 < θ < 90 1 -1 -1 1 90 < θ < 180 -1 -1 1 1 Ridge-line detection algorithm(RDA): The RDA is an algorithm for obtaining binary line-segment information U(i, j;t) along the diffusion directions from V (i, j) ≡ V (i, j;t) after executing the DDA (Fig. 2(b)). This is a CAbased algorithm, and is given as follows: At pixel (i, j), if all the following conditions are satisfied, then U(i, j;t + 1) = 1. • • • • U(i, j;t) = 0, ∀ (i , j ) ∈ N ,∃ U(i , j ;t) = 1, N N ij N N V (i, j) > V (i + m1 , j + n1 ), V (i, j) > V (i + m2 , j + n2 ), where Ni j is a set of neighboring pixels of (i, j), and indexsets, (m1 , n1 ) and (m2 , n2 ) are given in Table 2. This updating is repeated for the given iterations. Fig. 3 shows an example of the result of ridged-line detection. Fig. 3: Ridge-line extraction result. Generation of subjective contours: The algorithm described above has been applied to subjective contour generation in Kanizsa figures shown in Fig. 1. In order to obtain the diffusion source positions from the given image, orientation-selective Gabor-filtering is performed. Using the obtained diffusion source positions and Fig. 4: Simulation results of subjective contour generation: (i) input image (100×100 pixels), (ii) diffusion sources, (iii) diffusion results by DDA, (iv) ridge-line detection results, (v) subjective contour extraction, (vi) subjective region extraction. Conclusion: We proposed a pixel-parallel anisotropic diffusion algorithm(DDA) for subjective contour generation. The DDA generates analog ridge-like functions from the given diffusion source position in the given direction, and the RDA traces the ridge lines. Therefore, even if the pair of diffusion ridge lines from the confronting diffusion sources do not meet strictly, the connected line can be detected by the RDA. The proposed algorithms can be implemented in pixel-parallel, and they use only the neighboring pixel states, they are suitable for the VLSI implementation. References [1] G. Kanizsa, Scientific American, 234, 48–52, 1976. [2] Tai Sing Lee, Neuron, 33(5), 667–668, 2002. [3] H. Yasuda, et al., IEICE Trans., J73-D-II(6), 906–913, 1990. [4] Y. Kim and T. Morie, ISCAS2005, 4237–4240, 2005. [5] Y. Kim and T. Morie, J. Signal Processing, 10, 259– 262, 2006. - 58 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 RESEARCH ON CONTROL OF AN UNDERACTUATED MANIPULATOR Yuya Nishida and Masahiro Nagamatsu Department of Brain Science and Engineering, Kyushu Institute of Technology Email: [email protected] 1. INTRODUCTION Recently, control and analysis of an underactuated manipulator (UAM) that has nonholonomic system attract the attention of researcher. An UAM has passive joint mounted no motor compared to each joint of generic manipulator mounted motor. Overall weight saving and energy saving are expected by control of manipulator that has fewer actuator than its degree of freedom. In an UAM, the torque is transmitted to passive joint by inertia because there is no driving force in passive joint. Thus, control of an UAM is difficult problem. This research has studied control system that set state of an UAM from random initial state to target state. As control method that can be stopped movement of manipulator when state of manipulator is target state, we can think trajectory planning and reinforcement learning [1][2]. Trajectory learning is method that search trajectory along objective function under constraint of stopping on target state. But trajectory planning has problem that can not handle change of state. At the same time, Reinforcement learning has impressive self-directive ability, and handles unexpected change of state. But reinforcement learning needs for learning optimal trajectory within a relatively long time to learn by trial and error. In this paper, we explain control method that combines trajectory planning and reinforcement learning, and we report trajectory planning in this control method. 2. CONTOROL METHOD We use actor-critic method as reinforcement learning. Actor-critic method learns with updating value function (critic) by interaction between agent (actor) and environment. Actor is not able to learn optimal action when learning of critic does not advance, because actor that decides current action is evaluated and altered by critic. Critic is not able to learn optimal value function when actor never has gotten reward from environment, because critic is updated by TD-error. This is difficult that actor sets state of an UAM to target state and gets reward from environment. Therefore, controlling attitude of an UAM by actor-critic method need lot of time. And so, we suggest the following control method of an UAM. Fig.1 Image of this control method 1. We get optimal trajectory from random initial state using trajectory planning. 2. We increase value of getting trajectory in value function of actor- critic method. 3. We learn using actor-critic method with value function. We promise that convergence rate of learning is increased by giving the trajectory from trajectory planning to initial value function as above. 3. TRAJECTORY PLANNING We think about concatenated neural networks in time series as shown fig.2. A neural network is that learned forward model of an UAM. The neural network can output next state vector st+1 in current state vector st and current input torque τ1. We are able to get after-n-hours state vector sn and the following by n of concatenated neural networks. T = {τ 0 , τ 1 , K , τ n −1 τ n } (1) S = {s0 , s1 , K , sn −1 (2) sn } Equation 1 is set of input torque T, and eq.2 is set of state S. We can get set of state vector S by inputting set of input torque T to concatenated neural networks. If final state vector sn that n of concatenated neural networks output is target state vector s*, we get set of input torque T*={τ*0, τ*1, …, τ*n-1, τ*n} that can set final state vector sn to target state vector s* and set of state vector S*={s*0, s*1, …, s*n-1, s*n} between initial state vector s0 and target state s*. - 59 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Fig.2 Concatenated Neural Networks in time series In this research, trajectory planning is to get set of state vector S*, and set of state vector S* is trajectory that we will give to value function. If we convert this problem into optimization problem, this problem is converted into optimization problem as following. Constraint h = 0 (3) ) 2 250 Y axis ( h = s* − sn −1 500 ) E τ , w LPPH = w LPPH hT -250 ( ) ) dw ∂E τ , w LPPH − αw LPPH = dt ∂τ t -500 -250 0 250 500 X axis Fig.3 Result of simulation Above w is weight in LPPH, and eq.5 is evaluation function. We need set of input T that evaluation equation minimize, if we want to get set of state vector S* between initial state and target state. As this time, updating equation in the evaluation equation is given as following. ( -500 (5) LPPH dτ ∂E τ, w LPPH = −η dt ∂τ Initial state 0 (4) Equation 3 and 4 is constraint condition in optimization problem. We apply LPPH (Lagrange Programming neural net work with Polarized High-order connections) to this optimization problem. LPPH is algorithm that is developed by nagamatu’s laboratory in KIT [3]. ( Final state (6) (7) 5. CONCLUSION We explain control method that combines trajectory planning and reinforcement. And we suggested trajectory planning in this control method. But we got proper trajectory between initial state and target state. In future work, we will readjust constraint or evaluation equation, and will run a simulation control method. 6. ACKNOWLEDGMENT This research was supported by Grant-in-Aid for 21th century COE program. In above equation, η is learning rate and α is cutting rate. Cutting rate was used to do not infinitely increase weight wLPPH. 7. REFERENCE 4. SIMULATION We plan a trajectory for ten seconds eq.2 and eq.3 as updating equation in an UAM. Evaluation equation eq.5 was newly added term of total input torque. We hope trajectory that minimize ．． total input torque. As s = [θ1 θ2 θ1 θ2]T, We set initial state s0=[0 0 0 0 ]T and target state s*=[π 0 0 0]T. Result of trajectory planning is shown in fig.3. We got trajectory that state of an UAM near to target state s*. But Final state included steady-state error and velocity of final state was not zero. Thus, we think that final state change after ten seconds. [1] N.Shiroma, K.M.Lynch, H.Arai, and K.Tanie, “Motion Planning for a Three-Axis Planar Manipulator with a Passive Revolute joint”, Transactions of the Japan Society of Mechanical Engineers(C), Vol.66, No.642, pp.545-552 [2] J.Yoshimoto, S.Ishii, and M.Sato, “Application of Reinforcement Learning Based on On-Line EM Algorithm to Balancing of Acrobot”, The Institute of Electrical Engineers of Japan, Vol. J83-D-II, No.3 [3] K.Zhang, and M.Nagamatu, “Solving Satisfiability Problem by Parallel Execution of Neural Networks with Biases”, Proceedings of the 15th International Conference on Artificial Neural Networks, pp.969-974 - 60 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 A CMOS ELEMENT CIRCUIT FOR PHASE-LOCKED LOOP NEURAL NETWORKS WITH PULSE-MODULATION TECHNIQUE Daisuke Atuti, Kazuki Nakada, Takashi Morie Department of Brain Science and Engineering, Kyushu Institute of Technology E-mail: [email protected] 1 3 CIRCUIT IMPLEMENTATION INTRODUCTION Nonlinear phenomena in ensembles of oscillatory elements in physical, chemical, and biological systems can be modeled as coupled phase oscillators with periodic phase variables and periodic nonlinear interaction [1]. For instance, the oscillatory neural network is a class of coupled phase oscillators, in which each oscillator is regarded as a spiking neuron in a periodic firing state. The dynamical and statistical properties of the oscillatory neural networks have been investigated in detail as a model of associative memory [2, 3]. In the previous work, we have proposed a CMOS pulse modulation circuit technique for coupled phase oscillators with arbitrary nonlinear interactions, in which states and signals are represented by pulse-width modulation (PWM) and pulse-phase modulation (PPM) schemes [4]. In the pulse-modulation circuit technique, analog signal processing can be achieved by binary pulses in the time domain, and therefore we can construct an analog dynamical system compatible with the conventional digital systems. In this study, we propose a phase oscillator circuit [5] using the pulse-modulation approach for implementing PLL neural networks proposed by Hoppensteadt and Izhikevich [6]. We designed an element circuit and confirmed fundamental operation of two coupled element circuits through SPICE simulations. 2 MODEL DESCRIPTION The PLL neural network is an oscillatory neural network that can generate arbitrary phase-locked patterns of output waveforms equivalent to embedded memories. In specific case, the dynamics of the PLL neural network is written as follows: X φ˙i = ω − sin(φi ) sij cos(φj ) (1) By using the pulse-modulation approach, we designed an element circuit for PLL neural networks. The block diagram of the element circuit is shown in Fig. 1. The circuit timing diagram is shown in Fig. 2. In our circuit implementation, a continuous-time dynamics is discretized by using the Euler discretization method. Phase variables φi,j are represented by PWM signals. A PWM signal is transformed into a PPM signal by using a PWM-PPM converter (PPC). A PPM signal switches a pair of current sources, which perform a positive or negative update according to the nonlinear waveform Vnon p or Vnon n. After an update operation φi,j are stored in capacitor Cφ as terminal voltage Vφ for each element circuit. Vφ is transformed into a PWM signal again by using a voltage-pulse converter (VPC). By repeating this update operation, the discretized dynamics is achieved. 4 SIMULATION RESULTS We simulated two coupled element circuits with HSPICE using the TSMC 0.25 µm technology. We set the coupling strength sij = 1 and -1, and changed the angular frequency ω from 10 to 2.5 kHz. The waveforms of voltages Vφi and Vφj are shown in Fig. 3. The voltages were synchronized with in- and anti-phase depending on the coupling strength sij . 5 CONCLUSION We designed an element circuit for implementing PLL neural networks using the pulse-modulation approach. We verified the fundamental operation of two coupled element circuits with HSPICE. The results demonstrated that the circuits synchronized with in- and anti-phase depending on coupling strengths. j where φi denotes the phase variable of the i-th oscillator, ω the angular frequency of the oscillator, and sij the coupling strength betwteen the PLLs. In order to realize multiplication terms for circuit implementation, we rewrite (1) as follows: φ˙i = ω + X sij j 2 sin(φj − φi ) − X sin(φj + φi ) (2) j - 61 - REFERENCES [1] Y. Kuramoto, “Chemical oscillation, waves, and turbulence,” Springer, Berlin, 1984. [2] T. Aoyagi and K. Kitano, “Retrieval dynamics in oscillator neural networks,” Neural Computation, vol. 10, pp. 1527–1546, 1998. [3] T. Aonishi, K. Kurata, and M. Okada, “A statistical mechanics of an oscillator associative memory Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Figure 1: Circuit configuration of two-coupled element circuits for PLL neural networks. Figure 2: Timing diagram of the two coupled element circuits. with scattered natural frequencies,” Phys. Rev. Lett., vol. 82, pp. 2800–2803, 1999. [4] D. Atuti, T. Morie, T. Nakano, and K. Nakada, “An element circuit for nonlinear dynamical systems using phase variables,” Abst. of The 3rd Int. Conf. on Brain-Inspired Information Technology (BrainIT2006), p. 66, Sept. 2006. [5] D. Atuti, K. Nakada, and T. Morie, “CMOS pulsemodulation circuit implementation of phase-locked - 62 - Figure 3: HSPICE circuit simulation results. loop neural networks,” IEEE Int. Symp. on Circuits and Systems (ISCAS2008), pp. 2174–2177, May 2008. [6] F. C. Hoppensteadt and E. M. Izhikevich, “Pattern recognition via synchronization in phase-locked loop neural networks,” IEEE Trans. Neural Networks, no. 11, pp. 734–738, 2000. Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Underwater Sound Source Localization System using Particle Filter Yu Eriguchi, Kazuo Ishii Department of Brain Science and Engineering, LSSE, Kyushu Institute of Technology [email protected], [email protected] 2) Particle Filter Particle Filter (PF) [2]-[5] is introduced in USLS to predict the direction from the series of measured data. The system model and the observation model of PF are set as follows, (3) xt = xt −1 + u t −1 + ν t , ν t ~ N ( μ s , σ s 2 ) 1. INTRODUCTION As the tools for ocean survey and natural energy resources search, underwater robots are expected to operate these missions because underwater operations are too danger for humans. One of the key issue for underwater robots is the self-localization problem in the ocean. Self-localization systems are required to navigate underwater robots. There are various methods of selflocalization such as Dead Reckoning, GPS, and sound source localization system. However, Dead Reckoning is easy to accumulate errors, and GPS is unusable underwater because of radio attenuations. Though “Underwater Sound source Localization System” (USLS) are often used to communicate in the ocean due to propagate well, USLS have problems caused by multipath. Accordingly, robust USLS are hoped for underwater robots. In this paper, we describe a basic method and design of USLS, and results of experiments and simulations using USLS introduced Particle Filter to predict locations from the series of observed data. y t = h ( xt ) + ε t , ε t ~ N ( μ o , σ o 2 ) where xt is a sound source direction at time t, yt is observed value. In Eq.(3), ut-1 means the rotation angle of the underwater robot, vt is the process noise which supposes the normal random numbers with the variance σs2. In Eq.(4), observation noise εt supposes the normal distribution with the variance σo2, and the observation function h(xt) are decided by results of the experimental observation. Weights of particles are performed by P(εt). Each particle corresponds to a target direction with estimated direction ~xt (i ) and its weight wt(i) given by Eq.(5). wt (i ) = f (ε t (i ) ) = 2. ALGORITHMS AND METHODS OF USLS 1) Calculation of Sound Source Direction We approach to obtain the sound source direction with a pair of hydrophones which are obtained using Super Short Base Line (SSBL) method [1] as shown in Fig.1. The direction θ is given by (1). f is the frequency of the sound wave, c is the velocity of the acoustic waves underwater, d is the baseline length which is less than between two hydrophones, and δt =δφ / 2πf is the timelag. c ⎛ c⋅δ t ⎞ (1) ) θ = cos −1 ⎜ ⎟ , d ≤ λ / 2 (= 2 ⋅f d ⎝ ⎠ The Cross Correlation Function (CCF) is used for the time-lag estimation according to Eq.(2). The delay time τ is obtained so that R(k) becomes maximum value. V1 and V2 are the acoustic voltage of each hydrophone. R (k ) = 1 N 1 2 ⎛ − (ε t (i ) − μ o ) 2 ⎞ ⎟ exp⎜ ⎜ ⎟ 2π σ o 2σ o 2 ⎝ ⎠ 1 3. SYSTEM ARCHITECTURE OF USLS System architecture of USLS is illustrated in Fig.2. Each hydrophone (Reson TC4013) is connected to a small pressure tight hull which includes amplifier, filter, and batteries. The hulls are connected to a large pressure hull which includes signal processing circuit board using the dsPIC. The signals which traveled through the circuits are acquisitioned by fast 10 bit A/D converter and detected by a threshold value. The target direction is transmitted to PC via RS232 serial communication. (2) i =0 τ = arg max R(k ) (5) The equation shows probability density function of normal distribution, which follows μo, σo2. PF is calculated recursively by method of Sampling Importance Resampling (SIR) [2][3]. N −1 ∑V (i)V (i + k ) (4) (2’) k Fig.2 System architecture of USLS Fig. 1 Calculation of the sound source direction - 63 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 4.EXPERIMENTS AND SIMULATIONS 1) Observations of Sound Source Direction The developed hydrophone modules and a pinger were set as shown in Fig.3. The experimental results of USLS on each observed point is shown in Fig.4. In Fig.4, observed directions of a pinger are plotted along the ordinate, numbers of observations are plotted along the abscissa. Each observed direction at Point2-4 was coincident with each theory direction π/3, π/2, 2π/3[rad]. On the other hand, at Point1,5, the observed directions are different from theory directions. The errors are attributed to multipath from water surface, wall surface and bottom face. Table 1 Parameters for the PF simulations Particle Number Trial System Model Observation Model 5. CONCLUSIONS In this paper, we discussed development of the basic Underwater Sound source Localization System designed for mounting in underwater robots. According to the experimental and the simulation result, detecting ability of the developed modules is within the region of ± π/3 [rad] from forward direction, and the robust estimation systems for noise are technically feasible. 2)Estimating Simulation using Particle Filter Figure 5 and Figure 6 show simulated results of USLS using PF. Underwater sound source directions were simulated by parameters as shown in Table 1, and set as shown in Fig.3. Where, average values of likelihood were defined by Eq. (6). Eq.(6) was derived from histogram of direction errors between observed direction and theory direction in Fig.4. Every simulation, starting particles {~x0 (i ) }iN=1 were distributed between REFERENCE [1] The Marine Acoustics Society of Japan, "Fundamentals and Applications of Marine Acoustics", 2004 [2] R.Karlsson, F.Gustafsson, ”Particle Filter for Underwater Terrain Navigation”, Statistical Signal Processing workshop 2003 [3] N.Gordon, D.Salmond,A.Smith, ”Novel approach to nonlinear/non-Gaussian Bayesian state estimation”, IEEE, PROCEEDINGS-F, Vol.140, No.2, APRIL 1993 [4] N. Ikoma,”Tutorial: Particle Filters”, PF Study Group and Allied HIF Council, Lecture Document, KIT, Jan.26, 2007, Japan [5] T.Maki, H.Kondo, T.Ura,T.Sakamaki, “Observation of Breakwater Caissons by the AUV ”Tri-Dog 1”Dives by Autonomous Navigation based on Particle Filter-“, Proc. of the 2005 JSME Conference on Robotics and Mechatronics, Kobe, Japan, June 9-11 0[rad] and π[rad] by uniform distribution. In the Figs, directions of a pinger are plotted along the ordinate, number of observation is plotted along the abscissa. In Fig.5 and Fig.6, the pinger was fixed on each direction point, and particle data were appeared only in Fig.5. As a result, the estimated values converged on theory values using PF even there were noises such as the multipath. ⎛ μ11 ⎜ ⎜ μ 21 ⎜μ ⎜ 31 ⎜ μ 41 ⎜ ⎝ μ 51 μ12 μ 22 μ32 μ 42 μ52 0.622 ⎞ μ13 ⎞ ⎛ 0 2.157 ⎟ ⎜ ⎟ − ⎟ μ 23 ⎟ ⎜ 0 0.456 − − ⎟ μ 33 ⎟ = ⎜ 0 ⎟ ⎜ ⎟ − − ⎟ μ 43 ⎟ ⎜ 0 ⎟ ⎜ ⎟ μ 53 ⎠ ⎝ 0 − 2.157 − 0.622 ⎠ N =50 T =10 or 16 u t-1 =0.0 σ s 2 =0.1 μ ij ∼Eq.(6) σ o 2 =0.1 (6) Simulation of sound source localization using PF / Point1= π/6[rad] π Particles Direction[rad] 5π/6 Obs Estimated 2π/3 ABS(Errors) π/2 π/3 π/6 0 0 Fig.3 Experimental Arrangements of a pinger and USLS 2 4 6 8 Number of the observation 10 12 Fig.5 Simulated result of USLS at Point1 Sound Source Observation / 5 Points Simulation of sound source localization using PF / Point5= 5π/6[rad] π Point1=π/6[rad] Point2=π/3[rad] Point3=π/2[rad] Point4=2π/3[rad] Point5=5π/6[rad] 2π/3 π/2 5π/6 Direction[rad] Direction[rad] 5π/6 π/3 2π/3 π/2 Obs π/3 Estimated ABS(Errors) π/6 π/6 0 0 0 10 20 30 40 Number of the observation 50 0 60 2 4 6 8 Number of the observation 10 Fig.6 Simulated result of USLS at Point5 Fig.4 Sound source observation - 64 - 12 Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 SURFACE ROUGHNESS MEASUREMENT USING SHADOWS PRODUCED BY CIRCULATING LIGHT SOURCES Tomokazu HIRATSUKA∗ , Keiichi HORIO† and Takeshi YAMAKAWA‡ Department of Brain Science and Engineering, Kyushu Institute of Technology E-mail: ∗ [email protected], † [email protected], ‡ [email protected] I. I Various oral mucosal diseases, including carcinoma (cancer), often appear inside the mouth. Squamous cell carcinoma which account for 2% of the cancer population neighbors various maxillofacial organs. Thus, the functional and aesthetic significant disabilities often occur at these organs. One of the features of squamous cell carcinoma is that the mucosa surface has concavo-convex shape as shown in Fig.1. Therefore, affected parts are diagnosed by a measurement of the mucosa surface roughness for determining whether the affected part is squamous cell carcinoma or not. In this study, we present a new imaging device with circulating light sources (CLS) for surface roughness measurement. The features of surface roughness are extracted by a Wavelet Multi-Resolution Analysis (MRA) from the images produced by the proposed device, and are categorized by a SelfOrganizing Map (SOM). The roughness of unknown surfaces can be estimated by the SOM after categorizing. Fig. 1. Squamous cell carcinoma case. 3. Deciding a pixel value of the foreign patterns image (Fi, j ) is as follows: 0 S i, j = 0 (background) 1 S = 8 ( f oreign pattern) (2) Fi, j = i, j 0 others (shadow) II. C L S The structure of the circulating light sources (CLS) is shown in Fig.2. Multiple light sources which aligned in a circle illuminate on object surface sequentially, and produce shadow images along concavo-convex shapes as shown in Fig.3. The shadow images produced by CLS contain concavo-convex information. If the shadow area is large, the degree of surface roughness is large. On the other hand, if the shadow area is small, the degree of surface roughness is small. The features of surface roughness are extracted from the shadow images including concavo-convex information. III. F P E E Blood vessel patterns also would sometimes appear in the oral images produced by CLS. The blood vessel patterns (foreign patterns) will affect the surface roughness measurement based on the shadow images (original texture) produced by CLS. Thus, the foreign patterns should be eliminated from the original texture. Figure 4 shows the framework of foreign patterns extraction. And procedure of extracting the foreign patterns area is as follows: Finally, the brightness of the foreign pattern area is corrected, and eliminated. Figure 5 shows an example of the foreign patterns extraction. IV. F E D C The Wavelet Multi-Resolution Analysis (MRA) [1] resolves original signal to a high-frequency component and a lowfrequency component. Then the resolved low-frequency component is also resolved into the high-frequency component and the low-frequency component by the MRA. The features of surface roughness are extracted by the MRA as wavelet coefficients. Figure 6 shows an example of the MRA. The Self-Organizing Map (SOM) [2] is biologically motivated by the ability of the brain to map outside world onto the cortex, where nearby stimuli are coded on nearby cortical areas. The SOM algorithm is a simplified model of such a mapping process. The structure of the SOM is shown in Fig.7. The wavelet coefficients of various surface roughness extracted by the MRA are categorized by the SOM. The roughness of unknown surfaces can be estimated by the SOM after categorizing. A 1. Binarizing of eight shadow images (‘1’: black, ‘0’: white). 2. Focusing on a pixel (Pni, j ), and summing pixel value of eight images (S i, j ). Where i and j are pixel location. S i, j = 8 X This work was partially supported by a 21st COE program granted to Kyushu Institute of Technology by MEXT of Japan. R Pni, j (1) [1] Charles K. Chui, An Introduction to Wavelets, Academic Press, 1992. [2] T. Kohonen, Self-Organizing Maps, Springer-Verlag, 1995. n=1 - 65 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Fig. 5. An example of the foreign patterns extraction. (a) the shadow image with foreign pattern, (b) extracted foreign pattern area, (c) eliminated result. Fig. 2. The structure of the Circulating Light Sources (CLS). (h) (a) (g) L8 (b) L1 L7 L3 L6 (f) (c) L2 (e) L5 L4 (d) Fig. 6. An example of the Multi-Resolution Analysis (MRA). Fig. 3. An example of the shadow images produced by CLS. This object is a sandpaper No.P16. The image (a)∼(h) are illuminated by L1∼L8, respectively. xi x1 ... xn Input Vector ... Input Layer Competitive Layer Weight Vector wj Fig. 7. The structure of the Self-Organizing Map (SOM). Fig. 4. The framework of the foreign patterns extraction. - 66 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 ENVIRONMENT RECOGNITION SYSTEM BASED ON MULTIPLE CLASSIFICATION ANALYSES FOR MOBILE ROBOT Atushi Kanda, Masanori Sato, and Kazuo Ishii Department of Brain Science and Engineering, Kyushu Institute of Technology, Japan [email protected] 1. INTRODUCTION In the previous research, we have developed a 6wheeled mobile robot “Zaurus” employing a passive linkage mechanism to evaluate the maneuverability on rough terrain environment. Figure 1 and 2 show the overview and degrees of freedom of Zaurus, respectively. A neural network controller for velocity control is introduced into the robot and the performances of controllers for rough terrain using a neural network showed better performance than a well-tuned PID controller [1]. In order to improve the maneuverability, the function to switch controller suitable to current environment is also inevitable [2]. In this paper, we proposed an environment recognition system which select on the adjusted controller in each environment. The input data to the system are the linkage angles and fed to SelfOrganizing Map (SOM) which is the basic information processing engine of the environment recognition system and compared with other clustering methods; PCA and k-means. 2. ENVIRONMENT RECOGNITION SYSTEM The environment recognition system needs the function to classify environment using the linkage angle data and posture of robot to estimate the current terrain. Figure 3 shows the concept of the environment recognition system. The system recognizes various environments using data of passive link joints. The angles of the passive linkage mechanism and the attitude of robots can express the current environment that is; the robot itself can be an environment recognition sensor. In this paper, we compared classification performance with three multiple classification analyses; Principle Component Analysis (PCA) which is basic multiple classification analysis, k-means method which is basic nonhierarchical cluster analysis and third one is SOM [3]. 1) Basic Environment Data To realize better recognition capability, previous sets of state variables and also included for classification. Therefore, the number of the input data to the system becomes high dimension. We defined our environment recognition as “Measuring environmental data of height using joint and attitude angle data making basic environment data by clustering, and identifying the travelling environment”,. Experiments to make basic data set, several time series of climbing over and down of bumps, front angle (θf), side link angle (θs) and attitude angle of pitch (θp). We decide four one-step height bumps, 0.0[m], 0.06[m], 0.12[m] and 0.18[m] as basic environment, and a PID controller is employed for data sampling. - 67 - 2) Principle component analysis: PCA The principle component Analysis (PCA) is one of the feature extraction algorithms for less deficit data. The advantages of PCA are the possibility of contraction of data and the feature. The extracted data are able to be evaluated independently. This method is used to detect characteristic point statistically. Figure 4 show the results of analysis using PCA, Horizontal axis shows the first principle component. In Fig.4 the climbing up and down data are spread in wide variety area of graph. Therefore, the PCA using principle component 1 and 2 is not suitable in this case to use as the environment recognition system. 3) k-means method Regarding to the multiple classification analysis, k- Fig.1 Overview of Zaurus Fig.2 DOF of Zaurus Fig.3 The concept of proposed system (a) 0.18[m] (b)All height Fig. 4 The result of PCA (a)cluster number is 9 (b)cluster number is 27 Fig. 5 The result of k-means Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 means method is one of the most popular nonhierarchical analysis. The number of cluster, k, is assumed, and each input data is classified into the similarity cluster. Figure 5 shows the result of analysis using k-means. The number of cluster is 9 and 27. Horizontal axis shows the sampling step and vertical axis show the cluster number. The cluster number 9 is assumed that robot has nine situations. The cluster number 27 is assumed that nine situations and each height of bumps. The data classify as the attitude of robot climbing up and down. However, the height of bumps is not clustered because the cluster number 9 and cluster number 27 are almost same result. 4) Self-Organizing Map Self-Organizing Map proposed by T.Kohonen is one of the well-known methods for classifying data with preserving topological feature map. The SOM is trained using unsupervised learning to produce low dimensional of training data. Figure 6 shows the obtained feature map and color of each unit means distance vector between the reference neighborhood units. The white color means far, and the black color means that we can see that similar units locate closely. As shown in Fig.6 (a), the map is classified based on the attitude of robot. The movement of climbing up and down is arranged in opposite side. The feature map by SOM includes the dynamics of the robot. In Fig.6 (b), we evaluate interpolation function using unlearned experimental result (0.10[m] height). The trajectory of 0.10 is similar to those of 0.06[m] and 0.12[m], and the unlearned data is assigned in the middle of those lines. The areas where the trajectories overlap are the conditions of where the differences are not clear. As the result, SOM showed the best performance in the points of bump height and posture classification, so that SOM is introduced as the basic processing system. 3. SYSTEM EVALUATION The developed environment recognition system is introduced into Zaurus to evaluate classification ability online. Figure 7 shows the velocity of the robot and which environment the system categorized the current condition. The conditions are classified into 3 states, climb up and down and flat. The velocity of the robot is controlled properly and the environment recognition works online. The number of input data to the system becomes high dimension, so that we combined PCA and SOM for system miniaturization. We detected number of dimension from contribution ratio of basic environment data and compressed from 24 to 5 dimensions. As shown in Fig.8, the PCA-SOM map is classified based on the attitude of robot. For this reason, the map of PCA-SOM is expected to same performance with the map using original basic environment data. 4. CONCLUSION In this research, we proposed the environment recognition system by classifying the angle data. The basic environment map can classify attitude and - 68 - height of bumps. The system is able to recognize the simple traveled terrain while traveling. The environment recognition system using compressed basic environment data shows same performance with the proposed system used original high dimensional environment recognition data. (a)attitudes of robot (b)interpolation function(0.10[m]) Fig. 6 The result of SOM Fig.7 The experimental result of environment recognition system (a)SOM (b) PCA-SOM Fig.8 The result of SOM: Classification about attitude of robot ACKNOWLEDGEMENT This work was partly supported by a grant of Knowledge Cluster Initiative implemented by Min istry of Education, Culture, Sports, Science and T echnology (MEXT) REFERENCE [1] Masanori SATO, Kazuo ISHII, “A neural network based controller for a wheel type mobile robot”, BrainInspired IT 3, pp. 261-264., 2006 [2] Masanori Sato, Atushi Kanda, Kazuo Ishii, “A Switching Controller System for a Wheeled Mobile Robot”, Journal of Bionic Engineering, Vol.4, No.4, pp.281-289, 2007 [3] T. Kohonen, “Self-Organizing Maps”, SpringerVerlag(2001) Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Emotional Expression Model Inspired by the Amygdala Satoshi Sonoh and Takeshi Yamakawa Department of Brain Science and Engineering, Kyushu Institute of Technology Email: [email protected], yamakawa@}.brain.kyutech.ac.jp 1 Introduction of the amygdala[2]. EMA consists of an architecture which is illustrated in Figure 1, and can realize two functions: one is a recognition of the sensory stimuli, the other is a classical conditioning between the sensory stimuli and emotional stimuli. The recognition of the sensory stimulus can be achieved by a function of the LA layer. In the LA layer, many functional units with reference vectors are arranged on two-dimensional array, and learn to get a feature of the sensory stimulus by using an extended Kohonen’s rule. The best matching unit (BMU) is selected when the sensory stimulus is presented, and the reference vector of the BMU is used as a recognized stimulus. In learning of EMA, an adaptation to environmental change can be achieved by using adjustment of a learning rate and a neighboring width based on an adaptation degree of the BMU. For example, Figure 2 shows a obtained feature map for the color sensory stimuli. Neighboring units have similar reference vectors (color), thus a robustness to noises of the sensory stimuli is enhanced. A relationship between the recognized sensory stimulus and presented emotional stimulus is acquired by a classical conditioning model. The emotional stimulus unconditionally induces the emotional response, and the sensory stimulus have no emotional value naturally. The emotional value is output of EMA, and represents a strength of the emotional response. The LA units have emotional weights, and the emotional response is calculated by product-sum of BMU and neighboring units. Then, the emotional weights are updated based on delta rule (diﬀerence between the emotional stimulus and the output emotional value). Figure 3 shows a learning curve of the emotional value on a computational simulation. This simulation emulates the acquisition and extinction in a part of the classical conditioning. Although fear emotion may be processed mainly in the amygdala, many other emotions such as pleasure and surprise emotions can too be handled in EMA. This extension is more eﬀective to generate various emotions and behavior on robot applications. Animals generate a wide variety of emotions based on interactions between them and their environments, and can adapt themselves to new environments. It is well known that emotions are controlled by the amygdala that is a part of the cerebral limbic system[1]. In the amydala, the sensory stimulus is evaluated according to its an emotional value; how strong does the stimulus aﬀect pleasure or unpleasure? And then, the emotional response is transmitted to the corresponding systems from the amygdala. The emotional response relates very closely to learning and memory system of animals. This means that the emotions have a signiﬁcant role to play on animal’s behavior. The aim of this research is the development of an emotional expression algorithm from the point of view of brain-inspired technology. We proposed a novel neural network model based on neuroscience ﬁndings about the architecture and the function of the amygdala. Along with a philosophy of Kyutech COE program; ”World of Brain Computing Interwoven out of Animals and Robots”, in this research, we have proposed the emotional expression model of the amygdala and developed a speciﬁc digital hardware for the proposed model as well. Furthermore, an implementation of the proposed model was successfully archived in order to make an autonomous robot generate the emotions. 2 Neural Network Model of the Amygdala t) ) V 1 (t ),. .., V k (t ),. .., V K( 3 V( t)= ( x( t) =( x1 (t) ,. E( .., xm ( t), . .., t)= (E 1 (t ),. xM ( t) ) .., E k (t ),. .., E K( t) ) Emotional expression model of the amygdala (EMA), in this research, is proposed based on the neural circuit Sensory Input Layer CE Layer wi(t) Implementation in Robot uk(t) In this research, we implemented the proposed model EMA to an autonomous robot. The autonomous robot “WITH” is developed by Ishii laboratory as an experimental platforms of brain-like information LA Layer Figure 1: The architecture of EMA. - 69 - 1 Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 BMUB BMUR BMUG (a) Figure 4: The emotional interaction between human and “WITH”. (a)During conditioning. (b)After conditioning. Figure 2: A feature map of the sensory stimulus. 4 1 V1(t) (b) Experimental Results 0.5 An experiment of an emotional interaction was carried out by using the developed robot. Actions from hu1 man to “WITH” are presenting color marker as the 0.5 sensory stimulus, and hitting and gentle stroking as 0 the emotional stimulus. “WITH” detects those stim100 200 300 400 500 600 700 800 900 1000 (step) uli, and induces the emotional response based on the emotional value that EMA outputs. The emotional interaction is shown in Figure 4. In Figure 3: A learning curve in acquisition and extinc- the beginning, “WITH” did not induce any emotional tion. responses against the presented red marker. However, as a result of the classical conditioning, “WITH” gradprocessing[3]. In order to establish an emotional in- ually induced the emotional responses fear against the teraction between human and the robot, we extended presented red marker. In the end, if the emotional stimulus is changed from hitting to gentle stroking, an“WITH” as follows: other emotional response, pleasure is induced. 1. Developing a real-time hardware of EMA V2(t) 0 5 2. Addition of a touch sensor array which the emotional stimulus can detect Conclusion In this research, we propose a neural network model, 3. Addition of ear- and tail-like devices which uni- Emotional expression Model of the Amygdala (EMA), from brain information engineered point of view. The versally express the emotions emotion expression in the EMA was achieved by a The EMA hardware (EMAHW) employing a mas- cooperation between recognition of sensory stimulus sively parallel architecture is developed by using FPGA and a classical conditioning. In addition, we imple(Xilinx Spartan-3E, sc3s1600E). Both speed-up of cal- mented EMA as the emotion expression system to an culations and small-scale integration can be achieved autonomous robot “WITH”. As the result, we successby simplifying the calculation algorithms and replacing fully made the robot to generate principal emotions all multiplier with bit shifters. The proposed EMAHW from the interaction between the human and the robot. can process calculations of 81 units at clock frequency of up to 35MHz. The calculation speed is twenty-times faster than one of a portable PC (Intel 1.1GHz, 512M References memory). [1] J. P. Aggleton, The Amygdala: A Functional The touch sensory array on head of “WITH” is able Analysis, Oxford University Press, 2000. to detect the emotional stimulus such as hitting and gentle stroking from human. In addition, we added [2] S. Sonoh, et. al., “Emotional behavior and expresthe image sensor for detection of the sensory stimulus sion based on a neural network model of amyg(color-marker). dala,” Abstract of the 4th International ConferThe ear- and tail-like devices are emotional expresence on Brain-Inspired Information Technology sion devices modeling behavior of dogs (one of the most (BrainIT 2007), p.36, 2007. famous pets). If the emotions are represented by movements of ear and tail, a non-verbal interaction will be- [3] Y. Takemura, et. al., “Toward Realization of Swarm Intelligence Mobile Robots,” Lecture come available. The non-verbal interaction is indepenNotes in Computer Science, Vol.4232, pp.273-276 dent of language, nationality, and age, that means very 2006. universal. The universal communication is considered an eﬀective method that reduces techno-stress. - 70 - 2 Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Dynamics of Link Mechanisms Employing COG Jacobian matrices *Takashi Sonoda1, Kazuo Ishii1 and Daigoro Isobe2 Department of Brain Science and Engineering, Kyushu Institute of Technology 2 Department of Engineering Mechanics and Energy, University of Tukuba E-mail: [email protected], [email protected] and [email protected] 1 1. INTRODUCTION Analyzing dynamics of robots is important in controls, developments and designs. However, it causes increase of calculation cost and complication of expressions, when analysis targets become complex. Therefore, unified calculation method that doesn’t regard target’s structures is desired, and the parallel computation method by matrix forms is proposed [1]. The proposed method expresses the equation of motion as a simple form and employs the Jacobian matrix (COG Jacobian matrix) regarding the center of gravity of links and the active joints. In this paper, we derive the equation of motion employing COG Jacobian matrix from principle of virtual work. Moreover, we show the simulations of inverse dynamics using the proposed method. Fi-1 i-1 ∂q . ∂θ m linki Gi linki-1 Gi-1 k k-1 Ground Fig. 1 (3) : Passive joint : Active joint Model of closed-link mechanism q&& = Gθ&&m + G& θ&m . (4) In brief, COG Jacobian matrices are matrices which express relation between the velocities of the COGs and the velocities of the active joints as described to eq. (2). 2.2 Inverse dynamics employing COG Jacobian. Relationship between torque vector τ m ∈ R M applied to active joints and inertia forces originated by the acceleration vector q&& obtained in the above section can be expressed as following equations by using principle of virtual work. (1) (2) k Eq. (3) expresses the COG Jacobian matrix of the link mechanism. Moreover, differentiating eq. (2) with respect to time, we derive the relationship of the accelerations of the COGs as ∂r T F = 0 , (5) ∂r = [∂q T , ∂hT , ∂θ mT ]T (6) F = [−( Mq&& + b)T ,− f T , τ mT ]T . (7) where and where G ≡ G (θ m ) = i+1 . Then, differentiating eq. (1) with respect to time, we obtain the velocities of the COGs as q& = Gθ&m , Fi . p.Gi k+1 2. INVERSE DYNAMICS 2.1 Deriving COG Jacobian matrices Fig. 1 shows relationship between the forces applyed around the COGs of each link and the torques needed at the active joints, in the motion of a closed link mechanism. In this paper, we don’t discuss about non-holonomic mechanisms, because it’s easy. First, the number of links and the number of active joints are defined as N and M. Then, the coordinates of the COGs of the link i (i=1,…,N) are expressed as T T T qi = [ pGi , θGi ] , and the all coordinates are grouped as T q = [q1 ,L, q TN ]T . The coordinates θm ∈ R M of all active joints decide the motion of the mechanism uniquely. So, the coordinates q of the COGs are expressed by using the function g (θ ) regarding to θm as the following equation. q = g (θm ) . . pGi-1 . i ∂q , ∂h and ∂θm are virtual displacement vectors of the COGs of links, the arbitrary points and the active joints. b is external force vector which doesn’t proceed - 71 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 from the active joints such as gravity. f is the force vector applied to the arbitrary point coordinates h. M is the mass matrix of each link. Expanding with respect to eq. (5), we obtain as 2 link1 q1 F1 1 − ∂q T ( Mq&& + b) − ∂hT f + ∂θ mT τ m = 0 . F2 2 1 1 2 3 link2 3 3 F3 q3 q2 link1 4 1 1 (8) Then solve for τ m . So, T 1 1 : Passive joint (a) Model of a mechanism without redundant actuators. ⎛ ∂h ⎛ ∂q ⎞ ⎟⎟ ( Mq&& + b) + ⎜⎜ τ m = ⎜⎜ ⎝ ∂θm ⎝ ∂θm ⎠ q1 F1 3 link2 3 link3 4 F2 2 q2 3 3 3 link3 F3 4 q3 4 : Active joint (b) Model of a mechanism with a redundant actuator. Fig. 2 Two types of link mechanisms for the dynamic analysis examples employing the COG Jacobian matrices. T ⎞ ⎟⎟ f ⎠ obtained by following equation. ≡ G T ( Mq&& + b) + J T f . (9) J is a Jacobian matrix regarding the external forces which are applied to arbitrary point except the COGs. For example, applying it to a robot arm, we can obtain the torques balancing with the external forces when the coordinate vector h is defined to the end-effector of the arm. Eq. (9) means that inverse dynamics of link mechanisms are able to be solved by using COG Jacobian matrices. However, the virtual displacements have to satisfy those constraints when the mechanisms have some constraints of displacements and motions. Next, substituting eq. (4) for eq. (9), we obtain the following equation. τ m = G T MGθ&&m + G T MG& θ&m + G T b + J T f , = H (θ )θ&& + B (θ , θ& ) m m m (10) m where eq. (10) is general form expressing equation of motion, H (θ m ) is an inertia matrix and B (θ m , θ& m ) is a term grouping together gravity, Coriolis and centrifugal forces. ⎡ G1 (θ 1 , θ 3 ) ⎤ ⎡τ 1 ⎤ ⎢ ⎥ ⎢τ ⎥ = ⎢G 2 (θ 1 , θ 3 )⎥ ⎣ 3 ⎦ ⎢G (θ , θ ) ⎥ ⎣ 3 1 3 ⎦ ⎡ F1 ⎤ ⎢F ⎥ ⎢ 2⎥ ⎢⎣ F3 ⎥⎦ (12) 4. CONCULUTIONS We showed inverse dynamics calculations employing COG Jacobian matrix expressing relationship between the velocities of the COGs and the ones of the active joints. Moreover, deriving the COG Jacobian matrix from principle of virtual work, we showed that equations of motion are expressed as matrix forms by using it. [1] D. Isobe: “A Unified Numerical Scheme for Calculating Inverse Dynamics of Open/Closed Link Mechanisms,” Proceedings of the 27th Annual Conference of the IEEE Industrial Electronics Society IECON'01, pp.341-344, 2001. SIMULATION EXAMPLES In this section, we show a inverse dynamics simulation employing COG Jacobian matrices. Fig. 2 is shown two models of simple link-mechanisms that have four joints. The model of (a) in Fig. 2 doesn’t have redundant actuators. The model of (b) in Fig. 2 has a redundant actuator. Both models are same sizes, masses and materials. In case of (a), the active joint of the model is θ m ≡ θ3 , and using COG Jacobian matrix, active joint’s torque is obtained by following equation. T ⎡ F1 ⎤ ⎢F ⎥ ⎢ 2⎥ ⎢⎣ F3 ⎥⎦ But each active joint displacement θi (i = 1,L,4) has to satisfy the constraints of closed-loop in both cases. Fig. 3 is shown the target motion applied to both cases, and then Fig. 4 is shown those results of inverse dynamics simulation. 3 2 3. ⎡ G1 (θ 3 ) ⎤ [τ 3 ] = ⎢⎢G 2 (θ 3 )⎥⎥ ⎢⎣G 3 (θ 3 ) ⎥⎦ T 1 4 Fig. 3 Target motion Fig. 4 Torque curves (11) In case of (b), the active joints of the model are θm ≡ [θ1 , θ3 ]T , and active joint’s torques are similarly - 72 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 BIOMIMETIC MOTION CONTROL FOR AMPHIBIOUS MULTI LINK MOBILE ROBOT Takayuki Matsuo, Takeshi Yokoyama, Daishi Ueno and Kazuo Ishii Department Brain Science and Engineering, Kyushu Institute of Technology Email:{matsuo-takayuki, [email protected] [email protected] wheels are used on ground MLMR II consists of two kinds of cylinders; a cylinder for head (head module) and seven cylinders for joints (motor modules). A motor module has a motor to control joint angle and a circuit board. The circuit has a MPU, a potentiometer to measure the joint angle, a RS485 transceiver for communication and a current sensor to measure joint torque. The block diagram of the circuit is shown in Fig.2. The head module is the interface device between the robot and the host PC, and transfers the target behavior to other modules. 1. INTRODUCTION Robots and robotics technologies are expected as new tools for inspection and manipulation, especially in the extreme environments. Robots for such the extreme environments should be robust and strong enough against disturbance. As you can see in the animals, they adapt well to environmental changes in both short and long time periods. Here, we investigate the motion of snake and eels as they can move various fields, land and underwater, by wriggling bodies. Snake’s body is covered by special scales which have low friction in tangential direction and high friction in normal direction[1]. The feature enables to produce thrust from wriggle motion. Eel swims in underwater by generating impellent force using hydrodynamic force[2]. We can see that a certain phase difference along the body and the amplitude of wriggling increase along the bodyline. In nervous systems of animals, it has proven that motion patterns, such as walking, respiration, flap, etc, are controlled by rhythm generator mechanisms called the Central Pattern Generator (CPG). CPG consists of many neural oscillators, and rhythm patterns are generated on each neural oscillator by affecting each other. In this research, we introduce Matsuoka Model[3] as the neuron model of CPG. We also realized phase differences and bias by adjusting parameters and the periodical signals from CPG are utilized as the target joint angles of a robot. And, we developed a amphibious multi link mobile robot (MLMR II). A robot motion control system using CPG is investigated and applied to MLMR II. Table 1 Specification of MLMR II Length [m] 2.2 Weight [kg] 12.8 Number of joint 7 MPU PIC18F452 x8 Communication RS485 Sensors Current sensor Potentiometer Actuators DC motor x7 Fig.1 Overview of MLMR II 2. AMPHIBIOUS MULTI LINK MOBILE ROBOT The specification and overview of amphibious multi link mobile robot (MLMR II) are shown as Table 1 and Fig.1. MLMR II is able to move on both environments, ground and underwater. Therefore, Waterproof is very important point in development of robot. The robot is constructed by 8 cylinders which consist of joint that are able to rotate around yaw axis using DC motors, gearbox and control circuit. Each cylinder is put O-rings on shaft of joint and lids of cylinder for the waterproof of robot. In underwater, hydrodynamic forces caused by fins and body produce thrust forces and passive Fig.2 Block diagram of circuit 3. BIOMIMETIC MOTION CONTROL USING CPG The robot has 7 joints ; therefore, a set of neural oscillators is assigned to each joint. The CPG for MLMR II is shown in Fig.3. Matsuoka model is expressed in Eqs. (1)-(3), where u i is the membrane potential of the i-th neuron, v i is - 73 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 the variable that represents the degree of adaptation, u 0 is the external input with a constant rate, fi is the feedback signal from a sensory input, β , τ u , τ v are parameters that specify the time constant for the adaptation, wij is the weight between neurons, and n is the number of neurons. Table 2 shows parameters in forward motion and Fig.4 shows outputs of each neural oscillator. MLMR II can change heading and to rotates by changing the parameter u0 so as to shift neutral position. If parameters are set as Table 3, robot rotate and changing heading toward right. Figure 5 show output of CPG network in right turn motion. We analyzed motion of robot using motion capture system in underwater and on ground. Figure 6 (a) show motion capture data in forward motion on the ground and in the underwater. Figure 6 (b) show motion capture data in right turn motions. In forward motion, the robot is able to go forward on ground more than in underwater. In underwater, robot was able to move through a distance of about 2.0 meters, but on ground, about 5.5 meters. In right turn motion, the robot is able to go round on ground, but, is not able to go round in underwater. Turning radiuses are 1.6 meters on ground and 1.2 meters in underwater. τ v v&i = −vi + yi τuu&i − ui − βvi + ∑wij yi + u0 + fi (2) yi = max(0, ui ) (3) j=1 Neural Oscillator1 Neural Oscillator2 u0_e u0_e EN 1 w1 FN 1 wfe wef Neural Oscillator3 w1 EN 2 w2 w2 w1 FN 2 w2 EN 3 w1 FN 3 O1 w2 w1 EN7 w2 w1 FN7 wfe wef O2 w2 wfe wef u0_f u0_f 4. CONCLUTION Multi link mobile robot which has 7 joints and moves on ground and in underwater is developed. Distributed control system which is implemented biomimetic motion control system using CPG are developed. The system was applied to MLMR II, and motion control were carried out on the ground and in the underwater with adjusting CPG parameters. Additionally, the motions of robot are analyzed using motion capture system. u0_e w1 wfe wef Fig.6 Motion capture data of forward motion (a) and right turn motion (b) on ground and in underwater using CPG Neural Oscillator7 u0_e w2 u0_f Table 3 Parameters of CPG network in right turn motion τu 0.44 τv 0.54 β 1 wef 1.5 wfe 1.5 w1 0.3 w2 0 u0_e 1.02 u0_f 0.95 (1) n w1 Table 2 Parameters of CPG network in forward motion τu 0.44 τv 0.54 β 1 wef 1.5 wfe 1.5 w1 0.3 w2 0 u0_e 0.75 u0_f 0.75 w2 u0_f O3 O7 Fig.3 CPG Network for biomimetic motion control ACKNOWLEDGMENT This work was supported by a 21st Century Center of Excellence Program, ”World of Brain Computing Interwoven out of Animals and Robots (Pl: T. Yamakawa)” (center#J19) granted to Kyushu Institute of Technology by MEXT of Japan. Fig.4 Output of CPG network in forward motion REFERENCES [1] S. Hirose, “Bionic machine engineering”, Kougyo tyosakai, 1987 (in japanese) [2] A. Azuma, “The subject-book of Animal’s Motion”, Asakura, 1997 [3] K. Matsuoka, “Sustained oscillations generated by mutually inhibiting neurons with adaptation”, Biological Cybernetics 52, pp367-376, 1985. Fig.5 Output of CPG network in right turn motion - 74 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 A Central Pattern Generator Network Model with Independent Controllability Tatsuji TOKIWA, Takeshi YAMAKAWA, Department of Brain Science and Engineering, Kyusyu Institute of technology, Email: [email protected] , [email protected] 1. INTRODUCTION where the parameter ε is a small constant and is called a We have proposed a central pattern generator (CPG) nonlinearity coefficient. The value of xi is the output network model with independent controllability, in signal of CPGi (i= 1, 2,…, n). A and Bi determine the terms of the amplitude and period of each output signal, value of the amplitude and period of xi in the stable and phase difference between CPGs [1]. In this paper, state, respectively [1][3][4]. Especially, the period is we introduce the CPG network with independent inversely proportional to the parameter Bi, and the controllability for quadruped locomotion signal amplitude reaches 2A in the stable state. n denotes the generator. The proposed model is comprised of several required number of CPGs in the CPG network. In order CPGs and one Rhythm Generator (RG). Each CPG and to control the phase difference between CPGs, the the RG are described by the Van der Pol (VDP) phase of each CPG must be temporally shifted. The equation. The VDP equation has a limit cycle and a parameter Bi represented in the following equation is feature of independent controllability in terms of an utilized to control the phase. amplitude and period of the limit cycle. The amplitude k( and period of an output signal from each CPG and the ) =B+ (2) (3) RG are independently controlled by external signals, where B and k denote the natural frequency of the CPGs because the CPG and RG is designed keeping the and gain factor, respectively. The value of bi determines feature of the VDP equation. In order to control the the amount of phase shift of CPGi. After the value of phase shift between CPGs, the period of the output the phase difference (xi-Xi) becomes 0, the value of bi signal from each CPG is temporarily controlled through approaches 0. The value of k can control the time until connections which are only conjunction between the the stable state. The block diagram of the CPGi is RG and each CPG. shown in Figure 1. The period of the output signal xi is The proposed CPG network is applied to a quadruped controlled by the bi so that the phase difference between locomotion signal generator, and can generate typical the control and target signal becomes 0. quadruped locomotion signals that are walk, trot, bound 2.2. and gallop mode. 2. RHYTHM GENERATOR The Rhythm Generator (RG) is designed as the target CENTRAL PATTERN GENERATOR signal generator and also designed based on the VDP Locomotion signals such as walking, swimming and equation by using subscript R. flying are generated and controlled by a central nervous system so-called central pattern generator (CPG) [2]. 2.1. CPG MODEL The ith CPG model (CPGi) is written as follows, 2 =0 (1) Figure 1; CPG model - 75 - Proc. of the 8th POSTECH-KYUTECH Joint Workshop on Neuroinformatics, Kitakyushu, Japan, 2008 Figure 2; Rhythm Generator 2 (a)Transition of xi 0 (4) The target signal Xi is described as equation (5). The values of ci1 and ci2 take -1, 0, or 1 and are decided based on the gait transitions which we intend to control. Either ci1 or ci2 always takes 0. + (5) where, τ arranges the dimension of xR and dxR/dt, and (b)Transition of bi adjusts the amplitude of dxR/dt to always make the Figure 3; Output signals of each CPG. amplitudes of xR and dxR/dt constant even if the parameter A and B are changed. of each xi were controlled by target signal Xi. In this 2.3. simulation, since value of X i was not always CPG NETWORK The proposed CPG network can be composed of one corresponding to the value of xi, bi omitted 0.3 or less. RG and n CPGs based on a gait transition. The In figure 3 (b), a horizontal line describes that the value connections of the CPG network can be logically and of bi=0.3. uniquely decided based on the gait transition. The 4. composed method is reported in [1]. 3. CONCLUSION We introduced a Central Pattern Generator (CPG) SIMULATION RESULTS network with independent controllability for quadruped In order to verify the effectiveness of the proposed locomotion signal generator. The network could be method, the output signal of each CPG was investigated generated and controlled the output signals for walk with the 4th order Runge-Kutta method. In this paper, mode. we aim to obtain the typical quadruped gaits, especially REFERENCE walk mode. The parameter A and B are provided by [1] Tatsuji Tokiwa and Takeshi Yamakawa, “Central external signals. Pattern Generator network model with independent Figure 3 shows simulation results of walk mode. In controllability,” RISP International Workshop on this simulations, ε , k, initial values of xR, x1, x2, x3, x4, Nonlinear Circuits and Signal Processing dxR/dt, dx1/dt, dx2/dt, dx3/dt and dx4/dt were 0.2, 1, 0.1, (NCSP’08), pp.156-159, 2008. [2] F. Delcomyn,”Neural basis of rhythmic behavior in 0.1, -0.5, 0.3, 0.7, 0.1, 0.1, 0.3, 0.3 and 0.2, respectively. animals,” Science, Vol.210, pp.492-498, 1980. A and B is 0.5 and 1. The output value of CPGs and bn [3]Van der pol, On relazation oscillations, Phil. Mag., were measured until 1200 steps. In figure 1, it is No.2, pp.987-993, 1926. confirmed that the amplitude of the limit cycle becomes almost 2A and phase differences between CPGs are [4]Louis A. Pipes and Lawernce R. Harvill, controlled as the desired phase differences. In figure Mathematics for engineers and physicists, 3(b), it is found that bi becomes 0 after the phase McGrawHill Education, Third Edition, 1970. - 76 -

© Copyright 2016