Cross-media retrieval is an interesting research topic,which seeks to remove the barriers among different modalities.To enable cross-media retrieval,it is needed to find the correlation measures between heterogeneous low-level features and to judge the semantic similarity.This paper presents a novel approach to learn cross-media correlation between visual features and auditory features for image-audio retrieval.A semi-supervised correlation preserving mapping(SSCPM)method is described to construct the isomorphic SSCPM subspace where canonical correlations between the original visual and auditory features are further preserved.Subspace optimization algorithm is proposed to improve the local image cluster and audio cluster quality in an interactive way.A unique relevance feedback strategy is developed to update the knowledge of cross-media correlation by learning from user behaviors,so retrieval performance is enhanced in a progressive manner.Experimental results show that the performance of our approach is effective.
Recently a new clustering algorithm called 'affinity propagation' (AP) has been proposed, which efficiently clustered sparsely related data by passing messages between data points. However, we want to cluster large scale data where the similarities are not sparse in many cases. This paper presents two variants of AP for grouping large scale data with a dense similarity matrix. The local approach is partition affinity propagation (PAP) and the global method is landmark affinity propagation (LAP). PAP passes messages in the subsets of data first and then merges them as the number of initial step of iterations; it can effectively reduce the number of iterations of clustering. LAP passes messages between the landmark data points first and then clusters non-landmark data points; it is a large global approximation method to speed up clustering. Experiments are conducted on many datasets, such as random data points, manifold subspaces, images of faces and Chinese calligraphy, and the results demonstrate that the two ap-proaches are feasible and practicable.
Automatic web image annotation is a practical and effective way for both web image retrieval and image understanding. However, current annotation techniques make no further investigation of the statement-level syntactic correlation among the annotated words, therefore making it very difficult to render natural language interpretation for images such as "pandas eat bamboo". In this paper, we propose an approach to interpret image semantics through mining the visible and textual information hidden in images. This approach mainly consists of two parts: first the annotated words of target images are ranked according to two factors, namely the visual correlation and the pairwise co-occurrence; then the statement-level syntactic correlation among annotated words is explored and natural language interpretation for the target image is obtained. Experiments conducted on real-world web images show the effectiveness of the proposed approach.
P2P systems are categorized into tree-based and mesh-based systems according to their topologies. Mesh-based systems are considered more suitable for large-scale lnternet applications, but require optimization on latency issue. This paper proposes a content subscribing mechanism (CSM) to eliminate unnecessary time delays during data relaying. A node can send content data to its neighbors as soon as it receives the data segment. No additional time is taken during the interactive stages prior to data segment transmission of streaming content. CSM consists of three steps. First, every node records its historical segments latency, and adopts gamma distribution, which possesses powerful expression ability, to express latency statistics. Second, a node predicts subscribing success ratio of every neighbor by comparing the gamma distribution parameters of the node and its neighbors before selecting a neighbor node to subscribe a data segment. The above steps would not increase latency as they are executed before the data segments are ready at the neighbor nodes. Finally, the node, which was subscribed to, sends the subscribed data segment to the subscriber immediately when it has the data segment. Experiments show that CSM significantly reduces the content data transmission latency.
As historical Chinese calligraphy works are being digitized, the problem of retrieval becomes a new challenge. But, currently no OCR technique can convert calligraphy character images into text, nor can the existing Handwriting Character Recognition approach does not work for it. This paper proposes a novel approach to efficiently retrieving Chinese calligraphy characters on the basis of similarity: calligraphy character image is represented by a collection of discriminative features, and high retrieval speed with reasonable effectiveness is achieved. First, calligraphy characters that have no possibility similar to the query are filtered out step by step by comparing the character complexity, stroke density and stroke protrusion. Then, similar calligraphy characters axe retrieved and ranked according to their matching cost produced by approximate shape match. In order to speed up the retrieval, we employed high dimensional data structure - PK-tree. Finally, the efficiency of the algorithm is demonstrated by a preliminary experiment with 3012 calligraphy character images.
Along with the development of motion capture technique, more and more 3D motion databases become available. In this paper, a novel approach is presented for motion recognition and retrieval based on ensemble HMM (hidden Markov model) learning. Due to the high dimensionality of motion’s features, Isomap nonlinear dimension reduction is used for training data of ensemble HMM learning. For handling new motion data, Isomap is generalized based on the estimation of underlying eigen- functions. Then each action class is learned with one HMM. Since ensemble learning can effectively enhance supervised learning, ensembles of weak HMM learners are built. Experiment results showed that the approaches are effective for motion data recog- nition and retrieval.
In this paper, we propose a highly automatic approach for 3D photorealistic face reconstruction from a single frontal image. The key point of our work is the implementation of adaptive manifold learning approach. Beforehand, an active appearance model (AAM) is trained for automatic feature extraction and adaptive locally linear embedding (ALLE) algorithm is utilized to reduce the dimensionality of the 3D database. Then, given an input frontal face image, the corresponding weights between 3D samples and the image are synthesized adaptively according to the AAM selected facial features. Finally, geometry reconstruction is achieved by linear weighted combination of adaptively selected samples. Radial basis function (RBF) is adopted to map facial texture from the frontal image to the reconstructed face geometry. The texture of invisible regions between the face and the ears is interpolated by sampling from the frontal image. This approach has several advantages: (1) Only a single frontal face image is needed for highly automatic face reconstruction; (2) Compared with former works, our reconstruction approach provides higher accuracy; (3) Constraint based RBF texture mapping provides natural appearance for reconstructed face.