Celso M. de Melo

1 Evolution of indirect reciprocity under emotion expression
Fonseca, H., de Melo, C., Teraa, K., Gratch, J., Paiva, A., & Santos, F., Scientific Reports, 2025
Show abstract

Do emotion expressions impact the evolution of cooperation? Indirect Reciprocity offers a solution to the cooperation dilemma with prior work focusing on the role of social norms in propagating others’ reputations and contributing to evolutionarily stable cooperation. Recent experimental studies, however, show that emotion expressions shape pro-social behaviour, communicate one’s intentions to others, and serve an error-correcting function; yet, the role of emotion signals in the evolution of cooperation remains unexplored. We present the first model of IR based on evolutionary game theory that exposes how emotion expressions positively influence the evolution of cooperation, particularly in scenarios of frequent errors. Our findings provide evolutionary support for the existence of emotion-based social norms, which help foster cooperation among unrelated individuals.

2 A Compound 3D-Informed Design toward Spatially-Intelligent Large Multimodal Models
Ma, W., de Melo, C., Yuille, A., Chen, J., Proceedings of CVPR, 2025
Show abstract

Humans naturally understand 3D spatial relationships, enabling complex reasoning like predicting collisions of vehicles from different directions. Current large multimodal models (LMMs), however, lack of this capability of 3D spatial reasoning. This limitation stems from the scarcity of 3D training data and the bias in current model designs toward 2D data. In this paper, we systematically study the impact of 3D-informed data, architecture, and training setups, introducing 3DI-LMM, an LMM with advanced 3D spatial reasoning abilities. To address data limitations, we develop two types of 3D-informed training datasets: (1) 3D-informed probing data focused on object’s 3D location and orientation, and (2) 3D-informed conversation data for complex spatial relationships. Notably, we are the first to curate VQA data that incorporate 3D orientation relationships. Furthermore, we systematically integrate these two types of training data with the architectural and training designs of LMMs, providing a roadmap for optimal design aimed at achieving superior 3D reasoning capabilities. Our 3DI-LMM advances machines toward highly capable 3Dinformed reasoning, surpassing GPT-4o performance by 8.7%. Our systematic empirical design and the resulting findings offer valuable insights for future research in this direction.

3 PulseCheck457: A Diagnostic Benchmark for Comprehensive Spatial Reasoning of Large Multimodal Models
Wang, X., Ma, W., Zhang, T., de Melo, C., Chen, J., Yuille, A., Proceedings of CVPR, 2025
Show abstract

Although large multimodal models (LMMs) have demonstrated remarkable capabilities in visual scene interpretation and reasoning, their capacity for complex and precise 3-dimensional spatial reasoning remains uncertain. Existing benchmarks focus predominantly on 2D spatial understanding and lack a framework to comprehensively evaluate 6D spatial reasoning across varying complexities. To address this limitation, we present PulseCheck457, a scalable and unbiased synthetic dataset designed with 4 key capability for spatial reasoning: multi-object recognition, 2D location, 3D location, and 3D orientation. We develop a cascading evaluation structure, constructing 7 question types across 5 difficulty levels that range from basic single object recognition to our new proposed complex 6D spatial reasoning tasks. We evaluated various large multimodal models (LMMs) on PulseCheck457, observing a general decline in performance as task complexity increases, particularly in 3D reasoning and 6D spatial tasks. To quantify these challenges, we introduce the Relative Performance Dropping Rate (RPDR), highlighting key weaknesses in 3D reasoning capabilities. Leveraging the unbiased attribute design of our dataset, we also uncover prediction biases across different attributes, with similar patterns observed in real-world image settings.

4 Video-ColBERT: Contextualized Late Interaction for Text-to-Video Retrieval
Reddy, A., et al., Proceedings of CVPR, 2025
Show abstract

In this work, we tackle the problem of text-to-video retrieval (T2VR). Inspired by the success of late interaction techniques in text-document, text-image, and text-video retrieval, our approach, Video-ColBERT, introduces a simple and efficient mechanism for fine-grained similarity assessment between queries and videos. Video-ColBERT is built upon three main components: a fine-grained spatial and temporal token-wise interaction, query and visual expansions, and a dual sigmoid loss during training. We find that this interaction and training paradigm leads to strong individual, yet compatible, representations for encoding video content. These representations lead to increases in performance on common text-to-video retrieval benchmarks compared to other bi-encoder methods.

5 ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution
Rivera, C., et al., Proceedings of ICRA, 2025
Show abstract

Robotic planning and execution in open-world environments is a complex problem due to the vast state spaces and high variability of task embodiment. Recent advances in perception algorithms, combined with Large Language Models (LLMs) for planning, offer promising solutions to these challenges, as the common sense reasoning capabilities of LLMs provide a strong heuristic for efficiently searching the action space. However, prior work fails to address the possibility of hallucinations from LLMs, which results in failures to execute the planned actions largely due to logical fallacies at high- or low-levels. To contend with automation failure due to such hallucinations, we introduce ConceptAgent, a natural language-driven robotic platform designed for task execution in unstructured environments. With a focus on scalability and reliability of LLM-based planning in complex state and action spaces, we present innovations designed to limit these shortcomings, including 1) Predicate Grounding to prevent and recover from infeasible actions, and 2) an embodied version of LLM-guided Monte Carlo Tree Search with self reflection. In simulation experiments, ConceptAgent achieved a 19% task completion rate across three room layouts and 30 easy level embodied tasks outperforming other state-of-the-art LLM-driven reasoning baselines that scored 10.26% and 8.11% on the same benchmark. Additionally, ablation studies on moderate to hard embodied tasks revealed a 20% increase in task completion from the baseline agent to the fully enhanced ConceptAgent, highlighting the individual and combined contributions of Predicate Grounding and LLM-guided Tree Search to enable more robust automation in complex state and action spaces.

6 A Mamba-based Siamese Network for Remote Sensing Change Detection
Paranjape, J., de Melo, C., Patel, V., Proceedings of WACV, 2025
Show abstract

Change detection in remote sensing images is an essential tool for analyzing a region at different times. It finds varied applications in monitoring environmental changes, man-made changes as well as corresponding decision-making and prediction of future trends. Deep learning methods like Convolutional Neural Networks (CNNs) and Transformers have achieved remarkable success in detecting significant changes, given two images at different times. In this paper, we propose a Mamba-based Change Detector (M-CD) that segments out the regions of interest even better. Mamba-based architectures demonstrate linear-time training capabilities and an improved receptive field over transformers. Our experiments on four widely used change detection datasets demonstrate significant improvements over existing state-of-the-art (SOTA) methods.

7 ConceptGraphs: Open-vocabulary 3D scene graphs for perception and planning
Kuwajerwala, A., Gu, Q., Morin, S., Jatavallabhula, K., Sen, B., Agarwal, A., Rivera, C., Paul, W., Ellis, K., Chellappa, R., Gan, C., de Melo, C., Tenenbaum, J., Torralba, A., Shkurti, F., Paull, L., Proceedings of ICRA, 2024
Show abstract

For robots to perform a wide variety of tasks, they require a 3D representation of the world that is semantically rich, yet compact and efficient for task-driven perception and planning. Recent approaches have attempted to leverage features from large vision-language models to encode semantics in 3D representations. However, these approaches tend to produce maps with per-point feature vectors, which do not scale well in larger environments, nor do they contain semantic spatial relationships between entities in the environment, which are useful for downstream planning. In this work, we propose ConceptGraphs, an open-vocabulary graph-structured representation for 3D scenes. ConceptGraphs is built by leveraging 2D foundation models and fusing their output to 3D by multiview association. The resulting representations generalize to novel semantic classes, without the need to collect large 3D datasets or finetune models. We demonstrate the utility of this representation through a number of downstream planning tasks that are specified through abstract (language) prompts and require complex reasoning over spatial and semantic concepts.

8 Vilco-bench: Video language continual learning benchmark
Tang, T., Deldari, S., Xue, H., de Melo, C., Salim, F., Proceedings of NeurIPS, 2024
Show abstract

Video language continual learning involves continuously adapting to information from video and text inputs, enhancing a model’s ability to handle new tasks while retaining prior knowledge. This field is a relatively under-explored area, and establishing appropriate datasets is crucial for facilitating communication and research in this field. In this study, we present the first dedicated benchmark, ViLCo-Bench, designed to evaluate continual learning models across a range of video-text tasks. The dataset comprises ten-minute-long videos and corresponding language queries collected from publicly available datasets. Additionally, we introduce a novel memory-efficient framework that incorporates self-supervised learning and mimics long-term and short-term memory effects. This framework addresses challenges including memory complexity from long video clips, natural language complexity from open queries, and text-video misalignment. We posit that ViLCo-Bench, with greater complexity compared to existing continual learning benchmarks, would serve as a critical tool for exploring the video-language domain, extending beyond conventional class-incremental tasks, and addressing complex and limited annotation issues. The curated data, evaluations, and our novel method are available at https://github. com/cruiseresearchgroup/ViLCo.

9 Exploring the impact of rendering method and motion quality on model performance when using multi-view synthetic data for action recognition
Panev, S., Kim, E., Namburu, S., Nikolova, D., de Melo, C., De la Torre, F., Hodgins, J., Proceedings of WACV, 2024
Show abstract

This paper explores the use of synthetic data in a human action recognition (HAR) task to avoid the challenges of obtaining and labeling real-world datasets. We introduce a new dataset suite comprising five datasets, eleven common human activities, three synchronized camera views (aerial and ground) in three outdoor environments, and three visual domains (real and two synthetic). For the synthetic data, two rendering methods (standard computer graphics and neural rendering) and two sources of human motions (motion capture and video-based motion reconstruction) were employed. We evaluated each dataset type by training popular activity recognition models and comparing the performance on the real test data. Our results show that synthetic data achieve slightly lower accuracy (4-8%) than real data. On the other hand, a model pre-trained on synthetic data and fine-tuned on limited real data surpasses the performance of either domain alone. Standard computer graphics (CG)-rendered data delivers better performance than the data generated from the neural-based rendering method. The results suggest that the quality of the human motions in the training data also affects the test results: motion capture delivers higher test accuracy. Additionally, a model trained on CG aerial view synthetic data exhibits greater robustness against camera viewpoint changes than one trained on real data.

10 Unsupervised video domain adaptation with masked pre-training and collaborative self-training
Reddy, A., Paul, W., Rivera, C., Shah, K., de Melo, C., Chellappa, R., Proceedings of CVPR, 2024
Show abstract

In this work, we tackle the problem of unsupervised domain adaptation (UDA) for video action recognition. Our approach, which we call UNITE, uses an image teacher model to adapt a video student model to the target domain. UNITE first employs self-supervised pre-training to promote discriminative feature learning on target domain videos using a teacher-guided masked distillation objective. We then perform self-training on masked target data, using the video student model and image teacher model together to generate improved pseudolabels for unlabeled target videos. Our self-training process successfully leverages the strengths of both models to achieve strong transfer performance across domains. We evaluate our approach on multiple video domain adaptation benchmarks and observe significant improvements upon previously reported results.

11 Entropic open-set active learning
Safaei, B., Vibashan, V.S., de Melo, C., Patel, V., Proceedings of AAAI, 2024
Show abstract

Active Learning (AL) aims to enhance the performance of deep models by selecting the most informative samples for annotation from a pool of unlabeled data. Despite impressive performance in closed-set settings, most AL methods fail in real-world scenarios where the unlabeled data contains unknown categories. Recently, a few studies have attempted to tackle the AL problem for the open-set setting. However, these methods focus more on selecting known samples and do not efficiently utilize unknown samples obtained during AL rounds. In this work, we propose an Entropic Open-set AL (EOAL) framework which leverages both known and unknown distributions effectively to select informative samples during AL rounds. Specifically, our approach employs two different entropy scores. One measures the uncertainty of a sample with respect to the known-class distributions. The other measures the uncertainty of the sample with respect to the unknown-class distributions. By utilizing these two entropy scores we effectively separate the known and unknown samples from the unlabeled data resulting in better sampling. Through extensive experiments, we show that the proposed method outperforms existing state-of-the-art methods on CIFAR-10, CIFAR-100, and TinyImageNet datasets.

12 Synthetic-to-Real Adaptation for Complex Action Recognition in Surveillance Applications
Lu, S., Jin, Z., Rajendran, V., Harari, M., Feng, A., and de Melo, C., Proceedings of Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2024
Show abstract

In this paper, we propose to enhance action recognition accuracy by leveraging synthetic data and domain adaptation. Specifically, We achieve this through the creation of a synthetic dataset mimicking the Multi-View Extended Video with Activities (MEVA) dataset and the introduction of a multi-modal model for domain adaptation. This synthetic-to-real adaptation approach improves recognition accuracy by leveraging the synthetic data to enhance model generalization. Firstly, we focus on creating and utilizing synthetic datasets generated through a high-fidelity physically-based rendering system. The sensor simulation incorporates domain randomization and photo-realistic rendering to reduce the domain gap between the synthetic and real data, effectively addressing the persistent challenges of real data scarcity in action recognition. Complementing the synthetic dataset generation, we leverage the multi-modal models in the synthetic-toreal adaptation experiments that utilize RGB images and skeleton features. Our experiments show that even relatively straightforward techniques, such as synthetic data pre-training, provide improvements to the models. Our work highlights the effectiveness of the approach and its practical applications across various domains, including surveillance systems, threat identification, and disaster response.

13 Enhancing human action recognition with GAN-based data augmentation
Pulakurthia, P., de Melo, C., Rao, R., and Rabbani, M., Proceedings of Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2024
Show abstract

Deep Neural Networks (DNNs) have emerged as powerful tools for human action recognition, yet their reliance on vast amounts of high-quality labeled data poses significant challenges. The traditional approach of collecting and labeling large volumes of real-world data is not only costly but also raises ethical concerns. A promising alternative is to generate synthetic data. However, the existing synthetic data generation pipelines require complex simulation environments. Our work presents a novel solution by employing Generative Adversarial Networks (GANs) to generate synthetic yet realistic training data from a small existing real-world dataset, thereby bypassing the need for elaborate simulation environments. Central to our approach is a training pipeline that extracts motion from each training video and augments it across varied subject appearances within the training set. This method increases the diversity in both motion and subject representation, thus significantly enhancing the model’s performance in accurately recognizing human gestures. The model’s performance is rigorously evaluated in diverse scenarios, including ground and aerial views, to demonstrate the method’s versatility and effectiveness. The findings of our study highlight the efficiency of GAN-based data augmentation, utilizing a minimal real dataset to create synthetic data without relying on complex simulators. Moreover, useful insights are provided by analyzing the critical factors influencing gesture recognition performance, such as the diversity in gesture motion and the diversity in subject appearance.

14 An evaluation of large pre-trained models for gesture recognition using synthetic videos
Reddy, A., Shah, K., Rivera, C., Paul, W., de Melo, C., and Chellappa, R., Proceedings of Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2024
Show abstract

In this work, we explore the possibility of using synthetically generated data for video-based gesture recognition with large pre-trained models. We consider whether these models have sufficiently robust and expressive representation spaces to enable “training-free” classification. Specifically, we utilize various state-of-the-art video encoders to extract features for use in k-nearest neighbors classification, where the training data points are derived from synthetic videos only. We compare these results with another training-free approach— zero-shot classification using text descriptions of each gesture. In our experiments with the RoCoG-v2 dataset, we find that using synthetic training videos yields significantly lower classification accuracy on real test videos compared to using a relatively small number of real training videos. We also observe that video backbones that were fine-tuned on classification tasks serve as superior feature extractors, and that the choice of fine-tuning data has a substantial impact on k-nearest neighbors performance. Lastly, we find that zero-shot text-based classification performs poorly on the gesture recognition task, as gestures are not easily described through natural language.

15 Real-time human action recognition from aerial videos using autozoom and synthetic data
Xian, R., Vogel, B., de Melo, C., Harrison, A., Manocha, D., Proceedings of Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2024
Show abstract

In this paper, we propose a novel approach for real-time human action recognition (HAR) on resource-constrained UAVs. Our approach tackles the limited availability of labeled UAV video data (compared to ground-based datasets) by incorporating synthetic data augmentation to improve the performance of a lightweight action recognition model. This combined strategy offers a robust and efficient solution for UAV-based HAR.We evaluate our method on the RoCoG v21 and UAV-Human2 datasets, showing a notable increase in top-1 accuracy across all scenarios on RoCoG: 9.1% improvement when training with synthetic data only, 6.9% with real data only, and the highest improvement of 11.8% with a combined approach. Additionally, using an X3D backbone further improves accuracy on the UAV-Human dataset by 5.5%. Our models deployed on a Qualcomm Robotics RB5 platform achieve real-time predictions at approximately 10 frames per second (fps) and demonstrate a superior trade-off between performance and inference rate on both low-power edge devices and high-end desktops.

16 Incorporating physics into data-driven computer vision
Kadambi, A., de Melo, C., Hsieh, C.-J., Srivastava, M., & Soatto, S., Nature Machine Intelligence, 2023
Show abstract

Many computer vision techniques infer properties of our physical world from images. While images are formed through the physics of light and mechanics, computer vision techniques are typically data-driven. This trend is mostly driven by performance: classical techniques from physicsbased vision often do not score as high in metrics, compared to modern deep learning. However, recent research, covered in this perspective, has shown that physical models can be included as a constraint into datadriven pipelines. In doing so, one can combine the performance benefits of a data-driven method with advantages offered from a physics-based method, such as intepretability, falsifiability, and generalizability. The aim of this Perspective is to provide an overview into specific approaches of how physical models can be integrated into artificial intelligence (AI) pipelines, referred to as physics-based machine learning. We discuss technical approaches that range from modifications to the dataset, network design, loss functions, optimization, and regularization schemes.

17 Social functions of machine emotional expressions
de Melo, C., Gratch, J., Marsella, S., & Pelachaud, C., Proceedings of IEEE, 2023
Show abstract

Virtual humans and social robots frequently generate behaviors that human observers naturally see as expressing emotion. In this review article, we highlight that these expressions can have important benefits for human-machine interaction. We first summarize the psychological findings on how emotional expressions achieve important social functions in human relationships and highlight that artificial emotional expressions can serve analogous functions in human-machine interaction.We then review computational methods for determining what expressions make sense to generate within the context of an interaction and how to realize those expressions across multiple modalities such as facial expressions, voice, language and touch. The use of synthetic expressions raises a number of ethical concerns and we conclude with a discussion of principles to achieve the benefits of machine emotion in ethical ways.

18 Emotion expression and cooperation under collective risks
de Melo, C., Santos, F. C., Terada, K., iScience, 2023
Show abstract

The difficulties associated with solving Humanity's major global challenges have increasingly led world leaders and everyday citizens to publicly adopt strong emotional responses, with either mixed or unknown impacts on others' actions. Here, we present two experiments showing that non-verbal emotional expressions in group interactions play a critical role in determining how individuals behave when contributing to public goods entailing future and uncertain returns. Participants' investments were not only shaped by emotional expressions but also enhanced by anger when compared with joy. Our results suggest that global coordination may benefit from interaction in which emotion expressions can be paramount.

19 ConceptFusion: Open-set multimodal 3D mapping
Jatavallabhula1, K., Kuwajerwala, A., Gu, Q., Omama, M., Chen, T., Maalouf, A., Li, S., Iyer, G., Saryazdi, S., Keetha, N., Tewari, A., Tenenbaum, J., de Melo, C., Krishna, M., Paull, L., Shkurti, F., Torralba, A., Proceedings of Robotics: Science and Systems (RSS), 2023
Show abstract

Building 3D maps of the environment is central to robot navigation, planning, and interaction with objects in a scene. Most existing approaches that integrate semantic concepts with 3D maps largely remain confined to the closed-set setting: they can only reason about a finite set of concepts, pre-defined at training time. Further, these maps can only be queried using class labels, or in recent work, using text prompts. We address both these issues with ConceptFusion, a scene representation that is: (i) fundamentally open-set, enabling reasoning beyond a closed set of concepts (ii) inherently multi-modal, enabling a diverse range of possible queries to the 3D map, from language, to images, to audio, to 3D geometry, all working in concert. ConceptFusion leverages the open-set capabilities of today’s foundation models pre-trained on internet-scale data to reason about concepts across modalities such as natural language, images, and audio. We demonstrate that pixel-aligned open-set features can be fused into 3D maps via traditional SLAM and multi-view fusion approaches. This enables effective zero-shot spatial reasoning, not needing any additional training or finetuning, and retains long-tailed concepts better than supervised approaches, outperforming them by more than 40% margin on 3D IoU. We extensively evaluate ConceptFusion on a number of real-world datasets, simulated home environments, a real-world tabletop manipulation task, and an autonomous driving platform. We showcase new avenues for blending foundation models with 3D open-set multimodal mapping.

20 Stmt: A spatial-temporal mesh transformer for mocap-based action recognition
Zhu, X., Huang, P.-Y., Liang, J., de Melo, C., Hauptmann, A., Proceedings of CVPR, 2023
Show abstract

21 Open-Set automatic target recognition
Safaei, B., Vibashan, V.S.; de Melo, C.; Hu, S.; Patel, V., Proceedings of ICASSP, 2023
Show abstract

Automatic Target Recognition (ATR) is a category of computer vision algorithms which attempts to recognize targets on data obtained from different sensors. ATR algorithms are extensively used in real-world scenarios such as military and surveillance applications. Existing ATR algorithms are developed for traditional closed-set methods where training and testing have the same class distribution. Thus, these algorithms have not been robust to unknown classes not seen during the training phase, limiting their utility in real-world applications. To this end, we propose an Open-set Automatic Target Recognition framework where we enable open-set recognition capability for ATR algorithms. In addition, we introduce a plugin Category-aware Binary Classifier (CBC) module to effectively tackle unknown classes seen during inference. The proposed CBC module can be easily integrated with any existing ATR algorithms and can be trained in an end-to-end manner. Experimental results show that the proposed approach outperforms many open-set methods on the DSIAC and CIFAR-10 datasets. To the best of our knowledge, this is the first work to address the open-set classification problem for ATR algorithms. Source code is available at: https://github.com/bardisafa/Open-set-ATR.

22 Synthetic-to-real domain adaptation for action recognition: A dataset and baseline performances
Reddy, A., Shah, K., Paul, W., Mocharla, R., Hoffman, J., Katyal, K., Manocha, D., de Melo, C., & Chellappa, R., Proceedings of International Conference on Robotics and Automation (ICRA), 2023
Show abstract

Human action recognition is a challenging problem, particularly when there is high variability in factors such as subject appearance, backgrounds and viewpoint. While deep neural networks (DNNs) have been shown to perform well on action recognition tasks, they typically require large amounts of high-quality labeled data to achieve robust performance across a variety of conditions. Synthetic data has shown promise as a way to avoid the substantial costs, and potential practical and ethical issues associated with collecting and labeling enormous amounts of data in the real-world. However, synthetic data may differ from real data in important ways. This phenomenon, known as domain shift, can limit the utility of synthetic data in robotics applications. To mitigate the effects of domain shift, substantial effort is being dedicated to the development of domain adaptation (DA) techniques. Yet, much remains to be understood on how best to develop these techniques. In this paper, we introduce a new dataset, called Robot Control Gestures (RoCoG-v2), composed of corresponding real and synthetic videos, to support the study of synthetic-to-real domain shift in video action recognition. Our work expands upon existing datasets by focusing the action classes on gestures for humanrobot teaming, as well as by enabling investigation of domain shift in both ground and aerial views. We present baseline results using state-of-the-art action recognition and domain adaptation algorithms and offer initial insight on tackling the synthetic-to-real and ground-to-air domain shifts. A link to the dataset and corresponding documentation can be found at https://github.com/reddyav1/RoCoG-v2.

23 AZTR: Aerial video action recognition with auto zoom and temporal reasoning
Wang, X., Xian, R., Guan, T., de Melo, C., Nogar, S., Bera, A., & Manocha, D., Proceedings of International Conference on Robotics and Automation (ICRA), 2023
Show abstract

We propose a novel approach for aerial video action recognition. Our method is designed for videos captured using UAVs and can run on edge or mobile devices. We present a learning-based approach that uses customized auto zoom to automatically identify the human target and scale it appropriately. This makes it easier to extract the key features and reduces the computational overhead. We also present an efficient temporal reasoning algorithm to capture the action information along the spatial and temporal domains within a controllable computational cost. Our approach has been implemented and evaluated both on the desktop with high-end GPUs and on the low power Robotics RB5 Platform for robots and drones. In practice, we achieve 6.1-7.4% improvement over SOTA in Top-1 accuracy on the RoCoG-v2 dataset, 8.3- 10.4% improvement on the UAV-Human dataset and 3.2% improvement on the Drone Action dataset.

24 Multi-view action recognition using contrastive learning
Shah, K., Shah, A., Lau, C., de Melo, C., & Chellappa, R., Proceedings of Winter Conference on Applications of Computer Vision (WACV), 2023
Show abstract

In this work, we present a method for RGB-based action recognition using multi-view videos. We present a supervised contrastive learning framework to learn a feature embedding robust to changes in viewpoint, by effectively leveraging multi-view data. We use an improved supervised contrastive loss and augment the positives with those coming from synchronized viewpoints. We also propose a new approach to use classifier probabilities to guide the selection of hard negatives in the contrastive loss, to learn a more discriminative representation. Negative samples from confusing classes based on posterior are weighted higher. We also show that our method leads to better domain generalization compared to the standard supervised training based on synthetic multi-view data. Extensive experiments on real (NTU-60, NTU-120, NUMA) and synthetic (RoCoG) data demonstrate the effectiveness of our approach.

25 Synthetic data for automatic target recognition from small drones
de Melo, C., Conover, D., Poster, D., Leung, S., Nguyen, R., Conroy, J., Proceedings of Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2023
Show abstract

Automatic target recognition (ATR) technology is likely to play an increasingly prevalent role in maintaining situational awareness in the modern battlefield. Progress in deep learning has enabled considerable progress in the development of ATR algorithms; however, these algorithms require large amounts of high-quality annotated data to train and that is often the main bottleneck. Synthetic data offers a potential solution to this problem, especially given recent proliferation of tools and techniques to synthesize custom data. Here, we focus on ATR, in the visible domain, from the perspective of a small drone, which represents a domain of growing importance to the Army. We describe custom simulators built to support synthetic data for multiple targets in a variety of environments. We describe a field experiment where we compared a baseline (YOLOv5) model, trained on off-the-shelf large generic public datasets, with a model augmented with specialized synthetic data. We deployed the models on a VOXL platform in a small drone. Our results showed a considerable boost in performance when using synthetic data of over 40% in target detection accuracy (average precision with at least 50% overlap). We discuss the value of synthetic data for this domain, the opportunities it creates, but also the novel challenges it introduces.

26 Adversarial learning using synthetic IR imagery
Uplinger, J., Schesser, D., Meyer, C., Conroy, J., de Melo, C., Proceedings of Synthetic Data for Artificial Intelligence and Machine Learning: Tools, Techniques, and Applications, 2023
Show abstract

27 The influence of emotional expressions of an industrial robot on human collaborative decision-making
Usui, K., Terada, K., & de Melo, C., Proceedings of Affective Computing and Intelligent Interaction (ACII), 2022
Show abstract

In recent years, robots have been equipped with the ability to express emotions and have begun building social relationships with people. However, the significance and effectiveness of incorporating emotion in industrial robots, which have a strong instrumental nature, is not fully understood. We investigated how emotional expressions of an industrial robot influence human collaborative decision-making. The participants (n=52), in a laboratory experiment, engaged in a dessert survival task with an arm robot in a 2 (emotion expression: present vs. absent) × 2 (competence: high vs. low) between-participants study. Emotion was expressed using color through a LED strip of lights - e.g., anger was conveyed by flashing red. The results showed that emotion expression and competence did not influence the final agreement and, in fact, emotion expressions made the interaction longer, emphasizing the difficulty in communicating emotion and the reason for those expressions. We discuss lessons learnt and provide insight on improving the value of emotion expression in industrial robots.

28 Not just streaks: Towards ground truth for single image deraining
Ba, Y., Zhang, H., Yang, E., Suzuki, A., Pfahnl, A., Chandrappa, C., de Melo, C., You, S., Soatto, S., Wong, Al, & Kadambi, A., Proceedings of European Conference on Computer Vision (ECCV), 2022
Show abstract

We propose a large-scale dataset of real-world rainy and clean image pairs and a method to remove degradations, induced by rain streaks and rain accumulation, from the image. As there exists no real-world dataset for deraining, current state-of-the-art methods rely on synthetic data and thus are limited by the sim2real domain gap; moreover, rigorous evaluation remains a challenge due to the absence of a real paired dataset. We fill this gap by collecting a real paired deraining dataset through meticulous control of non-rain variations. Our dataset enables paired training and quantitative evaluation for diverse real-world rain phenomena (e.g. rain streaks and rain accumulation). To learn a representation robust to rain phenomena, we propose a deep neural network that reconstructs the underlying scene by minimizing a rain-robust loss between rainy and clean images. Extensive experiments demonstrate that our model outperforms the state-of-the-art deraining methods on real rainy images under various conditions. Project website: https://visual.ee.ucla.edu/gt_rain.htm/.

29 Next-generation deep learning based on simulators and synthetic data
de Melo, C., Torralba, A., Guibas, L., DiCarlo, J., Chellappa, R., & Hodgins, J., Trends in Cognitive Sciences, 2021
Show abstract

Deep learning (DL) is being successfully applied across multiple domains, yet these models learn in a most artificial way: they require large quantities of labeled data to grasp even simple concepts. Thus, the main bottleneck is often access to supervised data. Here, we highlight a trend in a potential solution to this challenge: synthetic data. Synthetic data are becoming accessible due to progress in rendering pipelines, generative adversarial models, and fusion models. Moreover, advancements in domain adaptation techniques help close the statistical gap between synthetic and real data. Paradoxically, this artificial solution is also likely to enable more natural learning, as seen in biological systems, including continual, multimodal, and embodied learning. Complementary to this, simulators and deep neural networks (DNNs) will also have a critical role in providing insight into the cognitive and neural functioning of biological systems. We also review the strengths of, and opportunities and novel challenges associated with, synthetic data.

30 Emotion expressions shape human social norms and reputations
de Melo, C., Terada, K., & Santos, F., iScience, 2021
Show abstract

The emergence of pro-social behaviors remains a key open challenge across disciplines. In this context, there is growing evidence that expressing emotions may foster human cooperation. However, it remains unclear how emotions shape individual choices and interact with other cooperation mechanisms. Here, we provide a comprehensive experimental analysis of the interplay of emotion expressions with two important mechanisms: direct and indirect reciprocity. We show that cooperation in an iterated prisoner's dilemma emerges from the combination of the opponent's initial reputation, past behaviors, and emotion expressions. Moreover, all factors influenced the social norm adopted when assessing the action of others — i.e., how their counterparts' reputations are updated – thus, reflecting longer-term consequences. We expose a new class of emotion-based social norms, where emotions are used to forgive those that defect but also punish those that cooperate. These findings emphasize the importance of emotion expressions in fostering, directly and indirectly, cooperation in society.

31 Heuristic Thinking and Altruism towards Machines in People Impacted by Covid-19
de Melo, C., Gratch, J., & Krueger, F., iScience, 2021
Show abstract

Autonomous machines are poised to become pervasive, but most treat machines differently: we are willing to violate social norms and less likely to display altruism toward machines. Here, we report an unexpected effect that those impacted by Covid-19 —as measured by a Post-Traumatic Stress Disorder scale— show a sharp reduction in this difference. Participants engaged in the dictator game with humans and machines and, consistent with prior research on disasters, those impacted by Covid-19 displayed more altruism to other humans. Unexpectedly, participants impacted by Covid-19 displayed equal altruism toward human and machine partners. A mediation analysis suggests that altruism toward machines was explained by an increase in heuristic thinking —reinforcing prior theory that heuristic thinking encourages people to treat machines like people— and faith in technology —perhaps reflecting longer-term consequences on how we act with machines. These findings give insight, but also raise concerns, for the design of technology.

32 Risk of injury in moral dilemmas with autonomous vehicles
de Melo, C., Marsella, S., & Gratch, J., Frontiers in Robotics and AI, 2021
Show abstract

As autonomous machines, such as automated vehicles (AVs) and robots, become pervasive in society, they will inevitably face moral dilemmas where they must make decisions that risk injuring humans. However, prior research has framed these dilemmas in starkly simple terms, i.e., framing decisions as life and death and neglecting the influence of risk of injury to the involved parties on the outcome. Here, we focus on this gap and present experimental work that systematically studies the effect of risk of injury on the decisions people make in these dilemmas. In four experiments, participants were asked to program their AVs to either save five pedestrians, which we refer to as the utilitarian choice, or save the driver, which we refer to as the nonutilitarian choice. The results indicate that most participants made the utilitarian choice but that this choice was moderated in important ways by perceived risk to the driver and risk to the pedestrians. As a second contribution, we demonstrate the value of formulating AV moral dilemmas in a game-theoretic framework that considers the possible influence of others’ behavior. In the fourth experiment, we show that participants were more (less) likely to make the utilitarian choice, the more utilitarian (nonutilitarian) other drivers behaved; furthermore, unlike the game-theoretic prediction that decision-makers inevitably converge to nonutilitarianism, we found significant evidence of utilitarianism. We discuss theoretical implications for our understanding of human decision-making in moral dilemmas and practical guidelines for the design of autonomous machines that solve these dilemmas while, at the same time, being likely to be adopted in practice.

33 Social factors in human-agent teaming
de Melo, C., Files, B., Pollard, K., & Khooshabeh, P., A. Moallem (Eds.), Smart and Intelligent Systems, , 2021
Show abstract

Recent decades have seen impressive progress in the development of autonomous technology, such as robots, drones, self-driving cars, and personal assistants. These intelligent agents are able to engage with their surrounding environment in increasingly sophisticated ways. However, as this technology becomes pervasive in society, its success hinges on effective and efficient collaboration with humans. To accomplish this, agents need not only understand the functional aspects of the task, but also the broader social context. Here, we first review relevant psychological theory explaining why and when humans treat agents in a social manner and are socially influenced by them. Second, we summarize experimental evidence showing the importance of verbal (e.g., natural language conversation) and nonverbal (e.g., emotion expressions) communication for successful collaboration between humans and agents. Third, we review recent work showing how perceptions of social group membership with agents influence cooperation. Fourth, we cover research on key individual differences – e.g., anthropomorphic tendency – shaping social interaction with agents. Finally, we identify open challenges and opportunities in this emerging field.

34 The impact of partner expressions on felt emotion in the iterated prisoner’s dilemma: An event-level analysis
Angelika-Nikita, M., de Melo, C., Terada, K., Terada, K., & Gratch, J., Proceedings of the Ninth Annual Conference on Advances in Cognitive Systems (ACS), 2021
Show abstract

Social games like the prisoner’s dilemma are often used to develop models of the role of emotion in social decision-making. Here we examine an understudied aspect of emotion in such games: how an individual’s feelings are shaped by their partner’s expressions. Prior research has tended to focus on other aspects of emotion. Research on felt-emotion has focused on how an individual’s feelings shape how they treat their partner, or whether these feelings are authentically expressed. Research on expressed-emotion has focused on how an individual’s decisions are shaped by their partner’s expressions, without regard for whether these expressions actually evoke feelings. Here, we use computer-generated characters to examine how an individual’s moment-to-moment feelings are shaped by (1) how they are treated by their partner and (2) what their partner expresses during this treatment. Surprisingly, we find that partner expressions are far more important than actions in determining self-reported feelings. In other words, our partner can behave in a selfish and exploitive way, but if they show a collaborative pattern of expressions, we will feel greater pleasure collaborating with them. These results also emphasize the importance of context in determining how someone will feel in response to an expression (i.e., knowing a partner is happy is insufficient; we must know what they are happy-at). We discuss the implications of this work for cognitive-system design, emotion theory, and methodological practice in affective computing.

35 The interplay of emotion expressions and strategy in promoting cooperation in the iterated prisoner's dilemma
de Melo, C., & Kazunori, T., Scientific Reports, 2020
Show abstract

The iterated prisoner's dilemma has been used to study human cooperation for decades. The recent discovery of extortion and generous strategies renewed interest on the role of strategy in shaping behavior in this dilemma. But what if players could perceive each other's emotional expressions? Despite increasing evidence that emotion signals influence decision making, the effects of emotion in this dilemma have been mostly neglected. Here we show that emotion expressions moderate the effect of generous strategies, increasing or reducing cooperation according to the intention communicated by the signal; in contrast, expressions by extortionists had no effect on participants' behavior, revealing a limitation of highly competitive strategies. We provide evidence that these effects are mediated mostly by inferences about other's intentions made from strategy and emotion. These findings provide insight into the value, as well as the limits, of behavioral strategies and emotion signals for cooperation.

36 Reducing cognitive load and improving warfighter problem solving with intelligent virtual assistants
de Melo, C., Kim, K., Norouzi, N., Bruder, G., & Welch, G., Frontiers in Psychology, 2020
Show abstract

Recent times have seen increasing interest in conversational assistants (e.g., Amazon Alexa) designed to help users in their daily tasks. In military settings, it is critical to design assistants that are, simultaneously, helpful and able to minimize the user’s cognitive load. Here, we show that embodiment plays a key role in achieving that goal. We present an experiment where participants engaged in an augmented reality version of the relatively well-known desert survival task. Participants were paired with a voice assistant, an embodied assistant, or no assistant. The assistants made suggestions verbally throughout the task, whereas the embodied assistant further used gestures and emotion to communicate with the user. Our results indicate that both assistant conditions led to higher performance over the no assistant condition, but the embodied assistant achieved this with less cognitive burden on the decision maker than the voice assistant, which is a novel contribution. We discuss implications for the design of intelligent collaborative systems for the warfighter.

37 Vision-based gesture recognition in human-robot teams using synthetic data
de Melo, C., Rothrock, B., Gurram, P., Ulutan, O., & Manjunath, B. S., Proceedings of International Conference on Intelligent Robots and Systems (IROS), 2020
Show abstract

Building successful collaboration between humans and robots requires efficient, effective, and natural communication. Here we study a RGB-based deep learning approach for controlling robots through gestures (e.g., “follow me”). To address the challenge of collecting high-quality annotated data from human subjects, synthetic data is considered for this domain. We contribute a dataset of gestures that includes real videos with human subjects and synthetic videos from our custom simulator. A solution is presented for gesture recognition based on the state-of-the-art I3D model. Comprehensive testing was conducted to optimize the parameters for this model. Finally, to gather insight on the value of synthetic data, several experiments are described that systematically study the properties of synthetic data (e.g., gesture variations, character variety, generalization to new gestures). We discuss practical implications for the design of effective human-robot collaboration and the usefulness of synthetic data for deep learning.

38 Reducing task load with an embodied intelligent virtual assistant for improved performance in collaborative decision making
Kim, K., de Melo, C., Norouzi, N., Bruder, G., & Welch, G., Proceedings of IEEE on Virtual Reality and 3D User Interfaces (IEEE VR), 2020
Show abstract

Collaboration in a group has the potential to achieve more effective solutions for challenging problems, but collaboration per se is not an easy task, rather a stressful burden if the collaboration partners do not communicate well with each other. While Intelligent Virtual Assistants (IVAs), such as Amazon Alexa, are becoming part of our daily lives, there are increasing occurrences in which we collaborate with such IVAs for our daily tasks. Although IVAs can provide important support to users, the limited verbal interface in the current state of IVAs lacks the ability to provide effective non-verbal social cues, which is critical for improving collaborative performance and reducing task load. In this paper, we investigate the effects of IVA embodiment on collaborative decision making. In a within-subjects study, participants performed a desert survival task in three conditions: (1) performing the task alone, (2) working with a disembodied voice assistant, and (3) working with an embodied assistant. Our results show that both assistant conditions led to higher performance over when performing the task alone, but interestingly the reported task load with the embodied assistant was significantly lower than with the disembodied voice assistant. We discuss the findings with implications for effective and efficient collaborations with IVAs while also emphasizing the increased social presence and richness of the embodied assistant.

39 Human cooperation when acting through autonomous machines
de Melo, C., Marsella, S., & Gratch, J., Proceedings of the National Academy of Sciences U.S.A., 116, 3482-3487, 2019
Show abstract

Recent times have seen an emergence of intelligent machines that act autonomously on our behalf, such as autonomous vehicles. Despite promises of increased efficiency, it is not clear whether this paradigm shift will change how we decide when our self-interest (e.g., comfort) is pitted against the collective interest (e.g., environment). Here we show that acting through machines changes the way people solve these social dilemmas and we present experimental evidence showing that participants program their autonomous vehicles to act more cooperatively than if they were driving themselves. We show this happens because programming causes selfish short-term rewards to become less salient, leading to considerations of broader societal goals. We also show that the programmed behavior is influenced by past experience. Finally, we report evidence that the effect generalizes beyond the domain of autonomous vehicles. We discuss implications for designing autonomous machines that contribute to a more cooperative society.

40 Cooperation with autonomous machines through culture and emotion
de Melo, C., & Terada, K., PLOS ONE, 2019
Show abstract

As machines that act autonomously on behalf of others–e.g., robots–become integral to society, it is critical we understand the impact on human decision-making. Here we show that people readily engage in social categorization distinguishing humans (“us”) from machines (“them”), which leads to reduced cooperation with machines. However, we show that a simple cultural cue–the ethnicity of the machine’s virtual face–mitigated this bias for participants from two distinct cultures (Japan and United States). We further show that situational cues of affiliative intent–namely, expressions of emotion–overrode expectations of coalition alliances from social categories: When machines were from a different culture, participants showed the usual bias when competitive emotion was shown (e.g., joy following exploitation); in contrast, participants cooperated just as much with humans as machines that expressed cooperative emotion (e.g., joy following cooperation). These findings reveal a path for increasing cooperation in society through autonomous machines.

41 Toward a unified theory of learned trust in interpersonal and human-machine interactions
Juvina, I., Collins, M., Larue, O., Kennedy, W., De Visser, E., & de Melo, C., ACM Transactions on Interactive Intelligent Systems, 9, 24-31, 2019
Show abstract

A proposal for a unified theory of learned trust implemented in a cognitive architecture is presented. The theory is instantiated as a computational cognitive model of learned trust that integrates several seemingly unrelated categories of findings from the literature on interpersonal and human-machine interactions and makes unintuitive predictions for future studies. The model relies on a combination of learning mechanisms to explain a variety of phenomena such as trust asymmetry, the higher impact of early trust breaches, the black-hat/white-hat effect, the correlation between trust and cognitive ability, and the higher resilience of interpersonal as compared to human-machine trust. In addition, the model predicts that trust decays in the absence of evidence of trustworthiness or untrustworthiness. The implications of the model for the advancement of the theory on trust are discussed. Specifically, this work suggests two more trust antecedents on the trustor’s side: perceived trust necessity and cognitive ability to detect cues of trustworthiness.

42 Inferring intentions from emotion expressions in social decision making
Gratch, J., & de Melo, C., U. Hess, & S. Hareli (Eds.), The Social Nature of Emotion Expression, 141-160, 2019
Show abstract

In the last decade we have seen increasing experimental evidence that people make important inferences from emotion expressions about others intentions in situations of interdependent decision making. Reverse appraisal has been proposed as one mechanism whereby people retrieve, from emotion displays, information about how others are appraising the ongoing interaction (e.g., does my counterpart find the current outcome to be goal conducive? Does s/he blame me for it?); in turn, from these appraisal attributions, people make inferences about the others' goals (e.g., is my counterpart likely to cooperate?) that shape their decision making. Here we review experimental evidence and progress that has been done in understanding this inferential mechanism and its relationship to other mechanisms for the interpersonal effects of emotion (e.g., emotional contagion and social appraisal). We discuss theoretical implications for our understanding of the role of emotion expression on human decision making, but also practical implications for the growing industry of socially intelligent machines (e.g., personal digital assistants and social robots).

43 Shaping cooperation between humans and agents with emotion expressions and framing
de Melo, C., Khooshabeh, P., Amir, O., & Gratch, J., Proceedings of Autonomous Agents and Multiagent Systems (AAMAS 18), 2018
Show abstract

Emotion expressions can help solve social dilemmas where individual interest is pitted against the collective interest. Building on research that shows that emotions communicate intentions to others, we reinforce that people can infer whether emotionally expressive computer agents intend to cooperate or compete. We further show important distinctions between computer agents that are perceived to be driven by humans (i.e., avatars) vs. by algorithms (i.e., agents). Our results reveal that, when the emotion expression reflects an intention to cooperate, participants will cooperate more with avatars than with agents; however, when the emotion reflects an intention to compete, participants cooperate just as little with avatars as with agents. Finally, we present first evidence that the way the dilemma is described – or framed – can influence people’s decision-making. We discuss implications for the design of autonomous agents that foster cooperation with humans, beyond what game theory predicts in social dilemmas.

44 People do not feel guilty about exploiting machines
de Melo, C., Marsella, S., & Gratch, J., ACM Transactions on Computer-Human Interaction, 23, 2017
Show abstract

Guilt and envy play an important role in social interaction. Guilt occurs when individuals cause harm to others or break social norms. Envy occurs when individuals compare themselves unfavorably to others and desire to benefit from the others’ advantage. In both cases, these emotions motivate people to act and change the status quo: following guilt, people try to make amends for the perceived transgression and, following envy, people try to harm envied others. In this paper, we present two experiments that study participants' experience of guilt and envy when engaging in social decision making with machines and humans. The results showed that, though experiencing the same level of envy, people felt considerably less guilt with machines than with humans. These effects occurred both with subjective and behavioral measures of guilt and envy, and in three different economic games: public goods, ultimatum, and dictator game. This poses an important challenge for human-computer interaction because, as shown here, it leads people to systematically exploit machines, when compared to humans. We discuss theoretical and practical implications for the design of human-machine interaction systems that hope to achieve the kind of efficiency – cooperation, fairness, reciprocity, etc. – we see in human-human interaction.

45 Social decisions and fairness change when people's interests are represented by autonomous agents
de Melo, C., Marsella, S., & Gratch, J., Journal of Autonomous Agents and Multiagent Systems, 2017
Show abstract

In the realms of AI and science fiction, agents are fully-autonomous systems that can be perceived as acting of their own volition to achieve their own goals. But in the real world, the term “agent” more commonly refers to a person that serves as a representative for a human client and works to achieve this client’s goals (e.g., lawyers and real estate agents). Yet, until the day that computers become fully autonomous, agents in the first sense are really agents in the second sense as well: computer agents that serve the interests of the human user or corporation they represent. In a series of experiments, we show that human decision-making and fairness is significantly altered when agent representatives are inserted into common social decisions such as the ultimatum game. Similar to how they behave with human representatives, people show less regard for other people (e.g., exhibit more self-interest and less fairness), when the other is represented by an agent. However, in contrast to the human literature, people show more regard for others and increased fairness when “programming” an agent to represent their own interests. This finding confirms the conjecture by some in the autonomous agent community that the very act of programming an agent changes how people make decisions. Our findings provide insight into the cognitive mechanisms that underlie these effects and we discuss the implication for the design of autonomous agents that represent the interests of humans.

46 Increasing fairness by delegating decisions to autonomous agents
de Melo, C., Marsella, S., & Gratch, J., Proceedings of Autonomous Agents and Multiagent Systems (AAMAS 17), 2017
Show abstract

There has been growing interest in autonomous agents that act on our behalf, or represent us, across various domains such as negotiation, transportation, health, finance, defense, etc. As these agent representatives become immersed in society, it is critical we understand whether and, if so, how they disrupt the traditional patterns of interaction with others. In this paper we study how programming agents to represent us, shapes our decisions in social settings. Here we show that, when acting through agent representatives, people are considerably less likely to accept unfair offers from others, when compared to direct interaction with others. This result, thus, demonstrates that agent representatives have the potential to promote fairer outcomes. Moreover, we show that this effect can also occur when people are asked to “program” human representatives, thus revealing that the effect is caused by the act of programming itself. We argue this happens because programming requires the programmer to deliberate on all possible situations that might arise and, thus, promote consideration of social norms – such as fairness – when making their decisions. These results have important theoretical, practical, and ethical implications for designing and the nature of people's decision making when they act through agents that act on our behalf.

47 "Do as I say, not as I do:" Challenges in delegating decisions to automated agents
de Melo, C., Marsella, S., & Gratch, J., Proceedings of Autonomous Agents and Multiagent Systems (AAMAS 16), 2016
Show abstract

There has been growing interest, across various domains, in computer agents that can decide on behalf of humans. These agents have the potential to save considerable time and help humans reach better decisions. One implicit assumption, however, is that, as long as the algorithms that simulate decision-making are correct and capture how humans make decisions, humans will treat these agents similarly to other humans. Here we show that interaction with agents that act on our behalf or on behalf of others is richer and more interesting than initially expected. Our results show that, on the one hand, people are more selfish with agents acting on behalf of others, than when interacting directly with others. We propose that agents increase the social distance with others which, subsequently, leads to increased demand. On the other hand, when people task an agent to interact with others, people show more concern for fairness than when interacting directly with others. In this case, higher psychological distance leads people to consider their social image and the long-term consequences of their actions and, thus, behave more fairly. To support these findings, we present an experiment where people engaged in the ultimatum game, either directly or via an agent, with others or agents representing others. We show that these patterns of behavior also occur in a variant of the ultimatum game – the impunity game – where others have minimal power over the final outcome. Finally, we study how social value orientation – i.e., people’s propensity for cooperation – impact these effects. These results have important implications for our understanding of the psychological mechanisms underlying interaction with agents, as well as practical implications for the design of successful agents that act on our behalf or on behalf of others.

48 Toward a unified theory of learned trust.
Juvina, I., Collins, M., Larue, O., & de Melo, C., International Conference on Cognitive Modeling (ICCM 16), 2016
Show abstract

A proposal for a unified theory of learned trust is presented. A number of limitations of a published computational cognitive model of learned trust are discussed. A solution is proposed to overcome these limitations and expand the model’s scope of applicability. The revised model integrates several seemingly unrelated categories of findings from the literature and makes unintuitive predictions for future studies. The implications of the model for the advancement of the theory on trust are discussed.

49 Neurophysiological effects of negotiation framing
Khooshabeh, P., Lin, R., de Melo, C., Gratch, J., Ouimette, B., et al., Annual Meeting of the Cognitive Science Society (CogSci 16), 2016
Show abstract

In this study, we manipulated gain/loss framing context during a simulated negotiation between a human user and a virtual agent. Task instructions placed users either in a loss or gain framed context, such that those in the loss frame had to minimize expenses whereas those in the gain frame had to maximize profits. The virtual agent displayed facial emotions so that we could also test how interpersonal emotions interact with framing. Results suggest that individuals are more motivated to minimize their losses than maximizing their gains. The loss frame caused individuals to demand more during the negotiation, hence to minimize expenses. Neurophysiological results suggest that cardiovascular patterns of challenge (i.e., positive motivations) were present in the loss frame condition, most strongly when the virtual human smiled. We discuss these results in regards to Prospect Theory. This work also has implications for designing and rigorously evaluating humanlike virtual agents.

50 Physiological evidence for a dual process model of the social effects of emotion in computers.
Choi, A., de Melo, C., Khooshabeh, P., Woontack, W., & Gratch, J., International Journal of Human-Computer Studies, 74, 41-53, 2015
Show abstract

There has been recent interest on the impact of emotional expressions of computers on people's decision making. However, despite a growing body of empirical work, the mechanism underlying such effects is still not clearly understood. To address this issue the paper explores two kinds of processes studied by emotion theorists in human-human interaction: inferential processes, whereby people retrieve information from emotion expressions about other's beliefs, desires, and intentions; affective processes, whereby emotion expressions evoke emotions in others, which then influence their decisions. To tease apart these two processes as they occur in human-computer interaction, we looked at physiological measures (electrodermal activity and heart rate deceleration). We present two experiments where participants engaged in social dilemmas with embodied agents that expressed emotion. Our results show, first, that people's decisions were influenced by affective and cognitive processes and, according to the prevailing process, people behaved differently and formed contrasting subjective ratings of the agents; second we show that an individual trait known as electrodermal lability, which measures people's physiological sensitivity, predicted the extent to which affective or inferential processes dominated the interaction. We discuss implications for the design of embodied agents and decision making systems that use emotion expression to enhance interaction between humans and computers.

51 Beyond believability: Quantifying the differences between real and virtual humans.
de Melo, C., & Gratch, J., Proceedings of the 15th International Conference on Intelligent Virtual Agents (IVA 15), 2015
Show abstract

“Believable” agents are supposed to “suspend the audience's disbelief” and provide the “illusion of life”. However, beyond such high-level definitions, which are prone to subjective interpretation, there is not much more to help researchers systematically create or assess whether their agents are believable. In this paper we propose a more pragmatic and useful benchmark than believability for designing virtual agents. This benchmark requires people, in a specific social situation, to act with the virtual agent in the same manner as they would with a real human. We propose that perceptions of mind in virtual agents, especially pertaining to agency – the ability to act and plan – and experience – the ability to sense and feel emotion – are critical for achieving this new benchmark. We also review current computational systems that fail, pass, and even surpass this benchmark and show how a theoretical framework based on perceptions of mind can shed light into these systems. We also discuss a few important cases where it is better if virtual humans do not pass the benchmark. We discuss implications for the design of virtual agents that can be as natural and efficient to interact with as real humans.

52 People show envy, not guilt, when making decisions with machines.
de Melo, C., & Gratch, J., Proceedings of the 6th International Conference on Affective Computing and Intelligent Interaction (ACII 15), 2015
Show abstract

Research shows that people consistently reach more efficient solutions than those predicted by standard economic models, which assume people are selfish. Artificial intelligence, in turn, seeks to create machines that can achieve these levels of efficiency in human-machine interaction. However, as reinforced in this paper, people's decisions are systematically less efficient – i.e., less fair and favorable – with machines than with humans. To understand the cause of this bias, we resort to a well-known experimental economics model: Fehr and Schmidt's inequity aversion model. This model accounts for people's aversion to disadvantageous outcome inequality (envy) and aversion to advantageous outcome inequality (guilt). We present an experiment where participants engaged in the ultimatum and dictator games with human or machine counterparts. By fitting this data to Fehr and Schmidt's model, we show that people acted as if they were just as envious of humans as of machines; but, in contrast, people showed less guilt when making unfavorable decisions to machines. This result, thus, provides critical insight into this bias people show, in economic settings, in favor of humans. We discuss implications for the design of machines that engage in social decision making with humans.

53 Reading people's minds from emotion expressions in interdependent decision making.
de Melo, C., Carnevale, P., Read, S., & Gratch, J., Journal of Personality and Social Psychology, 106(1), 73-88, 2014
Show abstract

How do people make inferences about other people's minds from their emotion displays? The ability to infer others beliefs, desires and intentions from their facial expressions should be especially important in interdependent decision making when people make decisions from beliefs about the others' intention to cooperate. Five experiments tested the general proposition that people follow principles of appraisal when making inferences from emotion displays, in context. Experiment 1 found that the same emotion display produced opposite effects depending on context: when the other was competitive, a smile on the other's face evoked a more negative response than when the other was cooperative. Experiment 2 found that the essential information from emotion displays was derived from appraisals (e.g., is the current state-of-affairs conducive to my goals? Who is to blame for it?}, facial displays of emotion had the same impact on people's decision making as textual expressions of the corresponding appraisals. Experiments 3, 4 and 5 used multiple mediation analyses and a causal-chain design: Results supported the proposition that beliefs about others' appraisals mediate the effects of emotion displays on expectations about others' intentions. We suggest a model based on appraisal theories of emotion that posits an inferential mechanism whereby people retrieve, from emotion expressions, information about others' appraisals, which then lead to inferences about others' mental states. This work has implications for the design of algorithms that drive agent behavior in human-agent strategic interaction, an emerging domain at the interface of computer science and social psychology.

54 Humans vs. computers: Impact of emotion expressions on people's decision making.
de Melo, C., Carnevale, P., & Gratch, J., IEEE Transactions on Affective Computing, 6(2), 127-136, 2014
Show abstract

Recent research in perception and theory of mind reveals that people show different behavior and lower activation of brain regions associated with mentalizing (i.e., the inference of other's mental states) when engaged in decision making with computers, when compared to humans. These findings are important for affective computing because they suggest people's decisions might be influenced differently according to whether they believe emotional expressions shown in computers are being generated by algorithms or humans. To test this, we had people engage in a social dilemma (Experiment 1) or negotiation (Experiment 2) with virtual humans that were either perceived to be agents (i.e., controlled by computers) or avatars (i.e., controlled by humans). The results showed that such perceptions have a deep impact on people's decisions: in Experiment 1, people cooperated more with virtual humans that showed cooperative facial displays (e.g., joy after mutual cooperation) than competitive displays (e.g., joy when the participant was exploited) but, the effect was stronger with avatars (d = .601) than with agents (d = .360}, in Experiment 2, people conceded more to angry than neutral virtual humans but, again, the effect was much stronger with avatars (d = 1.162) than with agents (d = .066). Participants also showed less anger towards avatars and formed more positive impressions of avatars when compared to agents.

55 Emotion in games.
de Melo, C., Paiva, A., & Gratch, J., M. Angelides, H. Agius (Eds.), The Handbook of Digital Games, 575-592, 2014
Show abstract

Growing interest on the study of emotion in the behavioral sciences has led to the development of several psychological theories of human emotion. These theories, in turn, inspired computer scientists to propose computational models that synthesize, express, recognize and interpret emotion. This cross-disciplinary research on emotion introduces new possibilities for digital games. Complementing techniques from the arts for drama and storytelling, these models can be used to drive believable non-player characters that experience properly-motivated emotions and express them appropriately at the right time; these theories can also help interpret the emotions the human player is experiencing and suggest adequate reactions in the game. This chapter reviews relevant psychological theories of emotion as well as computational models of emotion and discusses implications for games. We give special emphasis to appraisal theories of emotion, undeniably one of the most influential theoretical perspectives within computational research. In appraisal theories, emotions arise from cognitive appraisal of events (e.g., is this event conducive to my goals? Who is responsible for this event? Can I cope with this event?). According to the pattern of appraisals that occur, different emotions are experienced and expressed. Appraisal theories can, therefore, be used to synthesize emotions in games, which are then expressed in different ways. Complementary, reverse appraisal has been recently proposed as a theory for the interpretation of emotion. Accordingly, people are argued to retrieve, from emotion displays, information about how others' are appraising the ongoing interaction, which then leads to inferences about the others' intentions. Reverse appraisal can, thus, be used to infer how human players, from their emotion displays, are appraising the game experience and, from this information, what their intentions in the game are. This information can then be used to adjust game parameters or have non-player characters react to the player's intentions and, thus, contribute to improve the player's overall experience.

56 The importance of cognition and affect for artificially intelligent decision makers.
de Melo, C., Gratch, J., Carnevale, P., Proceedings of the 28th Conference on Artificial Intelligence (AAAI 14), 2014
Show abstract

Agency – the capacity to plan and act – and experience – the capacity to sense and feel – are two critical aspects that determine whether people will perceive non-human entities, such as autonomous agents, to have a mind. There is evidence that the absence of either can reduce cooperation. We present an experiment that tests the necessity of both for cooperation with agents. In this experiment we manipulated people's perceptions about the cognitive and affective abilities of agents, when engaging in the ultimatum game. The results indicated that people offered more money to agents that were perceived to make decisions according to their intentions (high agency), rather than randomly (low agency). Additionally, the results showed that people offered more money to agents that expressed emotion (high experience), when compared to agents that did not (low experience). We discuss the implications of this agency-experience theoretical framework for the design of artificially intelligent decision makers.

57 Using virtual confederates to research intergroup bias and conflict.
de Melo, C., Carnevale, P., & Gratch, J., Best Paper Proceedings of the Annual Meeting of the Academy of Management (AOM 14), 2014
Show abstract

Virtual confederates–i.e., three-dimensional virtual characters that look and act like humans–have been gaining in popularity as a research method in the social and medical sciences. Interest in this research method stems from the potential for increased experimental control, ease of replication, facilitated access to broader samples and lower costs. We argue that virtual confederates are also a promising research tool for the study of intergroup behavior. To support this claim we replicate and extend with virtual confederates key findings in the literature. In Experiment 1 we demonstrate that people apply racial stereotypes to virtual confederates, and show a corresponding bias in terms of money offered in the dictator game. In Experiment 2 we show that people also show an in-group bias when group membership is artificially created and based on interdependence through shared payoffs in a nested social dilemma. Our results further demonstrate that social categorization and bias can occur not only when people believe confederates are controlled by humans (i.e., they are avatars), but also when confederates are believed to be controlled by computer algorithms (i.e., they are agents). The results, nevertheless, show a basic bias in favor of avatars (the in-group in the “human category”) to agents (the out-group). Finally, our results (Experiments 2 and 3) establish that people can combine, in additive fashion, the effects of these social categories; a mechanism that, accordingly, can be used to reduce intergroup bias. We discuss implications for research in social categorization, intergroup bias and conflict.

58. Bridging the gap between human and non-human decision makers.
de Melo, C., Carnevale, P., & Gratch, J., Annual Meeting of International Association for Conflict Management (IACM 14), 2014
Show abstract

59. Social categorization and cooperation between humans and computers.
de Melo, C., Carnevale, P., & Gratch, J., Annual Meeting of the Cognitive Science Society (CogSci 14), 2014
Show abstract

60 The effect of agency on the impact of emotion expressions on people's decision making.
de Melo, C., Gratch, J., Carnevale, P., Proceedings of the International Conference of Affective Computing and Intelligent Interaction (ACII 13), 2013
Show abstract

Recent research in neuroeconomics reveals that people show different behavior and lower activation of brain regions associated with mentalizing (i.e., the inference of other's mental states) when engaged in decision making tasks with a computer, when compared to a human. These findings are important for affective computing because they suggest people's decision making might be influenced differently according to whether they believe the emotional expressions shown by a computer are being generated by a computer algorithm or a human. To test this, we had people engage in a social dilemma (Experiment 1) or a negotiation (Experiment 2) with virtual humans that were either agents (i.e., controlled by computers) or avatars (i.e., controlled by humans). The results show a clear agency effect: in Experiment 1, people cooperated more with virtual humans that showed facial cooperative displays (e.g., joy after mutual cooperation) rather than competitive displays (e.g., joy when the participant was exploited) but, the effect was only significant with avatars; in Experiment 2, people conceded more to an angry than a neutral virtual human but, once again, the effect was only significant with avatars.

61 Agent or avatar? Using virtual confederates in conflict management research.
de Melo, C., Carnevale, P., & Gratch, J., Annual Meeting of the Academy of Management (AOM 13), 2013
Show abstract

Virtual confederates–i.e., three-dimensional virtual characters that look and act like humans–are used in a growing number of empirical studies, especially in the behavioral and medical sciences. The growing popularity of this research method stems from increased experimental control, ease of replication, facilitated access to broader samples and lower costs. In this paper we investigate the plausibility of virtual confederates for conducting research in conflict management. We posit that generality studies that compare findings with human and virtual confederates are required to determine the merits of virtual confederates. To accomplish this we present two novel studies where people engaged in a social dilemma (Experiment 1) and in a negotiation (Experiment 2) with virtual confederates that expressed emotions in their faces. Experiment 1 showed that people cooperated more with a virtual confederate that showed cooperative displays (e.g., smile in mutual cooperation) than one that showed competitive displays (e.g., smile after exploiting the participant). Experiment 2 showed that people conceded more to an angry virtual confederate than to a neutral one. These results comport with previous findings from similar studies with humans thus supporting the viability of virtual confederates as a research tool. Our results also reveal that virtual confederates are more successful in achieving social influence when participants are convinced that humans control the virtual images (i.e., the confederate is an avatar), rather than computer programs (i.e., the confederate is an agent). We discuss implications for research in conflict management.

62 Cooperative strategies with incongruent facial expressions cause cardiovascular threat.
Khooshabeh, P., de Melo, C., Volkman, B., Gratch, J., Blascovich, J., & Carnevale, P., Annual Meeting of the Cognitive Society (CogSci 13), 2013
Show abstract

Affect is important in motivated performance situations such as negotiation. Longstanding theories of emotion suggest that facial expressions provide enough information to perceive another person's internal affective state. Alternatively, the contextual emotion hypothesis posits that situational factors bias the perception of emotion in others' facial displays. This hypothesis predicts that individuals will have different perceptions of the same facial expression depending upon the context in which the expression is displayed. In this study, cardiovascular indexes of motivational states (i.e., challenge vs. threat) were recorded while players engaged in a multiissue negotiation where the opposing negotiator (confederate) displayed emotional facial expressions (angry vs. happy}, the confederate's negotiation strategy (cooperative vs. competitive) was factorially crossed with his facial expression. During the game, participants' eye fixations and cardiovascular responses, indexing task engagement and challenge/threat motivation, were recorded. Results indicated that participants playing confederates with incongruent facial expressions (e.g., cooperative strategy, angry face) exhibited a greater threat response, which arises due to increased uncertainty. Eye fixations also suggest that participants look at the face more in order to acquire information to reconcile their uncertainty in the incongruent condition. Taken together, these results suggest that context matters in the perception of emotion.

63 People's biased decisions to trust and cooperate with agents that express emotions.
de Melo, C., Carnevale, P., & Gratch, J., Trust Workshop at the Autonomous Agents and Multiagent Systems (AAMAS) Conference, 2013
Show abstract

Research in the behavioral sciences shows that emotion expressions impact people's decisions to trust and cooperate with others in situations where self and collective interests collide. Building on such findings, computer scientists have shown that emotion expressions in agents can also impact people's decision making. However, recent findings in neuroeconomics reveal that people systematically show different behavior and brain activation patterns in decision making tasks with computers, when compared to humans. These findings suggest a bias people might have with respect to autonomous agents and, in particular, agents that express emotions. To clarify this, the paper presents a novel experiment where participants engaged in the iterated prisoner's dilemma, for clear financial stakes, with counterparts, either agents or humans, that showed facial displays of emotion that were compatible with a cooperative (e.g., smile after mutual cooperation) or competitive (e.g., smile after exploiting the participant) goal orientation. The results showed that participants cooperated, as expected, more with cooperative than competitive counterparts but, also revealed that people trusted and cooperated more with a human that showed cooperative displays than an agent that showed the exact same displays. We discuss implications of such a bias for trust and cooperation in human-agent interaction.

64 The impact of emotion displays in embodied agents on emergence of cooperation with people.
de Melo, C., Carnevale, P., Gratch, J., Presence: Teleoperators and Virtual Environments Journal, 20(5), 449-465, 2012
Show abstract

Acknowledging the social functions of emotion in people, there has been growing interest in the interpersonal effect of emotion on cooperation in social dilemmas. This article explores whether and how facial displays of emotion in embodied agents impact cooperation with human users. The article describes an experiment where participants play the iterated prisoner's dilemma against two different agents that play the same strategy (tit-for-tat), but communicate different goal orientations (cooperative vs. individualistic) through their patterns of facial displays. The results show that participants are sensitive to differences in the emotion displays and cooperate significantly more with the cooperative agent. The results also reveal that cooperation rates are only significantly different when people play first with the individualistic agent. This is in line with the well-known black-hat/white-hat effect from the negotiation literature. However, this study emphasizes that people can discern a cooperator (white-hat) from a non-cooperator (black-hat) based only on emotion displays. We propose that people are able to identify the cooperator by inferring from the emotion displays, the agent's goals. We refer to this as reverse appraisal, as it reverses the usual process in which appraising relevant events with respect to one's goals leads to specific emotion displays. We discuss implications for designing human-computer interfaces and understanding human-human interaction.

65 Affective engagement to emotional facial expressions of embodied social agents in a decision-making game.
Choi, A., de Melo, C., Woo, W., & Gratch, J., Computer Animation and Virtual Worlds, 23(3-4), 331-342, 2012
Show abstract

Previous research illustrates that people can be influenced by the emotional displays of computer-generated agents. What is less clear is if these influences arise from cognitive or affective process (i.e., do people use agent displays as information or do they provoke user emotions). To unpack these processes, we examine the decisions and physiological reactions of participants (heart rate and electrodermal activity) when engaged in a decision task (prisoner's dilemma game) with emotionally expressive agents. Our results replicate findings that people's decisions are influenced by such emotional displays, but these influences differ depending on the extent to which these displays provoke an affective response. Specifically, we show that an individual difference known as electrodermal lability predicts the extent to whether people will engage affectively or strategically with such agents, thereby better predicting their decisions. We discuss implications for designing agent facial expressions to enhance social interaction between humans and agents.

66 The effect of virtual agent's emotion displays and appraisals on people's decision making in negotiation.
de Melo, C., Carnevale, P., & Gratch, J., Proceedings of The 12th International Conference on Intelligent Virtual Agents (IVA 12), 2012
Show abstract

There is growing evidence that emotion displays can impact people's decision making in negotiation. However, despite increasing interest in AI and HCI on negotiation as a means to resolve differences between humans and agents, emotion has been largely ignored. We explore how emotion displays in virtual agents impact people's decision making in human-agent negotiation. This paper presents an experiment (N=204) that studies the effects of virtual agents' displays of joy, sadness, anger and guilt on people's decision to counteroffer, accept or drop out from the negotiation, as well as on people's expectations about the agents' decisions. The paper also presents evidence for a mechanism underlying such effects based on appraisal theories of emotion whereby people retrieve, from emotion displays, information about how the agent is appraising the ongoing interaction and, from this information, infer about the agent's intentions and reach decisions themselves. We discuss implications for the design of intelligent virtual agents that can negotiate effectively

67 Bayesian model of the social effects of emotion in decision-making in multiagent systems.
de Melo, C., Carnevale, P., Read, S., Antos, D., & Gratch, J., Proceedings of Autonomous Agents and Multiagent Systems (AAMAS 12), 2012
Show abstract

Research in the behavioral sciences suggests that emotion can serve important social functions and that, more than a simple manifestation of internal experience, emotion displays communicate one's beliefs, desires and intentions. In a recent study we have shown that, when engaged in the iterated prisoner's dilemma with agents that display emotion, people infer, from the emotion displays, how the agent is appraising the ongoing interaction (e.g., is the situation favorable to the agent? Does it blame me for the current state-of-affairs?). From these appraisals people, then, infer whether the agent is likely to cooperate in the future. In this paper we propose a Bayesian model that captures this social function of emotion. The model supports probabilistic predictions, from emotion displays, about how the counterpart is appraising the interaction which, in turn, lead to predictions about the counterpart's intentions. The model's parameters were learnt using data from the empirical study. Our evaluation indicated that considering emotion displays improved the model's ability to predict the counterpart's intentions, in particular, how likely it was to cooperate in a social dilemma. Using data from another empirical study where people made inferences about the counterpart's likelihood of cooperation in the absence of emotion displays, we also showed that the model could, from information about appraisals alone, make appropriate inferences about the counterpart's intentions. Overall, the paper suggests that appraisals are valuable for computational models of emotion interpretation. The relevance of these results for the design of multiagent systems where agents, human or not, can convey or recognize emotion is discussed.

68 Reverse appraisal: The importance of appraisals for the effect of emotion displays on people's decision-making in a social dilemma.
de Melo, C., Carnevale, P., Read, S., & Gratch, J., Annual Meeting of the Cognitive Science Society (CogSci 12), 2012
Show abstract

Two studies are presented that explore the interpersonal effect of emotion displays in decision making in a social dilemma. Experiment 1 (N=405) showed that facial displays of emotion (joy, sadness, anger and guilt) had an effect on perception of how the person was appraising the social dilemma outcomes (perception of appraisals) and on perception of how likely the person was to cooperate in the future (perception of cooperation). Experiment 1 also showed that perception of appraisals (partially and, in some cases, fully) mediated the effect of emotion displays on perception of cooperation. Experiment 2 (N=202) showed that manipulating perception of appraisals, by expressing them textually, produced an effect on perception of cooperation thus, providing evidence for a causal model where emotion displays cause perception of appraisals which, in turn, cause perception of cooperation. In line with Hareli and Hess' (2010) findings and a social-functions view of emotion, we advance the reverse appraisal proposal that argues people can infer, from emotion displays, how others are appraising a situation which, in turn, support inferences that are relevant for decision making. We discuss implications of these results and proposal to decision and emotion theory.

69 A computer model of the interpersonal effect of emotion displayed in a social dilemma.
de Melo, C., Carnevale, P., Antos, D., & Gratch, J., Proceedings of Affective Computing and Intelligent Interaction (ACII 11), 2011
Show abstract

The paper presents a computational model for decision-making in a social dilemma that takes into account the other party's emotion displays. The model is based on data collected in a series of recent studies where participants play the iterated prisoner's dilemma with agents that, even though following the same action strategy, show different emotion displays according to how the game unfolds. We collapse data from all these studies and fit, using maximum likelihood estimation, probabilistic models that predict likelihood of cooperation in the next round given different features. Model 1 predicts based on round outcome alone. Model 2 predicts based on outcome and emotion displays. Model 3 also predicts based on outcome and emotion but, considers contrast effects found in the empirical studies regarding the order with which participants play cooperators and non-cooperators. To evaluate the models, we replicate the original studies but, substitute the humans for the models. The results reveal that Model 3 best replicates human behavior in the original studies and Model 1 does the worst. The results, first, emphasize recent research about the importance of nonverbal cues in social dilemmas and, second, reinforce that people attend to contrast effects in their decision-making. Theoretically, the model provides further insight into how people behave in social dilemmas. Pragmatically, the model could be used to drive an agent that is engaged in a social dilemma with a human (or another agent).

70 The effect of expression of anger and happiness in computer agents on negotiations with humans.
de Melo, C., Carnevale, P., & Gratch, J., Proceedings of Autonomous Agents and Multiagent Systems (AAMAS 11), 2011
Show abstract

There is now considerable evidence in social psychology, economics, and related disciplines that emotion plays an important role in negotiation. For example, humans make greater concessions in negotiation to an opposing human who expresses anger, and they make fewer concessions to an opponent who expresses happiness, compared to a no-emotion-expression control. However, in AI, despite the wide interest in negotiation as a means to resolve differences between agents and humans, emotion has been largely ignored. This paper explores whether expression of anger or happiness by computer agents, in a multi-issue negotiation task, can produce effects that resemble effects seen in human-human negotiation. The paper presents an experiment where participants play with agents that express emotions (anger vs. happiness vs. control) through different modalities (text vs. facial displays). An important distinction in our experiment is that participants are aware that they negotiate with computer agents. The data indicate that the emotion effects observed in past work with humans also occur in agent-human negotiation, and occur independently of modality of expression. The implications of these results are discussed for the fields of automated negotiation, intelligent virtual agents and artificial intelligence.

71 The influence of emotion expression on perceptions of trustworthiness in negotiation.
Antos, D., de Melo, C., Gratch, J., & Grosz, B., Proceedings of The 25th Conference on Artificial Intelligence (AAAI 11), 2011
Show abstract

When interacting with computer agents, people make inferences about various characteristics of these agents, such as their reliability and trustworthiness. These perceptions are significant, as they influence people's behavior towards the agents, and may foster or inhibit repeated interactions between them. In this paper we investigate whether computer agents can use the expression of emotion to influence human perceptions of trustworthiness. In particular, we study human-computer interactions within the context of a negotiation game, in which players make alternating offers to decide on how to divide a set of resources. A series of negotiation games between a human and several agents is then followed by a “trust game.” In this game people have to choose one among several agents to interact with, as well as how much of their resources they will trust to it. Our results indicate that, among those agents that displayed emotion, those whose expression was in accord with their actions (strategy) during the negotiation game were generally preferred as partners in the trust game over those whose emotion expressions and actions did not mesh. Moreover, we observed that when emotion does not carry useful new information, it fails to strongly influence human decision-making behavior in a negotiation setting.

72 Reverse appraisal: Inferring from emotion displays who is the cooperator and the competitor in a social dilemma.
de Melo, C., Carnevale, P., & Gratch, J., Annual Meeting of the Cognitive Science Society (CogSci 11), 2011
Show abstract

This paper explores whether and how facial displays of emotion can impact emergence of cooperation in a social dilemma. Three experiments are described where participants play the iterated prisoner's dilemma with (computer) players that display emotion. Experiment 1 compares a cooperative player, whose displays reflect a goal of mutual cooperation, with a control player that shows no emotion. Experiment 2 compares a competitive player, whose displays reflect a goal of getting more points than the participant, and the control player. Experiment 3 compares the cooperative and competitive players. Results show that people: cooperate more with the cooperative than the control player (Experiment 1}, do not cooperate differently with the competitive and control players (Experiment 2}, and, cooperate more with the cooperative than the competitive player, when they play the latter first (Experiment 3). In line with a social functions view of emotion, we argue people infer, from emotion displays, the other player's propensity to cooperate by reversing the emotion appraisal process. Post-game surveys show that people interpret the emotion displays according to appraisal variables (desirability, responsibility and controllability) in ways that are consistent with predictions from appraisal theories of emotion.

73 These are ours: The effects of ownership and groups on property negotiation.
Carnevale, P., Kim, Y., de Melo, C., Dehghani, M., & Gratch, J., Annual Conference of the International Association for Conflict Management (IACM 11), 2011
Show abstract

Ownership tends to affect negotiation by increasing the value that the negotiator places on the objects being negotiated. In this study, we invented a new computer-controlled negotiation task that presents negotiators pictures of objects on a screen and the negotiators grab the objects, or give them to an opponent, using a mouse. We experimentally varied ownership, telling negotiators in one case that they owned the objects (but needed the other's agreement on the distribution of the objects), or the other owned the objects (but their agreement was needed for distribution), or neither party owned the objects (and both had to agree on the distribution). We also varied whether negotiations were conducted by 3-person groups, or by individuals, and we varied the opponent's behavior in the negotiation (the other consistently demanded almost all the objects, hardly demanded any, or was totally responsive with a Tit-for-Tat strategy on the objects). We also varied the value of the objects, thus giving the task an integrative structure. One result was that groups were more likely than individuals to match the opponent's competitiveness, but only when ownership of the objects was undefined. Ownership, either self, or other, attenuated differences between groups and individuals, an effect not observable in studies that use abstract negotiation tasks or prisoner-dilemma–type games.

74 The influence of autonomic signals on perception of emotions in embodied agents.
de Melo, C., Kenny, P., & Gratch, J., Applied Artificial Intelligence, 24(6), 494-509, 2010
Show abstract

Specific patterns of autonomic activity have been reported when people experience emotions. Typical autonomic signals that change with emotion are wrinkles, blushing, sweating, tearing, and respiration. This article explores whether these signals can also influence the perception of emotion in embodied agents. The article first reviews the literature on specific autonomic signal patterns associated with certain affective states. Next, it proceeds to describe a real-time model for wrinkles, blushing, sweating, tearing, and respiration that is capable of implementing those patterns. Two studies are then described. In the first, subjects compare surprise, sadness, anger, shame, pride, and fear expressed in an agent with or without blushing, wrinkles, sweating, or tears. In the second, subjects compare excitement, relaxation, focus, pain, relief, boredom, anger, fear, panic, disgust, surprise, startle, sadness, and joy expressed in an agent with or without typical respiration patterns. The first study shows a statistically significant positive effect on perception of surprise, sadness, anger, shame, and fear. The second study shows a statistically significant positive effect on perception of excitement, pain, relief, boredom, anger, fear, panic, disgust, and startle. The relevance of these results to artificial intelligence and intelligent virtual agents is discussed.

75 Real-time expression of affect through respiration.
de Melo, C., Kenny, P., & Gratch, J., Computer Animation and Virtual Worlds, 21(3-4), 225-234, 2010
Show abstract

Affect has been shown to influence respiration in people. This paper takes this insight and proposes a real-time model to express affect through respiration in virtual humans. Fourteen affective states are explored: excitement, relaxation, focus, pain, relief, boredom, anger, fear, panic, disgust, surprise, startle, sadness, and joy. Specific respiratory patterns are described from the literature for each of these affective states. Then, a real-time model of respiration is proposed that uses morphing to animate breathing and provides parameters to control respiration rate, respiration depth and the respiration cycle curve. These parameters are used to implement the respiratory patterns. Finally, a within-subjects study is described where subjects are asked to classify videos of the virtual human expressing each affective state with or without the specific respiratory patterns. The study was presented to 41 subjects and the results show that the model improved perception of excitement, pain, relief, boredom, anger, fear, panic, disgust, and startle.

76 The influence of emotions in embodied agents on human decision-making.
de Melo, C., Carnevale, P., & Gratch, J., Proceedings of Intelligent Virtual Agents (IVA 10), 2010
Show abstract

Acknowledging the social functions that emotions serve, there has been growing interest in the interpersonal effect of emotion in human decision making. Following the paradigm of experimental games from social psychology and experimental economics, we explore the interpersonal effect of emotions expressed by embodied agents on human decision making. The paper describes an experiment where participants play the iterated prisoner's dilemma against two different agents that play the same strategy (tit-for-tat), but communicate different goal orientations (cooperative vs. individualistic) through their patterns of facial displays. The results show that participants are sensitive to differences in the facial displays and cooperate significantly more with the cooperative agent. The data indicate that emotions in agents can influence human decision making and that the nature of the emotion, as opposed to mere presence, is crucial for these effects. We discuss the implications of the results for designing human-computer interfaces and understanding human-human interaction.

77 Evolving expression of emotions through color in virtual humans using genetic algorithms.
de Melo, C., & Gratch, J., Proceedings of the 1st International Conference on Computational Creativity (ICCC 10), 2010
Show abstract

For centuries artists have been exploring the formal elements of art (lines, space, mass, light, color, sound, etc.) to express emotions. This paper takes this insight to explore new forms of expression for virtual humans which go beyond the usual bodily, facial and vocal expression channels. In particular, the paper focuses on how to use color to influence the perception of emotions in virtual humans. First, a lighting model and filters are used to manipulate color. Next, an evolutionary model, based on genetic algorithms, is developed to learn novel associations between emotions and color. An experiment is then conducted where non-experts evolve mappings for joy and sadness, without being aware that genetic algorithms are used. In a second experiment, the mappings are analyzed with respect to its features and how general they are. Results indicate that the average fitness increases with each new generation, thus suggesting that people are succeeding in creating novel and useful mappings for the emotions. Moreover, the results show consistent differences between the evolved images of joy and the evolved images of sadness.

78 Expression of emotions using wrinkles, blushing, sweating and tears.
de Melo, C., & Gratch, J., Proceedings of the Intelligent Virtual Agents (IVA 09), 2009
Show abstract

Wrinkles, blushing, sweating and tears are physiological manifestations of emotions in humans. Therefore, the simulation of these phenomena is important for the goal of building believable virtual humans which interact naturally and effectively with humans. This paper describes a real-time model for the simulation of wrinkles, blushing, sweating and tears. A study is also conducted to assess the influence of the model on the perception of surprise, sadness, anger, shame, pride and fear. The study follows a repeated-measures design where subjects compare how well is each emotion expressed by virtual humans with or without these phenomena. The results reveal a significant positive effect on the perception of surprise, sadness, anger, shame and fear. The relevance of these results is discussed for the fields of virtual humans and expression of emotions.

79 Expression of moral emotions in cooperating agents.
de Melo, C., Zheng, L., & Gratch, J., Proceedings of Intelligent Virtual Agents (IVA 09), 2009
Show abstract

Moral emotions have been argued to play a central role in the emergence of cooperation in human-human interactions. This work describes an experiment which tests whether this insight carries to virtual human-human interactions. In particular, the paper describes a repeated-measures experiment where subjects play the iterated prisoner's dilemma with two versions of the virtual human: (a) neutral, which is the control condition; (b) moral, which is identical to the control condition except that the virtual human expresses gratitude, distress, remorse, reproach and anger through the face according to the action history of the game. Our results indicate that subjects cooperate more with the virtual human in the moral condition and that they perceive it to be more human-like. We discuss the relevance these results have for building agents which are successful in cooperating with humans.

80 The effect of color on expression of joy and sadness in virtual humans.
de Melo, C., & Gratch, J., Proceedings of the Affective Computing and Intelligent Interaction (ACII 09), 2009
Show abstract

For centuries artists have been exploring color to express emotions. Following this insight, the paper describes an approach to learn how to use color to influence the perception of emotions in virtual humans. First, a model of lighting and filters inspired on the visual arts is integrated with a virtual human platform to manipulate color. Next, an evolutionary model, based on genetic algorithms, is created to evolve mappings between emotions and lighting and filter parameters. A first study is, then, conducted where subjects evolve mappings for joy and sadness without being aware of the evolutionary model. In a second study, the features which characterize the mappings are analyzed. Results show that virtual human images of joy tend to be brighter, more saturated and have more colors than images of sadness. The paper discusses the relevance of the results for the fields of expression of emotions and virtual humans.

81 Creative expression of emotions in virtual humans.
de Melo, C., & Gratch, J., Proceedings of the International Conference on the Foundations of Digital Games (FDG 09), 2009
Show abstract

We summarize our work on creative expression of emotion based on techniques from the arts.

82 Modeling gesticulation expression in virtual humans.
de Melo, C., & Paiva, A., N. Magnenat-Thalmann, L. Jain, & N. Ichalkaranje (Eds.), New Advances in Virtual Humans, 133-151, 2008
Show abstract

Gesticulation is the kind of unconscious, idiosyncratic and unconventional gestures humans do in conversation or narration. This chapter reviews efforts made to harness the expressiveness of gesticulation in virtual humans and proposes one such model. First, psycholinguistics research is overviewed so as to understand how gesticulation occurs in humans. Then, relevant computer graphics and computational psycholinguistics systems are reviewed. Finally, a model for virtual human gesticulation expression is presented which supports: (a) real-time gesticulation animation described as sequences of constraints on static (Portuguese Sign Language hand shapes, orientation palm axis, orientation angle and handedness) and dynamic features; (b) synchronization between gesticulation and synthesized speech; (c) automatic reproduction of annotations in GestuRA, a gesticulation transcription algorithm; (d) expression control through an abstract integrated synchronized language – Expression Markup Language (EML). Two studies, which were conducted to evaluate the model in a storytelling context, are also described.

83 Evolutionary expression of emotions in virtual humans using lights and pixels.
de Melo, C., & Paiva, A., J. Tao & T. Tan (Eds.), Affective Information Processing, 313-336, 2008
Show abstract

Artists express emotions through art. To accomplish this they rely on lines, shapes, textures, color, light, sounds, music, words and the body. The virtual humans field has been neglecting the kind of expression we see in the arts. In fact, researchers have tended to focus on gesture, face and voice for the expression of emotions. But why limit ourselves to the body? In this context, drawing on accumulated knowledge from the arts, this chapter describes an evolutionary model for the expression of emotions in virtual humans using lights, shadows, filters and composition. Lighting expression uses lighting techniques from the visual arts to convey emotions through the lights in the environment. Screen expression uses filters and composition to manipulate the virtual human's pixels themselves in a way akin to painting. Emotions are synthesized using the OCC model. To learn how to map affective states into lighting and screen expression, an evolutionary model which relies on genetic algorithms is used. The crossover and mutation operators generate alternatives for the expression of some affective state and a critic ensemble, composed of artificial and human critics, selects among the alternatives.

84 Evolving expression of emotions in virtual humans using lights and pixels.
de Melo, C., & Gratch, J., Proceedings of Intelligent Virtual Agents (IVA 08), 2008
Show abstract

We summarize our work on using genetic algorithms to evolve emotion expression through lighting and color.

85 Expression of emotions in virtual humans using lights, shadows, composition and filters.
de Melo, C., & Paiva, A., Proceedings of Affective Computing and Intelligent Interaction (ACII 07), 2007
Show abstract

Artists use words, lines, shapes, color, sound and their bodies to express emotions. Virtual humans use postures, gestures, face and voice to express emotions.Why are they limiting themselves to the body? The digital medium affords the expression of emotions using lights, camera, sound and the pixels in the screen itself. Thus, leveraging on accumulated knowledge from the arts, this work proposes a model for the expression of emotions in virtual humans which goes beyond embodiment and explores lights, shadows, composition and filters to convey emotions. First, the model integrates the OCC emotion model for emotion synthesis. Second, the model defines a pixel-based lighting model which supports extensive expressive control of lights and shadows. Third, the model explores the visual arts techniques of composition in layers and filtering to manipulate the virtual human pixels themselves. Finally, the model introduces a markup language to define mappings between emotional states and multimodal expression.

86 Multimodal expression in virtual humans.
de Melo, C., & Paiva, A., Computer Animation and Virtual Worlds, 17(3-4), 1-10, 2006
Show abstract

This work proposes a real-time virtual human multimodal expression model. Five modalities explore the affordances of the body: deterministic, non-deterministic, gesticulation, facial, and vocal expression. Deterministic expression is keyframe body animation. Non-deterministic expression is robotics-based procedural body animation. Vocal expression is voice synthesis, through Festival, and parameterization, through SABLE. Facial expression is lip-synch and emotion expression through a parametric muscle-based face model. Inspired by psycholinguistics, gesticulation expression is unconventional, idiosyncratic, and unconscious hand gestures animation described as sequences of Portuguese Sign Language hand shapes, positions and orientations. Inspired by the arts, one modality goes beyond the body to explore the affordances of the environment and express emotions through camera, lights, and music. To control multimodal expression, this work proposes a high-level integrated synchronized markup language—expressive markup language. Finally, three studies, involving a total of 197 subjects, evaluated the model in storytelling contexts and produced promising results.

87 Mainstream games in the multi-agent classroom.
de Melo, C., Prada, R., Raimundo, G., Pardal, J., Pinto, H., & Paiva, A., Proceedings of IEEE/WIC/ACM Intelligent Agent Technology (IAT 06), 2006
Show abstract

Computer games make learning fun and support learning through doing. Edutainment software tries to capitalize on this however, it has failed in reaching the levels of motivation and engagement seen in mainstream games. In this context, we have integrated a mainstream first-person shooter game, Counter-Strike, into the curriculum of our Autonomous Agents and Multi-agent Systems course. In this paper we describe this integration and a platform to support the creation of Counter-Strike agents. In addition, a questionnaire was posed to our students to assess the success of our approach. Results show that students found the idea of applying a first-person-shooter game motivating and the integration with the curriculum useful for their education.

88 A story about gesticulation expression.
de Melo, C., & Paiva, A., Proceedings of the Intelligent Virtual Agents Conference (IVA 06), 2006
Show abstract

Gesticulation is essential for the storytelling experience thus, virtual storytellers should be endowed with gesticulation expression. This work proposes a gesticulation expression model based on psycholinguistics. The model supports: (a) real-time gesticulation animation described as sequences of constraints on static (Portuguese Sign Language hand shapes, orientations and positions) and dynamic (motion profiles) features; (b) multimodal synchronization between gesticulation and speech; (c) automatic reproduction of annotated gesticulation according to GestuRA, a gesture transcription algorithm. To evaluate the model two studies, involving 147 subjects, were conducted. In both cases, the idea consisted of comparing the narration of the Portuguese traditional story “The White Rabbit” by a human storyteller with a version by a virtual storyteller. Results indicate that synthetic gestures fared well when compared to real gestures however, subjects preferred the human storyteller.

89 Environment expression: Expressing emotions through cameras, lights and music.
de Melo, C., & Paiva, A., Proceedings of Affective Computing and Intelligent Agents (ACII 05), 2005
Show abstract

Environment expression is about going beyond the usual Human emotion expression channels in virtual worlds. This work proposes an integrated storytelling model – the environment expression model – capable of expressing emotions through three channels: cinematography, illumination and music. Stories are organized into prioritized points of interest which can be characters or dialogues. Characters synthesize cognitive emotions based on the OCC emotion theory. Dialogues have collective emotional states which reflect the participants' emotional state. During storytelling, at each instant, the highest priority point of interest is focused through the expression channels. The cinematography channel and the illumination channel reflect the point of interest's strongest emotion type and intensity. The music channel reflects the valence of the point of interest's mood. Finally, a study was conducted to evaluate the model. Results confirm the influence of environment expression on emotion perception and reveal moderate success of this work's approach.

90 Environment expression: Telling stories through cameras, lights and music.
de Melo, C., & Paiva, A., Proceedings of The International Conference on Virtual Storytelling (ICV 05), 2005
Show abstract

This work proposes an integrated model – the environment expression model – which supports storytelling through three channels: cinematography, illumination and music. Stories are modeled as a set of points of interest which can be characters, dialogues or sceneries. At each instant, audience's focus is drawn to the highest priority point of interest. Expression channels reflect the type and emotional state of this point of interest. A study, using a cartoon-like application, was also conducted to evaluate the model. Results were inconclusive regarding influence on story interpretation but, succeeded in showing preference for stories told with environment expression.