How Cosmopolitan Are Emojis?
Exploring Emojis Usage and Meaning over Different Languages with Distributional Semantics
Francesco Barbieri*, German Kruszewski+, Francesco Ronzano* and Horacio Saggion*
In the proceedings of the
ACM Multimedia Conference for 2016
Download the paper
Choosing the right emoji to visually complement or condense the meaning of a message has become part of our daily life. Emojis are pictures, which are naturally combined with plain text, thus creating a new form of language. These pictures are the same independently of where we live, but they can be interpreted and used in different ways. In this paper we compare the meaning and the usage of emojis across different languages. Our results suggest that the overall semantics of the subset of the emojis we studied is preserved across all the languages we analysed. However, some emojis are interpreted in a different way from language to language, and this could be related to socio-geographical differences.
We gathered a dataset of about 4,5 Million tweets containing at least one emoji and geolocated in USA, United Kingdom, Spain or Italy. In order to compare the use and the meaning of the emojis over different languages we create semantic models of emojis for American English, British English, Spanish and Italian using the
skip-gram algorithm. The following figure shows the t-SNE (300->2) visualisation of the 150 most frequent emojis for each language. Similar emojis get clustered together, even if sometimes t-SNE might be
misleading...
In the paper we compare the four vecotors spaces more formally, but even just looking at the t-SNEs we can see interesting patterns. For example, it seems that Twitter users from UK associate the

emoji with beaches and weather (like

,

, and

), while other (mediterranean) contries like Spain and Italy don't.
Also the emoji

is used differetly by UK users, as it's similar to Christmas emojis, while in the other languages

is just another tree (clusterized close to the vegetation like emojis).
There are many emojis that are used differently like

and

you can spot the differences in the graphs!
However, it seems that the semantics of the emojis is similar, and that most of the emojis are used in the same way over the four languages, like

and

,
but also

and

!
(You might be also interested in
this work where we study how to build good vector spaces for emojis).
Downloads
You can find
here a Google sheet documents containing all the results of our experiments that we could't fit in the paper.
The dimension of the emoji vectors is 300. We trained the embedding using all the tokens words, emojis and punctuation (only links are removed).
In these models, the emojis character have been replaced by
eojiCODE where
CODE is the unicode of the emoji (for instance eoji1f60e is
this emoji). See
this for the possible Twitter emojis.
Gensim binaries
Use the gensim library to load them.
United States of America
United Kingdom
Spain
Italy
If you use this resources, please cite us

@inproceedings{barbieri2016cosmopolitan,
title={How Cosmopolitan Are Emojis?: Exploring Emojis Usage and Meaning over Different Languages with Distributional Semantics},
author={Barbieri, Francesco and Kruszewski, German and Ronzano, Francesco and Saggion, Horacio},
booktitle={Proceedings of the 2016 ACM on Multimedia Conference},
pages={531--535},
year={2016},
organization={ACM}
}