Geographical Topic Discovery and Comparison


Abstract
We studied the problem of discovering and comparing geographical topics from GPS-associated documents. GPS- associated documents become popular with the pervasive- ness of location-acquisition technologies. For example, in Flickr, the geo-tagged photos are associated with tags and GPS locations. In Twitter, the locations of the tweets can be identified by the GPS locations from smart phones. Many interesting concepts, including cultures, scenes, and product sales, correspond to specialized geographical distributions. In this work, we are interested in two questions: (1) how to discover different topics of interests that are coherent in geo- graphical regions? (2) how to compare several topics across different geographical locations? To answer these questions, this work proposes and compares three ways of modeling ge- ographical topics: location-driven model, text-driven model, and a novel joint model called LGTA (Latent Geographical Topic Analysis) that combines location and text. To make a fair comparison, we collect several representative datasets from Flickr website including Landscape, Activity, Manhat- tan, National park, Festival, Car, and Food. The results show that the first two methods work in some datasets but fail in others. LGTA works well in all these datasets at not only finding regions of interests but also providing ef- fective comparisons of the topics across different locations. The results confirm our hypothesis that the geographical distributions can help modeling topics, while topics provide important cues to group diŽerent geographical regions.
Citation
Zhijun Yin, Liangliang Cao, Jiawei Han, Chengxiang Zhai, and Thomas S. Huang
Geographical Topic Discovery and Comparison
World Wide Web(WWW), Hyderabad, India, March 28- April. 1, 2011. [pdf]

Working Flow


Model


Geographical Topics of Food


Geographical Topics of Landscapes


Geographical Topics of Cars