GCOTraj: A storage approach for historical trajectory data sets using grid cells ordering

Research paper by Shengxun Yang, Zhen He; Yi-Ping Phoebe Chen

Indexed on: 01 Jun '18Published on: 28 May '18Published in: Information Sciences


Publication date: August 2018 Source:Information Sciences, Volume 459 Author(s): Shengxun Yang, Zhen He, Yi-Ping Phoebe Chen Vast amounts of trajectory data have been collected due to the popularity of GPS devices. Analyzing this wealth of data is important, thus highlighting the need to efficiently index and store this large amount of data on secondary storage to allow for efficient retrieval. Existing approaches index trajectories based on data partitioning index structures such as R-trees or space partitioning index structures such as quad-trees. R-tree like data structures, when used for indexing trajectories, result in large overlapping minimum bounding boxes and are therefore inefficient for the indexing and storage of large trajectory data sets. Existing approaches based on space partitioning do not allow the tradeoff of time versus space constraints in a way that is sensitive to query patterns. This paper proposes a new indexing and storage approach called GCOTraj, which partitions a large spatio-temporal data space into multi-dimensional grid cells and orders these grid cells in two different ways, first via traditional space filling curves which are not sensitive to query patterns; and second, via the Graph-Based Ordering approach (GBO) which is a state-of-the-art workload-based ordering technique for multidimensional data. GCOTraj clusters and stores trajectories to secondary storage based on the ordering produced by ordering algorithms. A good ordering will result in less disk seeks when retrieving disk blocks to answer a query. In addition, GCOTraj uses an index to spot the targeted data on disk and reduce the redundant data retrieved therefore reducing disk IO. Extensive experiments suggest that GCOTraj outperforms the state-of-the-art trajectory storage scheme TrajStore by a factor of up to 16.07 in IO time to answer range queries.