Indexed on: 01 May '04Published on: 01 May '04Published in: Distributed and Parallel Databases
The advent of the Internet and the Web and their subsequent ubiquity have brought forth opportunities to connect information sources across all types of boundaries (local, regional, organizational, etc.). Examples of such information sources include databases, XML documents, and other unstructured sources. Uniformly querying those information sources has been extensively investigated. A major challenge relates to query optimization. Indeed, querying multiple information sources scattered on the Web raises several barriers for achieving efficiency. This is due to the characteristics of Web information sources that include volatility, heterogeneity, and autonomy. Those characteristics impede a straightforward application of classical query optimization techniques. They add new dimensions to the optimization problem such as the choice of objective function, selection of relevant information sources, limited query capabilities, and unpredictable events. In this paper, we survey the current research on fundamental problems to efficiently process queries over Web data integration systems. We also outline a classification for optimization techniques and a framework for evaluating them.