Research Interests in Databases

My primary research area is in query languages and query optimization techniques for object-oriented databases.

Commercial database systems offer a high degree of data independence but they provide limited modeling and expressive power. However, modern database applications require more advanced data structures and more expressive operations than those provided by the relational model. Object-oriented databases (OODBs) provide powerful data abstractions and modeling facilities but they usually lack a suitable framework for query processing and optimization.

My research work on databases is aimed at

The following summarizes my most recent research projects.

Algebraic Query Optimization.
My doctoral dissertation, supervised by Professor David Stemple, is chiefly aimed at generating efficient database systems from high-level specifications. In my thesis, I developed a query algebra that supports a very expressive higher-order traversal mechanism, called the fold operation, that can be defined for many different collection types (such as a set, a list, or a tree) [DBPL93]. If some simple, easily recognizable, syntactic restrictions to this algebra are imposed, then some very nice properties result. For example, any operation expressed in this algebra can always be reduced, by a very simple and efficient algorithm, to a canonical form. This algorithm is called the normalization algorithm. This canonical form is, in a way, optimal, since programs expressed in that form do not materialize any intermediate data structure (which might be generated when expressions are nested, passing intermediate results from one to another), and have fewer nested loops than the original programs. The normalization algorithm is used as the basis for a query optimizer that maps queries (expressed as folds over the conceptual structures) to physical algorithms (expressed as folds over the physical structures).

An Effective Framework for Processing OODB Languages.
I am a co-principal investigator of a new NSF project [NSF95], which is focused on query algebras and optimization techniques for OODB languages. David Maier, the principal investigator of this project, and I introduced a new query calculus, called the monoid comprehension calculus, based on monoids [SIGMOD95, OGI94]. A monoid is a general template for a data type. It can capture most collection and aggregate operators currently in use for relational and object-oriented databases. Monoid comprehensions - similar to set-former notation but applicable to types other than sets - give us a uniform way to express queries that simultaneously deal with more than one collection type and also naturally compose in a way that mirrors the allowable query nesting in many recent object-oriented database languages, such as the object query language (OQL) of ODMG-93. The monoid comprehension calculus has sufficient expressive power to represent most language constructs present in these languages. To demonstrate the effectiveness of this calculus, we proposed a framework for physical database design that automates the query translation process [DBPL95]. In this framework, the physical database design is specified in a declarative manner. This specification is used for generating an efficient query transformer that translates logical queries into programs that manipulate the physical database. The physical design language consists of a small but extensible repertoire of commands for specifying the implementation for various parts of a database. The query translator uses these commands to translate queries against the conceptual database into queries against the physical database. Our physical design language captures most recent proposals for OODB physical designs, including clustering, horizontal and vertical partitioning, normalization, join indices, and multiple access paths via secondary indices. Future research includes building a query processor for the object query language (OQL) of the ODMG-93 standard based on this framework.

Extensible Query Optimizers.
Another research work at OGI was focused on architectures for extensible query optimizers. It was part of the EREQ project and it is funded by ARPA. The long-term goal of this project was to develop an architecture for query processing in persistent object bases and object-oriented databases. This architecture consists of three components: an object query language and algebra, a storage management system, and a query optimization module. David Maier and I were leading the research for the development of the query optimizer. We have already developed a framework for specifying rule-based query optimizers in declarative form [DOOD93]. This framework is reminiscent of inherited and synthesized attributes for attribute grammars, and it is expressive of a wide range of information: logical and physical properties, both desired and delivered, cost estimates, optimization contexts, and control strategies. Moreover, we have developed a mechanism for processing optimizer specifications, which is based on compile-time reflection. This mechanism proves to be succinct and flexible, allowing modifications of the specification syntax, incorporation of new capabilities into generated optimizers, and retargeting the translation to a variety of optimization frameworks. The optimizer specification tool has been used by me and my colleagues at OGI in specifying and implementing a number of query optimizers. I have already developed four experimental optimizers: one for a subset of the SQL query language, one for an OODB language (based on TI's open OODB architecture), one for a pure algebraic language based on iterations over lists, and one for the the EREQ query algebra. The latter optimizer is capable of optimizing almost all ODMG'93 OQL queries, by mapping object traversals into efficient pointer joins.


Back to Fegaras' Home Page