Research Interests in Databases
My primary research area is in query languages and query optimization
techniques for object-oriented databases.
Commercial database systems offer a high degree of data independence
but they provide limited modeling and expressive power. However, modern
database applications require more advanced data structures and
more expressive operations than those provided by the relational
model. Object-oriented databases (OODBs) provide powerful data
abstractions and modeling facilities but they usually lack a suitable
framework for query processing and optimization.
My research work on databases is aimed at
- developing query algebras and calculi for OODBs that have
enough modeling power to capture multiple collection types, such as
trees and arrays, and that facilitate query evaluation and optimization;
- developing extensible query optimizers for these algebras.
The following summarizes my most recent research projects.
- Algebraic Query Optimization.
- My doctoral dissertation,
supervised by Professor David Stemple,
is chiefly aimed at generating
efficient database systems from high-level specifications. In my
thesis,
I developed a query algebra that supports a very expressive
higher-order traversal mechanism, called the fold operation,
that can be defined for many different collection types (such as a
set, a list, or a tree) [DBPL93].
If some simple, easily recognizable,
syntactic restrictions to this algebra are imposed, then some very
nice properties result. For example, any operation expressed in this
algebra can always be reduced, by a very simple and efficient
algorithm, to a canonical form. This algorithm is called the
normalization algorithm. This canonical form is, in a way, optimal,
since programs expressed in that form do not materialize any
intermediate data structure (which might be generated when expressions
are nested, passing intermediate results from one to another), and
have fewer nested loops than the original programs. The normalization
algorithm is used as the basis for a query optimizer that maps queries
(expressed as folds over the conceptual structures) to physical
algorithms (expressed as folds over the physical structures).
- An Effective Framework for Processing OODB Languages.
- I am a co-principal investigator of a new NSF project
[NSF95],
which is focused on query algebras and optimization techniques for OODB languages.
David Maier,
the principal investigator of this project, and I
introduced a new query calculus, called the monoid comprehension calculus,
based on monoids [SIGMOD95,
OGI94].
A monoid is a general template for a data
type. It can capture most collection and aggregate operators currently
in use for relational and object-oriented databases. Monoid comprehensions
- similar to set-former notation but applicable to
types other than sets - give us a uniform way to express queries that
simultaneously deal with more than one collection type and also
naturally compose in a way that mirrors the allowable query nesting in
many recent object-oriented database languages, such as the object
query language (OQL) of ODMG-93.
The monoid comprehension calculus
has sufficient expressive power to represent most language constructs
present in these languages. To demonstrate the effectiveness of this
calculus, we proposed a framework for physical database design that
automates the query translation process
[DBPL95].
In this framework, the
physical database design is specified in a declarative manner. This
specification is used for generating an efficient query transformer
that translates logical queries into programs that manipulate the
physical database. The physical design language consists of a small
but extensible repertoire of commands for specifying the
implementation for various parts of a database. The query translator
uses these commands to translate queries against the conceptual
database into queries against the physical database. Our physical
design language captures most recent proposals for OODB physical
designs, including clustering, horizontal and vertical partitioning,
normalization, join indices, and multiple access paths via secondary
indices. Future research includes building a query processor for the
object query language (OQL) of the ODMG-93
standard based on this framework.
- Extensible Query Optimizers.
- Another research work at OGI was
focused on architectures for extensible query optimizers. It was part
of the EREQ
project and it is funded by ARPA. The long-term goal of
this project was to develop an architecture for query processing in
persistent object bases and object-oriented databases. This
architecture consists of three components: an object query language
and algebra, a storage management system, and a query optimization
module. David Maier
and I were leading the research for the
development of the query optimizer. We have already developed a
framework for specifying rule-based query optimizers in declarative
form [DOOD93].
This framework is reminiscent of inherited and synthesized
attributes for attribute grammars, and it is expressive of a wide
range of information: logical and physical properties, both desired
and delivered, cost estimates, optimization contexts, and control
strategies. Moreover, we have developed a mechanism for processing
optimizer specifications, which is based on compile-time reflection.
This mechanism proves to be succinct and flexible, allowing
modifications of the specification syntax, incorporation of new
capabilities into generated optimizers, and retargeting the
translation to a variety of optimization frameworks. The optimizer
specification tool has been used by me and my colleagues at OGI in
specifying and implementing a number of query optimizers. I have
already developed four experimental optimizers: one for a subset of
the SQL query language, one for an OODB language (based on TI's open
OODB architecture), one for a pure algebraic language based on
iterations over lists, and one for the the EREQ query algebra. The
latter optimizer is capable of optimizing almost all ODMG'93 OQL
queries, by mapping object traversals into efficient pointer joins.
Back to Fegaras' Home Page