Advanced Information Management Research Group
Database technology has been extended into many new application areas such as spatio-temporal systems, stream and sensor database systems. Such applications bring with them demands for richer information models, and capabilities to handle larger quantities of heterogeneous data. All of these factors combine to produce a need for improved performance and additional functionality. Faculty and students at the Computer Science Dept. at UNO are actively engaged in research in many such areas including geospatial and spatio-temporal data management, and stream and sensor data management. This research utilizes UNO's 62-node Beowulf cluster and benefits from a close relationship with Naval Research Lab at the Stennis Space Center and the New Orleans District of US Army Corps of Engineers (USACE).
Temporal databases and spatial databases have long been separate, important areas of database research. It is only in recent years that a strong interaction between the two fields has taken place. This is motivated by the well established fact that many real-life problems require spatio-temporal concepts that go beyond traditional research in spatial and temporal databases. The primary goal of spatio-temporal databases is the modeling of a dynamic world, which involves objects whose position, shape and size change over time. Real life examples that need to handle spatio-temporal data include storage and manipulation of ship and plane trajectories, battlefield monitoring, fire or hurricane front monitoring and weather forecasting. Such databases need underlying systems with extended features like query languages, data models, and indexing methods, as compared to traditional databases, mainly because of the complexity of representing and retrieving data. Dr. Abdelguerfi's current work includes the design and implementation of spatio-temporal indexing schemes and the efficient processing of spatial network proximity queries. The Geospatial Information Database (GIDB) Portal System is an outcome of joint research between UNO and NRL's DMAP. investigates native file system support for DEBs, which has a number of benefits over ad hoc modification of digital evidence bags. We are developing an API for DEB-enabled applications and methods for providing DEB access to legacy applications through a DEB-aware file system. This work addresses an urgent need for digital-forensics-aware operating system components that can enhance the consistency, security and performance of investigations.
The size of many geospatial and spatio-temporal databases has grown exponentially in recent years. This increase in size brings with it an increased requirement for additional CPU and I/O resources to handle the querying, retrieval, and viewing of this data. To achieve the required performance levels, large database systems have been increasingly required to make use of parallelism. The parallel shared-nothing architecture, in which the main source of parallelism is partitioned parallelism, is ideally suited for such tasks. Beowulf compute clusters are probably the best-known example of shared nothing machines in existence today. UNO's 62-node Beowulf cluster is currently being used as a parallel geospatial database engine that allows for the seamless viewing, querying, and retrieval of desired geospatial data in a parallel fashion i.e. utilizing the compute and I/O resources of multiple nodes in the cluster.
Researchers in data management have recently recognized the importance of a new class of data-intensive applications. Data for these applications are best modeled via streams of data that require continuous monitoring, rather than persistent data stored in relational tables. Many applications require management of data streams. These include finance, security, and sensor monitoring. Management of data streams poses a number of interesting challenges that are not met by traditional database management systems (DBMS). This has led to the development of Data Stream Management Systems (DSMS) that provide adequate mechanisms to query data streams, monitor data streams and alert humans (or other software) of abnormal activity. An additional requirement for DSMS is to cater to real-time requirements since the decisions made on the basis of the data need to be acted upon in close to real-time. Drs. Chaudhry. Abdelguerfi and Fu are actively involved in stream data management research. This research investigates issues in both system development and algorithm development.
In DSMS main memory indexes are created on data streams. Such indexes undergo very frequent insertion and deletion. This dynamic nature of the index is fundamentally different from traditional disk-based indexing in DBMS, where data is relatively static. Dr. Chaudhry's current work investigates efficient indexing structures for data streams. In particular, techniques are being developed so an index on a data stream is restructured based on stochastic properties of that data stream.
Drs. Bin Fu and Abdelguerfi are interested in developing streaming algorithms for some important problems. Streaming algorithms are becoming more and more important due to the development and the rapid growth of the Internet. These algorithms have applications in many areas such as on-line detection of frequent moments in database, fast identification of timing statistics about networking traffic streams, and real-time discovery of anomalies in video surveillance streams. In the model of streaming computation, an algorithm reads the input data items (or points) one by one in order of their arrival, but the algorithm cannot save all of the input data. It only has some rough sketch about the input data. So, a streaming algorithm is in essence a sub-linear space algorithm. Due to such a sub-linear space requirement, it is often challenging to design efficient streaming algorithms for practical applications. Recently, Dr. Fu applied his width-bounded geometric separator theory to streaming algorithm design. He also works on the lower bound theory of streaming computation, in which he uses his background of complexity theory.
Continuing advances in miniaturization of sensors and accompanying reduction in their cost has led to significant interest in wireless sensor networks (WSN). Applications using WSN require mechanisms to efficiently query and process the data produced by sensors that are distributed over the network. Over the last twenty-five years, researchers in distributed database systems have been developing techniques for processing data distributed over a network. However, there are major differences between WSN and traditional distributed database systems that require major work in adapting existing techniques to WSN and creating new techniques more suitable for querying data in WSN. Aspects such as resource constraints, intermittent network connectivity, vastly differing devices, etc. are intrinsic to WSN but weren't important to traditional distributed database systems. Dr. Chaudhry is investigating execution of join queries in WSN and is examining how the implementation of the traditional join operator in distributed database systems needs to be modified to account for the new aspects specific to WSN.
Sample publications:
- Abdelguerfi, M., Wong, K. F., (Editors), Parallel Database Techniques, IEEE Computer Science Press, "Advances" Series, July 1998.
- Abdelguerfi, M., Lavington, S. (Editors), Emerging Trends in Database and Knowledge-Base Machines: the application of parallel architectures to smart information systems, IEEE Computer Science Press, "Advances" Series, 276 pages, 1995.
- Abdelguerfi, M, et al., A High Performance System for Viewing, Querying, and Retrieval of Geospatial Data Across a Beowulf Cluster, Sixth International Meeting on High Performance Computing for Computational Science Spain, June 2004.
- Abdelguerfi, M., Sood, K., Computational Complexity of Sorting and Joining Relations with Duplicates, IEEE Trans on Knowledge and Data Engineering, 1991, pp.496-503.
- Abdelguerfi, M., Ioup, E., Shaw, K., Sample, J., Tabone, O., Computing Spatial Network Queries using the M-Tree with Road Network Embedding, submitted for publication .
- Abdelguerfi, M., Chen, Z., Fu, B., Almost Tight Bounds on Space Complexity of Approximation Streaming Algorithms for the k-Center Problem, submitted for publication.
- Shaw, K., Sample, J., Abdelguerfi, M., Mahadevan, V., An R*-tree Based Semi-Dynamic Clustering Method for the Efficient Processing of Spatial Join in a Shared-Nothing Parallel Database System, accepted for publication, 7th International Meeting High Performance Computing for Computational Science, Brazil, July 2006.
- Ladner, R, Shaw, K., Abdelguerfi, M: (Editors), Mining Spatio-Temporal Information Systems, Kluwer Academic Publishers, 2002.
- Abdelguerfi, M., Givaudan, J., Shaw, K., Ladner, R., The 2-3TR-tree, A Trajectory-Oriented Index Structure for Fully Evolving Valid-Time Spatio-Temporal Datasets, The 10th ACM Int. Symp. on Advances in GIS, VA, 2002.
- Digital Mapping, Charting and Geodesy Analysis Program (DMAP) Team, http://dmap.nrlssc.navy.mil
- N. Chaudhry, Issues in Stream Data Management, chapter in Stream Data Managment.
- N. Chaudhry, K. Shaw, M. Abdelguerfi, (editors) Stream Data Management, Springer., March 2005.
- N. Chaudhry, Performance of Index Structures for Stochastic Streams, in preparation.
- N. Chaudhry, Execution of Join Queries in Sensor Networks. Funding proposal submitted to LaSPACE-NASA DART Program 2006 Round.
- Bin Fu, Theory and Application of Width Bounded Geometric Separator, In Proceedings of the 23rd International Symposium on Theoretical Aspects of Computer Science (STACS'06), 2006, Lecture Notes in Computer Science 3884, Springer, pp. 277-288.
- Richard Beigel and Bin Fu, Circuits Over PP and PL, Journal of Computer and Systems Sciences, 60(2000), pp. 422-441.
- Bin Fu, With Quasi-linear Queries EXP is Not Polynomial-time Turning Reducible to Sparse Sets. SIAM Journal on Computing, October, 1995, pp.1082-1090.
- Bin Fu, Closeness of NP-hard Sets to Other Complexity Classes. SIAM Journal on Computing, 1994, pp.255-260.
