Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers

Gashi, I., Popov, P. T. & Strigini, L. (2007). Fault tolerance via diversity for off-the-shelf products: A study with SQL database servers. IEEE Transactions on Dependable and Secure Computing, 4(4), pp. 280-294. doi: 10.1109/TDSC.2007.70208

[img]
Preview
PDF
Download (425kB) | Preview

Abstract

If an off-the-shelf software product exhibits poor dependability due to design faults, then software fault tolerance is often the only way available to users and system integrators to alleviate the problem. Thanks to low acquisition costs, even using multiple versions of software in a parallel architecture, which is a scheme formerly reserved for few and highly critical applications, may become viable for many applications. We have studied the potential dependability gains from these solutions for off-the-shelf database servers. We based the study on the bug reports available for four off-the-shelf SQL servers plus later releases of two of them. We found that many of these faults cause systematic noncrash failures, which is a category ignored by most studies and standard implementations of fault tolerance for databases. Our observations suggest that diverse redundancy would be effective for tolerating design faults in this category of products. Only in very few cases would demands that triggered a bug in one server cause failures in another one, and there were no coincident failures in more than two of the servers. Use of different releases of the same product would also tolerate a significant fraction of the faults. We report our results and discuss their implications, the architectural options available for exploiting them, and the difficulties that they may present.

Item Type: Article
Additional Information: © 2007 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other users, including reprinting/ republishing this material for advertising or promotional purposes, creating new collective works for resale or redistribution to servers or lists, or reuse of any copyrighted components of this work in other works.
Uncontrolled Keywords: fault tolerance, reliability, availability, serviceability, relational databases, error processing, design diversity, COTS software, fault records, noncrash failures, database availability, experimental results, SOFTWARE, SYSTEMS
Subjects: Q Science > QA Mathematics > QA76 Computer software
Divisions: School of Informatics > Centre for Software Reliability
URI: http://openaccess.city.ac.uk/id/eprint/518

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year

View more statistics