Liat Ein-Dor, Y. Goldschmidt, et al.
IBM J. Res. Dev
In response to the strong desire of customers to be provided with advance notice of unplanned outages, techniques were developed that detect the occurrence of software aging due to resource exhaustion, estimate the time remaining until the exhaustion reaches a critical level, and automatically perform proactive software rejuvination of an application, process group, or entire operating system. The resulting techniques are very general and can capture a multitude of cluster system characteristics, failure behavior, and performability measures.
Liat Ein-Dor, Y. Goldschmidt, et al.
IBM J. Res. Dev
Fan Jing Meng, Ying Huang, et al.
ICEBE 2007
Michael D. Moffitt
ICCAD 2009
Anupam Gupta, Viswanath Nagarajan, et al.
Operations Research