License: Creative Commons Attribution 3.0 Unported license (CC BY 3.0)
When quoting this document, please refer to the following
DOI: 10.4230/OASIcs.ICCSW.2018.8
URN: urn:nbn:de:0030-drops-101899
Go to the corresponding OASIcs Volume Portal

Alnafessah, Ahmad ; Casale, Giuliano

Anomaly Detection for Big Data Technologies

OASIcs-ICCSW-2018-8.pdf (0.2 MB)


The main goal of this research is to contribute to automated performance anomaly detection for large-scale and complex distributed systems, especially for Big Data applications within cloud computing. The main points that we will investigate are:
- Automated detection of anomalous performance behaviors by finding the relevant performance metrics with which to characterize behavior of systems.
- Performance anomaly localization: To pinpoint the cause of a performance anomaly due to internal or external faults.
- Investigation of the possibility of anomaly prediction. Failure prediction aims to determine the possible occurrences of catastrophic events in the near future and will enable system developers to utilize effective monitoring solutions to guarantee system availability.
- Assessment for the potential of hybrid methods that combine machine learning with traditional methods used in performance for anomaly detection.
The topic of this research proposal will offer me the opportunity to more deeply apply my interest in the field of performance anomaly detection and prediction by investigating and using novel optimization strategies. In addition, this research provides a very interesting case of utilizing the anomaly detection techniques in a large-scale Big Data and cloud computing environment. Among the various Big Data technologies, in-memory processing technology like Apache Spark has become widely adopted by industries as result of its speed, generality, ease of use, and compatibility with other Big Data systems. Although Spark is developing gradually, currently there are still shortages in comprehensive performance analyses that specifically build for Spark and are used to detect performance anomalies. Therefore, this raises my interest in addressing this challenge by investigating new hybrid learning techniques for anomaly detection in large-scale and complex systems, especially for in-memory processing Big Data platforms within cloud computing.

BibTeX - Entry

  author =	{Ahmad Alnafessah and Giuliano Casale},
  title =	{{Anomaly Detection for Big Data Technologies}},
  booktitle =	{2018 Imperial College Computing Student Workshop (ICCSW 2018)},
  pages =	{8:1--8:1},
  series =	{OpenAccess Series in Informatics (OASIcs)},
  ISBN =	{978-3-95977-097-2},
  ISSN =	{2190-6807},
  year =	{2019},
  volume =	{66},
  editor =	{Edoardo Pirovano and Eva Graversen},
  publisher =	{Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
  address =	{Dagstuhl, Germany},
  URL =		{},
  URN =		{urn:nbn:de:0030-drops-101899},
  doi =		{10.4230/OASIcs.ICCSW.2018.8},
  annote =	{Keywords: Performance anomalies, Apache Spark, Neural Network, Resilient Distributed Dataset (RDD)}

Keywords: Performance anomalies, Apache Spark, Neural Network, Resilient Distributed Dataset (RDD)
Collection: 2018 Imperial College Computing Student Workshop (ICCSW 2018)
Issue Date: 2019
Date of publication: 25.01.2019

DROPS-Home | Fulltext Search | Imprint | Privacy Published by LZI