Streamalert: A serverless framework for real-time data analysis and alerting

Dec. 19, 2017, 12:54 p.m. By: Kirti Bakshi

Streamalert

StreamAlert, that is an open source release by Airbnb, is serverless and real-time data analysis framework which helps empower its users to ingest, analyze, as well as an alert on data from any environment, with the use of data sources and alerting logic that is defined.

The uniqueness of StreamAlert lies in an area that as it is serverless, scalable to TB’s per hour and to add onto it, even more, its infrastructure deployment is automated. Not to forget that it is also secure by default, thus only proving it an upper hand.

The reason behind the idea of Steamalert was basically because, Airbnb was in the need of a product that empowered both the engineers as well as the administrators to ingest, analyze, and alongside alert on data in real-time from their respective environments.

And the product that was in search, was not in hand. This lead to the upcoming of Streamalert, an open source release, that comprised of all the requirements that were needed and in mind:

The Features included:

  • Deployment should be automated: simple, safe and repeatable for any AWS account as well.

  • Should be Easily scalable.

  • Infrastructure maintenance should be minimal and there should be no requirement of devops expertise

  • Infrastructure security must also be a default, no security expertise required.

  • Should Support data from different environments (ex: IT, Engineering)

  • Should also support data from different environment types (ex: Cloud, Datacenter, office)

  • Should also support different types of data (ex: JSON, CSV, Key-Value, or Syslog)

  • Also, Supports different use-cases like security, infrastructure, compliance and even more

As partially it has been outlined above, StreamAlert presents itself with some unique benefits:

  • Serverless — Utilization of AWS Lambda, therefore, no need to manage, patch or harden any new servers.

  • Scalable — Utilization of AWS Kinesis Streams that results in the scaling of megabytes to terabytes per hour and also from thousands to millions of PUT records per second.

  • Automated — Utilization of Terraform, that means infrastructure and supporting services are represented as code and deployed through automation.

  • Secure —  Making the use of secure transport (TLS) and performing data analysis in a container or sandbox, segments data per the desired defined environments, and also uses role-based access control (RBAC).

  • Open Source — Anyone can make use of or contribute to StreamAlert.

Coming to the use-cases:

The image below denotes some of the example datasets that can be analyzed by StreamAlert:

StreamAlert

The ultimate aim of StreamAlert is to be as sceptic as possible in order to support the widest range of data analysis and alerting use-cases.

At a high-level, Steamalert supports the following:

  • Any Source — StreamAlert can accept data from an S3 bucket or any agent/service that supports sending to Amazon Kinesis Streams. Examples: fluentd, logstash, aws-kinesis-agent, or any language supported by the AWS SDK

  • Any Operating System — StreamAlert can accept data from any device that supports log forwarding (Linux, MacOS, Windows, …)

  • Any Environment — StreamAlert can accept data from any environment that has internet connectivity (Cloud, Datacenter, Office, Hybrid)

From a data perspective, StreamAlert also supports file formats such as JSON, CSV, Key-Value, and Syslog formats. Apart from all this, StreamAlert introduces itself with an alerting framework that is flexible and that can be integrated with new or existing case/incident management tools. And as a matter of fact, StreamAlert also supports PagerDuty, Slack and S3. It can also be extended to support any API. Keeping up to the secure by default principle, all API credentials are encrypted as well as decrypted with the use of AWS Key Management Service (KMS).

Finally moving to what it plans for the future, StreamAlert will soon also support comparing logs against traditional indicators of compromise (IOCs), which might range up to millions in volume. This will be built in a way that’s provider agnostic, allowing one to make use of ThreatStream, ThreatExchange, or whatever desired. Continuing, StreamAlert will also support receiving data through an HTTP endpoint. And for the purpose of historical searching, StreamAlert will make use of AWS Athena, That will allow the analyzing of data using SQL for both ad-hoc as well as scheduled queries.

The team hopes that StreamAlert serves as an example of making deployment simple, repeatable and safe so that it can be used by anyone easily.

For More Information: GitHub