Mastering Elastic Stack
上QQ阅读APP看书,第一时间看更新

Exploring Filter Plugins

A filter plugin is used to perform transformations on the data. If your input fetches data based on what you want to process the data, then a filter plugin will help you to do so before sending the output. It acts as the intermediate section between input and output, which is required in the Logstash configuration file.

Let's have a look at few of the filter plugins.

grok

The grok plugin is the most commonly used filter in Logstash and has powerful capabilities to transform your data from unstructured to structured data. Even if your data is structured, you can streamline the data using this pattern. Due to the powerful nature of the grok pattern, Logstash is referred to as a Swiss Army Knife. Grok is used to parse the data and structure the data in the way you want. It is used to parse any type of log that is in human-readable format. It combines the text patterns into a structure that helps you to match the logs and group them into fields. Logstash ships with more than 120 grok patterns, which you can readily use, and you can even create your own patterns to match with grok.

Note

Grok patterns can be found at https://github.com/logstash-plugins/logstash-patterns-core/tree/master/patterns. They are also present within the Logstash installation folder and can be found by searching for the grok patterns file.

The basic syntax for a grok pattern is as follows:

 %{SYNTAX:SEMANTIC} 

Here, SYNTAX is the name of the pattern that matches the data and SEMANTIC is the identifier or the field name that provides you with a matched pattern, as defined in the syntax.

With grok, you can use the regular expressions, or you can even create custom patterns. The regular expression library used is Oniguruma, whose regex syntax is available at https://github.com/kkos/oniguruma/blob/master/doc/RE. You can even create a custom patterns file, which contains your custom-created patterns to match the data. Grok matches the data from left to right and matches the pattern one by one.

Let's have a look at a few log lines and apply grok patterns to them.

Pick a sample log line:

Log line: Jun 19 02:11:30 This is sample log. 

Let us use existing grok patterns to create a matching grok pattern for the preceding log line:

    %{CISCOTIMESTAMP:timestamp} %{GREEDYDATA:log}

Now let's have a look at their specified patterns to understand them:

    CISCOTIMESTAMP %{MONTH} +%{MONTHDAY}(?: %{YEAR})? %{TIME}
    GREEDYDATA .*

This looks so tough to come up with. How do I know which pattern matches my data?

I agree, it's difficult to know which patterns are available and which patterns to match. But don't worry; your friends will help you to discover your pattern initially on which you can modify as per requirement. You can visit either of the following two websites for help related to building patterns to match the data:

The basic configuration for grok is as follows:

grok { 
} 

In this plugin, no settings are mandatory. The additional configuration settings are as follows:

  • add_field: This is used to add a field to the incoming data.
  • add_tag: This is used to add a tag to the incoming data. Tags can be static or dynamic based on keys present in the incoming data.
  • break_on_match: This is used to exit searching for filters or patterns if it is set as true. It will finish the filter on successful match of the pattern. If set as false, it will match all the grok patterns.
  • keep_empty_captures: This is used to keep event fields which are empty if set as true.
  • match: This is used to match the field with the value. The value is provided as a pattern or multiple patterns, which can be provided as an array.
  • named_captures_only: This is used to keep only the fields which have been defined using the grok pattern, if set as true.
  • overwrite: This is used to overwrite the value of the field. It will overwrite the existing field value.
  • patterns_dir: This is used to mention the directory in which you have custom-created patterns defined. You can mention a single directory or multiple directories.
  • patterns_files_glob: This is used to select all the files from the directory as specified in patterns_dir.
  • periodic_flush: This is used to call the flush method at regular intervals.
  • remove_field: This is used to remove a field from the incoming data.
  • remove_tag: This is used to remove a tag from the incoming data.
  • tag_on_failure: This is used to create a tag with a failure message if the event was not able to match the grok pattern or there is no successful pattern match with the value.
  • tag_on_timeout: This is used to create a tag if the grok regular expression gets timed out.
  • timeout_millis: This is used to terminate the regular expression if it takes more than the time specified in this setting. It is applicable to each pattern if there are multiple patterns for grok. It is in milli-seconds.

The value type and default values for the settings are as follows:

Configuration example:

filter { 
        grok { 
                add_field => {"current_time" => "%{@timestamp}" } 
               match => { "message" => "%{CISCOTIMESTAMP:timestamp} %{HOST:host} %  
               {WORD:program}: \[%{NUMBER:duration}\] %{GREEDYDATA:log}" } 
                remove_field => ["host "] 
                remove_tag => ["grok","test"] 
             } 
        } 

In the preceding configuration, we are matching the data with the pattern defined as a message. It will match the data with this pattern and if the pattern does not match, it will add a _grokparsefailure tag. It will also remove the field host from the message.

mutate

The mutate filter is used to perform various mutations on the field such as renaming the field, joining the field, converting the field to uppercase or lowercase, splitting the string or converting the datatype of the field, and so on.

The basic configuration for mutate is as follows:

mutate { 
} 

In this plugin, no settings are mandatory. The additional configuration settings are as follows:

  • add_field: This is used to add a field to the incoming data.
  • add_tag: This is used to add a tag to the incoming data. Tags can be static or dynamic based on keys present in incoming data.
  • convert: This is used to convert the datatype of the field. You can convert the field value into integer, string, Boolean, date, or float.
    Note

    All fields stored to Elasticsearch are of the string datatype. To change the datatype of the field, use convert to change the datatype for that field.

  • gsub: This is used to search and replace a value of the field using a regular expression. It is similar to the sed command in Unix. It takes its input as three parameters, namely, fieldname, search pattern, and replace by. It can only be applied to fields whose datatype is a string.
    Note

    Do not forget to escape the backslash for the search pattern.

  • join: This is used to join the values of an array by a character as defined. It is only applied on the fields whose datatype is an array.
  • lowercase: This is used to convert the value of the string to lowercase.
  • merge: This is used to merge two fields with a datatype array or hash. It will not merge two fields with a datatype array and hash. Also, fields with datatype string are converted into an array so that you can merge the string with the array.
  • periodic_flush: This is used to call the flush method at regular intervals.
  • remove_field: This is used to remove a field from the incoming data.
  • remove_tag: This is used to remove a tag from the incoming data.
  • rename: This is used to rename the field name.
  • replace: This is used to replace the value of the field.
  • split: This is used to split the field using a separator, which splits the field into an array. It only works on fields with a datatype string.
  • strip: This is used to remove the whitespace from the field.
  • update: This is used to update the value of an existing field. If the field doesn't exist, update will not work.
  • uppercase: This is used to convert the value of the string to uppercase.

The value type and default values for the settings are as follows:

Configuration example:

filter { 
        mutate { 
                convert => {"field1" => "float" } 
                gsub => ["field2","!","+"]   
                lowercase => ["field2"] 
                rename => {"field1" => "newfield"} 
                strip => ["field1", "field2"] 
                update => {"field2" => "It's easy to update"} 
             } 
        } 

In the preceding configuration, we are showcasing a few of the mutations applied on the fields.

csv

csv is used to perform various operations on received CSV input data. It is used for parsing the CSV type of data separated by commas. Although it is a csv filter, it is worth mentioning that it can parse the data with any separator.

The basic configuration for csv is as follows:

csv { 
} 

In this plugin, no settings are mandatory. The additional configuration settings are as follows:

  • add_field: This is used to add a field to the incoming data.
  • add_tag: This is used to add a tag to the incoming data. Tags can be static or dynamic based on keys present in the incoming data.
  • autogenerate_column_names: This is used to autogenerate the names of the columns, if set to true. If a header is present, it will be used as a column name.
  • columns: This is used to define the name of the columns that appear in the file. If the names of the columns are not specified, then default column names are used, which are "column1", "column2", and so on. If there is a larger number of columns than those defined, the columns are auto-numbered. It is useful to define when there is no header in the file.
  • convert: This is used to convert the datatype of the field. You can convert the field value into either an integer, string, Boolean, date, or float.
  • periodic_flush: This is used to call the flush method at regular intervals.
  • quote_char: This is used to specify the character that quotes the values of the CSV fields.
  • remove_field: This is used to remove a field from the incoming data.
  • remove_tag: This is used to remove a tag from the incoming data.
  • separator: This is used to specify the separator that separates the column.
  • skip_empty_columns: This is used to specify whether to skip the empty columns or not.
  • source: This is used to expand the value of the source field.
  • target: This is used to specify the target field in which data will be stored.

The value types and default values for the settings are as follows:

Configuration example:

filter {   
        csv { 
                columns => ["id","name","money"] 
                convert => {"id" => "integer", "money" => "float"} 
                quote_char => "#" 
                separator => "  " 
             } 
        } 

In the preceding configuration, we mentioned the names of the columns. We converted the datatype of the fields and changed the value of the quote character. We also specified the separator to separate the columns by a tab.