Splunk String Likeness Algorithms¶
Splunk apps can greatly supplement Splunk's capabilities when working with SPL searches. Utilizing a Splunk app called Jellyfisher we can take advantage of several string likeness algorithms to help us identify data that is similiar.
Jellyfisher App¶
Jellyfisher is a Splunk app version of the Jellyfish Python library. The Jellyfisher app can be installed on modern versions of Splunk Enterprise along with Splunk Cloud. The app is installed in the typical ways, either browsing/searching the Splunk UI or downloading the app from Splunkbase and uploading to your Splunk instance.
Jellyfish Algorithms¶
The Jellyfish library (and therefore the Jellyfisher Splunk app) includes several different comparison algorithms to choose from.1
-
String comparison algorithms:
- Levenshtein Distance
- Damerau-Levenshtein Distance
- Jaro Distance
- Jaro-Winkler Distance
- Match Rating Approach Comparison
- Hamming Distance
-
Phonetic encoding algortihms:
- American Soundex - Metaphone
- NYSIIS (New York State Identification and Intelligence System)
- Match Rating Codex
-
Example:
- The Levenshtein distance between "kitten" and "sitting" is 3.
- The soundex representation of both "Robert" and "Rupert" is "R163"
Personally, I prefer using the Jaro-Winkler distance
algorithm when comparing Splunk strings.
Usage in Splunk¶
A simple example of Jellyfisher, after installing it, is as follows.
| makeresults
| eval domain1 = "mktbs.net"
| eval domain2 = "mkts.net"
| eval domain3 = "gmail.com"
| jellyfisher jaro_winkler(domain1,domain2)
| rename jaro_winkler AS jaro_winkler_1_and_2
| jellyfisher jaro_winkler(domain1,domain3)
| rename jaro_winkler AS jaro_winkler_1_and_3
This search will result in the following table.
domain1 | domain2 | domain3 | jaro_winkler_1_and_2 | jaro_winkler_1_and_3 |
---|---|---|---|---|
mktbs.net | mkts.net | gmail.com | 0.9740740740740741 | 0.43703703703703706 |
In SPL, using the | eval
syntax creates a Splunk field, we set the variables of domain1
, domain2
, and domain3
to be our example domains. Using jellyfisher
against domain1
and domain2
, or mktbs.net
and mkts.net
results in a very small change, we can see the value is 0.9740... In Jaro Winkler, the closer to 1 a comparison value is, the more like a string is.
In our second example we compare domain1
and domain3
, or mktbs.net
and gmail.com
. These strings aren't terribly close together and the resulting value of .4370... shows that.
Use Cases¶
Jellyfisher algorithms are great when we need to determine how similar a string is, more so when we are looking to find slight changes, things that are difficult to notice with the human eye.
- ATO detection
- Specifically where an attacker is changing the account email address to something very similar to the existing email address
- Phishing detection
- Jellyfisher algorithms can be used to compare senders/email subjects/email body content to detect variations on known phishing words or phrases.
- Password disclosure detection
- If you are logging your SSO logs from a provider such as Okta, Azure Entra, or Ping Identity, sometimes users may enter their password in the username field. Normally, this wouldn't be a big deal, but your SSO provider likely logs an event such as
No User Found
and likely includes the "username" in that log. If the user is remote, where the user is working from home, it is very likely that from those identity provider logs you can determine the user who entered their password in the username field. In this case, you will want to ensure that the user resets their password. - Jellyfisher can assist here by helping us understand if the user did not enter a password in the username field. We do this to avoid false positives inundating us with alerts.
- We can do this by comparing the data that was entered to expected data. For example, if your email domain is
mycompany.com
, it is not uncommon to see users making minor errors such as enteringmycompany,com
ormycompany.con
. Jellyfisher will help us understand if the incorrect username was very similar to their actual username, and likely not a password.
- If you are logging your SSO logs from a provider such as Okta, Azure Entra, or Ping Identity, sometimes users may enter their password in the username field. Normally, this wouldn't be a big deal, but your SSO provider likely logs an event such as
-
Most of this language comes from the Splunkbase page for Jellyfisher. Take a look at the docs there for more details on the specifics of the Jellyfisher supported algorithms. ↩