Struct aho_corasick::AhoCorasick
source · [−]Expand description
An automaton for searching multiple strings in linear time.
The AhoCorasick
type supports a few basic ways of constructing an
automaton, including
AhoCorasick::new
and
AhoCorasick::new_auto_configured
.
However, there are a fair number of configurable options that can be set
by using
AhoCorasickBuilder
instead. Such options include, but are not limited to, how matches are
determined, simple case insensitivity, whether to use a DFA or not and
various knobs for controlling the space-vs-time trade offs taken when
building the automaton.
If you aren’t sure where to start, try beginning with
AhoCorasick::new_auto_configured
.
Resource usage
Aho-Corasick automatons are always constructed in O(p)
time, where p
is the combined length of all patterns being searched. With that said,
building an automaton can be fairly costly because of high constant
factors, particularly when enabling the
DFA
option (which is disabled by default). For this reason, it’s generally a
good idea to build an automaton once and reuse it as much as possible.
Aho-Corasick automatons can also use a fair bit of memory. To get a
concrete idea of how much memory is being used, try using the
AhoCorasick::heap_bytes
method.
Examples
This example shows how to search for occurrences of multiple patterns simultaneously in a case insensitive fashion. Each match includes the pattern that matched along with the byte offsets of the match.
use aho_corasick::AhoCorasickBuilder;
let patterns = &["apple", "maple", "snapple"];
let haystack = "Nobody likes maple in their apple flavored Snapple.";
let ac = AhoCorasickBuilder::new()
.ascii_case_insensitive(true)
.build(patterns);
let mut matches = vec![];
for mat in ac.find_iter(haystack) {
matches.push((mat.pattern(), mat.start(), mat.end()));
}
assert_eq!(matches, vec![
(1, 13, 18),
(0, 28, 33),
(2, 43, 50),
]);
This example shows how to replace matches with some other string:
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];
let ac = AhoCorasick::new(patterns);
let result = ac.replace_all(haystack, replace_with);
assert_eq!(result, "The slow grey sloth.");
Implementations
Create a new Aho-Corasick automaton using the default configuration.
The default configuration optimizes for less space usage, but at the
expense of longer search times. To change the configuration, use
AhoCorasickBuilder
for fine-grained control, or
AhoCorasick::new_auto_configured
for automatic configuration if you aren’t sure which settings to pick.
This uses the default
MatchKind::Standard
match semantics, which reports a match as soon as it is found. This
corresponds to the standard match semantics supported by textbook
descriptions of the Aho-Corasick algorithm.
Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&[
"foo", "bar", "baz",
]);
assert_eq!(Some(1), ac.find("xxx bar xxx").map(|m| m.pattern()));
Build an Aho-Corasick automaton with an automatically determined configuration.
Specifically, this requires a slice of patterns instead of an iterator since the configuration is determined by looking at the patterns before constructing the automaton. The idea here is to balance space and time automatically. That is, when searching a small number of patterns, this will attempt to use the fastest possible configuration since the total space required will be small anyway. As the number of patterns grows, this will fall back to slower configurations that use less space.
If you want auto configuration but with match semantics different from
the default MatchKind::Standard
, then use
AhoCorasickBuilder::auto_configure
.
Examples
Basic usage is just like new
, except you must provide a slice:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new_auto_configured(&[
"foo", "bar", "baz",
]);
assert_eq!(Some(1), ac.find("xxx bar xxx").map(|m| m.pattern()));
Returns true if and only if this automaton matches the haystack at any position.
haystack
may be any type that is cheaply convertible to a &[u8]
.
This includes, but is not limited to, String
, &str
, Vec<u8>
, and
&[u8]
itself.
Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&[
"foo", "bar", "quux", "baz",
]);
assert!(ac.is_match("xxx bar xxx"));
assert!(!ac.is_match("xxx qux xxx"));
Returns the location of the first detected match in haystack
.
This method has the same behavior regardless of the
MatchKind
of this automaton.
haystack
may be any type that is cheaply convertible to a &[u8]
.
This includes, but is not limited to, String
, &str
, Vec<u8>
, and
&[u8]
itself.
Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&[
"abc", "b",
]);
let mat = ac.earliest_find("abcd").expect("should have match");
assert_eq!(1, mat.pattern());
assert_eq!((1, 2), (mat.start(), mat.end()));
Returns the location of the first match according to the match semantics that this automaton was constructed with.
When using MatchKind::Standard
, this corresponds precisely to the
same behavior as
earliest_find
.
Otherwise, match semantics correspond to either
leftmost-first
or
leftmost-longest.
haystack
may be any type that is cheaply convertible to a &[u8]
.
This includes, but is not limited to, String
, &str
, Vec<u8>
, and
&[u8]
itself.
Examples
Basic usage, with standard semantics:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::Standard) // default, not necessary
.build(patterns);
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("b", &haystack[mat.start()..mat.end()]);
Now with leftmost-first semantics:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns);
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("abc", &haystack[mat.start()..mat.end()]);
And finally, leftmost-longest semantics:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostLongest)
.build(patterns);
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("abcd", &haystack[mat.start()..mat.end()]);
Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.
haystack
may be any type that is cheaply convertible to a &[u8]
.
This includes, but is not limited to, String
, &str
, Vec<u8>
, and
&[u8]
itself.
Examples
Basic usage, with standard semantics:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::Standard) // default, not necessary
.build(patterns);
let matches: Vec<usize> = ac
.find_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![2, 2, 2], matches);
Now with leftmost-first semantics:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns);
let matches: Vec<usize> = ac
.find_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![0, 2, 0], matches);
And finally, leftmost-longest semantics:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostLongest)
.build(patterns);
let matches: Vec<usize> = ac
.find_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![0, 2, 1], matches);
pub fn find_overlapping_iter<'a, 'b, B: ?Sized + AsRef<[u8]>>(
&'a self,
haystack: &'b B
) -> FindOverlappingIter<'a, 'b, S>ⓘNotable traits for FindOverlappingIter<'a, 'b, S>impl<'a, 'b, S: StateID> Iterator for FindOverlappingIter<'a, 'b, S> type Item = Match;
pub fn find_overlapping_iter<'a, 'b, B: ?Sized + AsRef<[u8]>>(
&'a self,
haystack: &'b B
) -> FindOverlappingIter<'a, 'b, S>ⓘNotable traits for FindOverlappingIter<'a, 'b, S>impl<'a, 'b, S: StateID> Iterator for FindOverlappingIter<'a, 'b, S> type Item = Match;
impl<'a, 'b, S: StateID> Iterator for FindOverlappingIter<'a, 'b, S> type Item = Match;
Returns an iterator of overlapping matches in the given haystack
.
Overlapping matches can only be detected using
MatchKind::Standard
semantics. If this automaton was constructed with
leftmost semantics, then this method will panic. To determine whether
this will panic at runtime, use the
AhoCorasick::supports_overlapping
method.
haystack
may be any type that is cheaply convertible to a &[u8]
.
This includes, but is not limited to, String
, &str
, Vec<u8>
, and
&[u8]
itself.
Panics
This panics when AhoCorasick::supports_overlapping
returns false
.
That is, this panics when this automaton’s match semantics are not
MatchKind::Standard
.
Examples
Basic usage, with standard semantics:
use aho_corasick::AhoCorasick;
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns);
let matches: Vec<usize> = ac
.find_overlapping_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![2, 0, 2, 2, 0, 1], matches);
Replace all matches with a corresponding value in the replace_with
slice given. Matches correspond to the same matches as reported by
find_iter
.
Replacements are determined by the index of the matching pattern.
For example, if the pattern with index 2
is found, then it is
replaced by replace_with[2]
.
Panics
This panics when replace_with.len()
does not equal the total number
of patterns that are matched by this automaton.
Examples
Basic usage:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns);
let result = ac.replace_all(haystack, &["x", "y", "z"]);
assert_eq!("x the z to the xage", result);
Replace all matches using raw bytes with a corresponding value in the
replace_with
slice given. Matches correspond to the same matches as
reported by find_iter
.
Replacements are determined by the index of the matching pattern.
For example, if the pattern with index 2
is found, then it is
replaced by replace_with[2]
.
Panics
This panics when replace_with.len()
does not equal the total number
of patterns that are matched by this automaton.
Examples
Basic usage:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns);
let result = ac.replace_all_bytes(haystack, &["x", "y", "z"]);
assert_eq!(b"x the z to the xage".to_vec(), result);
Replace all matches using a closure called on each match.
Matches correspond to the same matches as reported by
find_iter
.
The closure accepts three parameters: the match found, the text of
the match and a string buffer with which to write the replaced text
(if any). If the closure returns true
, then it continues to the next
match. If the closure returns false
, then searching is stopped.
Examples
Basic usage:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns);
let mut result = String::new();
ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
dst.push_str(&mat.pattern().to_string());
true
});
assert_eq!("0 the 2 to the 0age", result);
Stopping the replacement by returning false
(continued from the
example above):
let mut result = String::new();
ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
dst.push_str(&mat.pattern().to_string());
mat.pattern() != 2
});
assert_eq!("0 the 2 to the appendage", result);
Replace all matches using raw bytes with a closure called on each
match. Matches correspond to the same matches as reported by
find_iter
.
The closure accepts three parameters: the match found, the text of
the match and a byte buffer with which to write the replaced text
(if any). If the closure returns true
, then it continues to the next
match. If the closure returns false
, then searching is stopped.
Examples
Basic usage:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns);
let mut result = vec![];
ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
dst.extend(mat.pattern().to_string().bytes());
true
});
assert_eq!(b"0 the 2 to the 0age".to_vec(), result);
Stopping the replacement by returning false
(continued from the
example above):
let mut result = vec![];
ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
dst.extend(mat.pattern().to_string().bytes());
mat.pattern() != 2
});
assert_eq!(b"0 the 2 to the appendage".to_vec(), result);
pub fn stream_find_iter<'a, R: Read>(
&'a self,
rdr: R
) -> StreamFindIter<'a, R, S>ⓘNotable traits for StreamFindIter<'a, R, S>impl<'a, R: Read, S: StateID> Iterator for StreamFindIter<'a, R, S> type Item = Result<Match>;
pub fn stream_find_iter<'a, R: Read>(
&'a self,
rdr: R
) -> StreamFindIter<'a, R, S>ⓘNotable traits for StreamFindIter<'a, R, S>impl<'a, R: Read, S: StateID> Iterator for StreamFindIter<'a, R, S> type Item = Result<Match>;
impl<'a, R: Read, S: StateID> Iterator for StreamFindIter<'a, R, S> type Item = Result<Match>;
Returns an iterator of non-overlapping matches in the given
stream. Matches correspond to the same matches as reported by
find_iter
.
The matches yielded by this iterator use absolute position offsets in
the stream given, where the first byte has index 0
. Matches are
yieled until the stream is exhausted.
Each item yielded by the iterator is an io::Result<Match>
, where an
error is yielded if there was a problem reading from the reader given.
When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.
Searching a stream requires that the automaton was built with
MatchKind::Standard
semantics. If this automaton was constructed
with leftmost semantics, then this method will panic. To determine
whether this will panic at runtime, use the
AhoCorasick::supports_stream
method.
Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
Panics
This panics when AhoCorasick::supports_stream
returns false
.
That is, this panics when this automaton’s match semantics are not
MatchKind::Standard
. This restriction may be lifted in the future.
Examples
Basic usage:
use aho_corasick::AhoCorasick;
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns);
let mut matches = vec![];
for result in ac.stream_find_iter(haystack.as_bytes()) {
let mat = result?;
matches.push(mat.pattern());
}
assert_eq!(vec![2, 2, 2], matches);
Search for and replace all matches of this automaton in
the given reader, and write the replacements to the given
writer. Matches correspond to the same matches as reported by
find_iter
.
Replacements are determined by the index of the matching pattern.
For example, if the pattern with index 2
is found, then it is
replaced by replace_with[2]
.
After all matches are replaced, the writer is not flushed.
If there was a problem reading from the given reader or writing to the
given writer, then the corresponding io::Error
is returned and all
replacement is stopped.
When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.
Searching a stream requires that the automaton was built with
MatchKind::Standard
semantics. If this automaton was constructed
with leftmost semantics, then this method will panic. To determine
whether this will panic at runtime, use the
AhoCorasick::supports_stream
method.
Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
Panics
This panics when AhoCorasick::supports_stream
returns false
.
That is, this panics when this automaton’s match semantics are not
MatchKind::Standard
. This restriction may be lifted in the future.
Examples
Basic usage:
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];
let ac = AhoCorasick::new(patterns);
let mut result = vec![];
ac.stream_replace_all(haystack.as_bytes(), &mut result, replace_with)?;
assert_eq!(b"The slow grey sloth.".to_vec(), result);
Search the given reader and replace all matches of this automaton
using the given closure. The result is written to the given
writer. Matches correspond to the same matches as reported by
find_iter
.
The closure accepts three parameters: the match found, the text of the match and the writer with which to write the replaced text (if any).
After all matches are replaced, the writer is not flushed.
If there was a problem reading from the given reader or writing to the
given writer, then the corresponding io::Error
is returned and all
replacement is stopped.
When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.
Searching a stream requires that the automaton was built with
MatchKind::Standard
semantics. If this automaton was constructed
with leftmost semantics, then this method will panic. To determine
whether this will panic at runtime, use the
AhoCorasick::supports_stream
method.
Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
Panics
This panics when AhoCorasick::supports_stream
returns false
.
That is, this panics when this automaton’s match semantics are not
MatchKind::Standard
. This restriction may be lifted in the future.
Examples
Basic usage:
use std::io::Write;
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let ac = AhoCorasick::new(patterns);
let mut result = vec![];
ac.stream_replace_all_with(
haystack.as_bytes(),
&mut result,
|mat, _, wtr| {
wtr.write_all(mat.pattern().to_string().as_bytes())
},
)?;
assert_eq!(b"The 2 1 0.".to_vec(), result);
Returns the match kind used by this automaton.
Examples
Basic usage:
use aho_corasick::{AhoCorasick, MatchKind};
let ac = AhoCorasick::new(&[
"foo", "bar", "quux", "baz",
]);
assert_eq!(&MatchKind::Standard, ac.match_kind());
Returns the length of the longest pattern matched by this automaton.
Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&[
"foo", "bar", "quux", "baz",
]);
assert_eq!(4, ac.max_pattern_len());
Return the total number of patterns matched by this automaton.
This includes patterns that may never participate in a match. For
example, if
MatchKind::LeftmostFirst
match semantics are used, and the patterns Sam
and Samwise
were
used to build the automaton, then Samwise
can never participate in a
match because Sam
will always take priority.
Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&[
"foo", "bar", "baz",
]);
assert_eq!(3, ac.pattern_count());
Returns true if and only if this automaton supports reporting overlapping matches.
If this returns false and overlapping matches are requested, then it will result in a panic.
Since leftmost matching is inherently incompatible with overlapping
matches, only
MatchKind::Standard
supports overlapping matches. This is unlikely to change in the future.
Examples
Basic usage:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::Standard)
.build(&["foo", "bar", "baz"]);
assert!(ac.supports_overlapping());
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(&["foo", "bar", "baz"]);
assert!(!ac.supports_overlapping());
Returns true if and only if this automaton supports stream searching.
If this returns false and stream searching (or replacing) is attempted, then it will result in a panic.
Currently, only
MatchKind::Standard
supports streaming. This may be expanded in the future.
Examples
Basic usage:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::Standard)
.build(&["foo", "bar", "baz"]);
assert!(ac.supports_stream());
let ac = AhoCorasickBuilder::new()
.match_kind(MatchKind::LeftmostFirst)
.build(&["foo", "bar", "baz"]);
assert!(!ac.supports_stream());
Returns the approximate total amount of heap used by this automaton, in units of bytes.
Examples
This example shows the difference in heap usage between a few configurations:
use aho_corasick::{AhoCorasickBuilder, MatchKind};
let ac = AhoCorasickBuilder::new()
.dfa(false) // default
.build(&["foo", "bar", "baz"]);
assert_eq!(10_336, ac.heap_bytes());
let ac = AhoCorasickBuilder::new()
.dfa(false) // default
.ascii_case_insensitive(true)
.build(&["foo", "bar", "baz"]);
assert_eq!(10_384, ac.heap_bytes());
let ac = AhoCorasickBuilder::new()
.dfa(true)
.ascii_case_insensitive(true)
.build(&["foo", "bar", "baz"]);
assert_eq!(1_248, ac.heap_bytes());
Trait Implementations
Auto Trait Implementations
impl<S> RefUnwindSafe for AhoCorasick<S> where
S: RefUnwindSafe,
impl<S> Send for AhoCorasick<S> where
S: Send,
impl<S> Sync for AhoCorasick<S> where
S: Sync,
impl<S> Unpin for AhoCorasick<S> where
S: Unpin,
impl<S> UnwindSafe for AhoCorasick<S> where
S: UnwindSafe,
Blanket Implementations
Mutably borrows from an owned value. Read more