Monday, 29 October 2018

regex - From within Java, how can I create a list of all possible numbers from a specific regular expression?




I have a strange problem, at least one I've never come across. I have a precondition where customers have simple regular expressions associated with labels. The labels are all they care about. What I would like to do is create a list of all possible numbers that would match each of these regular expressions. I would have logic that would warn me when the list is beyond a certain threshold.



Here is an example of the regular expression: 34.25.14.(227|228|229|230|243|244|245|246)



Lets say that these ip(s) are associated with ACME. Behind the scenes when the user selects ACME (in our UI), I'm filling out a filter object that contains all of those possible numbers and submitting them as an OR query to a highly specialized Vertica database.



I just can't determine an elegant way of creating a list of numbers from said regular expressions.



The others aspect of this, is that the java code within another portion of the product is using those regular expressions to show ACME by using a java Pattern.compile(), which means the customer 'could' create a complex regular expression. I've only seen them, so far, use something simple as shown above.




Is there a method that will generate a list based on regular expression?



Thanks for your time.


Answer



Related:



A library that generates data matching a regular expression (with limitations):
http://code.google.com/p/xeger/



Several solutions, such as conversing a regex into a grammar:

Using Regex to generate Strings rather than match them






EDIT: Actually, you can make it work!!! The only thing to address is to impose some domain-specific constraints to preclude combinatorial explosion like a+.



If you add to the Xeger class something like this:



public void enumerate() {
System.out.println("enumerate: \"" + regex + "\"");

int level = 0;
String accumulated = "";
enumerate(level, accumulated, automaton.getInitialState());
}

private void enumerate(int level, String accumulated, State state) {
List transitions = state.getSortedTransitions(true);
if (state.isAccept()) {
System.out.println(accumulated);
return;

}
if (transitions.size() == 0) {
assert state.isAccept();
return;
}
int nroptions = state.isAccept() ? transitions.size() : transitions.size() - 1;
for (int option = 0; option <= nroptions; option++) {
// Moving on to next transition
Transition transition = transitions.get(option - (state.isAccept() ? 1 : 0));
for (char choice = transition.getMin(); choice <= transition.getMax(); choice++) {

enumerate(level + 1, accumulated + choice, transition.getDest());
}
}
}


... and something like this to XegerTest:



@Test
public void enumerateAllVariants() {

//String regex = "[ab]{4,6}c";
String regex = "34\\.25\\.14\\.(227|228|229|230|243|244|245|246)";
Xeger generator = new Xeger(regex);
generator.enumerate();
}


... you will get this:



-------------------------------------------------------

T E S T S
-------------------------------------------------------
Running nl.flotsam.xeger.XegerTest
enumerate: "34\.25\.14\.(227|228|229|230|243|244|245|246)"
34.25.14.227
34.25.14.228
34.25.14.229
34.25.14.243
34.25.14.244
34.25.14.245

34.25.14.246
34.25.14.230
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.114 sec


... and, guess what. For "[ab]{4,6}c" it correctly produces 112 variants.



It's really a quick and dirty experiment, but it seems to work ;).


No comments:

Post a Comment

php - file_get_contents shows unexpected output while reading a file

I want to output an inline jpg image as a base64 encoded string, however when I do this : $contents = file_get_contents($filename); print ...