Pattern
objects.
The idea is not to go overboard and reimplement an SQL lexer, but to
capture in one place the rules for those bits of SQL snippets that are
likely to be human-supplied in annotations and need to be checked for
correctness when emitted into deployment descriptors. Identifiers, for a
start.
Supplied in the API module so they are available to javac
to
compile and generate DDR when the rest of PL/Java is not necessarily
present. Of course backend code such as SQLDeploymentDescriptor
can also refer to these.-
Nested Class Summary
Modifier and TypeClassDescriptionstatic class
Class representing a SQL identifier. -
Field Summary
Modifier and TypeFieldDescriptionstatic final Pattern
Most of the inside of a bracketed comment, defined in an odd way.static final Pattern
Pattern that matches any identifier valid by both ISO and PG rules, with the presence of named capturing groups indicating which kind it is:i
for a regular identifier,xd
for a delimited identifier (still needing "" replaced with "), orxui
(with or without an explicituec
for a Unicode identifier (still needing "" to " and decoding ofUnicode escape value
s).static final Pattern
A regular identifier that satisfies both ISO and PostgreSQL rules.static final Pattern
A regular identifier that satisfies both ISO and PostgreSQL rules, in a single capturing group namedi
.static final Pattern
A complete delimited identifier as allowed by ISO.static final Pattern
An ISO delimited identifier with a single capturing group that captures the content (which still needs to have "" replaced with " throughout).static final Pattern
An identifier by ISO SQL, PostgreSQL, and Java (not SQL at all) rules.static final Pattern
A complete regular identifier as allowed by ISO.static final Pattern
A complete ISO regular identifier in a single capturing group.static final Pattern
Allowed as any non-first character of a regular identifier by ISO.static final Pattern
Allowed as the first character of a regular identifier by ISO.static final Pattern
The escape-specifier part of a Unicode delimited identifier or string.static final Pattern
A Unicode delimited identifier.static final String
A compilable pattern to match aUnicode escape value
.static final Pattern
A newline, in any of the various forms recognized by the Java regex engine, letting it handle the details.static final Pattern
An operator by PostgreSQL rules.static final Pattern
A complete regular identifier as allowed by PostgreSQL (PG 7.4 -).static final Pattern
A complete PostgreSQL regular identifier in a single capturing group.static final Pattern
Allowed as any non-first character of a regular identifier by PostgreSQL (PG 7.4 -).static final Pattern
Allowed as the first character of a regular identifier by PostgreSQL (PG 7.4 -).static final Pattern
SQL's SEPARATOR, which can include any amount of whitespace, simple comments, or bracketed comments.static final Pattern
The kind of comment that extends from -- to the end of the line.static final Pattern
White space except newline, for any Java-recognized newline. -
Method Summary
Modifier and TypeMethodDescriptionstatic Lexicals.Identifier.Simple
Return an Identifier.Simple, given aMatcher
that has matched an ISO_AND_PG_IDENTIFIER_CAPTURING.static boolean
Consume any SQL SEPARATOR at the beginning ofMatcher
m's current region.
-
Field Details
-
ISO_REGULAR_IDENTIFIER_START
Allowed as the first character of a regular identifier by ISO. -
ISO_REGULAR_IDENTIFIER_PART
Allowed as any non-first character of a regular identifier by ISO. -
ISO_REGULAR_IDENTIFIER
A complete regular identifier as allowed by ISO. -
ISO_REGULAR_IDENTIFIER_CAPTURING
A complete ISO regular identifier in a single capturing group. -
ISO_DELIMITED_IDENTIFIER
A complete delimited identifier as allowed by ISO. As it happens, this is also the form PostgreSQL uses for elements of a LIST_QUOTE-typed GUC. -
ISO_DELIMITED_IDENTIFIER_CAPTURING
An ISO delimited identifier with a single capturing group that captures the content (which still needs to have "" replaced with " throughout). The capturing group is namedxd
. -
ISO_UNICODE_ESCAPE_SPECIFIER
The escape-specifier part of a Unicode delimited identifier or string. The escape character itself is in the capturing group nameduec
. The group can be absent, in which case \ should be used as the uec.What makes this implementable as a regular expression is that what precedes/follows
UESCAPE
is restricted to simple white space, not the more generalseparator
(which can include nesting comments and therefore isn't a regular language). PostgreSQL enforces the same restriction, and a bit of language lawyering does confirm it's what ISO entails. ISO says "any<token>
may be followed by a<separator>
", and enumerates the expansions of<token>
. While an entire<Unicode character string literal>
or<Unicode delimited identifier>
is a<token>
, the constituent pieces of one, likeUESCAPE
here, are not. -
ISO_UNICODE_IDENTIFIER
A Unicode delimited identifier. The body is in capturing groupxui
and the escape character in groupuec
. The body still needs to have "" replaced with ", andUnicode escape value
s decoded and replaced, and then it has to be verified to be no longer than 128 codepoints. -
ISO_UNICODE_REPLACER
A compilable pattern to match aUnicode escape value
. A match should have one of three named capturing groups. Ifcev
, substitute theuec
itself. Ifu4d
oru6d
, substitute the codepoint represented by the hex digits. A match with none of those capturing groups indicates an ill-formed string.Maka a Pattern from this by supplying the right
uec
, so:Pattern.compile(String.format(ISO_UNICODE_REPLACER, Pattern.quote(uec)));
- See Also:
-
PG_REGULAR_IDENTIFIER_START
Allowed as the first character of a regular identifier by PostgreSQL (PG 7.4 -). -
PG_REGULAR_IDENTIFIER_PART
Allowed as any non-first character of a regular identifier by PostgreSQL (PG 7.4 -). -
PG_REGULAR_IDENTIFIER
A complete regular identifier as allowed by PostgreSQL (PG 7.4 -). -
PG_REGULAR_IDENTIFIER_CAPTURING
A complete PostgreSQL regular identifier in a single capturing group. -
ISO_AND_PG_REGULAR_IDENTIFIER
A regular identifier that satisfies both ISO and PostgreSQL rules. -
ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING
A regular identifier that satisfies both ISO and PostgreSQL rules, in a single capturing group namedi
. -
ISO_AND_PG_IDENTIFIER_CAPTURING
Pattern that matches any identifier valid by both ISO and PG rules, with the presence of named capturing groups indicating which kind it is:i
for a regular identifier,xd
for a delimited identifier (still needing "" replaced with "), orxui
(with or without an explicituec
for a Unicode identifier (still needing "" to " and decoding ofUnicode escape value
s). -
ISO_PG_JAVA_IDENTIFIER
An identifier by ISO SQL, PostgreSQL, and Java (not SQL at all) rules. (Not calledREGULAR
because Java allows no other form of identifier.) This restrictive form is the safest for identifiers being generated into a deployment descriptor file that an old version of PL/Java might load, because through 1.4.3 PL/Java used the Java identifier rules to recognize identifiers in deployment descriptors. -
PG_OPERATOR
An operator by PostgreSQL rules. The length limit (NAMELEN - 1
) is not applied here. The match will not include a-
followed by-
or a/
followed by*
, and a multicharacter match will not end with+
or-
unless it also contains one of~ ! @ # % ^ & | ` ?
. -
NEWLINE
A newline, in any of the various forms recognized by the Java regex engine, letting it handle the details. -
WHITESPACE_NO_NEWLINE
White space except newline, for any Java-recognized newline. -
SIMPLE_COMMENT
The kind of comment that extends from -- to the end of the line. This pattern does not eat the newline (though the ISO production does). -
BRACKETED_COMMENT_INSIDE
Most of the inside of a bracketed comment, defined in an odd way. It expects both characters of the /* introducer to have been consumed already. This pattern will then eat the whole comment including both closing characters if it encounters no nested comment; otherwise it will consume everything including the / of the nested introducer, but leaving the *, and the<nest>
capturing group will be present in the result. That signals the caller to increment the nesting level, consume one * and invoke this pattern again. If the nested match succeeds (without again setting the<nest>
group), the caller should then decrement the nest level and match this pattern again to consume the rest of the comment at the original level.This pattern leaves the * unconsumed upon finding a nested comment introducer as a way to end the repetition in the SEPARATOR pattern, as nothing the SEPARATOR pattern can match can begin with a *.
-
SEPARATOR
SQL's SEPARATOR, which can include any amount of whitespace, simple comments, or bracketed comments. This pattern will consume as much of all that as it can in one match. There are two capturing groups that might be set in a match result:<nl>
if there was at least one newline matched among the whitespace (which needs to be known to get the continuation of string literals right), and<nest>
if the start of a bracketed comment was encountered.In the
<nest>
case, the / of the comment introducer will have been consumed but the * will remain to consume (as described above for BRACKETED_COMMENT_INSIDE); the caller will need to increment a nest level, consume the *, and match BRACKETED_COMMENT_INSIDE to handle the nesting comment. Assuming that completes without another<nest>
found, the level should be decremented and BRACKETED_COMMENT_INSIDE matched again to match the rest of the outer comment. When that completes (without a<nest>
) at the outermost level, this pattern should be matched again to mop up any remaining SEPARATOR content.
-
-
Method Details
-
separator
Consume any SQL SEPARATOR at the beginning ofMatcher
m's current region.The region start is advanced to the character following any separator (or not at all, if no separator is found).
The meaning of the return value is altered by the significant parameter: when significant is true (meaning the very presence or absence of a separator is significant at that point in the grammar), the result will be true if any separator was found, false otherwise. When significant is false, the result does not reveal whether any separator was found, but will be true only if a separator was found that includes at least one newline. That information is needed for the grammar of string and binary-string literals.
- Parameters:
m
- aMatcher
whose current region should have any separator at the beginning consumed. The region start is advanced past any separator found. ThePattern
associated with theMatcher
may be changed.significant
- when true, the result should report whether any separator was found or not; when false, the result should report only whether a separator containing at least one newline was found, or not.- Returns:
- whether any separator was found, or whether any separator containing a newline was found, as selected by significant.
- Throws:
InputMismatchException
- if an unclosed /*-style comment is found.
-
identifierFrom
Return an Identifier.Simple, given aMatcher
that has matched an ISO_AND_PG_IDENTIFIER_CAPTURING. Will determine from the matching named groups which type of identifier it was, process the matched sequence appropriately, and return it.- Parameters:
m
- AMatcher
known to have matched an identifier.- Returns:
- Identifier.Simple made from the recovered string.
-