- java.lang.Object
-
- org.postgresql.pljava.sqlgen.Lexicals
-
public abstract class Lexicals extends Object
A few useful SQL lexical definitions supplied asPattern
objects. The idea is not to go overboard and reimplement an SQL lexer, but to capture in one place the rules for those bits of SQL snippets that are likely to be human-supplied in annotations and need to be checked for correctness when emitted into deployment descriptors. Identifiers, for a start. Supplied in the API module so they are available tojavac
to compile and generate DDR when the rest of PL/Java is not necessarily present. Of course backend code such asSQLDeploymentDescriptor
can also refer to these.
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
Lexicals.Identifier
Class representing a SQL identifier.
-
Field Summary
Fields Modifier and Type Field Description static Pattern
BRACKETED_COMMENT_INSIDE
Most of the inside of a bracketed comment, defined in an odd way.static Pattern
ISO_AND_PG_IDENTIFIER_CAPTURING
Pattern that matches any identifier valid by both ISO and PG rules, with the presence of named capturing groups indicating which kind it is:i
for a regular identifier,xd
for a delimited identifier (still needing "" replaced with "), orxui
(with or without an explicituec
for a Unicode identifier (still needing "" to " and decoding ofUnicode escape value
s).static Pattern
ISO_AND_PG_REGULAR_IDENTIFIER
A regular identifier that satisfies both ISO and PostgreSQL rules.static Pattern
ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING
A regular identifier that satisfies both ISO and PostgreSQL rules, in a single capturing group namedi
.static Pattern
ISO_DELIMITED_IDENTIFIER
A complete delimited identifier as allowed by ISO.static Pattern
ISO_DELIMITED_IDENTIFIER_CAPTURING
An ISO delimited identifier with a single capturing group that captures the content (which still needs to have "" replaced with " throughout).static Pattern
ISO_PG_JAVA_IDENTIFIER
An identifier by ISO SQL, PostgreSQL, and Java (not SQL at all) rules.static Pattern
ISO_REGULAR_IDENTIFIER
A complete regular identifier as allowed by ISO.static Pattern
ISO_REGULAR_IDENTIFIER_CAPTURING
A complete ISO regular identifier in a single capturing group.static Pattern
ISO_REGULAR_IDENTIFIER_PART
Allowed as any non-first character of a regular identifier by ISO.static Pattern
ISO_REGULAR_IDENTIFIER_START
Allowed as the first character of a regular identifier by ISO.static Pattern
ISO_UNICODE_ESCAPE_SPECIFIER
The escape-specifier part of a Unicode delimited identifier or string.static Pattern
ISO_UNICODE_IDENTIFIER
A Unicode delimited identifier.static String
ISO_UNICODE_REPLACER
A compilable pattern to match aUnicode escape value
.static Pattern
NEWLINE
A newline, in any of the various forms recognized by the Java regex engine, letting it handle the details.static Pattern
PG_OPERATOR
An operator by PostgreSQL rules.static Pattern
PG_REGULAR_IDENTIFIER
A complete regular identifier as allowed by PostgreSQL (PG 7.4 -).static Pattern
PG_REGULAR_IDENTIFIER_CAPTURING
A complete PostgreSQL regular identifier in a single capturing group.static Pattern
PG_REGULAR_IDENTIFIER_PART
Allowed as any non-first character of a regular identifier by PostgreSQL (PG 7.4 -).static Pattern
PG_REGULAR_IDENTIFIER_START
Allowed as the first character of a regular identifier by PostgreSQL (PG 7.4 -).static Pattern
SEPARATOR
SQL's SEPARATOR, which can include any amount of whitespace, simple comments, or bracketed comments.static Pattern
SIMPLE_COMMENT
The kind of comment that extends from -- to the end of the line.static Pattern
WHITESPACE_NO_NEWLINE
White space except newline, for any Java-recognized newline.
-
Method Summary
Modifier and Type Method Description static Lexicals.Identifier.Simple
identifierFrom(Matcher m)
Return an Identifier.Simple, given aMatcher
that has matched an ISO_AND_PG_IDENTIFIER_CAPTURING.static boolean
separator(Matcher m, boolean significant)
Consume any SQL SEPARATOR at the beginning ofMatcher
m's current region.
-
-
-
Field Detail
-
ISO_REGULAR_IDENTIFIER_START
public static final Pattern ISO_REGULAR_IDENTIFIER_START
Allowed as the first character of a regular identifier by ISO.
-
ISO_REGULAR_IDENTIFIER_PART
public static final Pattern ISO_REGULAR_IDENTIFIER_PART
Allowed as any non-first character of a regular identifier by ISO.
-
ISO_REGULAR_IDENTIFIER
public static final Pattern ISO_REGULAR_IDENTIFIER
A complete regular identifier as allowed by ISO.
-
ISO_REGULAR_IDENTIFIER_CAPTURING
public static final Pattern ISO_REGULAR_IDENTIFIER_CAPTURING
A complete ISO regular identifier in a single capturing group.
-
ISO_DELIMITED_IDENTIFIER
public static final Pattern ISO_DELIMITED_IDENTIFIER
A complete delimited identifier as allowed by ISO. As it happens, this is also the form PostgreSQL uses for elements of a LIST_QUOTE-typed GUC.
-
ISO_DELIMITED_IDENTIFIER_CAPTURING
public static final Pattern ISO_DELIMITED_IDENTIFIER_CAPTURING
An ISO delimited identifier with a single capturing group that captures the content (which still needs to have "" replaced with " throughout). The capturing group is namedxd
.
-
ISO_UNICODE_ESCAPE_SPECIFIER
public static final Pattern ISO_UNICODE_ESCAPE_SPECIFIER
The escape-specifier part of a Unicode delimited identifier or string. The escape character itself is in the capturing group nameduec
. The group can be absent, in which case \ should be used as the uec.What makes this implementable as a regular expression is that what precedes/follows
UESCAPE
is restricted to simple white space, not the more generalseparator
(which can include nesting comments and therefore isn't a regular language). PostgreSQL enforces the same restriction, and a bit of language lawyering does confirm it's what ISO entails. ISO says "any<token>
may be followed by a<separator>
", and enumerates the expansions of<token>
. While an entire<Unicode character string literal>
or<Unicode delimited identifier>
is a<token>
, the constituent pieces of one, likeUESCAPE
here, are not.
-
ISO_UNICODE_IDENTIFIER
public static final Pattern ISO_UNICODE_IDENTIFIER
A Unicode delimited identifier. The body is in capturing groupxui
and the escape character in groupuec
. The body still needs to have "" replaced with ", andUnicode escape value
s decoded and replaced, and then it has to be verified to be no longer than 128 codepoints.
-
ISO_UNICODE_REPLACER
public static final String ISO_UNICODE_REPLACER
A compilable pattern to match aUnicode escape value
. A match should have one of three named capturing groups. Ifcev
, substitute theuec
itself. Ifu4d
oru6d
, substitute the codepoint represented by the hex digits. A match with none of those capturing groups indicates an ill-formed string.Maka a Pattern from this by supplying the right
uec
, so:Pattern.compile(String.format(ISO_UNICODE_REPLACER, Pattern.quote(uec)));
- See Also:
- Constant Field Values
-
PG_REGULAR_IDENTIFIER_START
public static final Pattern PG_REGULAR_IDENTIFIER_START
Allowed as the first character of a regular identifier by PostgreSQL (PG 7.4 -).
-
PG_REGULAR_IDENTIFIER_PART
public static final Pattern PG_REGULAR_IDENTIFIER_PART
Allowed as any non-first character of a regular identifier by PostgreSQL (PG 7.4 -).
-
PG_REGULAR_IDENTIFIER
public static final Pattern PG_REGULAR_IDENTIFIER
A complete regular identifier as allowed by PostgreSQL (PG 7.4 -).
-
PG_REGULAR_IDENTIFIER_CAPTURING
public static final Pattern PG_REGULAR_IDENTIFIER_CAPTURING
A complete PostgreSQL regular identifier in a single capturing group.
-
ISO_AND_PG_REGULAR_IDENTIFIER
public static final Pattern ISO_AND_PG_REGULAR_IDENTIFIER
A regular identifier that satisfies both ISO and PostgreSQL rules.
-
ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING
public static final Pattern ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING
A regular identifier that satisfies both ISO and PostgreSQL rules, in a single capturing group namedi
.
-
ISO_AND_PG_IDENTIFIER_CAPTURING
public static final Pattern ISO_AND_PG_IDENTIFIER_CAPTURING
Pattern that matches any identifier valid by both ISO and PG rules, with the presence of named capturing groups indicating which kind it is:i
for a regular identifier,xd
for a delimited identifier (still needing "" replaced with "), orxui
(with or without an explicituec
for a Unicode identifier (still needing "" to " and decoding ofUnicode escape value
s).
-
ISO_PG_JAVA_IDENTIFIER
public static final Pattern ISO_PG_JAVA_IDENTIFIER
An identifier by ISO SQL, PostgreSQL, and Java (not SQL at all) rules. (Not calledREGULAR
because Java allows no other form of identifier.) This restrictive form is the safest for identifiers being generated into a deployment descriptor file that an old version of PL/Java might load, because through 1.4.3 PL/Java used the Java identifier rules to recognize identifiers in deployment descriptors.
-
PG_OPERATOR
public static final Pattern PG_OPERATOR
An operator by PostgreSQL rules. The length limit (NAMELEN - 1
) is not applied here. The match will not include a-
followed by-
or a/
followed by*
, and a multicharacter match will not end with+
or-
unless it also contains one of~ ! @ # % ^ & | ` ?
.
-
NEWLINE
public static final Pattern NEWLINE
A newline, in any of the various forms recognized by the Java regex engine, letting it handle the details.
-
WHITESPACE_NO_NEWLINE
public static final Pattern WHITESPACE_NO_NEWLINE
White space except newline, for any Java-recognized newline.
-
SIMPLE_COMMENT
public static final Pattern SIMPLE_COMMENT
The kind of comment that extends from -- to the end of the line. This pattern does not eat the newline (though the ISO production does).
-
BRACKETED_COMMENT_INSIDE
public static final Pattern BRACKETED_COMMENT_INSIDE
Most of the inside of a bracketed comment, defined in an odd way. It expects both characters of the /* introducer to have been consumed already. This pattern will then eat the whole comment including both closing characters if it encounters no nested comment; otherwise it will consume everything including the / of the nested introducer, but leaving the *, and the<nest>
capturing group will be present in the result. That signals the caller to increment the nesting level, consume one * and invoke this pattern again. If the nested match succeeds (without again setting the<nest>
group), the caller should then decrement the nest level and match this pattern again to consume the rest of the comment at the original level.This pattern leaves the * unconsumed upon finding a nested comment introducer as a way to end the repetition in the SEPARATOR pattern, as nothing the SEPARATOR pattern can match can begin with a *.
-
SEPARATOR
public static final Pattern SEPARATOR
SQL's SEPARATOR, which can include any amount of whitespace, simple comments, or bracketed comments. This pattern will consume as much of all that as it can in one match. There are two capturing groups that might be set in a match result:<nl>
if there was at least one newline matched among the whitespace (which needs to be known to get the continuation of string literals right), and<nest>
if the start of a bracketed comment was encountered.In the
<nest>
case, the / of the comment introducer will have been consumed but the * will remain to consume (as described above for BRACKETED_COMMENT_INSIDE); the caller will need to increment a nest level, consume the *, and match BRACKETED_COMMENT_INSIDE to handle the nesting comment. Assuming that completes without another<nest>
found, the level should be decremented and BRACKETED_COMMENT_INSIDE matched again to match the rest of the outer comment. When that completes (without a<nest>
) at the outermost level, this pattern should be matched again to mop up any remaining SEPARATOR content.
-
-
Method Detail
-
separator
public static boolean separator(Matcher m, boolean significant)
Consume any SQL SEPARATOR at the beginning ofMatcher
m's current region.The region start is advanced to the character following any separator (or not at all, if no separator is found).
The meaning of the return value is altered by the significant parameter: when significant is true (meaning the very presence or absence of a separator is significant at that point in the grammar), the result will be true if any separator was found, false otherwise. When significant is false, the result does not reveal whether any separator was found, but will be true only if a separator was found that includes at least one newline. That information is needed for the grammar of string and binary-string literals.
- Parameters:
m
- aMatcher
whose current region should have any separator at the beginning consumed. The region start is advanced past any separator found. ThePattern
associated with theMatcher
may be changed.significant
- when true, the result should report whether any separator was found or not; when false, the result should report only whether a separator containing at least one newline was found, or not.- Returns:
- whether any separator was found, or whether any separator containing a newline was found, as selected by significant.
- Throws:
InputMismatchException
- if an unclosed /*-style comment is found.
-
identifierFrom
public static Lexicals.Identifier.Simple identifierFrom(Matcher m)
Return an Identifier.Simple, given aMatcher
that has matched an ISO_AND_PG_IDENTIFIER_CAPTURING. Will determine from the matching named groups which type of identifier it was, process the matched sequence appropriately, and return it.- Parameters:
m
- AMatcher
known to have matched an identifier.- Returns:
- Identifier.Simple made from the recovered string.
-
-