org.postgresql.pljava.sqlgen.Lexicals

public abstract class Lexicals extends Object

A few useful SQL lexical definitions supplied as Pattern objects. The idea is not to go overboard and reimplement an SQL lexer, but to capture in one place the rules for those bits of SQL snippets that are likely to be human-supplied in annotations and need to be checked for correctness when emitted into deployment descriptors. Identifiers, for a start. Supplied in the API module so they are available to javac to compile and generate DDR when the rest of PL/Java is not necessarily present. Of course backend code such as SQLDeploymentDescriptor can also refer to these.

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static class

Lexicals.Identifier

Class representing a SQL identifier.
Field Summary

Fields

Modifier and Type

Field

Description

static final Pattern

BRACKETED_COMMENT_INSIDE

Most of the inside of a bracketed comment, defined in an odd way.

static final Pattern

ISO_AND_PG_IDENTIFIER_CAPTURING

Pattern that matches any identifier valid by both ISO and PG rules, with the presence of named capturing groups indicating which kind it is: i for a regular identifier, xd for a delimited identifier (still needing "" replaced with "), or xui (with or without an explicit uec for a Unicode identifier (still needing "" to " and decoding of Unicode escape values).

static final Pattern

ISO_AND_PG_REGULAR_IDENTIFIER

A regular identifier that satisfies both ISO and PostgreSQL rules.

static final Pattern

ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING

A regular identifier that satisfies both ISO and PostgreSQL rules, in a single capturing group named i.

static final Pattern

ISO_DELIMITED_IDENTIFIER

A complete delimited identifier as allowed by ISO.

static final Pattern

ISO_DELIMITED_IDENTIFIER_CAPTURING

An ISO delimited identifier with a single capturing group that captures the content (which still needs to have "" replaced with " throughout).

static final Pattern

ISO_PG_JAVA_IDENTIFIER

An identifier by ISO SQL, PostgreSQL, and Java (not SQL at all) rules.

static final Pattern

ISO_REGULAR_IDENTIFIER

A complete regular identifier as allowed by ISO.

static final Pattern

ISO_REGULAR_IDENTIFIER_CAPTURING

A complete ISO regular identifier in a single capturing group.

static final Pattern

ISO_REGULAR_IDENTIFIER_PART

Allowed as any non-first character of a regular identifier by ISO.

static final Pattern

ISO_REGULAR_IDENTIFIER_START

Allowed as the first character of a regular identifier by ISO.

static final Pattern

ISO_UNICODE_ESCAPE_SPECIFIER

The escape-specifier part of a Unicode delimited identifier or string.

static final Pattern

ISO_UNICODE_IDENTIFIER

A Unicode delimited identifier.

static final String

ISO_UNICODE_REPLACER

A compilable pattern to match a Unicode escape value.

static final Pattern

NEWLINE

A newline, in any of the various forms recognized by the Java regex engine, letting it handle the details.

static final Pattern

PG_OPERATOR

An operator by PostgreSQL rules.

static final Pattern

PG_REGULAR_IDENTIFIER

A complete regular identifier as allowed by PostgreSQL (PG 7.4 -).

static final Pattern

PG_REGULAR_IDENTIFIER_CAPTURING

A complete PostgreSQL regular identifier in a single capturing group.

static final Pattern

PG_REGULAR_IDENTIFIER_PART

Allowed as any non-first character of a regular identifier by PostgreSQL (PG 7.4 -).

static final Pattern

PG_REGULAR_IDENTIFIER_START

Allowed as the first character of a regular identifier by PostgreSQL (PG 7.4 -).

static final Pattern

SEPARATOR

SQL's SEPARATOR, which can include any amount of whitespace, simple comments, or bracketed comments.

static final Pattern

SIMPLE_COMMENT

The kind of comment that extends from -- to the end of the line.

static final Pattern

WHITESPACE_NO_NEWLINE

White space except newline, for any Java-recognized newline.
Method Summary

Modifier and Type

Method

Description

static Lexicals.Identifier.Simple

identifierFrom(Matcher m)

Return an Identifier.Simple, given a Matcher that has matched an ISO_AND_PG_IDENTIFIER_CAPTURING.

static boolean

separator(Matcher m, boolean significant)

Consume any SQL SEPARATOR at the beginning of Matcher m's current region.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Field Details
- ISO_REGULAR_IDENTIFIER_START
  
  public static final Pattern ISO_REGULAR_IDENTIFIER_START
  
  Allowed as the first character of a regular identifier by ISO.
- ISO_REGULAR_IDENTIFIER_PART
  
  public static final Pattern ISO_REGULAR_IDENTIFIER_PART
  
  Allowed as any non-first character of a regular identifier by ISO.
- ISO_REGULAR_IDENTIFIER
  
  public static final Pattern ISO_REGULAR_IDENTIFIER
  
  A complete regular identifier as allowed by ISO.
- ISO_REGULAR_IDENTIFIER_CAPTURING
  
  public static final Pattern ISO_REGULAR_IDENTIFIER_CAPTURING
  
  A complete ISO regular identifier in a single capturing group.
- ISO_DELIMITED_IDENTIFIER
  
  public static final Pattern ISO_DELIMITED_IDENTIFIER
  
  A complete delimited identifier as allowed by ISO. As it happens, this is also the form PostgreSQL uses for elements of a LIST_QUOTE-typed GUC.
- ISO_DELIMITED_IDENTIFIER_CAPTURING
  
  public static final Pattern ISO_DELIMITED_IDENTIFIER_CAPTURING
  
  An ISO delimited identifier with a single capturing group that captures the content (which still needs to have "" replaced with " throughout). The capturing group is named xd.
- ISO_UNICODE_ESCAPE_SPECIFIER
  
  public static final Pattern ISO_UNICODE_ESCAPE_SPECIFIER
  
  The escape-specifier part of a Unicode delimited identifier or string. The escape character itself is in the capturing group named uec. The group can be absent, in which case \ should be used as the uec.
  What makes this implementable as a regular expression is that what precedes/follows UESCAPE is restricted to simple white space, not the more general separator (which can include nesting comments and therefore isn't a regular language). PostgreSQL enforces the same restriction, and a bit of language lawyering does confirm it's what ISO entails. ISO says "any <token> may be followed by a <separator>", and enumerates the expansions of <token>. While an entire <Unicode character string literal> or <Unicode delimited identifier> is a <token>, the constituent pieces of one, like UESCAPE here, are not.
- ISO_UNICODE_IDENTIFIER
  
  public static final Pattern ISO_UNICODE_IDENTIFIER
  
  A Unicode delimited identifier. The body is in capturing group xui and the escape character in group uec. The body still needs to have "" replaced with ", and Unicode escape values decoded and replaced, and then it has to be verified to be no longer than 128 codepoints.
- ISO_UNICODE_REPLACER
  
  public static final String ISO_UNICODE_REPLACER
  
  A compilable pattern to match a Unicode escape value. A match should have one of three named capturing groups. If cev, substitute the uec itself. If u4d or u6d, substitute the codepoint represented by the hex digits. A match with none of those capturing groups indicates an ill-formed string.
  Maka a Pattern from this by supplying the right uec, so: Pattern.compile(String.format(ISO_UNICODE_REPLACER, Pattern.quote(uec)));
  See Also:
  
  Constant Field Values
- PG_REGULAR_IDENTIFIER_START
  
  public static final Pattern PG_REGULAR_IDENTIFIER_START
  
  Allowed as the first character of a regular identifier by PostgreSQL (PG 7.4 -).
- PG_REGULAR_IDENTIFIER_PART
  
  public static final Pattern PG_REGULAR_IDENTIFIER_PART
  
  Allowed as any non-first character of a regular identifier by PostgreSQL (PG 7.4 -).
- PG_REGULAR_IDENTIFIER
  
  public static final Pattern PG_REGULAR_IDENTIFIER
  
  A complete regular identifier as allowed by PostgreSQL (PG 7.4 -).
- PG_REGULAR_IDENTIFIER_CAPTURING
  
  public static final Pattern PG_REGULAR_IDENTIFIER_CAPTURING
  
  A complete PostgreSQL regular identifier in a single capturing group.
- ISO_AND_PG_REGULAR_IDENTIFIER
  
  public static final Pattern ISO_AND_PG_REGULAR_IDENTIFIER
  
  A regular identifier that satisfies both ISO and PostgreSQL rules.
- ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING
  
  public static final Pattern ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING
  
  A regular identifier that satisfies both ISO and PostgreSQL rules, in a single capturing group named i.
- ISO_AND_PG_IDENTIFIER_CAPTURING
  
  public static final Pattern ISO_AND_PG_IDENTIFIER_CAPTURING
  
  Pattern that matches any identifier valid by both ISO and PG rules, with the presence of named capturing groups indicating which kind it is: i for a regular identifier, xd for a delimited identifier (still needing "" replaced with "), or xui (with or without an explicit uec for a Unicode identifier (still needing "" to " and decoding of Unicode escape values).
- ISO_PG_JAVA_IDENTIFIER
  
  public static final Pattern ISO_PG_JAVA_IDENTIFIER
  
  An identifier by ISO SQL, PostgreSQL, and Java (not SQL at all) rules. (Not called REGULAR because Java allows no other form of identifier.) This restrictive form is the safest for identifiers being generated into a deployment descriptor file that an old version of PL/Java might load, because through 1.4.3 PL/Java used the Java identifier rules to recognize identifiers in deployment descriptors.
- PG_OPERATOR
  
  public static final Pattern PG_OPERATOR
  
  An operator by PostgreSQL rules. The length limit (NAMELEN - 1) is not applied here. The match will not include a - followed by - or a / followed by *, and a multicharacter match will not end with + or - unless it also contains one of ~ ! @ # % ^ & | ` ?.
- NEWLINE
  
  public static final Pattern NEWLINE
  
  A newline, in any of the various forms recognized by the Java regex engine, letting it handle the details.
- WHITESPACE_NO_NEWLINE
  
  public static final Pattern WHITESPACE_NO_NEWLINE
  
  White space except newline, for any Java-recognized newline.
- SIMPLE_COMMENT
  
  public static final Pattern SIMPLE_COMMENT
  
  The kind of comment that extends from -- to the end of the line. This pattern does not eat the newline (though the ISO production does).
- BRACKETED_COMMENT_INSIDE
  
  public static final Pattern BRACKETED_COMMENT_INSIDE
  
  Most of the inside of a bracketed comment, defined in an odd way. It expects both characters of the /* introducer to have been consumed already. This pattern will then eat the whole comment including both closing characters if it encounters no nested comment; otherwise it will consume everything including the / of the nested introducer, but leaving the *, and the <nest> capturing group will be present in the result. That signals the caller to increment the nesting level, consume one * and invoke this pattern again. If the nested match succeeds (without again setting the <nest> group), the caller should then decrement the nest level and match this pattern again to consume the rest of the comment at the original level.
  This pattern leaves the * unconsumed upon finding a nested comment introducer as a way to end the repetition in the SEPARATOR pattern, as nothing the SEPARATOR pattern can match can begin with a *.
- SEPARATOR
  
  public static final Pattern SEPARATOR
  
  SQL's SEPARATOR, which can include any amount of whitespace, simple comments, or bracketed comments. This pattern will consume as much of all that as it can in one match. There are two capturing groups that might be set in a match result: <nl> if there was at least one newline matched among the whitespace (which needs to be known to get the continuation of string literals right), and <nest> if the start of a bracketed comment was encountered.
  In the <nest> case, the / of the comment introducer will have been consumed but the * will remain to consume (as described above for BRACKETED_COMMENT_INSIDE); the caller will need to increment a nest level, consume the *, and match BRACKETED_COMMENT_INSIDE to handle the nesting comment. Assuming that completes without another <nest> found, the level should be decremented and BRACKETED_COMMENT_INSIDE matched again to match the rest of the outer comment. When that completes (without a <nest>) at the outermost level, this pattern should be matched again to mop up any remaining SEPARATOR content.
Method Details
- separator
  
  public static boolean separator(Matcher m, boolean significant)
  
  Consume any SQL SEPARATOR at the beginning of Matcher m's current region.
  The region start is advanced to the character following any separator (or not at all, if no separator is found).
  The meaning of the return value is altered by the significant parameter: when significant is true (meaning the very presence or absence of a separator is significant at that point in the grammar), the result will be true if any separator was found, false otherwise. When significant is false, the result does not reveal whether any separator was found, but will be true only if a separator was found that includes at least one newline. That information is needed for the grammar of string and binary-string literals.
  
  Parameters:
  
  m - a Matcher whose current region should have any separator at the beginning consumed. The region start is advanced past any separator found. The Pattern associated with the Matcher may be changed.
  
  significant - when true, the result should report whether any separator was found or not; when false, the result should report only whether a separator containing at least one newline was found, or not.
  
  Returns:
  
  whether any separator was found, or whether any separator containing a newline was found, as selected by significant.
  
  Throws:
  
  InputMismatchException - if an unclosed /*-style comment is found.
- identifierFrom
  
  public static Lexicals.Identifier.Simple identifierFrom(Matcher m)
  
  Return an Identifier.Simple, given a Matcher that has matched an ISO_AND_PG_IDENTIFIER_CAPTURING. Will determine from the matching named groups which type of identifier it was, process the matched sequence appropriately, and return it.
  
  Parameters:
  
  m - A Matcher known to have matched an identifier.
  
  Returns:
  
  Identifier.Simple made from the recovered string.

Class Lexicals

Nested Class Summary

Field Summary

Method Summary

Methods inherited from class java.lang.Object

Field Details

ISO_REGULAR_IDENTIFIER_START

ISO_REGULAR_IDENTIFIER_PART

ISO_REGULAR_IDENTIFIER

ISO_REGULAR_IDENTIFIER_CAPTURING

ISO_DELIMITED_IDENTIFIER

ISO_DELIMITED_IDENTIFIER_CAPTURING

ISO_UNICODE_ESCAPE_SPECIFIER

ISO_UNICODE_IDENTIFIER

ISO_UNICODE_REPLACER

PG_REGULAR_IDENTIFIER_START

PG_REGULAR_IDENTIFIER_PART

PG_REGULAR_IDENTIFIER

PG_REGULAR_IDENTIFIER_CAPTURING

ISO_AND_PG_REGULAR_IDENTIFIER

ISO_AND_PG_REGULAR_IDENTIFIER_CAPTURING

ISO_AND_PG_IDENTIFIER_CAPTURING

ISO_PG_JAVA_IDENTIFIER

PG_OPERATOR

NEWLINE

WHITESPACE_NO_NEWLINE

SIMPLE_COMMENT

BRACKETED_COMMENT_INSIDE

SEPARATOR

Method Details

separator

identifierFrom