Exploring Apache TinkerPop 3.4.8’s new features in Amazon Neptune

Stephen Mallette November 18, 2020

18 6 minutes read

Amazon Neptune engine version 1.0.4.0 supports Apache TinkerPop 3.4.8, which introduces some new features and bug fixes. This post outlines these features, like the new elementMap() step and the improved behavior for working with map instances, and provides some examples to demonstrate their capabilities with Neptune. Upgrading your drivers to 3.4.8 should be straightforward and typically require no changes to your Gremlin code.

This article demonstrates commands using the Gremlin console and relies on the sample data provided by the air routes dataset, which prior posts have utilized for such purposes. For instructions on loading it through Neptune Workbench, see Visualize query results using the Amazon Neptune workbench.

elementMap()

Users have long utilized valueMap() to transform graph elements (vertices, edges, and vertex properties) to a map representation. For example, see the following code:

gremlin> g.V().has('airport','code','BOS').valueMap('code','city','runways')
==>[code:[BOS],city:[Boston],runways:[6]]
gremlin> g.V().has('airport','code','BOS').
......1>   valueMap('code','city','runways').with(WithOptions.tokens)
==>[id:5,label:airport,code:[BOS],city:[Boston],runways:[6]]
gremlin> g.V().has('airport','code','BOS').outE().limit(1).
......1>   valueMap('code','city','runways').with(WithOptions.tokens)
==>[id:190,label:route]
gremlin> g.V().has('airport','code','BOS').outE().limit(1).valueMap()
==>[dist:612]
gremlin> g.V().has('airport','code','BOS').outE().limit(1).
......1>   valueMap('dist').with(WithOptions.tokens)
==>[id:190,label:route,dist:612]

This approach has the desired effect, but you typically encounter two issues. The first issue is that valueMap() assumes multi-properties for values even if the cardinality was single, and therefore each map entry value is wrapped in a List. This List makes the results a bit unwieldly and forces you to unpack the List to get your single value. Although you can accomplish this unpacking directly in Gremlin with valueMap().by(unfold()), it’s inconvenient because including that extra by() modulator is more the rule than the exception.

The second issue is related to edges that produce similar output to that of a vertex or vertex property. The problem is that it doesn’t provide reference to the incident vertices bound to it. Without that data present, you have to do some form of custom project() of your edge, like the following code:

gremlin> g.V().has('airport','code','BOS').outE().limit(1).
......1>   project('properties','in','out').
......2>     by(valueMap('dist').with(WithOptions.tokens)).
......3>     by(inV().id()).
......4>     by(outV().id())
==>[properties:[id:190,label:route,dist:612],in:21,out:5]

To address these two shortcomings, TinkerPop introduced elementMap(), which assumes single cardinality for properties and returns the in and out reference vertices of an edge:

gremlin> g.V().has('airport','code','BOS').outE().limit(1).elementMap('dist')
==>[id:190,label:route,IN:[id:21,label:airport],OUT:[id:5,label:airport],dist:612]

When upgrading, replace calls to valueMap() with elementMap(). This replacement is an especially nice improvement if using the valueMap().by(unfold()) pattern, because the readability of the traversal should improve without the by() modulator and there is no secondary transformation of the map to its final result.

While performing this refactoring exercise, it’s also a good opportunity to look for valueMap() usage that doesn’t specify property keys and, when replacing those with elementMap(), to be explicit in their specification, as shown in the examples. It’s a good practice to specify these keys explicitly for the same reasons that you specify the column names in a SQL statement as opposed to using a wildcard, thus avoiding SELECT * FROM table.

Working with map instances

Graph elements, which are vertices, edges, and vertex properties, have similar behavior as map objects in the sense that their contents are accessed by way of keys. In Gremlin, the access patterns for elements and maps have drawn closer together with this release because the by(String) modulator now works on both objects equally well. Prior to this change, using by(String) on an element or map yielded two different results. For example, see the following code:

gremlin> g.V().has('airport','code','BOS').project('c').by('code')
==>{c=BOS}
gremlin> g.V().has('airport','code','BOS').valueMap().project('c').by('code')
{"detailedMessage":"PropertyMap cannot have properties","requestId":"36a9279d-b515-4b43-84b6-63621251f94d","code":"UnsupportedOperationException"}
Type ':help' or ':h' for help.
Display stack trace? [yN]n

The error wasn’t terribly informative, but rest assured that the issue is related to the use of by(String) where it isn’t welcome. In making by(String) behave more consistently, you can now use it in a variety of contexts related to map objects. See the following code:

gremlin> g.V().has('airport','code','BOS').elementMap().project('c').by('code')
==>[c:BOS]
gremlin> g.V().hasLabel('airport').limit(10).
......1>   elementMap('code','country').
......2>   order().by('code',desc)
==>[id:25,label:airport,country:US,code:TPA]
==>[id:28,label:airport,country:US,code:SNA]
==>[id:24,label:airport,country:US,code:SJC]
==>[id:23,label:airport,country:US,code:SFO]
==>[id:22,label:airport,country:US,code:SEA]
==>[id:26,label:airport,country:US,code:SAN]
==>[id:44,label:airport,country:US,code:SAF]
==>[id:45,label:airport,country:US,code:PHL]
==>[id:27,label:airport,country:US,code:LGB]
==>[id:46,label:airport,country:US,code:DTW]

In evaluating your existing code to determine if this change is helpful to you, you should typically look for situations where there was use of select(String) in a by() modulator, which was the usual approach for grabbing a value from a map in this context. For example, see the following query:

gremlin> g.V().has('airport','code','BOS').elementMap().project('c').by(select('code'))
==>[c:BOS]

You can replace it with:

gremlin> g.V().has('airport','code','BOS').elementMap().project('c').by('code')
==>[c:BOS]

Edge property equality

TinkerPop established a more predictable behavior for edge property equality. Equality for edge properties, which unlike vertex properties don’t have a unique identifier, no longer concern themselves with their parent element (the edge object itself). In other words, if the key and the value are the same, then the property is considered equal irrespective of whether that property is associated with the same edge or not. See the following code:

gremlin> g.E().has('dist',2300).properties()
==>p[dist->2300]
==>p[dist->2300]
==>p[dist->2300]
==>p[dist->2300]
gremlin> g.E().has('dist',2300).properties().dedup().count()
==>1

Prior to this upgrade, the count would have been 4 because the properties were each coming from a different edge object. Unfortunately, the preceding example demonstrates how this alteration might represent a breaking change because the behavior of the traversal has been modified. If you relied on the old approach, you might find your results different after upgrade. TinkerPop considered the old behavior bad enough to classify the issue as a bug and chose to introduce the fix despite the fact that it changed behavior.

If you need the old behavior, you need to take some approach that includes the edge identifier with each property. The following code presents one way of doing that:

gremlin> g.E().has('dist',2300).as('e').
......1>   properties().
......2>   map(union(select('e'),
......3>             identity())).
......4>   dedup().count()
==>4

Setting timeouts

There are times when you might wish to configure a particular request to have a timeout that is different than the default setting provided by the server. There are two specific contexts where this upgrade affects how those per-request timeouts are set:

A Gremlin bytecode-based request using with() syntax.
A request using the Java driver, where the RequestMessage is manually constructed and the timeout is provided to that message using the Builder object’s add() or addArg() options. This form is considerably less common and is usually reserved for either advanced cases or is inherited from older code that was never upgraded to take advantage of newer APIs (such as RequestOptions introduced at TinkerPop 3.4.2).

TinkerPop has long used the scriptEvaluationTimeout to control the length of time in milliseconds a request is allowed to run before timing out. That form is the String representation of the configuration option, but it’s sometimes also referred to by way of the constant Tokens.ARGS_SCRIPT_EVAL_TIMEOUT in the Java driver. This option is now deprecated, though still supported.

The preferred naming is simply evaluationTimeout when using the String form or, if using the constant in the Java driver, Tokens.ARGS_EVAL_TIMEOUT. The naming was changed to better reflect the general nature of the timeout in that it was used for both scripts and bytecode. Consider converting your code to use this preferred naming when you upgrade, because support for both deprecated options may be removed in future versions.

Typically, look for Java code lines like the following:

// “g” is a TraversalSource constructed by traversal(),withRemote(…)
List<Vertex> vertices = g.with(“scriptEvaluationTimeout”, 500L).V().out("knows").toList()

List<Vertex> vertices = g.with(Tokens.ARGS_SCRIPT_EVAL_TIMEOUT, 500L).V().out("knows").toList()

You then code like the following:

// “g” is a TraversalSource constructed by traversal(),withRemote(…)
List<Vertex> vertices = g.with(“evaluationTimeout”, 500L).V().out("knows").toList()

List<Vertex> vertices = g.with(Tokens.ARGS_EVAL_TIMEOUT, 500L).V().out("knows").toList()

Improved error handling for JavaScript

The Gremlin JavaScript driver now produces a ResponseError rather than attempting to pack the server statusMessage and statusCode into a string of a more general Error.message. The ResponseError also includes the statusAttributes, which incorporate more information on server-side exceptions. Although the Error.message property contents have not been changed at this time, it’s worth identifying any code that is performing a string parsing of that field and replacing it with usage of the new fields provided.

Session support in GLVs

Some use cases simply require session support and, if you needed to use that functionality, you needed to use Java because that was the only language driver to support that mode of operation. The latest version of the drivers (Python, Javascript, and .NET), however, all support session functionality. As with Java, this new session support in these languages is meant for script submission only (not bytecode). The following examples demonstrate how sessions are established:

// javascript
const sessionId = utils.getUuid().toString()
const client = new Client('wss://<neptune-host>:8182/gremlin', { traversalSource: 'g', 'session': sessionId });

# python
client = Client('wss://<neptune-host>:8182/gremlin', 'g', session=str(uuid.uuid4()))

// C#
var gremlinServer = new GremlinServer("<neptune-host>", 8182);
var client = new GremlinClient(gremlinServer, sessionId: Guid.NewGuid().ToString()))

Conclusion

This post was designed to call attention to some of the key changes from TinkerPop that are now officially compatible with Neptune. There were many other changes that offered bug fixes and minor enhancements. If you’re interested in learning more about any of those improvements, see the official TinkerPop GitHub repo and view the CHANGELOG.

About the Author

Stephen Mallette is a member of the Amazon Neptune team at AWS. He has developed graph database and graph processing technology for many years. He is a decade long contributor to the Apache TinkerPop project, the home of the Gremlin graph query language, and is currently serving as its PMC Chair.