public abstract class AbstractRDFParser<T extends AbstractRDFParser<T>> extends Object implements RDFParser, Cloneable
This abstract class keeps the properties in protected fields like
sourceFile
using Optional
. Some basic checking like
checkIsAbsolute(IRI)
is performed.
This class and its subclasses are Cloneable
, immutable and
(therefore) thread-safe - each call to option methods like
contentType(String)
or source(IRI)
will return a cloned,
mutated copy.
By default, parsing is done by the abstract method
parseSynchronusly()
- which is executed in a cloned snapshot - hence
multiple parse()
calls are thread-safe. The default parse()
uses a thread pool in threadGroup
- but implementations can override
parse()
(e.g. because it has its own threading model or use
asynchronous remote execution).
RDFParser.ParseResult
Modifier and Type | Field and Description |
---|---|
static ThreadGroup |
threadGroup |
Constructor and Description |
---|
AbstractRDFParser() |
Modifier and Type | Method and Description |
---|---|
protected T |
asT() |
T |
base(IRI base)
Specify a base IRI to use for parsing any relative IRI references.
|
T |
base(String base)
Specify a base IRI to use for parsing any relative IRI references.
|
protected void |
checkBaseRequired()
Check if base is required.
|
protected void |
checkContentType()
Subclasses can override this method to check compatibility with the
contentType setting.
|
protected void |
checkIsAbsolute(IRI iri)
Check if an iri is absolute.
|
protected void |
checkSource()
Check that one and only one source is present and valid.
|
protected void |
checkTarget()
Subclasses can override this method to check the target is valid.
|
T |
clone() |
T |
contentType(RDFSyntax rdfSyntax)
Specify the content type of the RDF syntax to parse.
|
T |
contentType(String contentType)
Specify the content type of the RDF syntax to parse.
|
protected RDF |
createRDFTermFactory()
Create a new
RDF for a parse session. |
Optional<IRI> |
getBase()
Get the set base
IRI , if present. |
Optional<String> |
getContentType()
Get the set content-type String, if any.
|
Optional<RDFSyntax> |
getContentTypeSyntax()
Get the set content-type
RDFSyntax , if any. |
Optional<RDF> |
getRdfTermFactory()
Get the set
RDF , if any. |
Optional<Path> |
getSourceFile()
Get the set source
Path . |
Optional<InputStream> |
getSourceInputStream()
Get the set source
InputStream . |
Optional<IRI> |
getSourceIri()
Get the set source
Path . |
Consumer<Quad> |
getTarget()
Get the target to consume parsed Quads.
|
Optional<Dataset> |
getTargetDataset()
Get the target dataset as set by
target(Dataset) . |
Optional<Graph> |
getTargetGraph()
Get the target graph as set by
target(Graph) . |
protected static Optional<RDFSyntax> |
guessRDFSyntax(Path path)
Guess RDFSyntax from a local file's extension.
|
Future<RDFParser.ParseResult> |
parse()
Parse the specified source.
|
protected abstract void |
parseSynchronusly()
|
protected T |
prepareForParsing()
Prepare a clone of this RDFParser which have been checked and completed.
|
T |
rdfTermFactory(RDF rdfTermFactory)
|
protected void |
resetSource()
Reset all source* fields to Optional.empty()
|
protected void |
resetTarget()
Reset all optional target* fields to
Optional.empty() . |
T |
source(InputStream inputStream)
Specify a source
InputStream to parse. |
T |
source(IRI iri)
Specify an absolute source
IRI to retrieve and parse. |
T |
source(Path file)
Specify a source file
Path to parse. |
T |
source(String iri)
Specify an absolute source IRI to retrieve and parse.
|
T |
target(Consumer<Quad> consumer)
Specify a consumer for parsed quads.
|
T |
target(Dataset dataset)
Specify a
Dataset to add parsed quads to. |
T |
target(Graph graph)
Specify a
Graph to add parsed triples to. |
public static final ThreadGroup threadGroup
public AbstractRDFParser()
public Optional<RDF> getRdfTermFactory()
RDF
, if any.RDF
to use, or Optional.empty()
if it has not
been setpublic Optional<RDFSyntax> getContentTypeSyntax()
RDFSyntax
, if any.
If this is Optional.isPresent()
, then getContentType()
contains the value of RDFSyntax.mediaType()
.
RDFSyntax
of the content type, or
Optional.empty()
if it has not been setpublic final Optional<String> getContentType()
If this is Optional.isPresent()
and is recognized by
RDFSyntax.byMediaType(String)
, then the corresponding
RDFSyntax
is set on getContentType()
, otherwise that is
Optional.empty()
.
text/turtle
,
or Optional.empty()
if it has not been setpublic Consumer<Quad> getTarget()
From the call to parseSynchronusly()
, this will be a
non-null
value (as a target is a required setting).
Quad
s, or null
if it
has not yet been set.public Optional<Dataset> getTargetDataset()
target(Dataset)
.
The return value is Optional.isPresent()
if and only if
target(Dataset)
has been set, meaning that the implementation
may choose to append parsed quads to the Dataset
directly instead
of relying on the generated getTarget()
consumer.
If this value is present, then getTargetGraph()
MUST be
Optional.empty()
.
Optional.empty()
if another kind
of target has been set.public Optional<Graph> getTargetGraph()
target(Graph)
.
The return value is Optional.isPresent()
if and only if
target(Graph)
has been set, meaning that the implementation may
choose to append parsed triples to the Graph
directly instead of
relying on the generated getTarget()
consumer.
If this value is present, then getTargetDataset()
MUST be
Optional.empty()
.
Optional.empty()
if another kind of
target has been set.public Optional<IRI> getBase()
IRI
, if present.IRI
, or Optional.empty()
if it has not
been setpublic Optional<InputStream> getSourceInputStream()
InputStream
.
If this is Optional.isPresent()
, then getSourceFile()
and getSourceIri()
are Optional.empty()
.
InputStream
, or Optional.empty()
if it
has not been setpublic Optional<Path> getSourceFile()
Path
.
If this is Optional.isPresent()
, then
getSourceInputStream()
and getSourceIri()
are
Optional.empty()
.
Path
, or Optional.empty()
if it has
not been setpublic Optional<IRI> getSourceIri()
Path
.
If this is Optional.isPresent()
, then
getSourceInputStream()
and getSourceInputStream()
are
Optional.empty()
.
IRI
, or Optional.empty()
if it has not
been setpublic T rdfTermFactory(RDF rdfTermFactory)
RDFParser
RDF
to use for generating RDFTerm
s.
This option may be used together with RDFParser.target(Graph)
to override
the implementation's default factory and graph.
Warning: Using the same RDF
for multiple
RDFParser.parse()
calls may accidentally merge BlankNode
s having
the same label, as the parser may use the
RDF.createBlankNode(String)
method from the parsed blank node
labels.
rdfTermFactory
in interface RDFParser
rdfTermFactory
- RDF
to use for generating RDFTerms.RDFParser
that will use the specified rdfTermFactoryRDFParser.target(Graph)
public T contentType(RDFSyntax rdfSyntax) throws IllegalArgumentException
RDFParser
This option can be used to select the RDFSyntax of the source, overriding
any Content-Type
headers or equivalent.
The character set of the RDFSyntax is assumed to be
StandardCharsets.UTF_8
unless overridden within the document
(e.g. <?xml version="1.0" encoding="iso-8859-1"?>
in
RDFSyntax.RDFXML
).
This method will override any contentType set with
RDFParser.contentType(String)
.
contentType
in interface RDFParser
rdfSyntax
- An RDFSyntax
to parse the source according to, e.g.
RDFSyntax.TURTLE
.RDFParser
that will use the specified content type.IllegalArgumentException
- If this RDFParser does not support the specified RDFSyntax.RDFParser.contentType(String)
public T contentType(String contentType) throws IllegalArgumentException
RDFParser
This option can be used to select the RDFSyntax of the source, overriding
any Content-Type
headers or equivalent.
The content type MAY include a charset
parameter if the RDF
media types permit it; the default charset is
StandardCharsets.UTF_8
unless overridden within the document.
This method will override any contentType set with
RDFParser.contentType(RDFSyntax)
.
contentType
in interface RDFParser
contentType
- A content-type string, e.g. application/ld+json
or text/turtle;charset="UTF-8"
as specified by
RFC7231.RDFParser
that will use the specified content type.IllegalArgumentException
- If the contentType has an invalid syntax, or this RDFParser
does not support the specified contentType.RDFParser.contentType(RDFSyntax)
public T base(IRI base)
RDFParser
Setting this option will override any protocol-specific base IRI (e.g.
Content-Location
header) or the RDFParser.source(IRI)
IRI,
but does not override any base IRIs set within the source document (e.g.
@base
in Turtle documents).
If the source is in a syntax that does not support relative IRI
references (e.g. RDFSyntax.NTRIPLES
), setting the
base
has no effect.
This method will override any base IRI set with RDFParser.base(String)
.
base
in interface RDFParser
base
- An absolute IRI to use as a base.RDFParser
that will use the specified base IRI.RDFParser.base(String)
public T base(String base) throws IllegalArgumentException
RDFParser
Setting this option will override any protocol-specific base IRI (e.g.
Content-Location
header) or the RDFParser.source(IRI)
IRI,
but does not override any base IRIs set within the source document (e.g.
@base
in Turtle documents).
If the source is in a syntax that does not support relative IRI
references (e.g. RDFSyntax.NTRIPLES
), setting the
base
has no effect.
This method will override any base IRI set with RDFParser.base(IRI)
.
base
in interface RDFParser
base
- An absolute IRI to use as a base.RDFParser
that will use the specified base IRI.IllegalArgumentException
- If the base is not a valid absolute IRI stringRDFParser.base(IRI)
public T source(InputStream inputStream)
RDFParser
InputStream
to parse.
The source set will not be read before the call to RDFParser.parse()
.
The InputStream will not be closed after parsing. The InputStream does
not need to support InputStream.markSupported()
.
The parser might not consume the complete stream (e.g. an RDF/XML parser
may not read beyond the closing tag of
</rdf:Description>
).
The RDFParser.contentType(RDFSyntax)
or RDFParser.contentType(String)
SHOULD be set before calling RDFParser.parse()
.
The character set is assumed to be StandardCharsets.UTF_8
unless
the RDFParser.contentType(String)
specifies otherwise or the document
declares its own charset (e.g. RDF/XML with a
<?xml encoding="iso-8859-1">
header).
The RDFParser.base(IRI)
or RDFParser.base(String)
MUST be set before
calling RDFParser.parse()
, unless the RDF syntax does not permit relative
IRIs (e.g. RDFSyntax.NTRIPLES
).
This method will override any source set with RDFParser.source(IRI)
,
RDFParser.source(Path)
or RDFParser.source(String)
.
public T source(Path file)
RDFParser
Path
to parse.
The source set will not be read before the call to RDFParser.parse()
.
The RDFParser.contentType(RDFSyntax)
or RDFParser.contentType(String)
SHOULD be set before calling RDFParser.parse()
.
The character set is assumed to be StandardCharsets.UTF_8
unless
the RDFParser.contentType(String)
specifies otherwise or the document
declares its own charset (e.g. RDF/XML with a
<?xml encoding="iso-8859-1">
header).
The RDFParser.base(IRI)
or RDFParser.base(String)
MAY be set before calling
RDFParser.parse()
, otherwise Path.toUri()
will be used as the base
IRI.
This method will override any source set with RDFParser.source(IRI)
,
RDFParser.source(InputStream)
or RDFParser.source(String)
.
public T source(IRI iri)
RDFParser
IRI
to retrieve and parse.
The source set will not be read before the call to RDFParser.parse()
.
If this builder does not support the given IRI protocol (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890
), this method
should succeed, while the RDFParser.parse()
should throw an
IOException
.
The RDFParser.contentType(RDFSyntax)
or RDFParser.contentType(String)
MAY
be set before calling RDFParser.parse()
, in which case that type MAY be
used for content negotiation (e.g. Accept
header in HTTP),
and SHOULD be used for selecting the RDFSyntax.
The character set is assumed to be StandardCharsets.UTF_8
unless
the protocol's equivalent of Content-Type
specifies
otherwise or the document declares its own charset (e.g. RDF/XML with a
<?xml encoding="iso-8859-1">
header).
The RDFParser.base(IRI)
or RDFParser.base(String)
MAY be set before calling
RDFParser.parse()
, otherwise the source IRI will be used as the base IRI.
This method will override any source set with RDFParser.source(Path)
,
RDFParser.source(InputStream)
or RDFParser.source(String)
.
public T source(String iri) throws IllegalArgumentException
RDFParser
The source set will not be read before the call to RDFParser.parse()
.
If this builder does not support the given IRI (e.g.
urn:uuid:ce667463-c5ab-4c23-9b64-701d055c4890
), this method
should succeed, while the RDFParser.parse()
should throw an
IOException
.
The RDFParser.contentType(RDFSyntax)
or RDFParser.contentType(String)
MAY
be set before calling RDFParser.parse()
, in which case that type MAY be
used for content negotiation (e.g. Accept
header in HTTP),
and SHOULD be used for selecting the RDFSyntax.
The character set is assumed to be StandardCharsets.UTF_8
unless
the protocol's equivalent of Content-Type
specifies
otherwise or the document declares its own charset (e.g. RDF/XML with a
<?xml encoding="iso-8859-1">
header).
The RDFParser.base(IRI)
or RDFParser.base(String)
MAY be set before calling
RDFParser.parse()
, otherwise the source IRI will be used as the base IRI.
This method will override any source set with RDFParser.source(Path)
,
RDFParser.source(InputStream)
or RDFParser.source(IRI)
.
source
in interface RDFParser
iri
- An IRI to retrieve and parseRDFParser
that will use the specified source.IllegalArgumentException
- If the base is not a valid absolute IRI stringprotected void checkIsAbsolute(IRI iri) throws IllegalArgumentException
Used by source(String)
and base(String)
.
iri
- IRI to checkIllegalArgumentException
- If the IRI is not absoluteprotected void checkSource() throws IOException
Used by parse()
.
Subclasses might override this method, e.g. to support other source combinations, or to check if the sourceIri is resolvable.
IOException
- If a source file can't be readprotected void checkBaseRequired() throws IllegalStateException
IllegalStateException
- if base is required, but not set.protected void resetSource()
Subclasses should override this and call super.resetSource()
if they need to reset any additional source* fields.
protected void resetTarget()
Optional.empty()
.
Note that the consumer set for getTarget()
is
note reset.
Subclasses should override this and call super.resetTarget()
if they need to reset any additional target* fields.
protected abstract void parseSynchronusly() throws IOException, RDFParseException
sourceInputStream
, sourceFile
or
sourceIri
.
One of the source fields MUST be present, as checked by
checkSource()
.
checkBaseRequired()
is called to verify if getBase()
is
required.
IOException
- If the source could not be readRDFParseException
- If the source could not be parsed (e.g. a .ttl file was not
valid Turtle)protected T prepareForParsing() throws IOException, IllegalStateException
The returned clone will always have getTarget()
and
getRdfTermFactory()
present.
If the getSourceFile()
is present, but the getBase()
is
not present, the base will be set to the file:///
IRI for
the Path's real path (e.g. resolving any symbolic links).
IOException
- If the source was not accessible (e.g. a file was not found)IllegalStateException
- If the parser was not in a compatible setting (e.g.
contentType was an invalid string)protected void checkTarget()
The default implementation throws an IllegalStateException if the target has not been set.
protected void checkContentType() throws IllegalStateException
IllegalStateException
- if the getContentType()
or
getContentTypeSyntax()
is not compatible or invalidprotected static Optional<RDFSyntax> guessRDFSyntax(Path path)
This method can be used by subclasses if getContentType()
is not
present and getSourceFile()
is set.
path
- Path which extension should be checkedRDFSyntax
which has a matching
RDFSyntax.fileExtension()
, otherwise
Optional.empty()
.protected RDF createRDFTermFactory()
RDF
for a parse session.
This is called by parse()
to set rdfTermFactory(RDF)
if
it is Optional.empty()
.
As parsed blank nodes might be made with
RDF.createBlankNode(String)
, each call to this method SHOULD
return a new RDF instance.
RDF
public Future<RDFParser.ParseResult> parse() throws IOException, IllegalStateException
RDFParser
A source method (e.g. RDFParser.source(InputStream)
, RDFParser.source(IRI)
,
RDFParser.source(Path)
, RDFParser.source(String)
or an equivalent subclass
method) MUST have been called before calling this method, otherwise an
IllegalStateException
will be thrown.
A target method (e.g. RDFParser.target(Consumer)
,
RDFParser.target(Dataset)
, RDFParser.target(Graph)
or an equivalent
subclass method) MUST have been called before calling parse(), otherwise
an IllegalStateException
will be thrown.
It is undefined if this method is thread-safe, however the
RDFParser
may be reused (e.g. setting a different source) as soon
as the Future
has been returned from this method.
The RDFParser SHOULD perform the parsing as an asynchronous operation,
and return the Future
as soon as preliminary checks (such as
validity of the RDFParser.source(IRI)
and RDFParser.contentType(RDFSyntax)
settings) have finished. The future SHOULD not mark
Future.isDone()
before parsing is complete. A synchronous
implementation MAY be blocking on the parse()
call and
return a Future that is already Future.isDone()
.
The returned Future
contains a RDFParser.ParseResult
.
Implementations may subclass this interface to provide any parser
details, e.g. list of warnings. null
is a possible return
value if no details are available, but parsing succeeded.
If an exception occurs during parsing, (e.g. IOException
or
org.apache.commons.rdf.simple.experimental.RDFParseException
),
it should be indicated as the
Throwable.getCause()
in the
ExecutionException
thrown on
Future.get()
.
parse
in interface RDFParser
Graph
when the
parsing has finished.IOException
- If an error occurred while starting to read the source (e.g.
file not found, unsupported IRI protocol). Note that IO
errors during parsing would instead be the
Throwable.getCause()
of
the ExecutionException
thrown on
Future.get()
.IllegalStateException
- If the builder is in an invalid state, e.g. a
source
has not been set.public T target(Consumer<Quad> consumer)
RDFParser
The quads will include triples in all named graphs of the parsed source,
including any triples in the default graph. When parsing a source format
which do not support datasets, all quads delivered to the consumer will
be in the default graph (e.g. their Quad.getGraphName()
will be
as Optional.empty()
), while for a source
It is undefined if any quads are consumed if RDFParser.parse()
throws any
exceptions. On the other hand, if RDFParser.parse()
does not indicate an
exception, the implementation SHOULD have produced all parsed quads to
the specified consumer.
Calling this method will override any earlier targets set with
RDFParser.target(Graph)
, RDFParser.target(Consumer)
or
RDFParser.target(Dataset)
.
The consumer is not assumed to be thread safe - only one
Consumer.accept(Object)
is delivered at a time for a given
RDFParser.parse()
call.
This method is typically called with a functional consumer, for example:
List<Quad> quads = new ArrayList<Quad>;
parserBuilder.target(quads::add).parse();
public T target(Dataset dataset)
RDFParser
Dataset
to add parsed quads to.
It is undefined if any quads are added to the specified Dataset
if RDFParser.parse()
throws any exceptions. (However implementations are
free to prevent this using transaction mechanisms or similar). On the
other hand, if RDFParser.parse()
does not indicate an exception, the
implementation SHOULD have inserted all parsed quads to the specified
dataset.
Calling this method will override any earlier targets set with
RDFParser.target(Graph)
, RDFParser.target(Consumer)
or
RDFParser.target(Dataset)
.
The default implementation of this method calls RDFParser.target(Consumer)
with a Consumer
that does Dataset.add(Quad)
.
public T target(Graph graph)
RDFParser
Graph
to add parsed triples to.
If the source supports datasets (e.g. the RDFParser.contentType(RDFSyntax)
set has RDFSyntax.supportsDataset()
is true)), then only quads in
the default graph will be added to the Graph as Triple
s.
It is undefined if any triples are added to the specified Graph
if RDFParser.parse()
throws any exceptions. (However implementations are
free to prevent this using transaction mechanisms or similar). If
Future.get()
does not indicate an exception, the parser
implementation SHOULD have inserted all parsed triples to the specified
graph.
Calling this method will override any earlier targets set with
RDFParser.target(Graph)
, RDFParser.target(Consumer)
or
RDFParser.target(Dataset)
.
The default implementation of this method calls RDFParser.target(Consumer)
with a Consumer
that does Graph.add(Triple)
with
Quad.asTriple()
if the quad is in the default graph.
Copyright © 2015–2018 The Apache Software Foundation. All rights reserved.