Configuring Search Fields, Facets, and Relationships
These instructions describe how to configure standard and custom search fields, facet categories, and relationships for any XML framework that is made available through the Search API. This information is provided for system administrators who are installing or managing a DDS repository system, which includes the Digital Discovery System (DDS) and the NSDL Catalog System (NCS). While it is not necessary to configure a framework in order for it to be used effectively in the repository, doing so adds additional search functionality that may be useful.
This document assumes familiarity with Apache Tomcat, Lucene, servlet configurations, and XML.
How search fields, facets, and relationships are generated
At index creation time, each record is inserted in the repository in it's native XML format. The indexer extracts standard, custom and XPath search fields and facet categories from the contents of the XML, establishes any relevant relationships, then generates a single entry containing each of the fields, facet categories, and data from related records and inserts it into the index. All records are guaranteed to contain certain fields such as the
For detailed information about search fields and the content within them, see the Search Service documentation (Search fields section).
How to configure search fields, facets, and relationships
Each XML framework in the DDS can have a corresponding configuration file that is used to define standard and custom search fields, facet categories, and relationships for that framework. Standard search fields include title, description, ID, URL and geospatial bounding box coordinates. Custom search fields and facet categories can be defined for any content extracted from the XML document and/or it's related documents, and relationships can be defined to establish relations that connect records in one XML framework with another for the purpose of optimized searching.
To configure a specific XML framework, follow these steps:
1. Add XML frameworks to the configuration index file
Add the given XML framework to the search fields configuration index file, which contains a list of the individual configurations files for each XML framework. Entries in the index may contain relative or absolute URIs to the individual framework configuration files that may be located on the local file system (file://) or anywhere on the Web (http://).
The index file is named
Example index file:
<?xml version="1.0" encoding="ISO-8859-1"?> <XMLIndexerFieldsConfigIndex> <!-- List the location of each framework-specific configuration file --> <configurationFiles> <configurationFile>xmlIndexerFieldsConfigs/oai_dc_search_fields.xml</configurationFile> <configurationFile>xmlIndexerFieldsConfigs/my_framework_search_fields.xml</configurationFile> </configurationFiles> </XMLIndexerFieldsConfigIndex>
2. Define search fields, facets, and relationships for each XML framework
Each configuration file describes the standard and/or custom search fields and facet categories for an XML framework and where the content for those fields reside in the XML instance documents, as well as relationships across XML frameworks in the repository. For the following discussion, see the example configuration file below.
Standard search fields
Standard search fields are processed by the indexer in a uniform manner, allowing clients to search the fields in a consistent manner across frameworks.
The standard fields are the following:
To configure a standard search field for a framework, add a
Custom fields and facet categories
Custom fields and facet categories can be defined for any content extracted from the XML document.
To define a custom field or facet category, add a
Attributes that may appear on the
Note that the Lucene Analyzer that is defined for a given field is automatically applied both in the indexer and the searcher.
Relationships for a given XML framework can be defined that will connect the records written in that framework with other records in the repository. Related records my be connected by either record ID or URL, as defined by the standard
For example, an annotation framework might define the relationship
To define a relationship for a given XML framework, add a
As the indexer processes the XML records, it first removes namespaces from the documents. This simplifies the XPath notation necessary to select the desired elements and attributes within. Therefore, do not include namespaces in your XPath notation.
To specify the content elements that should be pulled from an
It is also possible to pull in custom field content from related documents, e.g. a document that is associated with the one being indexed by way of a relation. To specify custom field
content that should be pulled from a related document, add an XPath in the framework configuration file
that starts with the relation prefix specifier (e.g. '
For example, the following notation would be used to index content for the given record from all
Example search configuration for the
<?xml version="1.0" encoding="ISO-8859-1"?> <!-- XMLIndexerFieldsConfig attributes: [xmlFormat OR schema] --> <XMLIndexerFieldsConfig xmlFormat="oai_dc"> <standardFields> <!-- standardField attributes include: name=[id|url|title|description|geoBBNorth|geoBBSouth|geoBBWest|geoBBEast] --> <standardField name="url"> <xpaths> <xpath>/dc/identifier</xpath> </xpaths> </standardField> <standardField name="title"> <xpaths> <xpath>/dc/title</xpath> </xpaths> </standardField> <standardField name="description"> <xpaths> <xpath>/dc/description</xpath> </xpaths> </standardField> </standardFields> <customFields> <!-- customField attributes include: [name OR facetCategory], [store], [type OR analyzer], [indexFieldPreprocessor], [facetCategory] --> <!-- Regular custom fields (use the name attribute) --> <customField name="dcIdentifier" store="yes" type="key"> <xpaths> <xpath>/dc/identifier</xpath> </xpaths> </customField> <customField name="dcType" store="yes" type="text"> <xpaths> <xpath>/dc/type</xpath> </xpaths> </customField> <customField name="dcMySubjectTags" store="yes" analyzer="org.example.MySubjectTagAnalyzer" indexFieldPreprocessor="org.example.MySubjectTagIndexFieldPreprocessor"> <xpaths> <xpath>/dc/subject</xpath> </xpaths> </customField> <!-- Facet category fields (use the facetCategory attribute) --> <customField facetCategory="dcTypeFacets">
How to verify it's working
Follow these steps to verify that the desired content is being indexed for search as expected:
Last revised: $Date: 2012/08/15 23:11:06 $