Tuesday, May 6, 2008

Oracle Text -- FILE_DATASTORAGE

Recently, I am working on Oracle Text as an index engine for the website's file system. In general, the files and the Oracle reside on different servers, and in order to let Oracle index files, Oracle needs to get access to those files through either URL without authentication or file system. Since access to our website needs authentication, we have to mount the file system to the Oracle server, so that it looks like all files are on the same server with Oracle, and Oracle can use FILE_DATASTORAGE to access those files.

There is a default FILE_DATASTORAGE preference by Oracle Text, but you have to specify the full path in the text table for the files that you want Oracle Text to index. However, you can create your own FILE_DATASTORAGE and set the path attribute for the preference and only specify the file name in the text table.

BEGIN
CTX_DDL.CREATE_PREFERENCE('my_datastorage', 'FILE_DATASTROAGE');
CTX_DDL.SET_ATTRIBUTE('my_datastorage','path', '/home/doc');
END


it is not allowed to give a partial path in the text table if you specify the path attribute for the FILE_DATASTORAGE preference. The location in the text table is either just file name or a full path for the file. If it is just file name, Oracle Text should looks up all paths in the path attribute of the preference and search for files with the specified name. So it may not be a good idea to use this if there are files with same names. If path is specified, it might be ignored if path is set for the preference. It will be helpful if Oracle Text doesn't accept partial path in the text table.

Here is the scenario that partial path might be helping. It is probably that the whole file set may be placed at different locations. If Oracle Text supports partial path in the text table, it is possible for Oracle to index two file sets with one table (and index) by creating another FILE_DATASTORAGE preference (this should be working if your file set is just plain files under the root directory).
BEGIN
CTX_DDL.CREATE_PREFERENCE('my_datastorage1', 'FILE_DATASTROAGE');
CTX_DDL.SET_ATTRIBUTE('my_datastorage','path', '/home/doc_copy');
END


and the text table specify the relative path for files under /home/doc_copy or /home/doc, so they can share the same text table and index, but just change the meta data for the index every time when you try to synchronize the index.

Note: This is just my thought and haven't try to test this and measure its performance if you change the FILE_DATASTORAGE every time you try to synchronize the index.

No comments: