Our goal is to enable interactive search of terabyte-scale, non-indexed collections of complex data (such as photo collections, satellite pictures and medical images) by exploiting recent advances in active disk technology. The Diamond System provides a common infrastructure and API for building search applications in a variety of domains (e.g., medical imaging or oceanography). This allows application developers to focus their efforts on addressing the domain-specific aspects of the problem while relying on Diamond to provide an efficient, parallelized implementation of the search task.
Tasks which satisfy the following criteria can be expressed well in Diamond: (1) objects can be processed independently, and in any order; (2) the search task can be decomposed into a sequence of filter steps. The former allows Diamond to search objects (spread across many storage devices) in an efficient manner. The latter enables Diamond to execute some of the (simpler) search steps on the active storage device and others on the user's machine. Diamond can dynamically balance computation so that powerful active storage devices will execute a greater fraction of the task. The collection of filters is termed a searchlet, and a searchlet encapsulates all of the domain-specific aspects of the search application. For example, in a homeland security application, the searchlet could contain specialized routines for face recognition. The searchlet API insulates the application programmer from the back-end, and applications do not need to be re-implemented as active storage systems evolve.
For further details, please visit the Diamond Project Home Page