Overview

ATLAS ( A rchitecture and T ools for L inguistic A nalysis S ystems) is issued from an initiative involving NIST , LDC and MITRE . ATLAS addresses an array of applications needs spanning corpus construction, evaluation infrastructure, and multi-modal visualization.

The ATLAS framework provides an architecture targeted at facilitating the development of linguistic applications. The principal goal of ATLAS is to provide an abstraction over the diversity of linguistic annotations. The abstraction, which expands on Bird and Liberman's Annotation Graphs (see history for more details), is able to represent complex annotations on signals of arbitrary dimensionality.

ATLAS is made of four main components:

  • an annotation ontology,
  • an Application Programming Interface,
  • an interchange format for linguistic data and
  • MAIA, a type definition infrastructure

The annotation ontology at ATLAS' core provides the abstractions on which the rest of the framework is built. These abstractions can be implemented using diverse programming languages. NIST has created a Java instantiation of the data model and provides an Application Programming Interface (API) to the core objects allowing their easy manipulation. This API is a prototype for a language-independent API we are working on.

Moreover, linguistic data expressed using ATLAS abstractions can be serialized to XML using the ATLAS Interchange Format (AIF) to facilitate their exchange and reuse.

Finally, an important new dimension was recently added to ATLAS: the Meta-Annotation concept allowing constraining of the generic abstractions for specific needs using the Meta-Annotation Infrastructure for ATLAS (MAIA).