Lekta User Manual (Version 1.0)

    Index

    Introduction

    The Internal Syntax of Lekta

      • Identifiers, Literals, Reserved Words & Commands
    The Lekta Environment

    Defining a Language in Lekta

    • Analysis Grammar
    • Analysis Lexicon
      • Macros
      • Categorial & Functional Ambiguity
    • Functional Equations
      • Assignment
      • Mathematical, Logical & Comparison Operators; Special Functions
      • Conditional Structures
      • Coherence & Completeness
      Not Found Words

      Transfer Module

      • Lexical Transfer Rules
      • Structural Transfer Rules
      Generation Grammar
      • Generation Blocks
      • Deletion & Recursion in Generation Rules
      • Empty GFs
      Generation Lexicon
    Technical Details
      • Compilation
      • Parsing Strategy
        • Creation & Propagation of Events
        • ex Phase: Deterministic Execution of Events
        • pr Phase: Restriction Propagation
        • ps Phase: Psycholinguistic Preferences
      • Heuristics
        • Heuristics & Elimination of Events
        • Definition of Heuristics
    Output Commands

    Statistics Module

    Help Commands

    Summary of Configuration Options

    Appendix 1: Simple Spanish Grammar. Version 1

    Appendix 2: Simple Spanish Grammar. Version 2

    Appendix 3: Simple Spanish Grammar. Version 3

    Appendix 4: Simple Spanish Grammar. Version 4

    Appendix 5: Simple Spanish Grammar. Version 5


    Introduction

    Lekta is a shell tool for the development of Machine Translation (MT) prototypes between two given languages. Lekta follows a transfer approach to MT, and it is inspired in classical LFG theory. The tool has been implemented in C, and is currently running under Sun Solaris 2.x.

    The characteristics of the system make it specially suitable for rapid prototyping of small-to-medium scale MT systems and for students of MT.

    The system allows the user to define his/her own source language (SL) grammar with its corresponding lexicon, transfer module and target language (TL) grammar and lexicon. The syntax which the user must follow is described below under "Specification Languages".

    Lekta may be used as a full MT system (i.e. performing the three translation phases) or as a parser only. As a full MT system, it may translate at a speed of up to 700 words per second.

    The following sections illustrate the functioning of the system, configuration options and simple examples.


    The Internal Syntax of Lekta

    Identifiers, literals, reserved words and commands

    From a syntactic point of view, four types of tokens are identified in Lekta:
      • Identifiers: A sequence of characters separated by one or more blanks, tabs or new line. The following special characters cannot be used as identifiers:
      • ^ [ ] , ~ | & + * ' " / ( ) { } : ; ! . < = > - %
        Commands and reserved words won't be taken as identifiers either.

        Commands: All the commands for Lekta are preceeded by the $ sign.

        Reserved words: these are symbols with a special meaning within Lekta:
         
        COHERENCE
        COMPLETENESS
         
          DO
        ELSE
        ELIM-PR
        GF
          HPATTERN
        IF
        NONULL
        NULL
          ON-NODE
        RG
        SELF
        THEN
          WHEN
        Literals: Sequences of signs, without any restrictions, inside quotes (single or double). Quotes within a literal must be preceded by the escape sequence (\"). According to its semantics literals and identifiers behave exactly the same. From a syntactic point of view they enable the user to create sequences which otherwise would conflict with reserved words or built-in commands.


    The Lekta Environment

    • The system is invoked with the command:
      • lekta [filename]
      If a filename has been specified, Lekta will execute all the instructions within the file.

      If no file has been specified Lekta will run in interactive mode waiting for a command from the input device.

      The start up file lekta.ini: Once invoked, Lekta looks for a file called lekta.ini in the working directory. If it already exists, all the instructions within the file will be executed. This option allows the user to have different configurations for different applications or developments and it provides great flexibility for the use of Lekta in a command sequence or pipe.

      The following is an example of a lekta.ini file. Some of the commands will be clarified below.

        % Load sp Language
        $f "sp"
        % Load eng Language
        $f "eng"
        % Configuration options
        $c:apch
        $c:apcp
        $c:transl sp -> eng
        $c:auo
        $c:agen
        $c:atra
      File inclusion: Files can be included at any level of embedding with the $f IDENTIFIER command. Lekta will execute the content of the corresponding file. When the execution is over, the control will be passed on to the original file or to the interactive mode (depending on where the call was made from).

      Comments: There are two types of comments. The first type starts with a % sign and goes on till the end of line. The second type takes as the comment everything included within the sequences /% and %/.

      Messages: The $m IDENTIFIER shows the identifier on the output device. This command can be used at any point and can be used as a first tracing device.


    Defining a Language in Lekta

    The basic object in Lekta is a Language. A language consists of an analysis grammar, a set of (optional) analysis heuristics, an analysis lexicon, a transfer module, a generation grammar and a generation lexicon. All these components are included between the $$LANG and $$ELANG identifiers. In turn, each component starts and finishes with its corresponding identifier. The following example illustrates a possible file organization for a Spanish to English MT application:
        % The sp language specification
        $$LANG sp
        % Analysis grammar is loaded from file "Analysisrules.esp"
        $f "Analysisrules.esp"
        % Analysis Lexicon
        % beginning of analysis lexicon
        $al
        $f "Analysis_lexicon.sp"
        % End of analysis lexicon
        $eal
        % Transfer Spanish to English (language eng)
        $t eng
        $f "transfer.sp-eng"
        $et
        $$ELANG
        %end of sp language
        %Beginning of eng language
        $$LANG eng
        
        % Generation grammar
        $gg
        $f "Generationrules.eng"
        $egg
        $gl
        $f "Generation_lexicon.eng"
        $egl
        $$ELANG

    Analysis Grammar

    • The following construction is used to define an analysis grammar:
      • $ag RGD GFD LPROD $eag
      - RGD defines the roots of the grammar, i.e. those nodes which may constitute an utterance. Its syntax is:
      • (RG: LIDENTIFIER)
      RG is one of the reserved words in Lekta, and LIDENTIFIER is defined as a list of identifiers. Example:
        (RG: S NP VP)
      - GFD defines the grammatical functions of the grammar. Its syntax is:
      • (GF: LIDENTIFIER)
      • Example:

        (GF: subj obj obj2 xcomp)
      - The different productions that make up the grammar are shown with LPROD. This is a list of productions, each one with the following syntax:
      • (IDENTIFIER : IDENTIFIER -> LIDENTIFIER)
      The first identifier works as a production label and it is used by the trace module to indicate the actions executed during analysis. The second identifier indicates the left-hand side of the production, while the last list of identifiers specifies the right-hand side of the production.

      Simple grammar of Spanish. Version 1.

    Analysis Lexicon

    The basic construction for the lexicon specification in Lekta is:
    $al LENTLEX $eal
    LENTLEX is a list of lexical entries. Each lexical entry consists of a feature structure, that is, a list of features separated by commas and within parentheses. Every feature has the following syntax:
    ATTRIBUTE: VALUE
    ATTRIBUTE is an identifier and VALUE can have any of the following values:
    • Atomic: an identifier
    • List: a list of identifiers within brackets and separated by commas
    • Complex: its value is in turn a feature structure
    • Negation: of an atomic value with the ~ symbol
    • Disjunction: separating different values with the | symbol
    Lekta defines two special features: LU (lexical unit) and CAT (syntactic category). These two features are necessary in every entry. They are used by the lexical component of the system in order to generate the string of syntactic categories associated with the input string. For instance, the Spanish determines 'los' would appear in the Lekta formalism as:
      (LU: los, CAT:det, agr:(gen:masc,num:pl))
     where agr stands for agreement, and so on.

    Macros

    The masculine-plural agreement is an example of a feature that will be reapeated once and again all through the specification of any lexicon. To avoid this unnecessary rewriting of identical features, Lekta is equipped with a recursive system for the definition of macros.

    For example, if the macro <MP> has been defined as

      <MP> = (agr:(gen:masc,num:pl))
    The lexical entry for 'los' could be simplified as
      (LU: los, CAT:det, <MP>)

    Categorial and functional ambiguity

    There are different linguistic situations which may be captured in Lekta:
    • Categorial ambiguity: an LU is associated with different CATs. For instance, para is both a verb and a preposition in Spanish.
    • Functional ambiguity: an LU and CAT are associated with different feature structures. For example, fue is both the past tense of the verb ir and ser in Spanish.

    Functional Equations

    As mentioned above, Lekta is inspired in the LFG theory. According to this theory there are two levels of analysis in each grammatical sentence. The constituent structure (c-structure) shows the phrasal configuration. The functional structure (f-structure) is obtained after the functional equations associated with each production in the grammar have been solved. The f-structure contains functional information in the form of attribute-value pairs.

    In the LFG literature on there are several metavariables ( ^ and v, called UP and SELF) to refer to the production mother or daughter nodes. Thus, a classic LFG rule like:

    • S -> NP VP
    •   ^ subj = v ^ = v

    would appear in Lekta as:
      (1 : S -> NP VP)
        { UP .subj = SELF-1;
        UP = SELF-2}
    Each production may be associated with a group of functional equations in order to control the unifier´s performance. Their syntax is as follows.

    Assignment

    The basic equation is the assignment operation. The following example shows a very simple case:
    (12 : SS -> SS idiom )
    {UP = SELF-1;
    UP.idiom = SELF-2 }
    In this example the feature structure associated with the production´s right-hand side SS (SELF-1) is passed directly to the symbol created by it (UP) while the feature structure associated with idiom (SELF-2) will be passed upwards as the idiom feature.

    Mathematical, logical and comparison operators; Special functions

    The following operators have been defined:
    • Mathematical: Addition (+), substraction (-), multiplication (*), division (/);
    • Logical: And (&&), Or (||) and No (!);
    • Comparison: Equal (==), Different (!=), greater than (>>), greater or equal than (>=), and less or equal than (<=);
    • Special functions: the NULL symbol implies a non existing value, while NONULL an existing one, regardless of its content.
    The following example shows the use of some mathematic operators and a rule which tries to get the numeric value of an expression in the quant feature.
    (87 : THOUSAND -> HUNDRED Q1000 HUNDRED)
    {UP.quant = ((SELF-1.quant * SELF-2.quant) + SELF-3.quant)}
    In addition, Lekta has been equipped with the following special functions:
    • CONCAT(argument list): returns the concatenation of all arguments provided. For example:
    • UP.pred = CONCAT(SELF-1.pred,^,SELF-2.pred)
    • COUNT(list valued feature): returns the number of elements in a list.
    • MEMBER(atom, list): returns a logical value (NULL or NONULL) depending on whether atom is found or not in the list. Example:
    • IF MEMBER(obj, SELF-1.ggf) THEN ....

    Conditional structures

    The conditional structure IF - THEN - ELSE has also been defined in order to control the application of certain equations according to the features of the input structures. The following example shows a sophisticated use of these constructions.
    (1: SS -> QS)
    {
    UP = SELF-1;
    IF ((SELF-1.subj.pred == NULL))
    THEN {
    UP.subj.pred = pro };
    IF (((SELF-1.agr.per == 1) &&
    (SELF-1.agr.per == 2)))
    THEN {
    UP.agr.per = 2 };
    UP.stype = quest
    }

    Coherence and Completeness

    In LFG, a functional structure is well formed if it satisfies the coherence and completeness requirements. Both requirements are checked over the grammatical functions only, since the number of adjuncts or modifiers of a head may be infinite. As we mentioned above, grammatical functions are defined after the declaration of root nodes.
    (GF: subj, obj, obj2, acomp, ncomp, scomp, pobj)
    The coherence test checks whether the grammatical function that is going to be created is compatible with the functions required by the subcategorization of the head as it appears in the ggf feature.
    ggf:[list of grammatical functions]
    The completeness test checks whether all the grammatical functions required by a verb are locally satisfied in the current the functional structure. Both tests ensure that there is no missing grammatical function required by the predicate or a grammatical function too many.

    Our syntax has been enhanced so that the user may control a partial coherence or completeness. For instance, rule 23 below checks that all the grammatical functions local to the verb phrase have been completed but it also indicates that the subject doesn't have to be checked since it has not been consumed yet.

    (23 : VP -> VG NP )
    {UP = SELF-1;
    IF (MEMBER(ncomp, SELF-1.ggf))
    THEN {
    UP.ncomp = SELF-2;
    COMPLETENESS(GF-[subj] ) }
    ELSE {
    IF (MEMBER(obj,SELF-1.ggf))
    THEN {
    UP.obj = SELF-2;
    COMPLETENESS(GF-[subj] ) }
    ELSE {
    UP.subj = SELF-2 }}
    COHERENCE(GF-[subj] }
    (14: CL -> NP VP)
    {
    UP = SELF-2;
    UP.subj = SELF-1;
    COHERENCE(GF);
    COMPLETENESS(GF)
    }
    As indicated in rule 14, full completeness and coherence may be checked on all the grammatical functions.

    Not found words

    If a word in the input string is not found in the lexicon, Lekta will stop and generate an error. Alternatively, the configuration command $c:anfw allows the parser and unifier to assume that the not found word belongs to one (or more) specific category and tries to go ahead with the analysis. The command has the following syntax:

    $c:anfw <ldf> where <ldf> is a list of generated forms for not found words:

    <ldf> = <df1>, <df2>, ...

    Each definition, <dfN> consists of a syntactic category and a feature (between parentheses):

    <dfN> = <catN>(<featureN>)

    Examples:

    • $c:anfw n(pred): for each not found word, Lekta will generate an item whose syntactic category is n and whose feature structure is (pred:NOTFOUNDWORD). Thus, given the input string El rio Tajo desemboca en Lisboa, if Tajo is not in the lexicon, it will generate the lexical item (LU: Tajo, CAT:n, pred:Tajo)
    • $c:anfw n(pred), v(pred), adj(pred): In this case, the following three lexical items will be generated automatically
    • (LU: Tajo, CAT:n, pred:Tajo)
      (LU: Tajo, CAT:v, pred:Tajo)
      (LU: Tajo, CAT:adj, pred:Tajo)
    The command is deactivated with $c:dnfw.

    Transfer Module

    As mentioned above, each language definition may contain one (or more) transfer modules to other languages. In the example above, the sp language contained a transfer module for the eng language, which would be loaded from the transfer.sp-eng file as follows:
      $t eng
      $f "transfer.sp-eng"
      $et
    The transfer phase takes an SL feature structure as input and returns a TL feature structure as output. Note that categorial information (contained in the CAT: feature) is no longer available. As in most transfer-based MT systems, two types of transfer rules may be defined in Lekta, structural transfer rules and lexical transfer rules. Lexical transfer is applied before structural transfer.

    Lexical Transfer Rules

    Lexical transfer rules consist of translation rules headed by the feature triggering them. The triggering feature is preceded by the reserved word FTRANSFER. The next block of lexical transfer rules will start with another FTRANSFER instruction and so on.

    The system traverses the input f-structure, finds the first feature (say, pred:) and looks for translation rules for the that feature. If no transfer rules have been defined for that feature, it will be copied onto the target f-structure (for example, most tense features do not need translation rules).

    Transfer rules are enclosed in parentheses. Each rule consists of a source item, a target item (separated by the => sign) and a set of (optional) conditions and actions. Conditions start with the reserved word WHEN and, basically, consist of a path of feature-value pairs which must be satisfied in the input f-structure. Actions start with DO, and may call other functions such as TRANSFERAS and NOTRANSFER, as illustrated below. The ordering of rules is important. Once a condition has been satisfied, the corresponding actions will be executed. Order your translation rules from most specific to most general. If none of the conditions apply, the default translation will be chosen.

    Below are some examples.

      % Spanish-English rules for Predicates.
      FTRANSFER pred
      (abrir => open)
      (cambiar => change)
      (cobrar => charge WHEN (ggf:[subj,obj,pobj],
                  pobj:(pcase:de,pred:comisión))
              DO (pobj:TRANSFERAS(obj),
                  pobj:(pcase:NOTRANSFER()),
                  pobj:(pcase:of),
                  obj:TRANSFERAS(pobj),
                  obj:(spec:a))
           charge )
      (cerrar => close)
      (decir => say)
      (haber => 'there be')
    In the example above, the verb 'cobrar' is translated as 'charge' and triggers some special actions. The rule may be glossed as follows: if the verb cobrar contains an object and a prepositional object (that is, its subcategorization feature is ggf:[subj,obj,pobj]), the head preposition is 'de' and the head noun is 'comisión', as for example in "Cobramos 400 pesetas (obj) de comisión (pobj)" then perform the following actions: transfer the original pobj as an object, do not transfer the original preposition, transfer the original obj as a pobj and include the preposition 'of' and the determiner (spec) 'a'. The resulting translation will be: "We charge a comission (obj) of 400 pesetas (pobj)".

    If the condition does not apply, translate cobrar as charge, in order to account for uses such as Cobramos 400 pesetas => We charge 400 pesetas.

    Additionally, we may check whether a specific feature has a non-null value (i.e. it exists in the input feature structure with any value), as in the following example:

      (número => number WHEN (spec:el,app:(quant:NONULL))
               DO (spec:NOTRANSFER())
            number)
    In this rule, número translates as number, but if it is followed by an appositional quantifier, the determiner does not translate. For example: "la número 372 => number 372".

    Conversely, if we wish to check that a specific feature does not exist in the input f-structure, the reserved word NULL is used instead. NULL may also be used as a translation if we don't want to translate a specific feature. For example, the time expression "a las 7.45" should be translated as "at 7.45", where the specifier does not translate. The following rule obtains this result:

      FTRANSFER spec
      % a las 4.45 -> at 4.45
      (el => NULL WHEN (time:yes)
      the)

    Structural Transfer Rules

    Structural transfer rules follow the same logic. They are headed by the reserved word STRANSFER followed by the complex feature being transferred. Effectively, this feature behaves as the right-hand side of the translation rule, which actually starts with the '=>' sign. In the example below, Spanish postmodifying prepositional phrases are transferred as noun-noun sequences called desc(criptors) in Lekta. This rule only applies if there is no possessive and no demonstrative.
      STRANSFER pmod
      ( => descr WHEN (pmod:(pcase:de,poss:NULL,dem:NULL))
        DO (pmod:(pcase:NOTRANSFER( )),
        pmod:(spec:NOTRANSFER()))
        pmod)
    Simple Spanish Grammar. Version 3. Contains an analysis grammar with a transfer module.

    Generation Grammar

    The generation phase takes a target f-structure as input and a returns a c-structure as output. The algorithm applies recursively over portions of the f-structure. Two types of features are distinguished: atomic-valued and complex-valued, which are defined before the generation grammar itself as HG and GF respectively. The example below shows the heading of the generation grammar for the 'eng' language.
    $$LANG eng
    % Generation Grammar
    $gg (GF: subj obj pobj ncomp acomp padj descr pmod mods agr)
    (HG: form pred pcase quant coor)
    Generation rules are built taking into account this distinction. Each generation block defines a group of GF features and HG features which must be found in the input f-structure. GF and HG features are separated by the / symbol. The next line consists of a production rule which defines the portion of the tree that will be created if this generation rule is applied. Additional conditions similar to those in the transfer phase may be added at this point.

    Finally, if the (optional) condition is met, the rule specifies how each of the nodes will be created. Non-terminal nodes are created through the Generate() function and terminal nodes through the Synthesis() function, as follows:

    [agr mods]/[pred]
    (93:NP -> det ADJP n) WHEN (spec:NONULL)
    { SELF-1 = Synthesis(UP.spec);
    SELF-2 = Generate(UP.mods);
    SELF-3 = Synthesis(UP.pred) }
    This rule creates NPs with a determiner and an ADJP if there is a specifier in the input f-structure. The triggering GF features are agr and mods, while the only relevant HG feature is pred. The determiner is identified as SELF-1 since it is the first element in the right-hand side of the production, and it will be synthesized through the application of the Synthesis() function over the value of the input spec(ifier).

    The adjective phrase (SELF-2) will be generated recursively from the information contained in the mods feature.

    Generation Blocks

    A pattern of GF and HG features may generate different types of nodes, hence the concept of generation block. Generation rules must be ordered from most specific to most general since the first one meeting the conditions will be applied. Below is a set of generation rules 'hanging' from the same GF/HG pattern. This is common in LFG since adjectives, verbs and nouns are 'preds'.
    [agr]/[pred]
    % NP is a personal pronoun
    (82:NP -> pron) WHEN (pred:pro)
    { SELF-1 = Synthesis(UP.pred) }
    % Adjetive phrase with a single adjective
    (84:ADJP -> adj) WHEN (deg:NONULL)
    { SELF-1 = Synthesis(UP.pred) }
    % NP with a determiner and a noun
    (86:NP -> det n) WHEN (spec:NONULL)
    { SELF-1 = Synthesis(UP.spec);
    SELF-2 = Synthesis(UP.pred) }
    % NP with a demonstrative determiner
    (88:NP -> det n) WHEN (spec:NULL,dem:NONULL)
    { SELF-1 = Synthesis(UP.dem);
    SELF-2 = Synthesis(UP.pred) }
    % Verb phrase with a single verb
    (90:VP -> v) WHEN (ggf:NONULL)
    { SELF-1 = Synthesis(UP.pred) }
    % NP with a single head noun
    (92:NP -> n)
    { SELF-1 = Synthesis(UP.pred) }

    Deletion and Recursion in generation rules

    Not all GF features have to be consumed by a single generation rule. Once a GF feature has been generated, it is deleted from the input f-structure. Otherwise, the generation algorithm never would finish. Once a feature has been consumed, the remaining features may be passed on to be managed by a different generation rule. For example:
    % declarative sentences
    [subj agr obj pobj]/[pred]
    (8:S -> NP VP) WHEN (stype :~ quest)
    { SELF-1 = Generate(UP.subj);
    SELF-2 = Generate(UP) }
    In this rule, even though four GF features have been defined, only the subject will be generated as an NP at this point. The remaining features will be managed recursively by the generation algorithm (Generate(UP)). This is just for 'cosmetic' reasons, since the input f-structure is by definition unordered and with no internal hierarchy between grammatical functions.

    Empty GFs

    The triggering GF portion in the generation rule may be empty. For example, adverb phrases consisting of an adverb only do not contain any other GF feature (they do not require agreement, as adjective phrases do), and hence there is no triggering GF feature:
    % adverb phrases
    []/[form] (158:ADV -> adv)
    { SELF-1 = Synthesis(UP.form) }
    Finally, the generation grammar finishes with the $egg symbol.

    Generation Lexicon

    The generation lexicon is similar to the analysis lexicon. A generation lexicon starts with the $gl (generation lexicon) symbol and finishes with the $egl symbol (end of generation lexicon). Lexical information is processed by the Synthesis function described above. This function takes a set of features as input and returns a string character as output.

    In addition to the special LU and CAT features found in the analysis lexicon, the special feature RS is also necessary. The CAT value must coincide with the terminal node being synthesized. RS stands for semantic root. Its value is the same as that of the feature calling the Synthesis function. Finally, LU will be the word returned by the synthesis algorithm. For example, given the generation instruction

     % NP is a personal pronoun
    (82:NP -> pron) WHEN (pred:pro)
    { SELF-1 = Synthesis(UP.pred) }
    and the generation entries
    (LU:I,CAT:pron,RS:pro,agr:(per:1,num:sing))
    (LU:you,CAT:pron,RS:pro,agr:(per:2))
    (LU:we,CAT:pron,RS:pro,agr:(per:1,num:pl))
    (LU:me,CAT:pron,RS:pro,case:dat)
    (LU:it,CAT:pron,RS:pro,agr:(num:sing,per:3))
    the generation algorithm will return I, you, we, me or it depending on the information found in the input f-structure.
     
    If no lexical entry is found which matches the content of the input f-structure, the system will return as LU the value of the RS feature. Effectively, the generation lexicon must contain non-base forms only, since base forms are generated by default.

    Simple Spanish Grammar, version 4 contains an analysis grammar, transfer module and a generation grammar. Version 5 displays source c-structure, source f-structure, target f-structure and target c-structure in an X windows environment.

    Technical Details

    Compilation

    Lekta's parser may be described as a deterministic bottom-up analyzer with a strong top-down prediction component. Functionally, it is a bidirectional parser, controlled by events, with propagation of restrictions and capable of using heuristics.

    From the grammar specification, Lekta obtains a symbolic representation which is then manipulated by the parser. Computationally, it consists of a representational model which obviates search operations over the symbols and productions of the grammar and which reduces the string comparison operations between strings of characters. Functionally, the grammar compilation involves the generation of a series of tables (of coverage, derivation and adjacency) that will later control the analyzer.

    The $pag command shows the grammar in use, while $ptc and $ptd display, respectively, the tables of coverage and the tables of derivation and adjacency of the grammar. These are the tables for the simple grammar of Spanish shown above.

    Parsing Strategy

    $prs(LIDENTIFIER) invokes the Lekta parser providing it with a list of identifiers as the input string. First a lexical component works on the input string. The lexical component sends to the parser the sequence of syntactic categories which correspond to the input string.

    Let's assume we have loaded the grammar above mentioned and we type the order:

    $prs(pedro come los pasteles)
    Even though the lexicon will be dealt with later, let's assume we get the following sequence back
    np v det n
    corresponding to the syntactic categories of the identifiers in the input string. The parser will then work on this string until it has been reduced to any of the root symbols of the grammar (S in this example) or it would reject the input as grammatically incorrect. In this case the parser would get the following representation:

    If we activate the $c:aat configuration command we will get the following output from the trace mode.

    The parsing module in Lekta consists of five concentric layers, each of which corresponds to a phase in the parsing process.
     

    Stage  Description  Activate  Deactivate 
    ex  Deterministic execution of events  $c:apex  $c:dpex 
    pr  Restriction propagation  $c:aprp  $c:dprp 
    ps  Psycholinguistic preferences  $c:appp  $c:dppp 
    hr  Heuristics  $c:aphe  $c:dphe 
    un  Verification of unification operations  $c:apun  $c:dpun 
    The last two columns display the configuration commands which activate and deactivate each of the phases.

    Creation and propagation of events

    At the beginning of the analysis and every time an event is executed (ex phase) the module of creation of events is applied. This module generates all the events which may be applied to the surface analysis configuration at that point. The trace module shows this operation with the message NewEvent>. Further information is supplied regarding the event number (e=), identifier of the production applying the event (p=), symbol over which it is applied (s=) and direction of application (d=). This direction can be 1 (left to right) or -1 (right to left).

    Next, the analysis module starts. Each phase applies certain criteria to reach the execution of an event. If so, control returns to the module of creation and propagation of events, which after propagating the events which were waiting for the change that took place, and creating new applicable events, launches the analysis phase again.

    This cycle will go on until the surface of analysis contains one of the root symbols of the grammar.

    Therefore, each event represents a possible analysis of an interval of the analysis surface. The more events the parser can reject in the first stages of the analysis the more efficient it will be. With this goal, Lekta has been equipped with a series of success filters which each event must satisfy. Furthermore, due to the different computational costs of each filter, they are executed following a certain hierarchy associated with each stage.

    At the very beginning, while events are being created, two filters are applied which can cancel the creation of the event, even though the trace mode will shows their creation and later cancellation. These filters are:

    • dup: If the event is a duplicate of an existing event (this can actually happen due to the bidirectional strategy of the parser).
    • out: If the event is waiting for more symbols at the (left or right) end of the analysis string.

    ex Phase: Deterministic execution of events

    This phase starts applying a series of filters on the active events at this point:
    • der: Analysis of derivations
    • iel: Analysis at the link start
    • fel: Analysis at the link end.
    If at the end of the process any of the events can be applied, it will be executed modifying then the analysis surface and control will return to the module of creation and propagation of events.

    pr Phase: Restriction propagation

    The goal of this phase is to eliminate events. This stage applies the same analysis of derivations, link start and link end but, instead of making local verifications, it propagates the restrictions imposed by the control devices all through the surface of analysis:
    • pde: Propagation of analysis of derivation
    • pei: Propagation of link start analysis
    • pfe: Propagation of link end analysis

    ps Phase: Psycholinguistic preferences

    If none of the two previous phases has been successful and this one is active, the psycholinguistic preferences as described in Shieber (1983) and Pereira (1985) will be executed. They make a single filter:
    • fps: Elimination of events attending to psycholinguistic preferences.

    Heuristics

    Heuristics and elimination of events

    Sometimes a structural ambiguity generated by a grammar cannot be solved with the analysis described in the previous sections. This is why Lekta allows the use of heuristics associated with a grammar. These heuristics will be used as a new filter for the elimination of events. To get a better grasp of their usefulness, assume the following grammar has been defined:
    $ag
    (RG:O)
    (GF:)
    (1: O -> X Y) (2:X -> a)
    (3:X-> a B1) (4: Y -> c)
    (5:Y -> B2 c) (6: B1 -> b1)
    (7:B2 -> b2)
    $eag
    Assume now that we wish to parse the string (a b c) and that the lexicon generates the following string of terminal symbols for this input (a b1||b2 c). That is, a and c belong to the syntactic categories a and c, while b is ambiguous between b1 or b2.

    The application of the three first stages yields the following result:

    @LktTrace> InputParser> a b c
    @LktTrace> CurrentParsingLayer> a b2||b1 c
    @LktTrace> NewEvent> (e=1,p=2,s=a,d=1)
    @LktTrace> NewEvent> (e=2,p=3,s=a,d=1)
    @LktTrace> NewEvent> (e=3,p=7,s=b2,d=1)
    @LktTrace> NewEvent> (e=4,p=6,s=b1,d=1)
    @LktTrace> NewEvent> (e=5,p=4,s=c,d=1)
    @LktTrace> NewEvent> (e=6,p=5,s=c,d=-1)
    @LktRe> Input Incorrect

    Definition of Heuristics

    In order to solve the situation above, we may include the following heuristic
    $ah
    (1: HPATTERN (a b1||b2 c) -> ELIM-PR 6 ON-NODE 2)
    $eah
    Heuristics are defined after the analysis grammar. They consist of a numbered list of rules. Each heuristic specifies a pattern (HPATTERN()) which must be found in any interval of the surface of analysis. If found, the rule will apply the deletion operations specified in the right-hand side of the rule. ELIM-PR n ON-NODE m states that the event associated with the production number n over the node m must be deleted. In our case, event number 4 will be deleted. The result of applying this heuristic is the following:
    @LktTrace> InputParser> a b c
    @LktTrace> CurrentParsingLayer> a b2||b1 c
    @LktTrace> NewEvent> (e=1,p=2,s=a,d=1)
    @LktTrace> NewEvent> (e=2,p=3,s=a,d=1)
    @LktTrace> NewEvent> (e=3,p=7,s=b2,d=1)
    @LktTrace> NewEvent> (e=4,p=6,s=b1,d=1)
    @LktTrace> NewEvent> (e=5,p=4,s=c,d=1)
    @LktTrace> NewEvent> (e=6,p=5,s=c,d=-1)
    @LktTrace> Heuristic> (1)
    @LktTrace> DeleteEvent:Heurist> (e=4,p=6)
    @LktTrace> RunningEvent:ExecStage> (e=3,p=7)
    @LktTrace> CurrentParsingLayer> a B2 c
    @LktTrace> DeleteEvent:Deriv> (e=2,p=3) D:(B1,B2)
    @LktTrace> NewEvent> (e=7,p=5,s=B2,d=1)
    @LktTrace> DeleteEvent:Duplied> (e=7,p=5,s=B2,d=1) -> 6
    @LktTrace> RunningEvent:ExecStage> (e=1,p=2)
    @LktTrace> CurrentParsingLayer> X B2 c
    @LktTrace> NewEvent> (e=8,p=1,s=X,d=1)
    @LktTrace> DeleteEvent:InitLink(Prop)> (e=5,p=4)
    @LktTrace> RunningEvent:ExecStage> (e=6,p=5)
    @LktTrace> CurrentParsingLayer> X Y
    @LktTrace> NewEvent> (e=9,p=1,s=Y,d=-1)
    @LktTrace> DeleteEvent:Duplied> (e=9,p=1,s=Y,d=-1) -> 8
    @LktTrace> RunningEvent:ExecStage> (e=8,p=1)
    @LktTrace> CurrentParsingLayer> O
    @LktRe> Input Correct

    Output Commands

    Lekta offers the following kinds of output:
  • Time: indicating if the analysis correct and the timing of the process.
  • List: analysis tree in the form of a bracketed list.
  • Tree: analysis tree in ASCII.
  • Incorrect: displays the parser status at the end of a incorrect input.
  • Unification: displays the feature structures from the unifier.
  • Graphic: In a X environment, it generates windows containing the analysis tree (c-structure) and the feature structure (f-structure) for a correct analysis.
  • Trace: displays multiple messages regarding each module´s performance.
  • These devices are activated and deactivated with the following commands:

    Statistics Module

    Lekta is equipped with a module to obtain statistic results. The commands associated with it are the following:
    • $is: start collecting data.
    • $pfs([file]): sends the statistics to a file. These include details of each production, indicating the number of events generated by each one and how many have been rejected by each filter, as well as information regarding the relative use of the production. The second block shows a summary of information regarding the level of efficiency achieved by the system as a whole.
    • $ire([file]): prints only the summary section described above.

    Help Commands

    Lekta offers detailed help about the different configuration options and command instructions, as follows:
    LektaII> $h
    LektaII> HELP - Main Topics
    Type for help about:
    ==== =======================
    $h1 Language Specification
    $h2 Translation Setup
    $h3 Execution
    $h4 Translation Stages
    $h5 Output
    $h6 Trace
    $h7 Printing
    $h8 Statistics
    $h9 Others
     

    Summary of Configuration Options

    Configuration options are included in the initialization file (lekta.ini) or interactively, from the command line.

    TRACE
     
     
    $c:aat 
    Activate analysis trace 
    $c:dat 
    Deactivate parsing trace 
    $c:att 
    Activate transfer trace 
    $c:dtt 
    Deactivate transfer trace 
    $c:agt 
    Activate generation trace 
    $c:dgt 
    Deactivate generation trace 
    $c:aft 
    Activate full trace 
    $c:dft 
    Deactivate full trace 
    OUTPUT
     
     
    $c:ago 
    Activate graphic output 
    $c:dgo 
    Deactivate graphic output 
    $c:alo 
    Activate list output 
    $c:dlo 
    Deactivate list output 
    $c:ato 
    Activate CPU time consumed output 
    $c:dto 
    Deactivate CPU time consumed output 
    $c:aps 
    Activate parsing status when incorrect Input 
    $c:dps 
    Deactivate parsing status 
    $c:auo 
    Activate unification output 
    $c:duo 
    Deactivate unification output 
    $c:atra 
    Activate Translation 
    $c:dtra 
    Deactivate Translation 
    $c:adtra 
    Activate direct translation 
    $c:ddtra 
    Deactivate direct translation 
    PARSING STAGES

    These options are ordered. Lower options include upper ones (i.e. unification activates all previous parsing stages)
     
    $c:apex 
    Activate parsing execution 
    $c:dpex 
    Deactivate parsing execution 
    $c:aprp 
    Activate Restriction Propagation 
    $c:dprp 
    Deactivate Restriction Propagation 
    $c:appp 
    Activate Psycholinguistic Preferences 
    $c:dppp 
    Deactivate Psycholinguistic Preferences 
    $c:aphe 
    Activate Parsing Heuristics 
    $c:dphe 
    Deactivate Parsing Heuristics 
    $c:apun 
    Activate Unification 
    $c:dpun 
    Deactivate Unification 
    $c:apch 
    Activate Coherence 
    $c:dpch 
    Deactivate Coherence 
    $c:apcp 
    Activate Completeness 
    $c:dpcp 
    Deactivate Completeness 
    TRANSFER STAGE
     
     
    $c:atrf 
    Activate transfer 
    $c:dtrf 
    Deactivate transfer 
    GENERATION STAGE
     
     
    $c:agen 
    Activate generation phase 
    $c:dgen 
    Deactivate generation phase 
    CURRENT LANGUAGE
     
     
    $c:clen 
    Change current language 
    NOT FOUND WORDS
     
     
    $c:anfw  Activate not found words 
    $c:dnfw  Deactivate not found words 
    TRANSLATION SOURCE AND TARGET
     
    $c:transl SOURCE_LANG -> TARGET_LANG  Translation Source and target 

    Printing Options

    $pcc 
    Display current configuration 
    $ptc 
    Display Tables of Coverage 
    $ptd 
    Display Tables of Derivation 
    $pal 
    Display Analysis Lexicon 
    $pag 
    Display Analysis Grammar 
    $phe 
    Display Heuristics 
    $ptr 
    Display Transfer Rules 
    $pgl 
    Display Generation Lexicon 
    $pgg 
    Display Generation Grammar 
    $pfull 
    Display Full Language (Analysis lexicon, grammar, etc.) 

    Execution Options

    $prs(LIDENTS) 
    Parse a string of words 
    $trf(LIDENTS) 
    Analyze and transfer a string of words 
    $gen(LIDENTS) 
    Analyze, transfer and generate a string of words 
    $ptg(LIDENTS) 
    Full translation of a string of words 
    $sal(IDENT) 
    Search for Lexical Entry in Analysis Lexicon 
    $sgl(IDENT) 
    Search for Lexical Entry in Generation Lexicon 

    Statistics Options

    $is 
    Initialize statistics 
    $ps(FILE) => to file $pls() => to screen 
    Display full statistics to file or screen 
    $pss(FILE) $pss() => to screen 
    Display short statistics to file or screen 

    Other options

    $f 
    File Inclusion 
    $q 
    Quit 
    $h 
    Help 

     

    Appendix 1: Simple Spanish Grammar. Version 1

    Appendix 2: Simple Spanish Grammar. Version 2

    Appendix 3: Simple Spanish Grammar. Version 3

    Appendix 4: Simple Spanish Grammar. Version 4

    Back to the top