Skip to content
Snippets Groups Projects
Commit dbd7f026 authored by Jan Trávníček's avatar Jan Trávníček
Browse files

improvements in user documentation

parent 6c660ebc
No related branches found
No related tags found
No related merge requests found
Pipeline #30665 passed
......@@ -189,7 +189,7 @@ pdfauthor={I am the Author} % author
\maketitle
 
\begin{abstract}
The algorithms library toolkit is a collection of datatypes and algorithms. The datatypes cover various kinds of automata, grammars, regexps, trees, tree regular expressions, strings, and some indexing structures. Algorithms cover manipulation and conversion algorithms of automata, grammars, regexps, tree regular expressions, some basic string and tree index creation algorithms, and tree and string matching algorithms. The implementation is in c++14 standard and heavily using templates. The toolkit comes with a command line interface and with still evolving graphical interface.
The Algorithms Library Toolkit is a collection of datatypes and algorithms. The datatypes cover various kinds of automata, grammars, regexps, trees, tree regular expressions, strings, and some indexing structures. Algorithms cover manipulation and conversion algorithms of automata, grammars, regexps, tree regular expressions, some basic string and tree index creation algorithms, and tree and string matching algorithms. The implementation is in c++17 standard and heavily using templates. The toolkit comes with a command line interface and with still evolving graphical interface.
\end{abstract}
\clearpage
 
......@@ -200,51 +200,49 @@ The algorithms library toolkit is a collection of datatypes and algorithms. The
 
The Algorithms Library Toolkit is an opensource project aiming to provide an implementation of algorithms from areas including automata theory, stringology, arbology and others. The project started with bachelor theses of students of Faculty of Information Technology of Czech Technical University in Prague. The original idea behind the library, formerly called Automata Library, is to provide another source material for students -- show the implementation of algorithms that manipulate automata. Soon it got extended with algorithms that use automata -- mainly from the area of stringology and arbology. Later, the library was used to base implementation of some state-of-the-art algorithms including efficient determinisation of subclass pushdown automata and some tree searching and indexing algorithms.
 
The library is designed to be mostly about stateless algorithms which manipulate datastructures. Data manipulated by algorithms can vary from automata through regexps to string and others. The design allows simple extension of functionality. If a new algorithm manipulates already existing datatype, it immediately fits into the ecosystem. Of course, new datatypes can be introduced as well. Datatypes and algorithms are designed to be maximally independent of each other, which allows easier maintenance.
The library is designed to be mostly about stateless algorithms which manipulate datastructures. Data manipulated by algorithms can vary from automata through regexps to strings and others. The design allows a simple extension of functionality. If a new algorithm manipulates already-existing datatype, it immediately fits into the ecosystem. Of course, new datatypes can be introduced as well. Datatypes and algorithms are designed to be maximally independent of each other, which allows easier maintenance.
 
Besides using algorithms to manipulate data, the datatypes allow casting when appropriate, i.e. DFA to NFA and similar.
 
The user interaction with the library is through binaries using pipes and filters philosophy, through a builtin interactive command line interface, or through a graphical interface.
The user interaction with the library is through a built-in interactive command line interface or a graphical interface.
 
The binaries are the oldest approach of user interface supported by the library and as they are designed as filters, they use some common communication format -- XML. The binaries and command line interface providing binary are compatible, meaning the binary providing command line interface can itself be used as one of the filters. Also following the pipes and filters philosophy, each binary is accumulating some related operations and combining more binaries together allows implementing more complicated algorithms.
The command line interface binary allows interaction with all algorithms through an interactive command line interface. The syntax of the language is similar to shell known from Unix operating systems. Internally it handles the passing of algorithm results to algorithm parameters directly without any transformation or manipulation. Such an approach allows the fastest interpretation of the algorithm interconnection. Even though the command line interface tries to hide some implementation details, a user can feel the fact that the implementation language of the library is c++ in some cases. For example, the command line interface is aware of datatypes and it is also unable to handle templates (used to design datatypes and algorithms) otherwise than statically. This, however, does not influence the overall functionality.
 
The command line interface binary is newer in design and it allows interaction with all algorithms through interactive command line interface. The syntax of the language similar to shell known from Unix. Internally it does not use any common communication format. Such an approach allows speeding up the interpretation of the algorithm description. Even though the command line interface tries to hide some implementation details, a user can feel the fact that the implementation language of the library is c++ in some cases. For example, the command line interface is aware of datatypes and it is also unable to handle templates (used to design datatypes and algorithms) otherwise than statically. This, however, does not influence the overall functionality.
The command line interface binary at first automatically loads existing datatypes and algorithm provided by the linked library. The command line interface also provides load and unload command to load and unload arbitrary library which can contribute to currently available datatypes and algorithms.
 
The command line interface binary can at start react to changes of algorithms provided by the library -- it detects newly registered algorithms automatically. The command line interface is planned to be extended to support procedural like language which may cause a drop of backward compatibility.
The command line interface is planned to be extended to support some procedural like language which may cause a drop of backward compatibility.
 
A limited graphical interface allowing to design complex algorithm is also provided. The limitation is in the number of available algorithms, which is a static subset of all algorithms. The graphical interface is planned to be extended to support all registered algorithm similarly to the command line interface.
A limited graphical interface allowing to design complex algorithm is also provided. The limitation is in the number of available algorithms, which is a static subset of all algorithms. Currently, the set is limited to algorithms manipulating automata, grammars, regexps. The graphical interface is planned to be extended to support all registered algorithm similar to the command line interface.
 
The toolkit is extensible and as long as the added datatype or algorithm connects itself to already existing code via casts or conversion algorithms, the extension can benefit from features already implemented.
\mainmatter
 
\chapter{Structure of the code}
 
The library consists of modules compiled to libraries and some accessors represented by binaries. Modules provide the functionality - either some core code, datatypes, or algorithms. Accessors are usually programmed as a single file with the main function. The accessor binary allows interaction with implemented algorithms from bash, or another shell like environment. One more complex accessor binary \emph{aql} exists. This aql accessor itself provides command line interface to all algorithms.
The library consists of modules compiled to libraries and some accessors represented by binaries. Modules provide the functionality -- either some core code, datatypes, or algorithms. Accessors are either the command line interface binary \emph{aql} or graphical interface binary \emph{agui}.
 
\section{Structure and overall description of modules}
 
Each module can be compiled separately. It shares a common makefile, which is parametrized with a configuration. Source files of the module are placed into \emph{src} directory. Modules are tested with appended unit tests. Unit tests are placed in a separate directory \emph{test-src}.
Each module can be compiled separately. It shares a common configuration file used by the cmake makefile generator. Source files of the module are placed into \emph{src} directory. Modules are tested with appended unit tests. Unit tests are placed in a separate directory \emph{test-src}.
 
When compiled the modules object files are, depending on the target type, placed into \emph{obj-debug} or \emph{obj-release} directory. Resulting shared library is placed into \emph{lib-debug} or \emph{lib-release}. Similarly object files of unit tests are placed into \emph{obj-test-debug} or \emph{obj-test-release}. An executable binary representing unit tests can be located in \emph{tet-bin-debug} or \emph{test-bin-release} directory.
The library uses a generator that prepares CMakeLists.txt file for each module and binary. CMake is then used to generate makefiles which provide targets for compiling everything, individual modules and binaries, or installation.
 
Compilation of each module requires its dependencies to be compiled as well. One can deside whether to compile only the module itself, only unit tests, or both. When both the code and tests are compiled, the tests are also executed. The respective targets are \emph{build-code-debug} (\emph{build-code-releae}), \emph{build-test-debug} (\emph{build-test-release}), or simply \emph{debug} (\emph{release}).
Compilation may be tuned with some parameters in the configuration file. Library name can be specified with LIBRARY parameter, the name of the test binary with TESTBIN. Dependencies split into those of the module and some additional ones required by unit tests. Of those the compilation environment also distinguishes other linked modules (LINK\_LIBRARIES) and system libraries which require to be linked to the shared library of the module (SYSTEM\_LIBRARIES). Include paths can also be specified for system libraries with SYSTEM\_INCLUDE\_PATHS. Prefix TEST\_ is used for both types of linked libraries and specification of includes for dependencies of unit tests.
The content of the module or binary configuration file specifies the category, i.e. library for module and executable for binary. Additionally, it may specify compilation groups. So far only gui and cli groups exist. The configuration file also contains a dependencies list of the module or binary. The dependencies can be on other project modules or on system libraries. Possible system libraries currently are threads, xml2, Qt5Widgets, Qt5Xml, graphviz, json, readline, and stdc++fs.
 
\section{Existing modules}
 
The algorithm library toolkit consists of three types of modules: core, feature, experimental.
The algorithm library toolkit currently consists of three types of modules: core, feature, and experimental.
 
The core modules include: alib2std (c++ standard library extensions), alib2measure (support for measurements), alib2abstraction (storage of registered datatypes and algorithms, alib2common (base datatypes), alib2xml (xml export and import of basic datatypes), and alib2cli (command line interface implementation).
 
The feature modules include alib2data (automata, grammar, regexps, and other datatypes), alib2aux (helper algorithms), alib2str (parsing and composing a string representation of some datatypes), alib2raw (parsing and composing of a raw representation of some datatypes), alib2algo (most of the algorithms).
The feature modules include alib2data (automata, grammar, regexps, and other datatypes), alib2aux (helper algorithms), alib2str (parsing and composing a string representation of some datatypes), alib2raw (parsing and composing of a raw representation of tree and string datatypes), alib2algo (most of the algorithms).
 
The experimental modules available mainly for testing purposes: alib2data\_experimental (experimental datatypes), alib2algo\_experimental (experimental algorithms), alib2elgo (more efficient implementation of some algorithms), alib2graph\_data (graph datatypes), alib2graph\_algo (graph algorithms), and alib2dummy (playground library).
The experimental modules are available mainly for testing purposes: alib2data\_experimental (experimental datatypes), alib2algo\_experimental (experimental algorithms), alib2elgo (more efficient implementation of some algorithms), alib2graph\_data (graph datatypes), alib2graph\_algo (graph algorithms), and alib2dummy (playground library).
 
\subsection{Standard library extensiton}
The module alib2std provides extensions to the c++ standard library. The extensions are placed in namespace \emph{ext}, not to collide with the same classes, functions ... in namespace \emph{std} and also because the standard disallows placing anything new into the std namespace except specialisations of templated types.
The module alib2std provides extensions to the c++ standard library. The extensions are placed in namespace \emph{ext}, not to collide with the same classes, functions... in namespace \emph{std} and also because the standard disallows placing anything new into the std namespace except specialisations of templated types.
 
This module exists to simplify some common operations with standard library containers, to serve as a place for backported code from newer standards then used currently. To name a few examples now consider a three-way comparison of standard library containers, a print of containers to standard stream with an overloaded operator, implementation of copy on write shared pointer, and some standard library containers modified to store an array of pointers instead of an array of values, while providing almost the same interface.
This module exists to simplify some common operations with standard library containers or to serve as a place for backported code from newer standards then used currently. To name a few examples now consider a three-way comparison of standard library containers, a print of containers to standard stream with an overloaded operator, implementation of copy on write shared pointer, and some standard library containers modified to store an array of pointers instead of an array of values, while providing almost the same interface.
 
To use these extended features use prepared includes. For example to use extensions of a standard vector include \emph{alib2/vector}.
 
......@@ -289,21 +287,21 @@ There is also a registration facility present in the module to allow interfacing
Some datatypes like strings and trees correspond to some basic file formats. String correspond to any file, tree to XML. This module allows interaction between mentioned datatypes and files of the given format - either reading the file and creating the representation of data or creating the file based on some provided data.
 
\subsection{alib2algo}
Main module where algorithms are implemented. Right now most of algorithms are present in this module. Algorithms include automata, grammar, regexp operation and conversions. String and tree matching and indexing. Some random data generators are implemented for trees and automata.
This is the main module where algorithms are implemented. Right now most of the algorithms are present in this module. Algorithms include automata, grammar, regexp operation and conversions. String and tree matching and indexing. Some random data generators are implemented for trees and automata.
 
Most of algorithms are also registered so that command line interface can call those. Those not registered ones are either support algorithms, where it does not make much sence or algorithms accepting something not yet supported on the command line interface level (like nontrivial function callback).
Most of the algorithms are also registered so that the command line interface can call those. Those not registered ones are either support algorithms, where it does not make much sense or algorithms accepting something not yet supported on the command line interface level (like nontrivial function callback).
 
\subsection{alib2cli}
The module is responsible for parsing of the command line interface commands. The module implements a simple LL1 grammar-based parser (and lexer) as recursive descent parser producing an internal representation of the command as an abstract syntax tree. The representation can be converted to graph consisting of abstractions over algorithm, file reads and writes, etc.
 
The command line interface language is limited, however, the language will be extended to fully support procedures in future releases. It supports some introspection into registered algorithm, their overloads, known datatypes, and casts. Execution of colon of commands is line-by-line. Each line can be quite complex, but there is no procedural extension, except variables.
The command line interface language is limited, however, the language will be extended to support procedures in future releases. It supports some introspection into the registered algorithm, their overloads, known datatypes, and casts. Execution of colon of commands is line-by-line. Each line can be quite complex, but there is no procedural extension, except variables.
 
The language is described in more details later.
 
\chapter{Concepts used in the implementation}
 
\section{Components}
Many data structures are defined as an n-tuple where the components are of different types and have some defined constraints. The components concept is present to aid with the design of data structures having this exact definition. A components concept is still in a stage of proof of concept. It does not support different types than sets and values. When extended, it will support maps, vectors, and trees to allow complete definition of most of data structures.
Many data structures are defined as an n-tuple where the components are of different types and have some defined constraints. The components concept is present to aid with the design of data structures having this exact definition. A components concept is still in a stage of proof of concept. It does not support different types than sets and values. When extended, it will support maps, vectors, and trees to allow complete definition of most data structures.
 
\subsection{Component as a base of datastructure}
 
......@@ -316,9 +314,9 @@ class DFA final : public AutomatonBase, public core::Components < DFA < SymbolTy
 
In this example, the deterministic finite automaton is constructed from a set of symbols represented by InputAlphabet component, two sets of states represented by States and FinalStates components, and a state represented by InitialState component.
 
The internal type of the component must provide a common interface required by the component behaviour scheme. So far supported schemes are \emph{component::Set} and \emph{component::Value}. The component set scheme expects the internal datatype to implement the interface of a set from the standard library. The component value scheme expects the internal datatype to behave like a primitive type.
The internal type of component must provide a common interface required by the component behaviour scheme. So far supported schemes are \emph{component::Set} and \emph{component::Value}. The component set scheme expects the internal datatype to implement the interface of a set from the standard library. The component value scheme expects the internal datatype to behave like a primitive type.
 
The name of the component can be specified in two ways. Either it can be specified by class name, where typically the class used here is incomplete. Or more names can be given at once to signal there are more components having the same structure and behaviour in the datatype definition. If more names are provided, the resulting components are instantiated for each name in a set and they are therefore independent instances.
The name of the component is specified by a class name, where typically the class used here is incomplete. The class name can be used directly in case the component is contained only once or if more components have the same structure and behaviour in the datatype definition a tuple of class names can be used as well. If more names are provided, the resulting components are instantiated for each name in a set, and they are therefore independent instances.
 
\subsection{Access to component content}
 
......@@ -328,9 +326,9 @@ The \emph{component::Set} scheme additionally supports \emph{empty}, \emph{remov
 
\subsection{Constraint specification}
 
To maintain consistency of datatypes constructed from components, some constraints need to be specified. These constraints are specified by additional code inside a class specialized with the concrete data type and component name. Given the component behaviour scheme, the constraints may differ.
To maintain consistency of datatypes constructed from components, some constraints need to be specified. These constraints are specified by additional code inside a class specialised with the concrete data type and component name. Given the component behaviour scheme, the constraints may differ.
 
The \emph{component::Value} behavior scheme requires existence of available and valid static methods inside the \emph{ElementContraint} class. The method available represents a check that the value to be set is available in other related components. The valid method represents additional check for validity of the set object. If needed, the method is supposed to throw an exception.
The \emph{component::Value} behavior scheme requires existence of available and valid static methods inside the \emph{ElementContraint} class. The method available represents a check that the value to be set is available in other related components. The valid method represents an additional check for the validity of the set object. If needed, the method is supposed to throw an exception.
 
\begin{lstlisting}
template<class SymbolType, class StateType >
......@@ -373,11 +371,11 @@ As mentioned the alib2abstraction module provides a facility to register algorit
First, the documentation focuses on the overview of the concept. Next, the registration of algorithms, Last, the casts and variables printing and other possibilities.
 
\subsection{Abstraction concept overview}
To provide an on-demand execution of algorithm based on the algorithm name and algorithm parameters, a lookup within available algorithm must be possible. The c++ language does not provide any form of introspection that would allow detection of the existence of methods within a class. Similarly, detection of the number of parameters, their types, qualifications of a given method is not available in the c++ language as well.
To provide an on-demand execution of algorithm based on the algorithm name and algorithm parameters, a lookup within the available algorithms must be possible. The c++ language does not provide any form of introspection that would allow detection of the existence of methods within a class. Similarly, detection of the number of parameters, their types, qualifications of a given method is not available in the c++ language as well.
 
The abstraction concept is designed to address this limitation of c++ language by registering available algorithms, casts, etc. internally, via some registration calls, to retrieve registered callable.
The abstraction concept is designed to address this limitation of c++ language by registering available algorithms, casts, etc. internally, via some registration calls and maintaining registered callables.
 
There are two approaches to registration in the abstraction module. Variables of registration class type, when constructed, carry on information about the registered algorithm to the inside of the abstraction module via a call of registration function. Such an approach was chosen to easily hook some code before the execution of the main function so that all registrations are done beforehand at a load time of a shared library. Registration function can also be called directly from any context to avoid the creation of global variable. Hence any algorithm can be registered at any time before or during the execution of the main function.
There are two approaches to registration in the abstraction module. Variables of registration class type, when constructed, carry on information about the registered algorithm to the inside of the abstraction module via a call of registration function. Such an approach was chosen to easily hook some code before the execution of the main function so that all registrations are done beforehand at a load time of a shared library. Registration function can also be called directly from any context to avoid the creation of a global variable. Hence any algorithm can be registered at any time before or during the execution of the main function.
 
\subsection{Execution of algorithms}
The abstraction is in general mostly about algorithms and their execution. The algorithm to execute is specified by its name and parameter types (also with the category which is however unused now). Then a list of overloads of the algorithm is selected from registered ones. The set is filtered based on the number of actual parameters, parameter types. The best candidate is selected and returned for execution. Effectively the overload resolution implemented within the abstraction concept is capable of multiple dispatch.
......@@ -567,17 +565,12 @@ arg
| IDENTIFIER
;
 
optional_arg
: arg
|
;
template_arg
: AT_SIGN arg
;
 
in_redirect_file
: ( LEFT_BRACKET arg RIGHT_BRACKET )? ( COLON_SIGN ( INTEGER | IDENTIFIER ) )? template_arg* arg
: ( LEFT_BRACKET arg RIGHT_BRACKET )? ( COLON_SIGN ( INTEGER | IDENTIFIER ) )? template_arg* ( arg | STRING )
;
 
in_redirect
......@@ -591,7 +584,7 @@ common
| STRING
| INTEGER
| HASH_SIGN ( INTEGER | IDENTIFIER )
| LEFT_BRACE ( COLON_SIGN ( INTEGER | IDENTIFIER ) )? ( CARET_SIGN? param ) * RIGHT_BRACE
| LEFT_BRACE ( COLON_SIGN ( INTEGER | IDENTIFIER ) ) ( CARET_SIGN? param ) * RIGHT_BRACE
;
 
param
......@@ -612,7 +605,7 @@ statement_list
;
 
out_redirect_file
: ( LEFT_BRACKET arg RIGHT_BRACKET )? arg
: ( LEFT_BRACKET arg RIGHT_BRACKET )? ( arg | STRING )
;
 
out_redirect
......@@ -623,7 +616,6 @@ out_redirect
 
result
| out_redirect
|
;
 
introspect_cast_from_to
......@@ -631,19 +623,23 @@ introspect_cast_from_to
;
 
introspect_command
: KW_ALGORITHMS optional_arg END
: KW_ALGORITHMS arg? END
| KW_OVERLOADS arg template_arg* END
| KW_DATATYPES optional_arg END
| KW_CASTS introspect_cast_from_to optional_arg END
| KW_DATATYPES arg? END
| KW_CASTS introspect_cast_from_to arg? END
| KW_VARIABLES ( DOLAR_SIGN arg )?
| KW_BINDINGS ( HASH_SIGN ( INTEGER | IDENTIFIER ) )?
;
 
parse
: EXECUTE statement_list result END
| QUIT END
| HELP optional_arg END
| QUIT statement_list? END
| EXIT statement_list? END
| HELP arg? END
| INTROSPECT introspect_command END
| SET ( INTEGER | IDENTIFIER ) ( INTEGER | IDENTIFIER | STRING ) END
| LOAD ( INTEGER | IDENTIFIER | STRING ) END
| UNLOAD ( INTEGER | IDENTIFIER | STRING ) END
;
\end{lstlisting}
 
......@@ -657,7 +653,11 @@ Overloads introspection requires exact algorithm name. The result is a list of s
 
Datatypes introspection allows printing datatypes that can be exported or imported as an XML file.
 
Casts introspection prints all available casts. Each cast is represented by to type and from type. The query may be limited to casts from or to the specific type
Casts introspection prints all available casts. Each cast is represented by to type and from type. The query may be limited to casts from or to the specific type.
An introspection command can list variables available in the environment of the command line interface. The type of variable can be also introspected by suffixing the variable name after introspect variables command.
Environment bindings are handled in the introspection similarly to variables.
 
Some examples of introspection commands follow.
 
......@@ -669,6 +669,8 @@ introspect datatypes
introspect casts
introspect casts :from int
introspect casts :to int
introspect variables
introspect bindings
\end{lstlisting}
 
\subsection{Execute}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment