* * * * * * * * * * * * * * * * * * * * * *
 * CRPC  - C-based remote procedure call   *
 * Version 0.7.6                           *
 *                                         *
 * MANUAL INSTRUCTIONS                     *
 * Copyright (C) 2006-2009                 *
 * Andrey V. Babanin. All rights reserved. *
 * * * * * * * * * * * * * * * * * * * * * *

 Contents:

 1. Technical review, user instructions
    a. Crpcc wrapper-compiler and language extension
    b. Running applications
    c. CRPC function unique identifiers
    d. Utilities
 2. Compilation and installation
 3. Running attached test

* 1. Technical review and user instructions ************************************

  CRPC is an open source remote procedure call system with the C language 
 support. CRPC is able to atomize programming of the network and 
 multi-threaded applications in Unix environment. CRPC extends standard C 
 language with new modificators. With the system you should only use two new 
 modificators to declare client and server side functions and two modificators 
 for traffic control, also new modificators are available for creating parallel 
 functions and mutual variables (parallelism based on POSIX Threads).

  CRPC wrapper-compiler supports all the C base data types, and all the data 
 types defined in the source code (i.e. all the data types). The system allows 
 sending data addressed by pointer of any type. System uses an automatic 
 function distinguishing method based on checksum. The default network protocol
 is TCP/IP. Also CRPC allows easily parallelize functions and has a 
 capability for automatic mutex adjustment before and after using mutual 
 variables (for POSIX Threads only). Also there is a capability of partitioning 
 data for utilization in threaded functions. Two types of parallelism is 
 available - network and non-network. System uses only C standard library
 and could be compiled on FreeBSD, MacOSX and Linux operating systems.

 ******************************************************	
 ** a. Crpcc wrapper-compiler and language extension **
 ******************************************************

  The CRPC system contains three components: Crpcc (C wrapper-compiler), 
 CRPC library and CRPC kernel module. 

  The core component of the system is the C wrapper-compiler, which reads your 
 source program and extends it with the network specific code. It's a 
 simplified C compiler which wraps the ordinary GCC-C compiler. It reads on 
 input an extended C program, gives on output an ordinary C program and
 calls the GCC compiler. Also Crpcc makes preprocessing and linking if needed. 
 Crpcc could be used with make either. 

  Crpcc is built on the originally designed syntax analyzer, which scans the
 given source text for suitable function declarations, checks data types and
 computes checksums for the function declarations text. The resulting program 
 extended with the network or threaded code is given for processing by GCC. 
 The input/output system of Crpcc is based on mmap system call and gives good 
 time results, making additional processing almost "invisible".

  Crpcc extends C syntax with the new reserved words:

 __remote - new storage class qualifier for marking remote functions
            (remote execution);
 __local -  new storage class qualifier for functions which are to be 
            called as remote (local execution);
 __attribute__
  (( __format_ptr(...) )) - new function attribute for linking pointers
                            with the appropriate size parameters;

 __in - new modificator, which tells the system to send data addressed
        by the marked parameter only to the server and not to receive it back
        (used only with function parameters);
 __out - new modificator, which tells the system not to send data 
         addressed by the marked parameter to the server but only to receive 
         it back (used only with function parameters);
 __threaded - new modificator for marking functions, which are to be executed
              in several threads, using POSIX Threads library. Could be
              combined with __local modificator;
 __mutual or
 __ptmutual - new storage class qualifier for marking variables, which should be
              mutual in parallel executed functions. (Yet disabled)

  Crpcc doesn't support old-style K&R C prototypes, so it should be converted 
 first.

 Notes for using above modificators:

 - __remote and __attribute__((__format_ptr())) could be applied only to 
 function prototypes, __local could be used with function declarations 
 either.

 - Attribute parameters are written in array-like notation, so "1[0]" 
 says that size for the array (pointer) at position 1 is stored in the variable
 at position 0. The size variable is integer only. A single pointer can have 
 any data type declared in the source code, but duple pointer can only have 
 the type of char** (duple char pointers are not fully implemented). Crpcc 
 denies pointers of a higher degree in function declarations extended with
 the new modificators.

 - When function parameter is a pointer but has no size parameter associated
 with it, only the 'sizeof(type)' bytes will be sent to the remote host. Also
 note that array parameter declared with '[]' will be automatic in __local
 function.

 - __remote qualifier can not be used simultaneously with another storage 
 class qualifier. However __local can be combined with any standard storage 
 class qualifier, except __remote. Static and external functions can be 
 declared with __local qualifier.

 - __in and __out modificators are used only with function parameters and 
 intended for the network traffic minimization. By default all the remote 
 function parameters are in the in/out mode, so when the call for remote 
 function is occur all the data is sent to server and after utilization this 
 data is received by the client. The char** parameters are always automatically
 __in. Also __in will prevent source data on the client from changing on the 
 server. However __out arrays could be filled on the server and be shared
 between server functions.

 - __threaded parameter could be applied to __local function prototypes and to
 ordinary (not-extended) functions. Having such mark, the function will be 
 automatically parallelized by the Crpcc compiler.

  There are two possible scenarios of building distributed applications 
 with CRPC. The first is writing the program from scratch, and the second 
 is "refactoring" of the existing source code.

  Consider the second scenario, more complicated.
  To create client and server part of the application, you should scrutinize 
 a program and decide what functions are comfortable for distribution. Then 
 left only prototypes and calls for these functions in the source file and 
 copy function declarations to another file. Mark the prototypes in the first 
 file with the __remote qualifier and the functions in the second file with 
 the __local qualifier. If pointers are used as parameters in remote functions, 
 the new attribute should be applied. For example:

 Client side:                          | Server side:
 --------------------------------------|--------------------------------------- 
 int __remote func(int, double *)      |  __local int func(int n,double *data);
  __attribute__((__format_ptr(1[0]))); |
                                       |
 void main(int argc, char **argv) {    |  int func(int n, double *data) {
    ...                                |    ...
    func(n,da);                        |  }
    ...                                |
 }                                     |
 ------------------------------------------------------------------------------

  Threaded functions must be declared with __threaded modificator, and could 
 be of two types - network and non-network. Network threaded functions could 
 be used only on the server side and must be declared with __local modificator.
 Non-network threaded functions must be declared with __threaded modificator 
 only. The difference between these two types of functions is that network 
 functions are to be called through the network and will be automatically 
 executed in threads, and the data returned by the function will be sent back 
 to the client. Such functions are not for the direct call. Non-network 
 functions can not be called through the network, and must be directly called 
 by any function in program, does not matter client or server side or 
 an any non-network application. Non-network threaded functions are suitable 
 for automatic parallelization of any ordinary program. Network threaded 
 functions are suitable for automatic parallelization on the sever side.

 For example, threaded declaration:

 ------------------------------------
 __local __threaded int mul(int n);

 int mul(int n) {
     int _res;
     ...
     res+=_res;
 }
 ------------------------------------

  Any threaded function has two constant build-in variables - CRPC_THR_I and
 CRPC_THR_NO, which stores the current thread identification and total thread
 number. This two variables could be used for data partitioning, execution 
 control and statistics. By default, any thread gets the same parameters as
 initially declared in prototype. Also note, that '...' parameter could not be
 used in threaded functions.

  Threaded functions can not call __remote functions, because CRPC protocol
 is not support it.

  For the enhanced utilization of threaded functions a special partitioning
 function is available. Partitioning function should be declared after the 
 threaded function declaration and be unique for every threaded function. 
 The name of the function must be 'CRPC_THR_PARTITION_name', where 'name' is 
 the name of the corresponding threaded function. The type must be 
 void(*)(void). This function also has built-in variable CRPC_THR_NO, but this 
 variable is not constant and could be used for setting up the number of 
 threads. The default value is equal to the jobs number given on server 
 startup. Partitioning function has special mark CRPC_THR_CREATE_JOIN, which 
 locates the position of the thread create and join loops. This mark can be 
 placed into loops or conditional expression. This mark gives you ability for 
 controlling create and join loops, change the number of threads to start. 
 Also the threaded function parameters are available as pointers of appropriate 
 type with names *ARG0,*ARG1,.. . Partitioning function is very useful when 
 threaded function has a network type. Because you can not call any function 
 before it, the corresponding partitioning function will be called 
 automatically and will control the execution of the threaded function. For 
 example, the partitioning function for the above 'mul' function will look 
 like this:

 --------------------------------
 void
 CRPC_THR_PARTITION_mul(void) {
     
     int i;
     CRPC_THR_NO=*ARG%2;

     for(i=0;i<n;i++) {
        CRPC_THR_CREATE
     }
 }
 --------------------------------			 

  Method based on partitioning function and special built-in variables is aimed 
 to solve the problem of dividing source data arrays between several threads and
 thread number controlling without interfering with the initial logic of the 
 function.

  On the server side a special CRPC_MAIN function is available. As 'main' will
 be generated automatically, CRPC_MAIN is a substitution and will be called
 at the server startup. CRPC_MAIN must be unique in the assembly.

  Server side applications can call __remote functions. To connect with the
 remote host it should use a connection group, defined in the configuration
 file. Also note that __remote functions can be called by any kind of
 servers. If you are going to call it from CRPC server, you should give
 a communication group on the start. If __remote functions are called from
 an non-CRPC server, a connection must be set with crpc_set_cgroup()
 function before remote call, and the server should be compiled with
 Crpcc.

  After compilation you will obtain either client or server part of the
 application. If the source file contains any __local declaration - 
 the resulting program will be server, if there is any __remote declaration -
 the program will be client.  The client part must have the 'main' function 
 defined, however the server part must not have it, because the server side 
 is a daemon process and the entry point for this program will be generated 
 by the Crpcc. Also client applications must have 'argc' and 'argv' parameters 
 declared. If the program contains __local and __remote declarations 
 simultaneously it will not compile.

  Crpcc has the same interface as GCC and could be used instead of it. Crpcc 
 does not modify GCC command line arguments in any way. The Crpcc input is a C 
 source or preprocessed file. To use Crpcc with make, the 'CC' variable has to 
 be set to 'Crpcc' in makefile. All the programs compiled by the Crpcc are 
 linked with the libcrpc shared library. If needed the program with be linked
 with POSIX Threads library automatically.

 *****************************
 ** b. Running applications **
 *****************************

  There are three ways to run the client application - using configuration 
 file, command line parameter and using default settings.

  The configuration file crpc.conf must be placed to either /etc/ or 
 /usr/local/etc/. The file contains the set of entries, where each entry is of 
 format:

 ------------------------------------
 @hostname:port [domain,type,proto]
    prog1 prog2
    ...
    progN;
 ------------------------------------

  The '@' symbol is a an entry marker, the hostname parameter is a CRPC server 
 hostname or IP address. The port parameter is optional, whether it is omitted
 the default port 50000 is used. The next three socket parameters are: domain,
 type and protocol. This parameters are optional either, by default the triad
 IPv4, SOC_STREAM and IPPROTO_TCP is used, where 'inet', 'stream' and 'tcp'
 are the appropriate subscript. This release supports IPv4 and SOC_STREAM only.

  At a new string list of programs follows. Each program can enter in several 
 lists with the different communication data, but the client will try to 
 connect first using the data in the uppermost entry. If connection fails, 
 the next entry is used.

  Also communication groups can be defined, each group can be of types:
 Simple - communication will be set with the first avalable host, and 
 Rand - communication will be set with the randomly selected host.
  Sample entries are:

 -----------------------------------
 
 # Simple entry

 @localhost:1236 [inet,stream,tcp]
  clnt;
 
 @192.0.0.1
  clnt
  clnt1 clnt2;

 # Group g1 entries

 @g1@localhost:5432 sort_clnt;
 @g1@localhost:9876 sort_clnt;

 -----------------------------------

  To connect the client with the specified server a command line parameter
 '-crpc-serv' is available. The arguments are hostname and port, for example

 $clnt -crpc-serv localhost:1236

  To connect using communication groups use '-crpc-group' parameter, which 
 takes two attributes - group name and group type. If no type is given
 the default Simple is used. Notation for group types is 'S' and 'R'.

 $clnt Param1 Param2 -crpc-group g1,S

 Note that this command line parameter will be deleted from the argv of the
 client program and will not affect on argv parsing. Communication parameter
 will be considered before parsing configuration file.

  Running client without communication parameter and having no corresponding
 entry in the configuration file will result in using default settings - port 
 number 50000 and current hostname, if no hostname is available the localhost
 is used.

  Server-side options are '-h', '-p', '-s', '-j', '-X', '-R', '-C'. 
 The '-h' option is a server hostname or IP address, if it is omitted the default 
 system's hostname or localhost is used.
 The '-p' option is the port to listen, if omitted the default port 50000 is 
 used. 
 The '-s' option is the server socket domain, type and protocol. 
 Notation used for the arguments is the same as for the client socket 
 parameters. 
 The forth '-j' options is the default number of threads to run
 parallelized functions.  
 The '-X' option is used to force the server to run the calling process, 
 not to fork, this options is for debugging.
 With the '-R' option server will redirect stdout and stderr to the client's
 socket, so client will be able to print the output generated in the remote
 function.
 The '-C' option enables transmission over SSL, parameter for the option
 is a tag that addresses key and certificate files in /etc/crpc/ directory.
 To generate these files files gen-cert 

 For example server could be started by:

 $serv -h localhost -p 1236 -s inet,stream,tcp -j 4 -R

  Running server without communication parameters will result in using 
 defaults, the same as for the client.

  Any CRPC server is an ordinary daemon process, so it uses syslog(3) to talk
 with the outside world. To be able to read server messages the syslogd
 is to be configured.

 *****************************************
 ** c. CRPC function unique identifiers **
 *****************************************

  All functions in the system are distinguished by the unique identifier, which 
 is a 32-bit integer number. The identifier is a checksum figured out from the 
 function prototype text. The checksum calculation rules is not trivial, so 
 only function name, pointer operators, return and parameter types are 
 significant. So the client and the server can be compiled in different time 
 and machine, and such approach guaranties that only the same functions will 
 have the same identifiers and these identifiers will be calculated 
 automatically. The checksum algorithm is derived from rsync program.

  On startup every server builds the list of available functions. Any time 
 the client calls the function, server tries to find the appropriate 
 identifier in the local list, if it is not exist the process dies and no 
 network intercommunication is done.

 ******************
 ** d. Utilities **
 ******************

  Also the tool called crpcutil accompanies the package. These are
 The crpcutil is a helper program, which has three command line
 options: '-p', '-i', '-d'. Run it with the '-p' option to print the 
 function list for the server specified by pid.
  Running it with '-i' option will force the utility to recursively walk the 
 directory tree and look for all the Makefiles or BSDmakefiles to append them
 with 'CC= Crpcc' line. 
  Running crpcutil with '-d' option will delete 'CC= crpcc' line from 
 makefiles.