Here is a brief guide on how to build the SRI LM tools and associated libraries. 1 - Unpack. This should give a top-level directory with the subdirectories listed in README, as well as a few documentation files and a Makefile. For an overview of SRILM, see the paper in doc/paper.ps . For reference information, look in man/html . 2 - Set the SRILM variable in the top-level Makefile to point to this top-level directory (an absolute path). 3 - You need a SunOS (Sparc or i386), IRIX 5.x, Alpha OSF, Linux i686, Mac OSX, or CYGWIN platform to compile out of the box. For other OS/cpu combinations you will have to modify the sbin/machine-type script to detect (and name) the platform type, and create a file common/Makefile.machine. that defines platform-dependent makefile variables. As a workaround, the MACHINE_TYPE variable can also be set on the make comand line make MACHINE_TYPE=foo ... in which case no changes to sbin/machine-type are needed. Some platform-specific notes may be found in doc/README.. Even on the known platforms you might have to edit common/Makefile.machine. . Candidates for changes are CC, CXX: choose compiler or compiler version. For example, you will probably have to change the path to the compiler driver. DEMANGLE_FILTER: If the "c++filt" program is not installed on your system set this variable to empty. TCL_INCLUDE, TCL_LIBRARY: to whatever is needed to find the Tcl header files and library. If Tcl is not available, set NO_TCL=X and leave the above variables empty. 4 - You need the following free third-party software to build SRILM: - gcc version 3.4.3 or higher (older versions might work as well, but are no longer supported). SRILM is occasionally tested with other compilers, see the portability notes in the CHANGES file. - GNU make - John Ousterhout's Tcl toolkit, version 7.3 or higher (this is currently used only for some test programs, but is needed for the build to go through without manual intervention). - Additional platform-dependent prerequisites are mentioned in doc/README.-, e.g., doc/README.windows-cygwin. The following tools are needed at runtime only: - GNU awk (gawk), to interpret many of the utility scripts - gzip, to read/write compressed files - bzip2, to read/write .bz2 compressed files (optional) - p7zip, to read/write .7z compressed files (optional) For Windows 9x or NT, you will need the CYGWIN UNIX compatibility environment, which includes all of the above. The MinGW and Visual C platforms will also work, but with some loss of functionality. See doc/README.windows-* for more information. Links to these packages can be found on the SRILM download page (http://www.speech.sri.com/projects/srilm/download.html). 5 - In the top-level directory, run gnumake World or make World (if the GNU version is the system default) This will create the directories bin/ lib/ include/ build everything and install public commands, libraries and headers in these directories. Binaries are actually installed in subdirectories indicating the platform type. To create binaries for a platform that is not the default on your system, use make MACHINE_TYPE=xxx, e.g. make MACHINE_TYPE=i686-m64 World # 64-bit binaries for Linux make MACHINE_TYPE=msvc World # MS Visual C++ on Windows 6 - The result of the above should be a fair number of .h and .cc files in include/, libraries in lib/$MACHINE_TYPE, and programs in bin/$MACHINE_TYPE. In your shell, set the following environment variables: PATH add $SRILM/bin/$MACHINE_TYPE and $SRILM/bin MANPATH add $SRILM/man 7 - To test the compiled tools, change into the $SRILM/test directory and run gnumake all This exercises the most important (though not all) functionality in SRILM and compares the results to reference outputs. If discrepancies are reported, examine the output files in $SRILM/test/output and compare them to the corresponding files in $SRILM/test/reference. 8 - After a successful build, clean up the source directories of object and binary files that are no longer needed: gnumake cleanest 9 - (Optional) To build versions of the libraries and executables that are optimized for space rather than speed, run gnumake World OPTION=_c gnumake cleanest OPTION=_c The libraries will appear in ${SRILM}/lib/${MACHINE_TYPE}_c, with executables in ${SRILM}/bin/${MACHINE_TYPE}_c . The data structures used in these versions use sorted arrays rather than hash tables, which wastes less memory, but is also somewhat slower. The directory suffix "_c" stands for "compact". Other versions of the binaries can be built in a similar manner. The compile options currently supported are OPTION=_c "compact" data structures OPTION=_s "short" count representation OPTION=_l "long long" count representation OPTION=_g debuggable, non-optimized code OPTION=_p profiling executables 10 - Recent versions of gawk may not perform correct floating-point arithmetic unless either LC_NUMERIC=C or LC_ALL=C is set in the environment. This affects many of the scripts in utils/. 11 - Be sure to let me know if I left something out. Andreas Stolcke stolcke@speech.sri.com $Date: 2007/11/11 16:47:05 $