资源说明:Ruby tools for handling floating point representations
[![Gem Version](https://badge.fury.io/rb/float-formats.svg)](http://badge.fury.io/rb/float-formats) [![Build Status](https://travis-ci.org/jgoizueta/float-formats.svg)](https://travis-ci.org/jgoizueta/float-formats) # Introduction Float-Formats is a Ruby package with methods to handle diverse floating-point formats. These are some of the things that can be done with it: * Encoding and decoding numerical values in specific floating point representations. * Conversion of floating-point data between different formats. * Obtaining properties of floating-point formats (ranges, precision, etc.) * Exploring and learning about floating point representations. * Definition and testing of new floating-point formats. # Installation To install the gem manually: gem install float-formats You can find the code in GitHub: * http://github.com/jgoizueta/float-formats/ # Predefined formats A number of common formats are defined as constants in the Flt module: ## IEEE 754-2008 **Binary** floating point representations in little endian order: * IEEE_binary16 (half precision), * IEEE_binary32 (single precision), * IEEE_binary64 (double precision), * IEEE_binary80 (extended), IEEE_binary128 (quadruple precision) and as little endian: IEEE_binary16_BE, etc. **Decimal** formats (using DPD): * IEEE_decimal32, IEEE_decimal64 and IEEE_decimal128. **Interchange binary & decimal** formats: * IEEE_binary256, IEEE_binary512, IEEE_binary1024, IEEE_decimal192, IEEE_decimal256. Others can be defined with IEEE.interchange_binary and IEEE.interchange_decimal (see the IEEE module). ## Legacy Formats of historical interest, some of which are found in file formats still in use. **Mainframe/supercomputer** formats: Univac 1100 (UNIVAC_SINGLE, UNIVAC_DOUBLE), IBM 360 etc. (IBM32, IBM64 and IBM128), CDC 6600/7600: (CDC_SINGLE, CDC_DOUBLE), Cray-1: (CRAY). **Minis**: PDP11 and Vaxes: (PDP11_F, PDP11_D, VAX_F, VAX_D, VAX_G and VAX_H), HP3000: (XS256, XS256_DOUBLE), Wang 2200: (WANG2200). **Microcomputers** (software implementations): Apple II: (APPLE), Microsoft Basic, Spectrum, etc.: (XS128), Microsoft Quickbasic: (MBF_SINGLE, MBF_DOUBLE), Borland Pascal: (BORLAND48). **Embedded systems**: Formats used in the Intel 8051 by the C51 compiler: (C51_BCD_FLOAT, C51_BCD_DOUBLE and C51_BCD_LONG_DOUBLE). **Minifloats**: AI Formats: (BFLOAT16, MSFP8, MSFP9, MSFP10, MSFP11) Khronos Vulkan unsigned formats: (KHRONOS_VULKAN_UNSIGNED10, KHRONOS_VULKAN_UNSIGNED11) ## Calculators Formats used in HP RPL calculators: (RPL, RPL_X), HP-71B formats (HP71B, HP71B_X) and classic HP 10 digit calculators: (HP_CLASSIC). # Using the pre-defined formats require 'rubygems' require 'float-formats' include Flt The properties of the floating point formats can be queried (which can be used for tables or reports comparing different formats): Size in bits of the representations: puts IEEE_binary32.total_bits # -> 32 Numeric radix: puts IEEE_binary32.radix # -> 2 Digits of precision (radix-based) puts IEEE_binary32.significand_digits # -> 24 Minimum and maximum values of the radix-based exponent: puts IEEE_binary32.radix_min_exp # -> -126 puts IEEE_binary32.radix_max_exp # -> 127 Decimal precision puts IEEE_binary32.decimal_digits_stored # -> 6 puts IEEE_binary32.decimal_digits_necessary # -> 9 Minimum and maximum decimal exponents: puts IEEE_binary32.decimal_min_exp # -> -37 puts IEEE_binary32.decimal_max_exp # -> 38 ## Encode and decode numbers For each floating-point format class there is a constructor method with the same name which can build a floating-point value from a variety of parameters: * Using three integers: the sign (+1 for +, -1 for -), the significand (coefficient or mantissa) and the exponent. * From a text numeral (with an optional `Numerals` format specifier) * From a number : converts a numerical value to a floating point representation. Examples: File.open('binary_file.dat','wb'){|f| f.write IEEE_binary80('0.1').to_bytes} puts IEEE_binary80('0.1').to_hex(true) # -> CD CC CC CC CC CC CC CC FB 3F puts IEEE_binary80(0.1).to_hex(true) # -> CD CC CC CC CC CC CC CC FB 3F puts IEEE_binary80(+1,123,-2).to_hex(true) # -> 00 00 00 00 00 00 00 F6 03 40 puts IEEE_decimal32('1.234').to_hex(true) # -> 22 20 05 34 A floating-point encoded value can be converted to useful formats with the to_ and similar methods: * `split` (split as integral sign, significand, exponent) * `to_text` * `to(num_class)` Examples: v = IEEE_binary80.from_bytes(File.read('binary_file.dat')) puts v.to(Rational) # -> 1/10 puts v.split.inspect # -> [1, 14757395258967641293, -67] puts v.to_text # -> 0.1 puts v.to(Float) # -> 0.1 puts v.to_hex # -> CDCCCCCCCCCCCCCCFB3F puts v.to_bits # -> 00111111111110111100110011001100110011001100110011001100110011001100110011001101 puts v.to_bits_text(16) # -> 3ffbcccccccccccccccd ## Special values: Let's show the decimal expression of some interesting values using 3 significative digits: fmt = Numerals::Format[mode: :general, rounding: [precision: 3]] puts IEEE_SINGLE.min_value.to_text(fmt) # -> 1e-45 puts IEEE_SINGLE.min_normalized_value.to_text(fmt) # -> 1.18e-38 puts IEEE_SINGLE.max_value.to_text(fmt) # -> 3.40e38 puts IEEE_SINGLE.epsilon.to_text(fmt) # -> 1.19e-7 ## Convert between formats v = IEEE_EXTENDED.from_text('1.1') v = v.convert_to(IEEE_SINGLE) v = v.convert_to(IEEE_DEC64) # Tools for the native floating point format This is an optional module to perform conversions and manipulate the native Float format. require 'float-formats/native' include Flt puts float_shortest_dec(0.1) # -> 0.1 puts float_significant_dec(0.1) # -> 0.10000000000000001 puts float_dec(0.1) # -> 0.1000000000000000055511151231257827021181583404541015625 puts float_bin(0.1) # -> 1.100110011001100110011001100110011001100110011001101E-4 puts hex_from_float(0.1) # -> 0x1999999999999ap-56 puts float_significant_dec(Float::MIN_D) # -> 5E-324 puts float_significant_dec(Float::MAX_D) # -> 2.2250738585072009E-308 puts float_significant_dec(Float::MIN_N) # -> 2.2250738585072014E-308 Together with flt/sugar (from Flt) can be use to explore or work with Floats: require 'flt/sugar' puts 1.0.next_plus-1 == Float::EPSILON # -> true puts float_shortest_dec(1.0.next_plus) # -> 1.0000000000000002 puts float_dec(1.0.next_minus) # -> 0.99999999999999988897769753748434595763683319091796875 puts float_dec(1.0.next_plus) # -> 1.0000000000000002220446049250313080847263336181640625 puts float_bin(1.0.next_plus) # -> 1.0000000000000000000000000000000000000000000000000001E0 puts float_bin(1.0.next_minus) # -> 1.1111111111111111111111111111111111111111111111111111E-1 puts float_significant_dec(Float::MIN_D.next_plus) # -> 1.0E-323 puts float_significant_dec(Float::MAX_D.next_minus) # -> 2.2250738585072004E-308 # Defining new formats New formats are defined using one of the classes defined in float-formats/classes.rb and passing the necessary parameters in a hash to the constructor. For example, here we define a binary floating point 32-bits format with 22 bits for the significand, 9 for the exponent and 1 for the sign (these fields are allocated from least to most significant bits). We'll use excess notation with bias 127 for the exponent, interpreting the significand bits as a fractional number with the radix point after the first bit, which will be hidden: Flt.define( :MY_FP, BinaryFormat, fields: [:significand,22,:exponent,9,:sign,1], bias: 127, bias_mode: :scientific_significand, hidden_bit: true ) Now we can encode values in this format, decode values, convet to other formats, query it's range, etc: Numerals::Format[mode: :general, rounding: [precision: 3]] puts MY_FP('0.1').to_bits_text(16) # -> 1EE66666 puts MY_FP.max_value.to_text # -> 7.8804e115 You can look at float-formats/formats.rb to see how the built-in formats are defined. # License This code is free to use under the terms of the MIT license. # References [*Floating Point Representations.* C.B. Silio.](http://www.ece.umd.edu/class/enpm607.S2000/fltngpt.pdf) Description of formats used in UNIVAC 1100, CDC 6600/7600, PDP-11, IEEE754, IBM360/370 [*Floating-Point Formats.* John Savard.](http://www.quadibloc.com/comp/cp0201.htm) Description of formats used in VAX and PDF-11 ### IEEE754 binary formats [*IEEE-754 References.* Christopher Vickery.](http://babbage.cs.qc.edu/courses/cs341/IEEE-754references.html) [*What Every Computer Scientist Should Know About Floating-Point Arithmetic.* David Goldberg.](http://docs.sun.com/source/806-3568/ncg_goldberg.html) ### DPD/IEEE754r decimal formats [*Decimal Arithmetic Encoding. Strawman 4d.* Mike Cowlishaw.](http://www2.hursley.ibm.com/decimal/decbits.pdf) [*A Summary of Densely Packed Decimal encoding.* Mike Cowlishaw.](http://www2.hursley.ibm.com/decimal/DPDecimal.html) [*Packed Decimal Encoding IEEE-754-r.* J.H.M. Bonten.](http://home.hetnet.nl/mr_1/81/jhm.bonten/computers/bitsandbytes/wordsizes/ibmpde.htm) [*DRAFT Standard for Floating-Point Arithmetic P754.* IEEE.](http://www.validlab.com/754R/drafts/archive/2007-10-05.pdf) ### HP 10 digits calculators [*HP CPU and Programming*. David G.Hicks.](http://www.hpmuseum.org/techcpu.htm) Description of calculator CPUs from the Museum of HP Calculators. [*HP 35 ROM step by step.* Jacques Laporte](http://www.jacques-laporte.org/HP35%20ROM.htm) Description of HP35 registers. *Scientific Pocket Calculator Extends Range of Built-In Functions.* Eric A. Evett, Paul J. McClellan, Joseph P. Tanzini. Hewlett Packard Journal 1983-05 pgs 27-28. Describes format used in HP-15C. ### HP 12 digits calculators *Software Internal Design Specification Volume I For the HP-71*. Hewlett Packard. Available from http://www.hpmuseum.org/cd/cddesc.htm *RPL PROGRAMMING GUIDE* Excerpted from *RPL: A Mathematical Control Language*. by W. C. Wickes. Available at http://www.hpcalc.org/details.php?id=1743 ### HP-3000 *A Pocket Calculator for Computer Science Professionals.* Eric A. Evett. Hewlett Packard Journal 1983-05 pg 37. Describes format used in HP-3000 ### IBM [*IBM Floating Point Architecture.* Wikipedia.](http://en.wikipedia.org/wiki/IBM_Floating_Point_Architecture) [*The IBM eServer z990 floating-point unit*. G. Gerwig, H. Wetter, E. M. Schwarz, J. Haess, C. A. Krygowski, B. M. Fleischer and M. Kroener.](http://www.research.ibm.com/journal/rd/483/gerwig.html) ### MBF [*Microsoft Knowledbase Article 35826*](http://support.microsoft.com/?scid=kb%3Ben-us%3B35826&x=17&y=12) [*Microsoft MBF2IEEE library*](http://download.microsoft.com/download/vb30/install/1/win98/en-us/mbf2ieee.exe) ### Borland *An Overview of Floating Point Numbers.* Borland Developer Support Staff [*Pascal Floating-Point Page.* J R Stockton.](http://www.merlyn.demon.co.uk/pas-real.htm) ### 8-bit micros This is the MS Basic format (BASIC09 for TRS-80 Color Computer, Dragon), also used in the Sinclair Spectrum. *Numbers are followed by information not in listings* Sinclair User October 1983 http://www.sincuser.f9.co.uk/019/helplne.htm *Sinclair ZX Spectrum / Basic Programming.*. Steven Vickers. Chapter 24. http://www.worldofspectrum.org/ZXBasicManual/zxmanchap24.html ### Apple II *Floating Point Routines for the 6502* Roy Rankin and Steve Wozniak. Dr. Dobb's Journal, August 1976, pages 17-19. ### C51 [*Advanced Development System* Franklin Software, Inc.](http://www.fsinc.com/reference/html/com9anm.htm) ### CDC6600 *CONTROL DATA 6400/6500/6600 COMPUTER SYSTEMS Reference Manual* Manuals available at http://bitsavers.org/ ### Cray *CRAY-1 COMPUTER SYSTEM Hardware Reference Manual* See pg 3-20 from 2240004 or pg 4-30 from HR-0808 or pg 4-21 from HP-0032. Manuals available at http://bitsavers.org/ ### Wang 2200 [*Internal Floating Point Representation*](http://www.wang2200.org/fp_format.html)
本源码包内暂不包含可直接显示的源代码文件,请下载源码包。