Floating point:

Note

bit 0=sign
bits 1-8=exponent+127
remaining bits are for the portion of the normalized value AFTER the decimal point

Solutions to homework, pg 53.


a= trucated version, b=rounded version
7.    .625=   a) 0011 1111 0010 0000 0000 0000 0000 0000 b)  same as (a)
                 +123 4567 8    |---- truncated/rounded off here
                 
                 .625 =  1*1/2 + 0/(2*2)+1/(2^3) = 101 with an ASSUMED decimal point after the
                 first digit, giving 1.01.  To indicate that we need to move the radix left 1,
                 to get the actual value, we need an exponent of -1. 
                 ActualExp+127=-1 +127= 126, which converts to binary as
                 16*7 + 14 =112 + 14 = 126 = 0x7E = 0111 1110
                 
		 now add the leading digit (0) for the sign (+), giving 0011 1111 0  (9 bits)
         now tack on the fractional part (101) 
         omitting the leading one, and fill with zeros on the right to get           0011 1111 0 100 
         now round off (or truncate) the last bit:    0011 1111 0100 0000

8.  25.625=   a) 0100 0001 1100 1101 0000 0000 0000 0000 b) same as (a)
                 +123 4567 8       |--- trunc/round here
  1. 25.625=16+0+4+2+0+.5+0+.125= 11001.101, which becomes 1.1001101 * 2^4
  2. Encoding the exponent as RealExp+127, we get 4+127=131, which is 128+3 in binary, which is 1000 0011.
  3. Now put the sign in front (0) and get 0100 0001 1. (9 bits)
  4. Next we tack on the fractional part, 1001101 to get 0100 0001 1 1001101.
  5. Regrouping gives the answer above 0100 0001 1100 1101
  6. To round this, using the "nearest" or "unbiased" approach, drop the last bit as in the previous problem.

Rounding floating-point numbers requires four steps which are:

  1. normalization,
  2. significand rounding,
  3. post normalization,
  4. exponent rounding
The IEEE standard has four different rounding modes.
Unbiased (or "nearest)
rounds to the nearest value. That is, if the number falls midway between two possible fp values, it is rounded to the nearest value with an even (zero) least significant bit. This mode is required to be the default in any IEEE compliant implementation.
Towards zero
Towards positive infinity
Similar to Ceiling
Towards negative infinity
Similar to Floor