Gregory Hildstrom Projects Publications Resume Contact About Youtube Donate

float composition

Float composition is a small program that demonstrates the actual composition of a 32-bit IEEE floating point number. This C/C++ program extracts the sign bit, 8 exponent bits, and 23 fraction (mantissa/significand) bits into separate integers. This is useful for experimentation, fast computations, and possibly compressing 32-bit floating point numbers using lossless integer compression on the individual components.
//floatcomposition.cpp
#include <iostream>
using namespace std;

/*
This code picks the sign bit, exponent bits, and mantissa bits out of a floating point number.
It then prints out the corresponding integers/bits.
Then it reconstructs a floating point number from the parsed integer/bit components.
It then prints out the resulting floating point number to verify that it is the same as the original.

One reason for doing this is to be able to compress 32-bit float data using lossless integer compression
less than 32 bits wide.
*/

int main(void){
	float n1;
	int n2int;
	float n2;

	int exp,mant,sign;//extracted from original float as integer
	int exps,mants,signs;//shifted back to original bit locations for rebuilding

	for(float i=-2; i < 2; i+=0.25){
		n1=i;

		//parse float components
		sign = (*(int*)&n1) >> 31 & 0x1;//right-shift by 31 bits, leaving sign bit in bit 0, keep 1 bit
		exp = (*(int*)&n1) >> 23 & 0xFF;//right-shift by 23 bits, leaving exponent in bits 0-8, keep 8 bits
		mant = (*(int*)&n1) & 0x7FFFFF;//no shift, mantissa is in bits 0-23, keep 23 bits

		//print out float value and component values
		cout << "n1: " << n1 << " sign: " << sign << " exp: " << exp << " mant: " << mant << endl;

		//rebuild new float from components
		signs = sign << 31 & 0x80000000;//left-shift sign bit back to original location
		exps = exp << 23 & 0x7F800000;//left-shift exponent bits back to original location
		mants = mant & 0x007FFFFF;//mantissa is already in original location
		n2int=signs|exps|mants;//bit-wise OR places all bits into correct float locations
		n2 = (*(float*)&n2int);

		//print out reconstructed float value to verify method worked
		cout << "n2: " << n2 << endl;

	}//end if

	return 0;
}//end main