Jump to content

Applying Compression to Accelerate Big Data Searches


Guest_Jim_*

Recommended Posts

Big data, the practice of building very large datasets for analysis, has come to improve many fields of study. The thing is that large datasets can be a double-edge sword as they can be slow to search through. Researchers at MIT found that compressing these datasets can speed things up, and now they have explained why that is, and how it can be applied to other datasets.

The compression scheme the researchers have been employing takes advantage of two properties of many datasets. One is the metric entropy and the other is low fractal dimension. The metric entropy has to do with the area the data occupies, compared to the total space of possibilities, while low fractal dimension means that the density of data points within the set does not vary greatly, as you move around. What this allowed the researchers to do is to effectively compress the datasets they were working with, which contained genomic information, by identifying spheres of related data points that could then be represented by a single example. This makes searching easier as much of the data can be skipped over, based on examining the representative data points.

While the work began with genomic sequences, which do tend to be very similar to each other, it should be possible to successfully apply this method to other datasets. One example could be the analysis of Internet usage, which is likely similar along biological and/or cultural lines.

Source: MIT



Back to original news post

Share this post


Link to post
Share on other sites

×
×
  • Create New...