Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. Data with many cases rows offer greater statistical powerwhile data with higher complexity more attributes or columns may lead to a higher false discovery rate.

Big data was originally associated with three key concepts: Current usage of the term "big data" tends to refer to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. Scientists encounter limitations in e-Science work, including meteorology, genomics, connectomics, complex physics simulations, biology and environmental research. Data sets grow rapidly- in part because they are increasingly gathered by cheap and numerous information-sensing Internet of things devices such as mobile devices, aerial remote sensing, software logs, cameras, microphones, radio-frequency identification RFID readers and wireless sensor networks.

Relational database management systems, desktop statistics [ clarification needed ] and software packages used to visualize data often have difficulty handling big data. The work may require "massively parallel software running on tens, or even thousands of servers". For others, it may take tens or hundreds of terabytes before data size becomes a significant consideration. The term has been in use since the s, with some giving credit to John Mashey for popularizing the term.

A definition states that "Big data represents the information assets characterized by such a high volume, velocity and variety to require specific technology and analytical methods for its transformation into value".

The growing maturity of the concept more starkly delineates the difference between "big data" and " Business Intelligence ": Big data can be described by the following characteristics: Data must be processed with advanced tools analytics and algorithms to reveal meaningful information.

For example, to manage a factory one must consider both visible and invisible issues with various components. Information generation algorithms must detect and address invisible issues such as machine degradation, component wear, etc.

Big data repositories have existed in many forms, often built by corporations with a special need. Commercial vendors historically offered parallel database management systems for big data beginning in the s. For many years, WinterCorp published a largest database report. Teradata Corporation in marketed the parallel processing DBC system.

