Technical challenges for big data in biomedicine and health: data sources, infrastructure, and analytics.

Research paper by N N Peek, J H JH Holmes, J J Sun

Indexed on: 16 Aug '14Published on: 16 Aug '14Published in: Yearbook of medical informatics


To review technical and methodological challenges for big data research in biomedicine and health.We discuss sources of big datasets, survey infrastructures for big data storage and big data processing, and describe the main challenges that arise when analyzing big data.The life and biomedical sciences are massively contributing to the big data revolution through secondary use of data that were collected during routine care and through new data sources such as social media. Efficient processing of big datasets is typically achieved by distributing computation over a cluster of computers. Data analysts should be aware of pitfalls related to big data such as bias in routine care data and the risk of false-positive findings in high-dimensional datasets.The major challenge for the near future is to transform analytical methods that are used in the biomedical and health domain, to fit the distributed storage and processing model that is required to handle big data, while ensuring confidentiality of the data being analyzed.