1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
-
Updated
Apr 22, 2026 - Python
1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames.
A multi-cloud framework for big data analytics and embarrassingly parallel jobs, that provides an universal API for building parallel applications in the cloud ☁️🚀
Graph Sampling is a python package containing various approaches which samples the original graph according to different sample sizes.
Bucketize an image based on exhaust data and AI generated data. industry-solutions azure azure machine learning services computer-vision big data big data analytics machine learning image recognition manufacturing quality control cognitive services
The goal of this project is to offer an AWS EMR template using Spot Fleet and On-Demand Instances that you can use quickly. Just focus on writing pyspark code.
Plugin offering views, operators, sensors, and more developed at Pandora Media.
ARAKAT - Big Data Analysis and Business Intelligence Application Development Platform
Material de apoyo para cursos, Facultad de Minas, Universidad Nacional de Colombia
This project analyses and correlates student performance with different attributes. Then at last, it determines most suitable algorithm from bunch of them.
YiraBot: Simplifying Web Scraping for All. A user-friendly tool for developers and enthusiasts, offering command-line ease and Python integration. Ideal for research, SEO, and data collection.
Iot,Big Data Analytics using Apache-kafka,spark and other aws services
This project builds a scalable log analytics pipeline use Lambda architecture for real-time and batch processing of NASA server logs.
Repository for the Big Data Specialization from University of California San Diego on Coursera
Big Data Analytics lab repository with weekly tutorials, exercises and other resources.
Real-time YouTube comment sentiment analysis using Kafka, Spark, and Streamlit dashboard.
In this tutorial we explain how to get real time analytics of energy produced and consumed from two solar stations simulators using influxDB together with grafana hosted on the kubernetes engine of google
Building a next-generation hybrid data pipeline architecture that combines the power of Microsoft Fabric, Azure Cloud, and Power BI. This pipeline is engineered to tackle the challenges of real-time data ingestion, multi-layered processing, and analytics, delivering business-critical insights.
EpiData IoT Data Science Platform - Community Edition
This repository contains the reimplementation and extension of ViHateT5 (ACL 2024 Findings), a unified text-to-text framework for Vietnamese hate speech detection. Developed for the course DS200.Q21 – Big Data Analysis at the University of Information Technology (UIT – VNU-HCM).
SSVC Ore Miner - www.rapticore.com
Add a description, image, and links to the big-data-analytics topic page so that developers can more easily learn about it.
To associate your repository with the big-data-analytics topic, visit your repo's landing page and select "manage topics."