Showing 10 of total 10 results (show query)
sparklyr
sparklyr:R Interface to Apache Spark
R interface to Apache Spark, a fast and general engine for big data processing, see <https://spark.apache.org/>. This package supports connecting to local and remote Apache Spark clusters, provides a 'dplyr' compatible back-end, and provides an interface to Spark's built-in machine learning algorithms.
Maintained by Edgar Ruiz. Last updated 12 days ago.
apache-sparkdistributeddplyridelivymachine-learningremote-clusterssparksparklyr
959 stars 15.20 score 4.0k scripts 21 dependentsr-spark
sparklyr.flint:Sparklyr Extension for 'Flint'
This sparklyr extension makes 'Flint' time series library functionalities (<https://github.com/twosigma/flint>) easily accessible through R.
Maintained by Edgar Ruiz. Last updated 3 years ago.
apache-sparkdata-analysisdata-miningdata-sciencedistributeddistributed-computingflintremote-clusterssparksparklyrstatistical-analysisstatisticsstatssummarizationsummary-statisticstime-seriestime-series-analysistwosigma-flint
9 stars 6.46 score 54 scriptsbnosac
spark.sas7bdat:Read in 'SAS' Data ('.sas7bdat' Files) into 'Apache Spark'
Read in 'SAS' Data ('.sas7bdat' Files) into 'Apache Spark' from R. 'Apache Spark' is an open source cluster computing framework available at <http://spark.apache.org>. This R package uses the 'spark-sas7bdat' 'Spark' package (<https://spark-packages.org/package/saurfang/spark-sas7bdat>) to import and process 'SAS' data in parallel using 'Spark'. Hereby allowing to execute 'dplyr' statements in parallel on top of 'SAS' data.
Maintained by Jan Wijffels. Last updated 4 years ago.
26 stars 6.01 score 23 scriptseu-ecdc
epitweetr:Early Detection of Public Health Threats from 'Twitter' Data
It allows you to automatically monitor trends of tweets by time, place and topic aiming at detecting public health threats early through the detection of signals (e.g. an unusual increase in the number of tweets). It was designed to focus on infectious diseases, and it can be extended to all hazards or other fields of study by modifying the topics and keywords. More information is available in the 'epitweetr' peer-review publication (doi:10.2807/1560-7917.ES.2022.27.39.2200177).
Maintained by Laura Espinosa. Last updated 1 years ago.
early-warning-systemsepidemic-surveillancelucenemachine-learningsignal-detectionsparktwitter
56 stars 5.98 score 86 scriptsmlverse
pysparklyr:Provides a 'PySpark' Back-End for the 'sparklyr' Package
It enables 'sparklyr' to integrate with 'Spark Connect', and 'Databricks Connect' by providing a wrapper over the 'PySpark' 'python' library.
Maintained by Edgar Ruiz. Last updated 5 days ago.
databrickspysparksparkspark-connect
15 stars 5.58 score 13 scriptsrstudio
graphframes:Interface for 'GraphFrames'
A 'sparklyr' <https://spark.rstudio.com/> extension that provides an R interface for 'GraphFrames' <https://graphframes.github.io/>. 'GraphFrames' is a package for 'Apache Spark' that provides a DataFrame-based API for working with graphs. Functionality includes motif finding and common graph algorithms, such as PageRank and Breadth-first search.
Maintained by Kevin Kuo. Last updated 6 years ago.
graphframesgraphspageranksparksparklyr
37 stars 5.19 score 84 scriptsmlr-org
mlr3db:Data Base Backend for 'mlr3'
Extends the 'mlr3' package with a backend to transparently work with databases such as 'SQLite', 'DuckDB', 'MySQL', 'MariaDB', or 'PostgreSQL'. The package provides two additional backends: 'DataBackendDplyr' relies on the abstraction of package 'dbplyr' to interact with most DBMS. 'DataBackendDuckDB' operates on 'DuckDB' data bases and also on Apache Parquet files.
Maintained by Michel Lang. Last updated 1 years ago.
bigquerydata-backenddatabaseduckdbmachine-learningmariadbmlr3mysqlodbcpostgresqlsparksqlite
21 stars 4.77 score 17 scriptsmiraisolutions
sparkbq:Google 'BigQuery' Support for 'sparklyr'
A 'sparklyr' extension package providing an integration with Google 'BigQuery'. It supports direct import/export where records are directly streamed from/to 'BigQuery'. In addition, data may be imported/exported via intermediate data extracts on Google 'Cloud Storage'.
Maintained by Martin Studer. Last updated 3 years ago.
19 stars 4.58 score 4 scriptschezou
sparkavro:Load Avro file into 'Apache Spark'
Load Avro Files into 'Apache Spark' using 'sparklyr'. This allows to read files from 'Apache Avro' <https://avro.apache.org/>.
Maintained by Aki Ariga. Last updated 5 years ago.
12 stars 4.53 score 14 scriptsnathaneastwood
catalog:Access the 'Spark Catalog' API via 'sparklyr'
Gain access to the 'Spark Catalog' API making use of the 'sparklyr' API. 'Catalog' <https://spark.apache.org/docs/2.4.3/api/java/org/apache/spark/sql/catalog/Catalog.html> is the interface for managing a metastore (aka metadata catalog) of relational entities (e.g. database(s), tables, functions, table columns and temporary views).
Maintained by Nathan Eastwood. Last updated 3 years ago.
sparksparklyrsparklyr-extension
4 stars 3.63 score 16 scripts