Vous êtes ici

Big data execution time based on Spark Machine Learning Libraries

Affiliation auteurs	!!!! Error affiliation !!!!
Titre	Big data execution time based on Spark Machine Learning Libraries
Type de publication	Conference Paper
Year of Publication	2019
Auteurs	Garate-Escamilla AKaren, Hassani AHajjam El, Andres E
Conference Name	PROCEEDINGS OF 2019 3RD INTERNATIONAL CONFERENCE ON CLOUD AND BIG DATA COMPUTING (ICCBDC 2019)
Publisher	ASSOC COMPUTING MACHINERY
Conference Location	1515 BROADWAY, NEW YORK, NY 10036-9998 USA
ISBN Number	978-1-4503-7165-0
Mots-clés	Apache Spark, Execution time prediction, Machine learning, Performance prediction model
Résumé	The paper focuses on exploring the time consumption of supervised and unsupervised models of Apache Spark framework in massive datasets. Big Data analytics has been relevant in the industry due to the need to convert information into knowledge. Among the challenge of big data is the creation of strategies to improve the execution costs of running machine learning models to make a prediction. Apache Spark is a powerful in-memory platform that offers an extensive machine learning library for regression, classification, clustering, and rule extraction. This investigation, from a computation cost perspective, performs different experiments using real datasets. The main contribution of the paper is to compare the execution time of different machine learning models, such as random forests, decision tree, logistic regression, linear support vector machine, and kNN. The present work expects to combine the areas of big data and machine learning, comparing the results with different configurations and the use of the optimization methods, cache and persist. The evaluation experiments show that logistic regression performed the shortest execution time of the Spark MLlib models.
DOI	10.1145/3358505.3358519