R-Analytik Beschleunigung

Vorbereitung großer Datenmengen und Ausführen von R in Eclipse

R ist eine freie Programmiersprache und Softwareumgebung, die Statistiker und Data Miner für Analysen und Vorhersagen verwenden und ist als Visualisierungswerkzeug für große Datenmengen bekannt geworden. Da R jedoch alle seine Objekte im Speicher hält, kann es nicht effektiv mit sehr großen Datensätzen arbeiten.

Das SortCL-Programm in der IRI Voracity Big Data Management Plattform oder dem eigenständigen IRI CoSort Paket ist eine schnelle, einfache und kostengünstige Möglichkeit, große Daten für R effizient aufzubereiten - sowohl in Bezug auf Jobdesign als auch auf die Laufzeitperformance.

Als wir einen SortCL-Sortier-, Join- und Aggregationsauftrag vor (und statt) R ausgeführt haben, wurden die Time-to-Visualisierungen in Tools wie ggplot oder qplot halbiert:

R arbeitet nur mit mehreren kleinen Datenblöcken und benötigt mehrere Code-Dateien, um das gleiche Ergebnis wie ein einzelner SortCL-Job zu erzielen. Hadoop ist eine weitere Möglichkeit, große Datensätze schnell für R vorzubereiten und Voracity-Benutzer können die meisten SortCL-Aufträge nahtlos in Map Reduce 2, Spark, Spark Stream, Storm oder Tez ohne zusätzliche Programmierung ausführen.

Weitere Informationen zum Benchmark und wie SortCL Daten in derselben Eclipse-Umgebung aufbereiten kann (über das StatET für R Plug-in für IRI Workbench), finden Sie hier.

Cookie	Dauer	Beschreibung
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Dauer	Beschreibung
_gcl_au	3 months	Provided by Google Tag Manager to experiment advertisement efficiency of websites using their services.
CONSENT	16 years 3 months 4 days 12 hours 23 minutes	YouTube sets this cookie via embedded youtube-videos and registers anonymous statistical data.

Cookie	Dauer	Beschreibung
IDE	1 year 24 days	Google DoubleClick IDE cookies are used to store information about how the user uses the website to present them with relevant ads and according to the user profile.
test_cookie	15 minutes	The test_cookie is set by doubleclick.net and is used to determine if the user's browser supports cookies.
VISITOR_INFO1_LIVE	5 months 27 days	A cookie set by YouTube to measure bandwidth that determines whether the user gets the new or old player interface.
YSC	session	YSC cookie is set by Youtube and is used to track the views of embedded videos on Youtube pages.
yt-remote-connected-devices	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.
yt-remote-device-id	never	YouTube sets this cookie to store the video preferences of the user using embedded YouTube video.