Automatic taxonomy generation: Issues and possibilities
Raghu Krishnapuram, Krishna Kummamuru
IFSA 2003
The rising need for custom machine learning (ML) algorithms and the growing data sizes that require the exploitation of distributed, data-parallel frameworks such as MapReduce or Spark, pose significant productivity challenges to data scientists. Apache SystemML addresses these challenges through declarative ML by (1) increasing the productivity of data scientists as they are able to express custom algorithms in a familiar domain-specific language covering linear algebra primitives and statistical functions, and (2) transparently running these ML algorithms on distributed, data-parallel frameworks by applying cost-based compilation techniques to generate efficient, low-level execution plans with in-memory single-node and large-scale distributed operations. This paper describes SystemML on Apache Spark, end to end, including insights into various optimizer and runtime techniques as well as performance characteristics. We also share lessons learned from porting SystemML to Spark and declarative ML in general. Finally, SystemML is open-source, which allows the database community to leverage it as a testbed for further research.
Raghu Krishnapuram, Krishna Kummamuru
IFSA 2003
Limin Hu
IEEE/ACM Transactions on Networking
Corneliu Constantinescu
SPIE Optical Engineering + Applications 2009
Erich P. Stuntebeck, John S. Davis II, et al.
HotMobile 2008