- Giannis Siglidis
- Giannis Nikolentzos
- Stratis Limnios
- Christos Giatsidis
- Konstantinos Skianis
Host laboratory : Lix
Graph structured data become more and more important in the recent literature. Measure similarity between such data, was always considered a difficult task as it was related with the graph isomorphism. As a result machine learning applications where focused in dealing with vector representations of data. A field of research known as “graph kernels”, tries to address that problem by balancing complexity with expressivity. With efficient and successful applications in predicting toxicology of compounds inside chemo-informatics, in mutagenicity of compounds in bioinformatics to malware detection as well as in text categorization, it has emerged as a promising approach. As this tools have not yet been collected in a complete and modern programming library, this seems as an important and promising task as they mature. GraKeL has been proposed as the project that will cover and create the need of using Graph Kernels inside contemporary Machine Learning research and applications.
Technical description of the project
GraKeL is a compatible library programmed in python. As scikit-learn has become a very important package for doing machine learning in python, it has been designed to be scikit-compatible allowing kernels to be integrated in an nearly abstract inside pipelines for complicated machine learning tasks. Inside GraKeL there currently implemented 15 diverse graph kernels and 3 frameworks (that operate on top of all other kernels). It uses various modern tools such as Cython to include fast and reliable code from C++, as all modern scientific computing libraries. Allowing collaborative work with github repository (see https://github.com/ysig/GraKeL) and a free-software license (namely BSD 3-clause), it also includes a semi-automatic generated sphinx documentation (uploaded at https://ysig.github.io/GraKeL/dev/) as well as utilizing continuous integration tools to ensure it builds and runs successfully on all supported operating systems: Linux, Windows, OSX. It can found both in PyPI and Anaconda-Cloud (as found here https://pypi.org/project/grakel-dev/ and here https://anaconda.org/ysig/grakel-dev) supporting the equivalent binary for both platforms, for all the supported operating systems. Finally the project is submitted to JMLR with a submission paper that can be found here: https://arxiv.org/abs/1806.02193.
GraKeL can be both a first step for itself, namely for its expansion and for a more general project namely machine learning with graphs. Concerning the first, grakel can be further optimized concerning the kernel performances (e.g. scalability), as well as the code structuring. Also its object oriented structure can be further designed in order to integrate successful graph kernel frameworks (such as deep graph kernels or optimal-assignment kernels) in a consistent way. Concerning the second GraKeL can be a further part of a much bigger library namely one that can concentrate machine learning with graphs, as this field is becoming more and more promising.
- T. Gartner et al. “On graph kernels: Hardness results and efficient alternatives.” COLT 2003
- K. Borgwardt et al. “Protein function prediction via graph kernels”. Bioinformatics 2005
- Gascon et al. “Structural detection of android malware using embedded call graphs”. AISec 2013
- G. Nikolentzos et al. “Shortest-Path Graph Kernels for Document Similarity”. EMNLP 2017
- S. V. N. Vishwanathan et al. ” Graph Kernels “. JMLR 2010
- M. Sugiyama et al. “graphkernels: R and Python packages for graph comparison.” Bioinformatics 2018
- F. Pedregosa et al. “Scikit-learn: Machine Learning in Python”. JMLR 2010
- G. Siglidis et al. “GraKeL: A Graph Kernel Library in Python.” Arxiv 2018 (Submitted to JMLR)
- N. M. Kriege et al. “On Valid Optimal Assignment Kernels and Applications to Graph Classification” NIPS 2016
- P. Yanardag and S.V.N. Vishwanathan. “Deep Graph Kernels.” KDD 2015