Over the last 6 months Raphtory has gone through a full rebuild, with the majority of the original project deprecated as
raphtory-akka. This includes replacing all the underlying tech stack, remaking both ingestion and analysis API’s, totally reworking local and distributed deployment and adding on a host of wonderful bells and whistles to boot. As such we have had a bit of a rebrand and are considering this the official Raphtory 0.1.0 release!
Below is a small summary of the new features that have been introduced. We shall soon be following up with individual blog posts on each. You can also read an in-depth dive in our Documentation.
Thanks to everyone who helped bring this together! Looking forward to the many exciting things we have planned for the future of Raphtory
Getting data in and out of Raphtory
Spouts and Graph Builders have been rewritten to be more flexible.
- They now interact via any serialisable class instead of just strings.
- Spouts now extend Iterable, requiring only a
hasNext()function - making them far easier to get non-standard datasources into.
- Output of algorithms is now handled by the Sink and Format interfaces. These define how to output your results to a location and the serialised format this should take.
- This allows code to connect to say AWS S3 to be defined once, and used in combination with a format for CSV, TSV, JSON, XML, etc
- This also enables us to define global formats, including all perspectives i.e. a singular valid JSON object for your whole query instead of individual JSON objects per-vertex as before.
- We have added a new connectors sub-project to the repository containing all the different Spouts and Sinks we support out of the box. These can be imported into your project as required and support a variety of things from AWS S3 to the Twitter Firehose.
Graph and Algorithmic Engine
tabularise()as the main algorithmic flow operators can now be performed in any sequence.
- Algorithms may be composed together with the
- Small amounts of global graph state can be stored as aggregators such as sum, product, min, max, any, all, within the algorithmic flow.
- Histogram API for storing distributions of vertex and edge quantities for algorithms and extracting quantiles.
- Introduction of vertex and edge filters for creating subgraph views of a perspective.
- Many new algorithms including three-node motifs, temporal taint tracking, max flow, prisoners’ dilemma and more.
- Temporal Multilayer View for modelling a temporal graph as a multilayer graph of snapshots with interlayer edges.
- Support and convenience functions for weighted networks and merging strategies for converting multiple temporal edges into a single weighted edge within a perspective.
- A host of convenience functions within the Vertex and Edge visitor objects, enabling cleaner algorithm code.
Graph perspective API
- More flexible ways of time slicing for expressing temporal queries using function composition. (see the documentation page for a full description of the new API on this).
Name changes of
- Support for natural language time descriptors on top of existing long timestamp specification. E.g.
25 June 2022, windowsize = 1 day.
- Increments and windows may be composed of multiple time frames with commas and ‘and’ to allow natural text. For instance, the interval “1 month 1 week 3 days” can equally be rewritten as “1 month, 1 week, and 3 days”
- Full handling of time including leap years, different month lengths and time zones.
- Raphtory now runs predominantly on top of Apache Pulsar, with Akka being used for analysis control messages.
- This means that all components are decoupled and can message eachother without fear of data being lost or causing a crash due to huge amounts of backpressure.
- This messaging is built on top of a communication layer abstraction which allows the medium for each topic (comms between two component types) to be set within conf. In later versions of Raphtory this will be exposed to the user, allowing Raphtory to run on other message brokers/technology stacks.
- Cluster management is now handled by zookeeper, which provides a central location to track partition IDs and store addresses for service discovery.
- We have begun a transition into Typelevel Cats for better state management and execution. This will be finalised in the next version.
- Raphtory has a new deployment API where local deployments are created through either
Raphtory.load()for closed datasets or
Raphtory.stream()for streaming datasets.
- This returns a Temporal Graph where queries can be built up in a much more expressive fashion as explored above.
- The Raphtory Service for distributed deployments has been fully overhauled, allowing all components to be easily spun up on bare-metal or as a container. For those wanting to give this a try, the process is fully documented on our ReadTheDocs page described below.
- To support automation and large scale deployments Raphtory is fully integrated with Kubernetes.
- We have even created a Deploy sub-project in the repo that will allow you to automatically spin up and shut down Raphtory components via fabric8.
- Whether you are running a local deployment or a distributed deployment you can now spin up a Client which can attach and submit new queries, with the results output to any Sink specified.
- This client interacts with Raphtory through the same TemporalGraph API as the local deployments.
- Raphtory 0.1.0 is now available on Maven, meaning you no longer need to build any of the jars. This includes the core, connectors and deploy packages.
Testing, Logging and Metrics
- Testing suite for algorithmic correctness and integration of all Raphtory components.
- CI/CD pipeline for all PR’s to Raphtory complete with test and build of core, connectors and the examples.
- The scourge of println’s has been replaced with logging throughout all packages, appropriately levelled and configurable through environment variables
- Metrics have been reenabled and expanded, encompassing the full ingestion and analysis pipeline. These are handled by Telemetry and scraped via Prometheus.
Documentation and Examples
- Getting set up with Raphtory, understanding the underlying frameworks and creating your own projects on top is fully explained in our new ReadTheDocs tutorials.
- Several example projects are now available within the repository covering social networks, cryptocurrencies, interaction networks and more. All of these can be used as a basis for your own applications.
- Every user facing function is fully commented and searchable via ScalaDocs.
- All algorithms included as part of Raphtory core have their purpose and parameters fully explained here.
Python Raphtory Client (Alpha)
- A Raphtory client for Python has been created for submitting established queries to a running Raphtory instance, meaning it is not necessary to interact with the Scala code after the initial setup, if preferred.
- Algorithm results can be outputted to this client for postprocessing and visualisation in Python.
- This is a new feature and will become more established with functionality and stability in the next release.
- Far too many to list here due to the full rewrite. But trust me there were a lot.