Wiktor K. Macura

Manhattan, New York
wmacura@gmail.com

Work and Projects

Square / Engineering Manager (2013–2016)

I joined as the first engineer in the NYC office to help build our engineering presence: sourced, recruited, interviewed, and managed engineers. Worked with leadership in SF to establish the local hiring/training/promoting process, stretching existing processes when appropriate. NYC engineering grew to 30 people with half reporting to me.
Employee Management and Multi-Location Support: Recognizing that we could not get merchants with multiple locations to use Square, I built and led a team to add support for employee authentication and tracking as well as multi-location management. We launched in 2015 and within a year this was our largest SaaS product. A full stack team, we worked in Rails, Go, Java, Javascript, iOS, and Android.
Retail: I took over the largest engineering team in NYC and led our work in developing a brand new point of sale targeted at Retail merchants. I was responsible for our overall technical direction as well as managing the 15 person engineering team. We launched in early 2017.
Large Merchant Platform: Was a senior technical advisor in our relationships with large merchants using Square (Uniqlo, Wholefoods, etc.). Regularly met with executives on both sides and was responsible for our technical infrastructure.

Tumblr / Lead, Platform Infrastructure (2011–2013)

Lead for the Data Pipeline project (4 FTEs) responsible for the live stream of user events that is the datafeed into other services (spam, search, analytics, etc).
Built the metrics and real-time analytics service at Tumblr (on top of OpenTSDB).
Acknowledging Kafka Consumer: Built a custom Kafka consumer that implements individual message acknowledgment semantics within each client. Following the model of ZookeeperConsumer, I implemented all logic in the client with ZooKeeper providing synchronization. Messages are elastically sharded across clients within the same consumer group. Combined with efforts to tune our hardware configuration, this improved message lag from ~4 seconds to 75ms (at the 99.9th percentile).
Metrics Aggregator and Dashboarding Built a collectd extension with Scala to filter, aggregate, and write metrics into an OpenTSDB cluster. The extension also exposes an endpoint for aggregating business metrics from the web application. The largest deployed Scala service at Tumblr, it runs on every node creating 150,000 datapoints a second. Built a dashboard frontend to OpenTSDB that replaced various Graphite, Munin, and Cacti installations.
Dense, Distributed ID Generation Designed a distributed and fault-tolerant scheme for generating IDs while minimizing waste of the keyspace (IDs are near consecutive under stable operation). Uses ZooKeeper for synchronization across multiple hosts and to allow elastically adding and removing machines from the cluster. Has served 5 billion requests so far.

Wolfram Alpha / Senior Engineering Lead (2007–2011)

Joined a team of ten people to work on the project which became Wolfram Alpha. My primary role was building software to augment the work of chemists, physicists, etc in organizing and exposing their field’s data and computations.
Built three of the five natural language parsers in use.
Designed and lead the effort to build a database of ‘real world facts’ which are poorly structured. For example, how to represents capital cities of countries intelligently when South Africa has three. This let us answer questions like “what is the largest country in europe”.
Led development of new technologies working with teams across the company (eg, quality assurance, legal, and business development)
Managed eight developers, trained them in best-practices, working to breakdown their projects into weekly deliverables
Dynamic Data Storage Model: I architected and lead the development of an internal toolset to model and query highly relational datasets. It was designed as the primary datastore for the Wolfram Alpha dataset. (a scalability goal of 1000 entity types, 100,000 relations, and 10TB of data. Contributed to by 100 developers). Related to an ORM, but on a massive scale, the schema was interacted with and modified through APIs in a REPL, rather than the declarative style common to ORMs. I designed a querying language around satisfying relation dependencies agnostic of the underlying implementation, whether SQL or graph database. The project was integrated as a core technology of Wolfram Alpha and forms the foundation for other, currently unannounced products.
- built to store messy, real-world data we had to support relational queries on strongly typed data while still being able to store and output the occasional arbitrarily typed value (eg, South Africa has three capital cities)
- used almost exclusively by content-experts with no database experience, we had to abstract away native database types and automatically choose indexes through heuristics; the API tried to make users naturally think about problems like data normalization
- a significant graphical component was developed to present schema organization and hierarchy visually
- built to support querying across traditional RDBMS and non-SQL databases, esp. graph databases, the implementation heavily abstracted concepts like primary keys and one-to-many relationships
- the majority of the code and all the logic was in Java
- the API was exposed to developers through Mathematica
Data Versioning System: The data storage model (above) was designed to be used by tens of developers, concurrently modifying interconnected datasets. It was apparent we would have to robustly deal with conflicting changes. I designed the data-storage API as a thin layer over a foundation of operational transforms for both schema and data modifications. This gave us a strong, formalized core, around which we built a distributed version control system of transcripts of transform operations. Around these transcripts we developed tools to resolve simple conflicts. We developed a testing framework to enforce each operation's pre-condition validation and its execution, both in unit tests and integrated tests through to the database.
- A separate team dealt with implementing the data archiving and network transport components, as well as a web-viewer and graphical interface for analyzing diffs between commits.
- We built optimizations allowing a developer to “narrow” checkout only the portion of the dataset they were interested in modifying, while still allowing them to perform relational queries over their local and remote data
Packrat Parser: I developed a recursive descent, memoizing parser (ie, a “packrat” parser). It was built to improve performance on a highly ambiguous subsection of the Wolfram Alpha grammar. Left recursion was robustly and correctly supported. Performance was heavily optimized algorithmically, particularly on sequences of highly ambiguous tokens as well as at a low-level by finding hotspots. I invented a few optimizations for parsing short inputs with a truly massive grammar. The parser's asymptotic improvements allowed Wolfram Alpha to finally handle cross-sectional queries. It was a core component of Wolfram Alpha for two years until eventually being phased out as part of a general rearchitecting of the parser.
- developed completely in Mathematica
- I also developed the grammar for inputs like “largest countries in europe with gdp below 35 billion”
Programmatic Data Visualization: I developed a framework and some heuristics to choose data visualizations for arbitrarily structured data. This supported a variety of core visualizations: time series scatter plots, potentially in multiple dimensions; pie-charts for data grouped into a small set of categories; log or linear histograms; bubble charts; and some more esoteric 3D visualizations. The emphasis was more on the framework, which was used by other developers to add their own visualizations. It allowed easy specification of heuristics based on data size, relative category size, log/linear growth, number of datasets under comparison, etc.
- developed entirely with functional code in Mathematica

Internships

TechGuard Security (2006): Developed neural networks for use in firewalls for a contractor to the DIA. Ultimately used bayesian methods since they were more accurate at the time.
NASA Goddard (2005): Updated 2D engineering designs of the TRMM satellite into 3D to allow computer modeling and simulation of behavior.

Patents

Method and System for Analyzing Data Using a Query Answering System (Wolfram Alpha)
Dynamic example generation for queries (Wolfram Alpha)

Education

University of Maryland, Baltimore County

unfinished B.S., Computer Science (2002–2006). Started academic leave senior year to work for Wolfram Research on what became Wolfram Alpha.

Logistics

U.S. Citizen
English (native)
Polish (passable conversationally; slow reader)