![]() ![]() ![]() However, they use the server’s local disk, which does not scale well. Steps 2 and 3 handle pure data pre-processing tasks, but take more than 50% of the total processing time.Random Forest training and model validation - The core step of model retraining, which takes refreshed data, retrains the model, and validates the model performance before persisting the model and the output files to external storage.Imputation and aggregation - Interpolate, aggregate and apply custom logic for existing variable data to create direct input data as features.Variable creation - The first step for preparing feature data as input for the model training.Tagging Fraud and down-sampling - WePay specific rules for tagging certain payments as fraudulent and targeting a mixture of fraud and non-fraud payments as the source of training data.Each row of the table represents an event/payment which has all the needed features for further processing of the event. Rollup and Merge - Transpose the key-value pair of transaction signal data into a multi-column, sparsely-populated data table. ![]() Signals pull from BigQuery - Run a few long BigQuery queries to pull multi-day transactional information from various tables, so that we have all the existing data stored on a flat file local to the single server node.WePay started off by using a single server node to handle the entire workflow as described in the figure 1: We also want to refresh the models more frequently, so as to make use of newly detected fraud patterns to fight more complex attacks. We are able to use the refreshed models to fight fraudsters who commit collusion fraud, perform credit card testing using stolen cards, or take over accounts.Īs our data grows, we need to retrain the models faster while consuming fewer resources. In a previous blog post, our Data Science team described how we use a Random Forest algorithm to achieve an optimal combination of model and system performance in building an automated machine learning pipeline that refreshes daily. WePay uses various machine-learning models to detect fraudulent payments and manage risk for payers, merchants and their platforms. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |