Big Data Revolution

BigData1As we welcome the new year 2018, I clearly see the emerging trends leading to Big Data. Calling it a revolution as it has changed the whole perspective and paradigm to look and treat the data. With high speed internet becoming more and more handy with masses with FTTH (Fiber To The Home) technology coupled with falling prices of broadband data connection, AI, Augmented Intelligence with Machine Learning getting wider acceptance across Enterprise IT World, the apprehensions about data security getting busted and low cost of high speed and storage resulting in high volumes of data; all lead to growing demand of Big Data.

BigData2

Started with 4Vs (Volume, Velocity, Variety and Veracity) Big Data has come a long way and is still evolving, evolving fast, adding more and more tools and techniques into its ecosystem.  The misconceptions, the myths about Big Data limitations and usage are fast fading out and the world is gearing up to embrace it wholeheartedly.

Let’s talk about the myths about limitations of Big Data:

Myth 1:  Hadoop is only for batch processing. The fact is it does provides real time analytics and interacts well with other real time big data tools like Scala.

Myth 2:  Data Security is not enough. With the evolution, Big data has come a long way and the ecosystem has built enterprise grade security onto Hadoop platforms including Cloudera, Hortonworks, MapR etc. There are now excellent data governance capabilities

Myth 3:  Big Data is for unstructured data only. Big Data widely used for unstructured data, providing alternate strategies and solutions to store huge volumes of unstructured data, providing parallel processing to reduce storage and access time, but is equally good for structured data.

In fact many of these myths are limited to MapReduce usage but as the Big Data evolves, there are plenty of alternate technologies and tools become part of ecosystem. While MapR, a Java-based tool is powerful enough to chomp Big Data and flexible enough to allow for good progress doing so, the coding is anything other than easy. Giving below a few effective and poplular alternatives to MapR:

  1. PigLatin, originally developed by Yahoo to maximize productivity and accommodate a complex procedural data flow, eventually became an Apache project, and has characteristics that resemble both scripting languages (like Python and Pearl) and SQL. In fact, many of the operations look like SQL: load, sort, aggregate, group, join, etc. It just isn’t as limited as SQL. Pig allows for input from multiple databases and output into a single data set.
  2. Hive also looks a lot like SQL at first glance. It accepts SQL-like statements and uses those statements to output Java MapReduce code. It requires little in the way of actual programming, so it’s a useful tool for teams that don’t have high-level Java skills or have fewer programmers with which to produce code. Initially developed by the folks at Facebook, Hive is now an Apache project.
  3. Spark is widely been hailed as the end of MapReduce. Born in the AMPLab at the University of California in Berkley, Spark, unlike Pig and Hive, which are merely programming interfaces for the execution framework, replaces the execution framework of MapReduce entirely. Spark provides most effective memory and resource usage, almost 100 times faster the MapR. Spark also provides many features, including stream processing, data transfer, fast fault recovery, optimized scheduling, and a lot more.

These alternatives have effectively reduced an organization’s dependency on Java.

If you are a late entrant into Big Data space, one benefit is that you won’t have to waddle through all of the platforms and products that came and went during the early years.

Initially it was just Hadoop Distributed File System (HDFS) then came MapReduce, Yarn and a plethora of various products, some of which blossomed and became mature parts of the Hadoop ecosystem. Others petered out or are still puttering around wondering what they’re going to be when they grow up.

BigDataEcosystem

Today, there are quite a number of Big Data products and platforms to pick from to assemble an infrastructure that meets your needs.

Giving below a few significant players, who have made a niche in the Big Data space:

Tez:  A generalized data-flow programming framework, built on Hadoop YARN, Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.

Hive:  A Datawarehouse infrastructure that provides data summarization, to query HDFS, Database / support SQL, for Structured DB only. Best for BI Apps.  It can be used independently as well,

PigLatin: A high level data flow language, to query HDFS, shell, can be used independently as well. Best used for ETL

HBase: Scalable distributed database that supports structured data storage for large tables, No Sql database, Database / NoSQL, can be used independently as well. Can have downtime. Speed is fast. Best used for NoSql database

Cassandra:  Scalable multi-master database with no single points of failure, Uses No Sql database, Database / NoSQL, can be used independently as well. Limitation of slow read and write but highly scalable and high availability , Best used for Large/Sensitive NoSQL database.

Mahout:  Scalable machine learning and data mining library, Machine Learning, ML programming framework.

Oozie:  Workflow scheduler system, it is used for managing Hadoop apache jobs, Workflow Management.

Sqoop:   Import/Export utility, to import/export data from Hadoop to various database.

Flume: Robust, mature and proven tool for streaming data. Used to import/export real time data.

Hue: Hadoop user Experience, It is an web interface that supports Hadoop and its ecosystem.

Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation. Framework can be used independently as well.

Scala: Language in which spark is written. Language, can be used independently as well. Can run unmodified hive queries on existing Hadoop deployment, An alternative to HQL.

Spark streaming: Spark module for performing streaming analytics, Enables analytical and interactive apps for live streaming data. It’s a faster alternative of Flume.

Mlib: Spark module for performing Machine Learning, Machine Learning libraries being built on top of Spark, ML programming framework, can be used independently as well. It is a replacement of Mahout from Hadoop,

GraphX: Graph Computation engine. Combine data parallel and graph parallel concepts, engine.

SparkR: Package for R language to enable R users to leverage spark power from R shell, To access Spark data from R. A package.

PySpark:, To access spark data from Python, It’s  a shell.

Julia: Analytical/Mathematical Language used for Data Modeling/Visualization. It’s a language, can be used independently as well, for structured and unstructured data.

R: Analytical/Mathematical Language used for Data Modeling/Visualization, It’s a language, can be used independently as well, for Structured Database only.

Python: A General Purpose language. Can be used for data modeling. Can be used independently as well, for structured/unstructured data.

While Hive, PigLatin, HBase, Cassandra, Mahout, Oozie, Sqoop, Flume, Hue are Hadoop components,  Many of these tools like R, Python and Julia are originated independently and now part of ecosystem.

For more information contact me at atul@nexcen.in.

Advertisements

Automation To Succeed

Why HR/Payroll/Accounting must be automated

Concept of teamwork
Process (HR/Payroll/Accounting) Automation

Behind every successful business is an effective Human Resource department. HR ensures that employees’ goals are aligned with organization goals to maximize organization outputs by company’s most important asset – People!

HR is responsible for hiring the right candidate, managing employee details, tracking time, handling employee benefits, measuring employee performance, planning succession and much much more. For a small or medium sized business, deployment of an HR software for these activities might seem expensive. In reality, it’s not! Rather it’s a great accelerator. Automating your HR can help you save time and money in a lot of ways. And helps you focus your energies on your core business activities.

An automated payroll system enables the employer to process its payroll through a computerized system making payroll processing simpler, defect free, and faster.

Accounting involves not just making statutory reports, Ledgers, Trial Balance, P&L Statement or Balance Sheet but also filing all the requisite forms and returns to various authorities, in time.

Human Resource, Payroll and Accounting are automated and integrated software products designed specifically to manage the complex Human Resource and Accounting activities involved in an organization. It takes care of the entire Human Capital Management of the company. It helps in automating all the HRMS and Accounting functions like staffing, tests, induction, training, appraisals, attendance and leave management, payroll, accounting, tax filing and communication with candidates and employees etc.

HRIS:

Get all the employee information on your fingertips. You can query on details like the employee name, address, telephone numbers, email address, branch, department, designation, qualifications, experience, birth date, marital status, blood group, allergies, serious illnesses, and other emergency information. So when you want a particular work done, you know exactly whom to get in touch with, in seconds. It can generate HR & Payroll MIS reports; to keep it simple it has approx. 150 reports that can be easily generated. It has a powerful search engine and report designers in all the modules to be able to create any number of reports.

Attendance:

Managing employee attendance is easy since any report can be made in to a MDB file or a text file. These files can then be imported in to spreadsheets for graphical reports and analysis. Overtime and benefits can be easily calculated. Late comings or early goings can be calculated in the salary.

Leave:

Employees can see their leaves status and apply online. Leave authorization can be done online. Attendance entry and shift details are totally automated. Manual calculation of leaves is no longer required. All you have to do is to enter from date and to date for the leave taken, and whether it is CL, SL etc. Software will automatically calculate opening balance of the different types of leaves due at the beginning of the period.

Payroll:

Payroll process in easy steps and quite in-depth and capable of handling any kind of complicated conditions. TDS calculation very accurate and in compliance with the statutory requirements. It calculates increments, arrears and all Payroll and Statutory reports like salary slip, salary register; IT Estimation report, Form 16, e-Form 24 Q, PF / PT / ESIC related reports etc. can also be generated.

Appraisal:

Customize appraisals to each position any time whenever required. Solution will cut down the time taken for appraisals and save on printing and courier costs. Weightages too can be assigned with the KRA and the recommendations can help in identifying the hotshot professional.

While manual systems are slower, error prone (to keep track of all the data) and tardy, automation not just make the processes much simpler, faster and better managed but also effectively eliminates possibilities of defects. However It’s important to maintain the accuracy of the input. Thus, if a terminated employee is due severance pay but the payroll representative neglects to make the entry, the system will not pay it. Typically, the system is reliable so long as the entries are correct.

You can drive following four major benefits out of automation:

download

  1. HR Automation: Automating HR allows you to free up your human resources workers. This doesn’t mean that you won’t need them anymore – automating human resources just means that they will actually be able to do their job much more efficiently. An HR manager who wastes time looking through time-log spreadsheets, files or emails might end up doing nothing else but that! An average manager spends more than 3 hours a week just sorting out employee timesheets. Automation tools help in tracking and calculating employee time, running it through approvals and sending it to payroll or billing. This can also be done in real-time and by configuring multi-level approvals. Time is not just saved, but is made productive. For example, instead of having to spend hours or even full days keying in payroll information, a few simple steps will be all that they have to do in order to complete the payroll process thanks to automating human resources. Instead of mindlessly typing numbers into the system they can focus on other efforts that benefit your company, like designing new HR strategies that can lead your company into the future.
  2. Productivity Boost: Once you begin automating human resources you’ll see an increase in both productivity and profit. Not only are your HR employees free to get more done thanks to automating human resources, they’ll also be more motivated to work. When people feel like they’re really accomplishing goals, they have more moral. And since they’ll get more done in the same amount of time, you’ll get more for your money. Plus, automating human resources is much more precise.
  3. Precision: Precision makes a huge difference. If you take the steps for automating human resources you’ll notice big differences in payroll, attendance, and benefit info. And along with precision comes simplicity. Automating human resources like employee benefits, for example, makes the process of managing and checking health insurance or vacation days as simple as making a few clicks with a mouse. No pestering the HR department, no need to pull files or check forms. All of the information is easily accessible to anybody who needs it thanks to automating human resources.
  4. Make your first impression count: Recent research indicates that an employee decides to stay or leave an organization in the first 90 days. In Fortune 500 companies alone it has been estimated that close to 50% of the outside hires quit in less than 2 years.
  5. Minimize Attrition: Onboarding ensures minimal attrition. However, it includes many forms, induction programs, salary contracts, IT system allocations and new hire training. With automation, you can structure workflows to trigger multiple actions. For instance, automatically send out requests for IT equipment, ID cards and provide employees access to directly add or edit their personal data. This streamlines new hire onboarding and reduces the time taken to induct a new employee.
  6. Security: Security is the final benefit of automating processes, and probably the most important. When automating human resources you can choose to back up your data to online servers, ensuring that in a fire or computer failure you won’t lose years of important information. And you’ll also get security for your company, since human resources errors can lead to tax issues, legal troubles, and unnecessary expenses. The rate of error drops tremendously when automating human resources, and you’ll be able to help your company avoid any issues that may arise in the future, in some cases without even realizing it. Obviously automating human resources is one investment that can help lead your company into the future, and one that you shouldn’t ignore. For the best systems for automating human resources you can trust Unicorn HRO. Automating human resources will never be simpler or more effective.
  7. Employee Empowerment: Every employee has different requirements. Some travel and have to apply for travel requests and submit expense reports. Others may contact the HR constantly to update their personal and professional information. As an organization grows there will be more and more employees that require HR assistance. Managing hundreds of changing employee data and other requests manually can be difficult. Nowadays, HR systems let employees manage all these activities themselves. For instance, an employee applies for an internal training directly on the HR system. HR can create a workflow to automatically add the training details to the employee record when it is completed. This way the employee is empowered and the HR burden is reduced tremendously.
  1. Interact with any third-party system: Automation is key to third party integration. With the help of APIs (Application Programming Interface) and Webhooks, information can be easily exchanged and communicated to any third party application. For example, when a travel expense record is approved an instant notification will be sent to the accounting software to process the reimbursement. It is also useful in cases where organizations use more than one system to manage their HR activities.

Other then these major direct benefits there are plenty of intangible process benefits, naming a few:

  1. HR, Payroll & Accounting software are completely web based, thereby giving tremendous cost benefits. Thus there is no headache on client PCs. No reloading of exes when the package is updated
  2. The Employee / Manager self-service makes the task of approvals quite fast without bothering to go into multiple pages. Some organizations still manage their leave requests through traditional means – emails, word of mouth and sometimes even sticky notes. Now, what if the managerial hierarchy demands approval at multiple levels? Or HR gets left out from the application and approval emails? The entire process becomes time consuming with loads of untraceable requests and approvals.This is why 50% of the global HR workforce today, prefer automated time-offs to update employee records instantly. This can be managed through mobile HR apps as well. Now, that’s automation for you!
  3. All statutory reports and challans of PF, ESIC, Income tax available with E-TDS integration, all available and fully integrated with e-TDS
  4. PF Trust, superannuating and gratuity management
  5. Claim management, very complicated issues of exempted and carry forward able claim heads based on monthly & yearly calculations can be easily done
  6. Salary processing steps, in very easy steps as it is highly automated.
  7. Highly automated MIS

Service Providers like PayBooks, PenSoft, Z-Pay, Ultipro and Sage Peachtree calculate gross-to-net earnings based on the data the payroll representative inputs for SME segment while AON Hewitt, KPMG, Accenture, ADP and Mercer gives you total outsourcing options for complete HR and Payroll ops. Preeminent Business Solutions (PBS) offers complete HR, Payroll and Accounting services that goes beyond usual HR & Payroll support and do all your accounting and tax filings as well.

Automating your HR and Accounting department helps reduce the strain on your HR  and accounting personnel, increase productivity and improve employee participation. Companies all over the globe are increasingly adopting HR automation tools to keep their employees ticking and bring them closer to the organizational goals. It’s time you do too.