Experienced
As a seasoned analytical technology professional with business acumen and proven success in accurately interpreting complex data to deliver actionable insights for the decision-making process, I decided to develop this web site to demonstrate some of these skills. Also, developing statistical and mathematical models for predictive analysis of operational data is simply something I love to do.
Data Integrity
Data Integrity, or Veracity has become my mantra. Analysis of bad data is simply a waste of time and can do more harm than good; especially if the analyst is determined to find something of value. Over the last 15 years, I had the privilege of working with engineers from Mitre, DARPA, UT Austin’s Applied Research Laboratory, and numerous defense contractors. The takeaway from this work can be summed up in two words — Data Integrity!
Unlock the Secrets
For me, there are few things more rewarding than gathering data, managing it properly, and harnessing from it actionable information. The power of prediction is an amazing thing, and with today’s technologies we are able to predict the future with a relatively high degree of accuracy; at least for certain types of events. For me, this is truly amazing and is precisely what keeps me focused on data.
Purpose
Obviously, this site is self-promoting with one objective being to attract consulting engagements and/or employment opportunities. However, you can do both without having an elaborate web site. The primary purpose of this site is to share information and, hopefully, learn and exchange ideas within the community of IT professionals.
Technologies at a Glance
Choosing the right technologies for deployment of your big data, and analytical platforms is critical.
Data Management
The data that flows through an organization is known as the data pipeline. Proper application of data management best practices requires that this data is identified and understood. Identifying and documenting business processes and the associated data flows provides valuable insights, and enables rapid change and minimizes disruptions. This does not prevent schema on read, and schema on write operations, but rather facilitates the use of both. One problem with unsupervised learning (data mining) is that the results are often difficult to interpret. This problem is compounded if you are not even sure what was in the data being mined. Proper data management is a fundamental first step in managing data as an asset.
Ontology
So what is ontology in the context of data management and predictive analytics? Most people have heard the word in association with Artificial Intelligence (AI), or the Semantic Web (SW). An ontology is a domain specific vocabulary of concepts used to model the properties and relationships within that specific domain. It is very similar in nature, and terminology to data modeling. Entities are defined, along with their properties and interrelationship with other entities. The final product allows the seamless exchange of information between data managed using the same ontology. For example, two healthcare facilities using the EHROntology for Electronic Health Records could “speak” to one another.
Big Data
At the heart of a Big Data project usually sets a series of clustered Apache™ Hadoop® servers that provide a distributed processing framework capable of processing large data sets. There are numerous packaged distributions of Hadoop from vendors like Hortonworks, Cloudera, and even some pay as you go services from companies like Amazon Web Services, and Google Cloud Platform. Each comes with their own component stack and customized solution. The key to a successful implementation is architecting and sizing properly to ensure that the solution will solve the problem.
Data Science
Data Science is an interdisciplinary field requiring in depth knowledge of various scientific methods that when applied to processes and systems will hopefully result in the extraction of actionable information. These methods include statistical and mathematical models applied to large and small data sets alike that come in both structured and unstructured formats. Big Data and Data Science are often spoken of as one in the same, but they are not. There can be an intersection for some problems, but one can exist without the other. Performing good, scientific analysis requires a strong statistical and mathematical background, computer programming skills in languages such as R and Python, as well as an understanding of the data within a given domain. The better the data is managed within the domain, the more likely the data scientist is to obtain meaningful results.